User:Onecuriouspineapple/Big Data and Discrimination

This is a user sandbox of Onecuriouspineapple. You can use it for testing or practicing edits.
This is not the place where you work on your assigned article for a dashboard.wikiedu.org course.
Visit your Dashboard course page and follow the links for your assigned article in the My Articles section.

Get Help

This is the user sandbox of Onecuriouspineapple. A user sandbox is a subpage of the user's user page. It serves as a testing spot and page development space for the user and is not an encyclopedia article. Create or edit your own sandbox here.

Other sandboxes: Main sandbox | Template sandbox

Finished writing a draft article? Are you ready to request review of it by an experienced editor for possible inclusion in Wikipedia? Submit your draft for review!

Introduction

Big Data and Discrimination is a conceptual framework that describes the ongoing debate within the discussion of artificial intelligence (AI) technologies where programs or algorithms that inhibit the usage of Big Data, may produce discriminatory results that affect the groups that were revealed in user queries as results when the user has inquired in an artificial intelligence (AI) program that uses Big Data as the database. The underlying theme for the two concepts closely relates to the ideologies laid out in the Algorithmic bias.

As part of a larger study in artificial intelligence (AI) research, Algorithmic bias as part of the conversation between Big Data and Discrimination helps explain the relationship on how Big Data perpetuates and/or attempts to reduce or eliminate discriminatory results in a user-given query, or as part of a larger project that companies utilize. Algorithmic bias states that the study of its concept stems from the "systematic and unfair discrimination" that is observed in the real world where privacy violations take place with intersectional pillars such as race, ethnicity, gender, class, sexuality and so forth.^[1] Peer-reviewed books such as AI Snake Oil, the authors discuss that often Big Data and/or artificial intelligence (AI) produce results that often inhibit "racial bias, gender bias, and ageism".^[2]

In addition, the relationship between Big Data and Discrimination has been an ongoing subject of debate by many communication studies, information technology and social science scholars alike. More and more studies have indeed revealed over the years that Big Data does produce discriminatory results in applications that do use Big Data as part of the application, and some authors argue that it is an inherent problem of the algorithmic characteristic itself.^[3]

However, when used in appropriate channels, it can also produce promising results where in the context of anti-discrimination laws for example, one solution to negate human error is to automate the decision-making process that disallows the conditioning of a decision on the protected characteristics, and this step is necessary to reflect the smooth operation of the current legal procedures by introducing automation.^[4]

Although Big Data producing discriminatory results has often been scrutinized by research scholars and social scientists alike, it is a delicate subject that is part of an ongoing debate on the fact that whether using Big Data to produce such results is ethical or not.

Nevertheless, the two sides of both advantages and disadvantages of using Big Data are prominent and one key characteristic that is observed through the usage of said technology is Discrimination.

Different research highlights the important notes where they question the rationale on both ends, to discover what it means for artificial intelligence (AI) programmers and developers to observe the results and attempt to provide solutions that may affect the real world if such discriminatory results indeed affect one group to another that may affect their various social activities.^[4]^[5]^[6]^[7] Some authors argue that unintended consequences are part of the result that may not reflect the accuracy of artificial intelligence (AI) systems.^[5]^[8]

Definitions

Big data, according to Favaretto, has been described as a “one-size-fits-all (so long as it’s triple XL) answer” to solve some of the most challenging problems in the fields of climate change, healthcare, education and criminology. ^[5] He continues to explain that "traditionally it has been defined in terms of four dimensions (the four V’s of Big Data): volume, velocity, variety, and veracity—although some scholars also include other characteristics such as complexity and value —and it consists of capturing, storing, analyzing, sharing and linking huge amount of data created through computer-based technologies and networks, such as smartphones, computers, cameras, sensors etc”.^[5] Thus, Big Data can be understood as a data set that artificial intelligence programs follow to produce desirable and/or undesirable results that are displayed after users input depending on what the query is and how the query is being asked.

Discrimination, also according to Favaretto, has been described as "acts, practices or policies that impose a relative disadvantage on persons because of their membership of a salient social or recognized vulnerable group based on gender, race, skin color, language, religion, political opinion, ethnic minority etc".^[5] Other definitions of discrimination can be found in the Collins dictionary for example, where it defines the term as "the practice of treating one person or group of people less fairly or less well than other people or groups".^[9] Thus, discrimination in the context of Big Data can be understood as both an intended and unintended result of artificial intelligence programs producing results that exhibit the characteristics of discriminatory data sets that put one group at a disadvantage and favour another social group, based on the characteristics listed by Favaretto, as an example.

History

Early Critiques

The concept of Big Data and the contemporary notions for the usage of the term did not emerge until historians attempted to find ways how to gather information more effectively and efficiently in organizing archival information digitally since manual organization of information became rather tedious in the 1800s.^[6] That is not to say that the term did not exist, however, historical anecdotes of the term can be traced to how governments wanted to know the "statistics and govern populations".^[6] Beer's article indicates that in order to organize the amount of social data that is being recorded in government publications such as consensus data was not the issue, however, the issue was when the amount of data (the "volume", as one of the big "V's" that indicate the characteristics of Big Data) "suddenly got big" in between the 1820s and the 1830s.^[6]

The theoretical background of Big Data as a theoretical concept in Beer's article is analyzed in a different light by reflecting upon Foucault's work to explain the current contemporary usage of the term. He explains the reasoning behind Big Data’s definition in Foucault's terms that the definition itself is akin to Foucault’s ideology where "my problem is to see how men govern (themselves and others) by the production of truth",^[6] and that his notion of social realities that indicate the "types of programmes or imagined possibilities" in social data may explain the current realities of what may align as the definition of Big Data in the contemporary social world. Favaretto, also explain in their article that the two most important "V's" out of the different characteristics it contains out of the V’s listed in Big Data characteristics, the volume and velocity are the ones that explain the correlation with discrimination and biases that are pertinent to the two terminologies.^[5]

When Big Data and discrimination are discussed together as both technological and sociological phenomena, Favaretto indicate that “it has gained momentum recently, in particular after the publication of the White House report of 2014 which firmly warned that discrimination might be the inadvertent outcome of Big Data technologies. Since then, possible discriminatory outcomes of profiling and scoring systems have increasingly come to the attention of the general public”.^[5]

Contemporary Critiques

Disadvantages of using Big Data that perpetuates discrimination

As more and more information gets gathered by the government about population data, whether it be gender, class, height, weight, eye colour and whatnot, Beer's article indicates that "...the emergence of new metrics also led to people being classified in new ways, which had powerful implications for how individuals and groups were perceived and treated".^[6] They also highlight the fact that the "avalanche" of data had resulted in leading to all kinds of 'classificatory struggles'^[6] due to many users pouring out information to the mass public from usage of smartphones, and health information from tracking devices such as smartwatches. Beer also points out that "...these emerging numbers quickly came to define how people saw themselves, how they saw others and, complimenting these, how limits and boundaries were placed around actions and opportunity".^[6]

One of the most pressing issues in various companies utilizing Big Data is the perpetuation of discriminatory results and using said results as an opportunistic measure that puts an advantage on one group to another. This phenomenon may or may not be a result of the unintended consequences.

The overarching consensus indicates that Big Data analytics, algorithms, and different programs and/or companies utilizing Big Data produce results that question the notion of privacy by information security scholars. It is also an ongoing debate that Big Data ethics is a rising concern as technological advancement is deeply ingrained in many users of the world who use smartphones, smartwatches and surveillance cameras in their homes, Draper's article suggests that privacy concerns will keep continue to exacerbate as long as companies disregard the importance of privacy of the users, and reframing the issue is going to be an important homework for many companies that do use Big Data as part of their corporate data practices.^[10]

Turow's case study is an excellent example of how Big Data embeds discriminatory practices through analyzing artificial voice recognition systems in their article, where voice profiling is being used in big companies such as Google Assistant or Alexa.^[7]

In addition, he argues that companies have kept their line of work in secrecy where they do not reveal their activities to the public^[7] because it is not necessary for them to reveal business strategies that may jeopardize their financial gain. Although Canada does have FIPPA (The Freedom of Information and Protection of Privacy Act), and the United States have the U.S. Privacy Act of 1974, each company's privacy policies are indicated in a highly questionable form before agreeing to use, and Turow further indicates that "most states, even those following the California privacy law, don’t even require a push-button opt-in (for example, pressing “OK” in response to information about what data are collected and why)".^[7]

Turow continues to iterate in the article that in 2019, the Ministry of Industry and Information Technology blacklisted fourteen apps it claimed it was collecting sensitive personal data and that it did not abide by the California privacy laws and that voice profiling had become prominent ever since.^[7]

Crawford's book discusses an in-depth analysis and a case study about Big Data and its discriminatory practices through the observation of NIST (National Institute of Standards and Technology). Her study looks at mugshots that were taken in 2017, and the results were devastating to the point where she questions the ethical collection of data that were ever considered that became a part of this image database used in NIST.^[11] According to her study, mugshots had defining characteristics that had "visible wounds, bruising, and black eyes; some are distressed and crying".^[11]

Furthermore, As part of a facial recognition system that is employed by law enforcement agencies, she argues that facial recognition technologies used in law enforcement do not undermine how these potential criminals or non-criminals alike are essentially "dehumanized" and that once the information is recorded into the database, "they are not seen to carry meanings or ethical weight as images of individual people or as representations of structural power in the carceral system".^[11] She then argues that this is the inherent nature of machine learning that eventually becomes a part of the grand scale of information that she refers to as data.^[11]

Advantages of the usage of Big Data and solutions towards discrimination

Favaretto's article suggests that it may be possible to eliminate algorithmic biases and discriminatory results if "pre-processing methods that involve the sanitization or distortion of the training data set to remove a possible bias in order to prevent the new model from learning discriminatory behaviours".^[5] That is, before the algorithmic models and Big Data start learning large sets of data, computer scientists and information technology developers program a set of rules that prevent the possible discriminatory information from being portrayed and end up producing such results may be one way to solve the issue.

Gillis and Spiess's article indicates that in the context of abiding by the United States anti-discrimination laws and having more individuals apply for mortgages and enter the housing market, they suggest that a "discrimination stress test" may be practical of use before it is applied in the real world; as in, a hypothetical "practice" of the buyer and lender scenario is inputted into the program to mitigate the outcomes that may severely disadvantage certain population groups that may include marginalized individuals and vulnerable minorities although the practicality of human labour involved in this solution has not experimented.^[4]

Beer argues in their article that it is not the information that is being produced by Big Data that is problematic that has been observed by different articles indicated prior, he argues that "it is in how those data and their potential is imagined and envisioned".^[6] Beer has an important point that because many users have been actively using and understand the power that Big Data has in social, economic, and political sectors, users and companies alike must pay close attention to the matters at hand that is the realities that users have been given. To step away from the technical infrastructures and "re-imagining" the potential possibilities is up to the users and societies that utilize them.^[6]

Critical Perspectives and Transparency as a solution

Critical perspectives for the relationship between Big Data and Discrimination inquires not only the Big Data ethics in producing Discrimination as a result but also the power dynamics that run for those who design and apply the programs. Critical scholars argue that the reason for discriminatory results being displayed in algorithms and Big Data programs is that there is not enough diversity in these application creators in the field of artificial intelligence (AI).^[12]^[13]

Richterich's article indicates that the underlying tone that runs through big data studies is conscious of their power relations, biases and inequalities.^[12] Power asymmetry is an important aspect of social inequality and that is reflected in the digital world where the majority of the developers are not diverse.^[13]. A very minority percentile of artificial intelligence scientists are women, and there is a definite diversity crisis that has been an ongoing issue surrounding Big Data studies.^[13] Obar's article also indicates that “writing unbiased algorithms is difficult and, often by design or error, programmers build in misinformation, racism, prejudice and bias” which aligns with Richterich's research.^[14]

There is also a definite digital divide that divides the researchers with computational skills, in which they consists of majority men, and that “data is mainly gathered from white, educated people leaving out racial minorities such as Latinos”.^[5] Another possible solution to fight against discriminatory results in Big Data is to implement protocols utilizing transparency measures for Big Data usage to give clear explanations on the steps toward said results being produced in various programs.^[5]

Reference List

^ Marabelli, Marco. "AI, Ethics, and Discrimination in Business". Palgrave Studies in Equity, Diversity, Inclusion, and Indigenization in Business. Springer. Retrieved 11 April 2025.
^ Narayanan, Arvind; Kapoor, Sayash (2024). AI snake oil: what artificial intelligence can do, what it can't, and how to tell the difference. Princeton: Princeton University Press. p. 11. ISBN 978-0-691-24913-1.
^ Coen, Rena. "A User Centered Perspective on Algorithmic Personalization" (PDF). UC Berkeley School of Information. UC Berkeley. Retrieved 11 April 2025.
^ ^a ^b ^c Gillis, Talia B; Spiess, Jann L (March 2019). "Big Data and Discrimination". The University of Chicago Law Review. 86 (2): 459–488. Retrieved 11 April 2025.
^ ^a ^b ^c ^d ^e ^f ^g ^h ⁱ ^j Favaretto, Maddalena; De Clercq, Eva; Elger, Bernice Simone (December 2019). "Big Data and discrimination: perils, promises and solutions. A systematic review". Journal of Big Data. 6 (1): 2, 12, 15, 16. doi:10.1186/s40537-019-0177-4.{{cite journal}}: CS1 maint: unflagged free DOI (link)
^ ^a ^b ^c ^d ^e ^f ^g ^h ⁱ ^j Beer, David (June 1, 2016). "How should we do the history of Big Data?". Big Data & Society. 3 (1): 2, 3. doi:10.1177/2053951716646135. Retrieved April 9, 2025.
^ ^a ^b ^c ^d ^e Turow, Joseph (2021). The voice catchers: how marketers listen in to exploit your feelings, your privacy, and your wallet. New Haven, London: Yale University Press. pp. 227, 234, 250, 255. ISBN 978-0-300-25873-8. Retrieved April 3, 2025.
^ Weizenbaum, Joseph (1976). Computer power and human reason: from judgment to calculation. San Francisco: Freeman. p. 300. ISBN 978-0716704645. Retrieved 11 April 2025.
^ "Definition of 'discrimination'". Collins dictionaries. Retrieved 11 April 2025.
^ Draper, Nora A; Pieter Hoffmann, Christian; Lutz, Christoph; Ranzini, Giulia; Turow, Joseph (September 2024). "Privacy resignation, apathy, and cynicism: Introduction to a special theme". Big Data & Society. 11 (3): 6. doi:10.1177/20539517241270663. Retrieved March 23, 2025.
^ ^a ^b ^c ^d Crawford, Kate (2021). Atlas of AI: Power, Politics, and the Planetary Costs of Artificial Intelligence. New Haven, CT: Yale University Press. pp. 90, 91, 93. ISBN 978-0-300-25239-2. Retrieved 3 April 2025.
^ ^a ^b Richterich, Annika (April 13, 2018). The Big Data Agenda: Data Ethics and Critical Data Studies. University of Westminster Press. p. 18. ISBN 978-1-911534-72-3. Retrieved 10 April 2025.
^ ^a ^b ^c Snow, Jackie. ""We're in a diversity crisis": cofounder of Black in AI on what's poisoning algorithms in our lives". MIT Technology Review. Retrieved 10 April 2025.
^ Obar, Jonathan; McPhail, Brenda. "Preventing Big Data Discrimination in Canada: Addressing Design, Consent and Sovereignty Challenges". Data Governance in the Digital Age: Preventing Big Data Discrimination in Canada: Addressing Design, Consent and Sovereignty Challenges. Centre for International Governance Innovation. Retrieved 9 April 2025.

[1] Marabelli, Marco. "AI, Ethics, and Discrimination in Business". Palgrave Studies in Equity, Diversity, Inclusion, and Indigenization in Business. Springer. Retrieved 11 April 2025.

[2] Narayanan, Arvind; Kapoor, Sayash (2024). AI snake oil: what artificial intelligence can do, what it can't, and how to tell the difference. Princeton: Princeton University Press. p. 11. ISBN 978-0-691-24913-1.

[3] Coen, Rena. "A User Centered Perspective on Algorithmic Personalization" (PDF). UC Berkeley School of Information. UC Berkeley. Retrieved 11 April 2025.

[gillis1-4] Gillis, Talia B; Spiess, Jann L (March 2019). "Big Data and Discrimination". The University of Chicago Law Review. 86 (2): 459–488. Retrieved 11 April 2025.

[favaretto1-5] ^ ^a ^b ^c ^d ^e ^f ^g ^h ⁱ ^j Favaretto, Maddalena; De Clercq, Eva; Elger, Bernice Simone (December 2019). "Big Data and discrimination: perils, promises and solutions. A systematic review". Journal of Big Data. 6 (1): 2, 12, 15, 16. doi:10.1186/s40537-019-0177-4.{{cite journal}}: CS1 maint: unflagged free DOI (link)

[beer1-6] ^ ^a ^b ^c ^d ^e ^f ^g ^h ⁱ ^j Beer, David (June 1, 2016). "How should we do the history of Big Data?". Big Data & Society. 3 (1): 2, 3. doi:10.1177/2053951716646135. Retrieved April 9, 2025.

[turow1-7] Turow, Joseph (2021). The voice catchers: how marketers listen in to exploit your feelings, your privacy, and your wallet. New Haven, London: Yale University Press. pp. 227, 234, 250, 255. ISBN 978-0-300-25873-8. Retrieved April 3, 2025.

[8] Weizenbaum, Joseph (1976). Computer power and human reason: from judgment to calculation. San Francisco: Freeman. p. 300. ISBN 978-0716704645. Retrieved 11 April 2025.

[9] "Definition of 'discrimination'". Collins dictionaries. Retrieved 11 April 2025.

[10] Draper, Nora A; Pieter Hoffmann, Christian; Lutz, Christoph; Ranzini, Giulia; Turow, Joseph (September 2024). "Privacy resignation, apathy, and cynicism: Introduction to a special theme". Big Data & Society. 11 (3): 6. doi:10.1177/20539517241270663. Retrieved March 23, 2025.

[crawford1-11] Crawford, Kate (2021). Atlas of AI: Power, Politics, and the Planetary Costs of Artificial Intelligence. New Haven, CT: Yale University Press. pp. 90, 91, 93. ISBN 978-0-300-25239-2. Retrieved 3 April 2025.

[richterich1-12] Richterich, Annika (April 13, 2018). The Big Data Agenda: Data Ethics and Critical Data Studies. University of Westminster Press. p. 18. ISBN 978-1-911534-72-3. Retrieved 10 April 2025.

[snow1-13] Snow, Jackie. ""We're in a diversity crisis": cofounder of Black in AI on what's poisoning algorithms in our lives". MIT Technology Review. Retrieved 10 April 2025.

[14] Obar, Jonathan; McPhail, Brenda. "Preventing Big Data Discrimination in Canada: Addressing Design, Consent and Sovereignty Challenges". Data Governance in the Digital Age: Preventing Big Data Discrimination in Canada: Addressing Design, Consent and Sovereignty Challenges. Centre for International Governance Innovation. Retrieved 9 April 2025.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]