Section 10 Ethics and Machine Learning
You find a wallet in the bathroom and no one else is around. What do you do? Its full of cash.
What are ethics? Answer
Is Spam, aka unsolicited email, good or bad business practice?
What is bias? Answer
Subsection 10.1 Discussion of articles
What did you think about facial recognition article? What bias was present and why?
There are some interesting updates to this topic. What are they? Answer
One of the authors, Timnit Gebru, of the paper referenced in the article above was fired/forced to resign from the ethical AI division at Google, after examining some of Google's algorithms and reporting on similar bias. (December 2020)
There is a documentary about this topic called "Coded Bias" available on Netflix. I highly recommend. (As of April 2021)
A number of cities have banned facial recognition. Numerous bills for federal regulation have been proposed, but no laws yet. Maybe this is the year. (As of February 2021)
What did you think about the deep fake article? Should we create ML algorithms to do this? What other ML algorithms are ethically questionable?
What did you think about ML algorithms for screening job candidates article?
Subsection 10.2 Weapons of Math Destruction
Is data objective? Are machine learning algorithms objective?
Let's watch a TED talk by Cathy O'Neill author of Weapons of Math Destruction and consider the following questions. [13:18]
- Why are data and algorithms not objective? AnswerWho defines success? Who defines what is associated with success? Making predictions based on data of past will repeat historical biases. If we don't know how an algorithm is working, we can't always tell if it is making terrible decisions.
- What three examples does she give for algorithms that are used in an unfair way. AnswerValue added formula for teachers. Fox news hiring algorithm. Predictive policing and recitivism risk.
Subsection 10.3 Is data the new oil?
The phrase "data is the new oil" is said a lot. The analogy isn't perfect, but there is power in data and machine learning algorithms and there are important questions to ask about who has that power and how is it being used.
- Who is doing the work of data science (and who is not)?
- Whose goals are prioritized in data science (and whose are not)?
- And who benefits from data science (and who is either overlooked or actively harmed)?
(Questions from Data Feminism, Chapter 1.)
There are many potential areas that can be problematic for machine learning.
Data Collection and Privacy
Potential for misuse of ML algorithms
Subsection 10.4 Encoding Bias
"Social scientist Kate Crawford has advanced the idea that the biggest threat from artificial intelligence systems is not that they will become smarter than humans, but rather that they will hard-code sexism, racism, and other forms of discrimination into the digital infrastructure of our societies." (Data Feminism, Chapter 1.)
We already discussed examples of facial recognition, predictive policing, and job candidate screening. There are lots of other examples.
Recitivism rate (Risk assessment for Criminal behavior) in 2016 ProPublica article called "Machine Bias: There’s software used across the country to predict future criminals. And it’s biased against blacks.""
Speech recognition has similar issues with racial bias. "There Is a Racial Divide in Speech-Recognition Systems, Researchers Say"
Bias issues in internet search engines "We Teach A.I. Systems Everything, Including Our Biases"
Another bias example in Google Search Engine. (from Safiya Umoja Noble in Algorithms of Oppression) As recently as 2016, searches for "three Black teenages" returned mugshots and "three white teenages" returned wholesome stock photography.
Using models to predict the risk of child abuse. (from Virginia Eubanks in Automating Inequality) For wealthier parents (with private health care and mental health services) contributed little data to the model. Poorer parents (more likely to rely on public services) had far more data available. The model overpredicted that children from poor parents were at higher risk for child abuse.
Subsection 10.5 Data Collection and Privacy
Already mentioned facial recognition bias, but there is also a huge issue of surveillance with facial recognition.
Before Clearview Became a Police Tool, It Was a Secret Plaything of the Rich
https://www.nytimes.com/2020/03/05/technology/clearview-investors.htmlThe facial-recognition app Clearview sees a spike in use after Capitol attack.
https://www.nytimes.com/2021/01/09/technology/facial-recognition-clearview-capitol.htmlThe Secretive Company That Might End Privacy as We Know It
Who decides what data to collect? Who decides how to use it? Is it protected?
In some cases not having enough data is a problem.
In some cases too much data is a problem.
"the databases and data systems of powerful institutions are built on the excessive surveillance of minoritized groups."
(from Data Feminism, chapter 1.)
A 2012, New York Times article, by Charles Duhigg, “How Companies Learn Your Secrets,”
- Target created a pregnancy detection score based on customer purchases.
- developed automated system to send coupons to possibly pregnant customers.
- A teenager received coupons for baby clothes in the mail.
- Her father was infuriated at Target for this.
- She was, in fact, pregnant, but had not yet told her family.
(from Data Feminism, chapter 1.)
Neural networks can leak personal information.
2018 paper, "The Secret Sharer: Evaluating and Testing Unintended Memorization in Neural Networks"
A Neural network (common generative sequence model) trained on sensitive data (kept private) but model released to the public memorizes data well enough that it can be extracted from the model.
Example: Google's Smart Compose, a commercial text-completion neural network trained on millions of users' email messages.
2020 paper, "Extracting Training Data from Large Language Models"
Language models trained on private datasets can recover individual training examples (names, phone numbers, and email addresses, etc). Larger models more vulnerable.
Example: GPT-2 language model. We demonstrate our attack on GPT-2, a language model trained on scrapes of the public Internet,
Subsection 10.6 Potential for Misuse of ML algorithms
Read about deep fakes and how easily they can spread misinformation.
Who makes 'moral' decisions for autonomous cars?
"An autonomous car is barreling down on five persons, and cannot stop in time to save them. The only way to save them is to swerve and crash into an obstacle, but the passenger of the car would then die. What should the car do?"
Even modification of images is easily available.
Should McDonald's use AI to get you to buy more fast food?
https://www.nytimes.com/2019/10/22/business/mcdonalds-tech-artificial-intelligence-machine-learning-fast-food.html"Would You Like Fries With That? McDonald’s Already Knows the Answer "
Should China use facial recognition software to shame people for wearing pajamas in public? (FYI, pre-covid)
https://www.nytimes.com/2020/01/21/business/china-pajamas-facial-recognition.html"Chinese City Uses Facial Recognition to Shame Pajama Wearers"
Subsection 10.7 Environmental Costs
A 2017 Greenpeace report estimated that the global IT sector, which is largely US-based, accounted for around 7 percent of the world’s energy use.
The cost of constructing Facebook’s newest data center in Los Lunas, New Mexico, is expected to reach $1 billion. The electrical cost of that center alone is estimated at $31 million per year.
Subsection 10.8 Enriching Big Data
We should not have blind faith in big data, neither should we stop using big data, but we must use our humanity to provide additional information and we should demand oversight for algorithms. We'll conclude with a video by Tricia Wang speaking about how to add "thick data" to enrich our big data.
Subsection 10.9 Recommended References
Weapons of Math Destruction, Cathy O'Neill
Coded Bias, Documentary available on Netflix.