Skip to main content

Section 10 Ethics and Machine Learning

Question 10.1.
You find a wallet in the bathroom and no one else is around. What do you do? Its full of cash.

Question 10.2.
What are ethics? Answer
“the discipline dealing with what is good and bad and with moral duty and obligation” (from Merriam-Webster dictionary)
  1. Principles that help us distinguish right from wrong.
  2. Choices that benefit society as a whole, rather than what's best for a single individual.
  3. May or may not correspond to laws. But, ethics usually come first and become laws based on shared ethical values.

Question 10.3.
Is Spam, aka unsolicited email, good or bad business practice?
Answer
It depends on how many people are doing it. It dates back to the 1990s, when a pair of attorneys posted about their specialty in immigration policy to everyplace they knew. It was innovative business practice at the time. But when everyone starts doing it, it overwhelms electronic mail and it is now viewed negatively and legislation limits how it may be used. (Legitimate businesses must allow you to unsubscribe.) Of course, it is also still used illegally.
From 2018 https://www.propellercrm.com/blog/email-spam-statistics

Question 10.4.

What is bias? Answer

“A tendency (either known or unknown) to prefer one thing over another that prevents objectivity, that influences understanding or outcomes in some way.” (from sociologydictionary.org)

Subsection 10.1 Discussion of articles

There are some interesting updates to this topic. What are they? Answer

One of the authors, Timnit Gebru, of the paper referenced in the article above was fired/forced to resign from the ethical AI division at Google, after examining some of Google's algorithms and reporting on similar bias. (December 2020) https://www.nytimes.com/2020/12/03/technology/google-researcher-timnit-gebru.html

There is a documentary about this topic called "Coded Bias" available on Netflix. I highly recommend. (As of April 2021)

A number of cities have banned facial recognition. Numerous bills for federal regulation have been proposed, but no laws yet. Maybe this is the year. (As of February 2021)

Subsection 10.2 Weapons of Math Destruction

Question 10.8.
Is data objective? Are machine learning algorithms objective?

Let's watch a TED talk by Cathy O'Neill author of Weapons of Math Destruction and consider the following questions. [13:18]

  1. Why are data and algorithms not objective? Answer
    Who defines success? Who defines what is associated with success? Making predictions based on data of past will repeat historical biases. If we don't know how an algorithm is working, we can't always tell if it is making terrible decisions.
  2. What three examples does she give for algorithms that are used in an unfair way. Answer
    Value added formula for teachers. Fox news hiring algorithm. Predictive policing and recitivism risk.

Subsection 10.3 Is data the new oil?

The phrase "data is the new oil" is said a lot. The analogy isn't perfect, but there is power in data and machine learning algorithms and there are important questions to ask about who has that power and how is it being used.

  1. Who is doing the work of data science (and who is not)?
  2. Whose goals are prioritized in data science (and whose are not)?
  3. And who benefits from data science (and who is either overlooked or actively harmed)?

(Questions from Data Feminism, Chapter 1.)

There are many potential areas that can be problematic for machine learning.

  1. Encoding Bias

  2. Data Collection and Privacy

  3. Potential for misuse of ML algorithms

  4. Environmental costs

Subsection 10.4 Encoding Bias

"Social scientist Kate Crawford has advanced the idea that the biggest threat from artificial intelligence systems is not that they will become smarter than humans, but rather that they will hard-code sexism, racism, and other forms of discrimination into the digital infrastructure of our societies." (Data Feminism, Chapter 1.)

We already discussed examples of facial recognition, predictive policing, and job candidate screening. There are lots of other examples.

  • Recitivism rate (Risk assessment for Criminal behavior) in 2016 ProPublica article called "Machine Bias: There’s software used across the country to predict future criminals. And it’s biased against blacks."" https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing

  • Speech recognition has similar issues with racial bias. "There Is a Racial Divide in Speech-Recognition Systems, Researchers Say" https://www.nytimes.com/2020/03/23/technology/speech-recognition-bias-apple-amazon-google.html

  • Bias issues in internet search engines "We Teach A.I. Systems Everything, Including Our Biases" https://www.nytimes.com/2019/11/11/technology/artificial-intelligence-bias.html

  • Another bias example in Google Search Engine. (from Safiya Umoja Noble in Algorithms of Oppression) As recently as 2016, searches for "three Black teenages" returned mugshots and "three white teenages" returned wholesome stock photography.

  • Using models to predict the risk of child abuse. (from Virginia Eubanks in Automating Inequality) For wealthier parents (with private health care and mental health services) contributed little data to the model. Poorer parents (more likely to rely on public services) had far more data available. The model overpredicted that children from poor parents were at higher risk for child abuse.

Subsection 10.5 Data Collection and Privacy

  1. Already mentioned facial recognition bias, but there is also a huge issue of surveillance with facial recognition.

    Before Clearview Became a Police Tool, It Was a Secret Plaything of the Rich https://www.nytimes.com/2020/03/05/technology/clearview-investors.html The facial-recognition app Clearview sees a spike in use after Capitol attack. https://www.nytimes.com/2021/01/09/technology/facial-recognition-clearview-capitol.html The Secretive Company That Might End Privacy as We Know It https://www.nytimes.com/2020/01/18/technology/clearview-privacy-facial-recognition.html

  2. Who decides what data to collect? Who decides how to use it? Is it protected?

    In some cases not having enough data is a problem.

    In some cases too much data is a problem.

    "the databases and data systems of powerful institutions are built on the excessive surveillance of minoritized groups."

    (from Data Feminism, chapter 1.)

  3. A 2012, New York Times article, by Charles Duhigg, “How Companies Learn Your Secrets,”

    • Target created a pregnancy detection score based on customer purchases.
    • developed automated system to send coupons to possibly pregnant customers.
    • A teenager received coupons for baby clothes in the mail.
    • Her father was infuriated at Target for this.
    • She was, in fact, pregnant, but had not yet told her family.

    (from Data Feminism, chapter 1.)

  4. Neural networks can leak personal information.

    • 2018 paper, "The Secret Sharer: Evaluating and Testing Unintended Memorization in Neural Networks"

      A Neural network (common generative sequence model) trained on sensitive data (kept private) but model released to the public memorizes data well enough that it can be extracted from the model.

      Example: Google's Smart Compose, a commercial text-completion neural network trained on millions of users' email messages.

      https://arxiv.org/abs/1802.08232

    • 2020 paper, "Extracting Training Data from Large Language Models"

      Language models trained on private datasets can recover individual training examples (names, phone numbers, and email addresses, etc). Larger models more vulnerable.

      Example: GPT-2 language model. We demonstrate our attack on GPT-2, a language model trained on scrapes of the public Internet,

      https://arxiv.org/abs/2012.07805

Subsection 10.6 Potential for Misuse of ML algorithms

Subsection 10.7 Environmental Costs

  • A 2017 Greenpeace report estimated that the global IT sector, which is largely US-based, accounted for around 7 percent of the world’s energy use.

  • The cost of constructing Facebook’s newest data center in Los Lunas, New Mexico, is expected to reach $1 billion. The electrical cost of that center alone is estimated at $31 million per year.

(Information from Data Feminism.)

Subsection 10.8 Enriching Big Data

We should not have blind faith in big data, neither should we stop using big data, but we must use our humanity to provide additional information and we should demand oversight for algorithms. We'll conclude with a video by Tricia Wang speaking about how to add "thick data" to enrich our big data.

Subsection 10.9 Recommended References

Weapons of Math Destruction, Cathy O'Neill

Data Feminism, https://data-feminism.mitpress.mit.edu/pub/vi8obxh7/release/3

Coded Bias, Documentary available on Netflix.