Lies, Damn Lies, and Data Mining

By Verified Atheist on Wed, 06/12/2013 - 11:06am |

In the wild (i.e., when we're not talking about contrived examples), data mining involves significant amounts of statistics. There are two common quotes that come to mind when talking about statistics:

There are three kinds of lies: lies, damned lies, and statistics. (Popularized by Mark Twain/Samuel Clemens, who attributed it to Benjamin Disraeli, but with uncertain provenance.)

and

The old saying is that “figures will not lie,” but a new saying is “liars will figure.” It is our duty, as practical statisticians, to prevent the liar from figuring; in other words, to prevent him from perverting the truth, in the interest of some theory he wishes to establish. (Carroll D. Wright, a prominent statistician employed by the U.S. government in 1889)

Although it's true that there are many who would use statistics to obfuscate, the more common danger is when people who don't intend to misuse statistics use them without properly understanding them. Here are a couple stories from Leonard Mlodinow's The Drunkard's Walk, the first of which concerned him personally:

My most memorable encounter with the Reverend Bayes came one Friday afternoon in 1989, when my doctor told me by telephone that the chances were 999 out of 1,000 that I'd be dead within a decade. …

…
The adventure started when my wife and I applied for life insurance. The application procedure involved a blood test. … [Mlodinow's HIV test] came back positive. Though I was too shocked initially to quiz him about the odds he quoted, I later learned that he had derived my 1-in-1,000 chance of being healthy from the following statistic: the HIV test produced a positive test when the blood was not infected with the AIDS virus in only 1 in 1,000 blood samples. That might sound like the same message he passed on, but it wasn't. My doctor had confused the chances that I would test positive if I was not HIV-positive with the chance that I would not be HIV-positive if I tested positive.

He goes into more detail explaining it (and I highly recommend that book), but the take-away result is that instead of his chances being 999 in 1,000 of having HIV (and hence AIDS), using the a priori odds of a heterosexual non-drug-abusing white male American having HIV is 1 in 10,0000, the actual odds that Mlodinow had the HIV virus was about 10%. (Consider those 10,000 people: about 10 of them would have a false-positive result on an HIV test, and 1 of them would have a true-positive result.)

Another excellent example from his book, which gets to the heart of the problem with over-relying on data mining concerned Sally Clark, in Britain. Her first child died from SIDS at 11 weeks, and was diagnosed as SIDS (sudden infant death syndrome). However, when her second baby also apparently died from SIDS (this time at 8 weeks), she was accused of smothering both children. The odds that were calculated for having two children dying from SIDS was 1-in-73,000,000. Putting aside the fact that there are far more than 73 million people in the world (and almost that many in Britain alone), the way those odds were calculated were to take the odds that a single child would die from SIDS (which had calculated at 1-in-8,543) and square it. The problem is, that process assumes statistical independence, which when one considers the possible medical explanation of SIDS seems even more unlikely than having two of your children in a row die from it. The good news is that after three and a half years in jail, Sally Clark was eventually released from prison when it was uncovered that the pathologist working for the prosecutor had withheld the information that Clark's second child had been suffering from a bacterial infection at the time of her death.

So, other than statistics, what does this have to do with data mining, since neither of these stories involved data mining? Well, while I was working on my Ph.D., I was also working for a data mining facility and my boss shared a story with us about an interesting statistical result that had been uncovered: average school SAT scores are negatively correlated with the amount of tax payer dollars spent on those schools. This was surprising, so people dug deeper. It turns out that the average school SAT scores are higher out west, where far fewer students per school take the SAT and where less money is spent per school on average. The reasonable explanation for the result was selection bias, as students who take the SAT out west are usually interested in applying to out-of-state colleges, and those students (regardless of location) usually do better on the SAT.

So what? Well, the real lesson our boss wanted us to learn from this was that if the result had been the reverse (i.e., that SAT scores were positively correlated with the amount spent on schools) it's less likely that people would've been considering alternate explanations.

Think about that.

Now, consider a hypothetical: data mining statistics come back suggesting that the odds that John Doe is guilty of X is 99.999%. John Doe is a minority with a juvenile record that includes crimes similar to X, and he happens to live within 20 miles of Alan Street and 1^st Avenue, in New York City where the crime happened. Although 99.999% seems pretty ironclad, it means that there are 1-in-100,000 odds that a non-guilty person would be also implicated. How many people live within 20 miles of Alan Street and 1^st Avenue? How many of them have juvenile records? How many of those records might include similar crimes? Do we even know if the person actually guilty of the crime has a juvenile record? (No, we don't. This is my hypothetical, after all, and it would seem weird to assume otherwise regardless.)

So, who here now thinks John Doe is most likely guilty? Who here thinks he would be found guilty in a court of law?

Add new comment
3941 reads

Comments

Given the context of this post, it would be interesting and important to know how the government is using the data it gathers. But I don't suppose that they'll tell us that.

by Michael Wolraich on Wed, 06/12/2013 - 11:27am

It would be just as interesting and important (if not more so) to know how the government is gathering the data it uses, but it's even less likely they'll tell us that!

by Verified Atheist on Wed, 06/12/2013 - 11:31am

The statistical explanation for the "collateral damage" deaths with drone strikes.

by rmrd0000 on Wed, 06/12/2013 - 11:28am

Your post reminded me of something I snipped from a few days ago. Thought you might be interested.

A Degree in Big Data

Graduate courses are springing up to meet the demand for analysts

Obtaining data is easy; it can come from a huge variety of automated sources, including RFID tags, mouse clicks, or sales receipts. And the analytic software systems—such as SAS Institute’s eponymous SAS and IBM’s SPSS—that are required to work with this data are getting better, says Michael Hasler, director of a new M.S. in Business Analytics program at the University of Texas at Austin. But what’s missing are the people: “You need to take these large unstructured data sets, clean them up, and find insights, but there’s a shortage of talent to do that work,” says Hasler.

by EmmaZahn on Wed, 06/12/2013 - 12:50pm

Well I am going to start a company that, for a small fee, figures out what the government is figuring out about you.

It's a free country, right?

by erica20 on Wed, 06/12/2013 - 3:23pm

Ummm...too late?

http://panopticon.com/

by EmmaZahn on Wed, 06/12/2013 - 3:40pm

Drat.

by erica20 on Wed, 06/12/2013 - 4:02pm

In the News

Russia plays weak hand in Palestine
Russia plays weak hand in Palestine

submitted by PeraclesPlease 10 months ago

The intra-Palestinian meeting in Moscow has precedent

Russia's hosted such meetings in the past, most recently Feb 2019

Russia has long lamented the US' "monopolization" of the peace process & tried to carve out a niche for itself: mediating among the disunited Palestinians/2
— Hanna Notte (@HannaNotte) February 29, 2024

Read the article at https://twitter.com/HannaNotte/status/1763116222256701787

Add new comment
Mother Jones is Reaching out to Alternative Party Voters
Mother Jones is Reaching out to Alternative Party Voters

submitted by HSG 10 months ago

Here's what I told them: https://halginsberg.com/vote-for-jill-stein-again/

Read the article at https://halginsberg.com/vote-for-jill-stein-again/

2 comments

Add new comment
Controversial Brazil law curbing Indigenous rights comes...
Controversial Brazil law curbing Indigenous rights comes into force

submitted by artappraiser 1 year ago

Controversial Brazil law curbing Indigenous rights comes into force https://t.co/pCoDg05irX
— Gareth Harris (@garethharr) December 28, 2023

Read the article at https://www.theguardian.com/world/2023/dec/28/brazil-law-indigenous-land-rights-claim-time-marker?CMP=share_btn_tw

1 comment

Add new comment
Security Alert: U.S. Embassy Port-au-Prince, Haiti (...
Security Alert: U.S. Embassy Port-au-Prince, Haiti (December 25, 2023)

submitted by artappraiser 1 year ago

Location: U.S. Embassy and residential compounds

Events: Heavy gunfire is occuring around the area of the U.S. Embassy and residential compounds adjacent to the Trutier area of Tabarre. All Embassy personnel have been instructed to remain indoors and shelter-in-place until further notice. All others should avoid the area.

Actions to take:

Avoid the area;

Avoid demonstrations and any large gatherings of people;

Do not attempt to drive through roadblocks; and

If you encounter a roadblock, turn around and get to a safe area.

Read the article at https://ht.usembassy.gov/security-alert-u-s-embassy-port-au-prince-haiti-december-25-2023/

3 comments

Add new comment
Pakistan Is Creating the World’s Next Refugee Crisis
Pakistan Is Creating the World’s Next Refugee Crisis

submitted by artappraiser 1 year ago

By The Editorial Board @ Bloomberg.com, December 8, 2023

A mass expulsion of Afghan migrants could destabilize the region and fuel radicalization. The West should pressure Islamabad to change course.

Read the article at https://www.bloomberg.com/opinion/articles/2023-12-08/pakistan-s-expulsion-of-afghan-refugees-risks-another-crisis

6 comments

Add new comment
Internet shut down in Chad
Internet shut down in Chad

submitted by artappraiser 1 year ago

All eyes on #Chad right now
Chad has two internet trunks coming into the country: One from the Red Sea via Sudan; the other from Cameroon. Not possible for the totality of the country's internet network to be shut unless done centrally. A lot of rumors swirling; few facts. https://t.co/N6bDJZ2ixO

Read the article at https://x.com/_hudsonc/status/1732891968709992511?s=20

2 comments

Add new comment
[Economy] where Joe's success falls short
[Economy] where Joe's success falls short

submitted by PeraclesPlease 1 year ago

Read the article at

3 comments

Add new comment
[CRIME News] 3 loss-prevention employees in Macy’s across...
[CRIME News] 3 loss-prevention employees in Macy’s across the street from Philadelphia City Hall stabbed

submitted by artappraiser 1 year ago

BREAKING: Three loss prevention employees in Macy’s across the street from Philadelphia City Hall stabbed, one of them has died from stab wounds, @PhillyPolice sources tell me. Police converged on the store as the three workers were rushed to Jefferson Hospital. pic.twitter.com/4U1eKycL4W

Read the article at https://x.com/KeeleyFox29/status/1731713272942940365?s=20

38 comments

Add new comment
Former U.S. Ambassador Arrested, Charged With Working As...
Former U.S. Ambassador Arrested, Charged With Working As Secret Agent For Cuba

submitted by artappraiser 1 year ago

Former US Ambassador Arrested, Charged With Working As Secret Agent For Cuba https://t.co/LDwo4ZJI1K
— HuffPost (@HuffPost) December 4, 2023

Read the article at https://www.huffpost.com/entry/manuel-rocha-ambassador-arrested-cuba_n_656d1e2ae4b028b0f3d07478

Add new comment
[War in Israel, Chapter II] UNRWA under scrutiny
[War in Israel, Chapter II] UNRWA under scrutiny

submitted by artappraiser 1 year ago

[Chapter I news is HERE, Oct. 7 til today]

You don’t get it.
It’s not about an UNRWA teacher who held an Israeli kid hostage in his house.
It’s all about how for 75 years you have destroyed the future of generations of Palestinians, including my family.
My cousins in Arab countries are still not citizens - not even the… https://t.co/nv6anubGhc
— George Deek (@GeorgeDeek) December 2, 2023

Note 'Community Notes' attached to UNWRA's statement.

Read the article at https://x.com/GeorgeDeek/status/1731020551328579863?s=20

58 comments

Add new comment
Maduro, Under Pressure, Holds Vote to Annex Territory From...
Maduro, Under Pressure, Holds Vote to Annex Territory From a Neighbor

submitted by artappraiser 1 year ago

Imperialism for me but not for thee?

It's wild that Venezuela is now holding a vote on whether 2/3 of Guyana actually belongs to them! Analysts suggest that Modoru may want military action to pump up his sinking popularity.

Could we have a war in South America?!?

"The people who live in Essequibo are largely… pic.twitter.com/QvMEjkkgwy

Read the article at https://www.nytimes.com/2023/12/03/world/americas/maduro-vote-essequibo-guyana.html

6 comments

Add new comment
[Where is the GOP headed?] Divisions leave Florida’s House...
[Where is the GOP headed?] Divisions leave Florida’s House Republicans powerless

submitted by artappraiser 1 year ago

The lack of a cohesive delegation has allowed attention-seeking lawmakers to act on their own.

McCarthy: “You have [Rep. Matt] Gaetz, who belongs in jail…”

Gaetz: “Tough words from a guy who sucker punches people in the back. The only assault I committed was against Kevin’s fragile ego.”https://t.co/LctPuz6Pcf

Read the article at https://www.politico.com/news/2023/11/29/divisions-leave-floridas-house-republicans-powerless-00129200

10 comments

Add new comment
‘She got so mad at me’: book on the ‘Squad’ details AOC-...
‘She got so mad at me’: book on the ‘Squad’ details AOC-Pelosi clashes

submitted by artappraiser 1 year ago

By Martinn Pengelly in Washington DC for TheGuardian.com, Nov. 30

Alexandria Ocasio-Cortez tells Ryan Grim life in Congress ‘completely transformed’ after Democratic leader stepped down

Read the article at https://www.theguardian.com/books/2023/nov/30/squad-book-aoc-nancy-pelosi?CMP=share_btn_tw

15 comments

Add new comment
US health insurers Humana, Cigna in talks to merge -source
US health insurers Humana, Cigna in talks to merge -source

submitted by artappraiser 1 year ago

@ Reuters, Nov. 29

Read the article at https://www.reuters.com/markets/deals/us-health-insurers-humana-cigna-talks-merge-wsj-2023-11-29/

3 comments

Add new comment
Central Africa’s dinosaur regimes and the art of coup-...
Central Africa’s dinosaur regimes and the art of coup-proofing

submitted by artappraiser 1 year ago

"Both the AU and the intl community place more weight on whether elections are held than whether they are free and fair. Sanctions/expulsions occur when there is a coup but not necessarily when elections are rigged or if an “institutional coup” occurs." https://t.co/m9dNimJP0D
— Cameron Hudson (@_hudsonc) November 28, 2023

Read the article at https://africanarguments.org/2023/11/central-africas-dinosaur-regimes-and-the-art-of-coup-proofing/

Add new comment

From the Readers

Creative corner

From the Dagbloggers

Hits of the Day