James Kwak | Bad Data

By: James Kwak ,; TheBaselineScenario|Op-Ed

Published: February 14, 2011

To make a vast generalization, we live in a society where quantitative data are becoming more and more important. Some of this is because of the vast increase in the availability of data, which is itself largely due to computers. Some is because of the vast increase in the capacity to process data, which is also largely due to computers. Think about Hans Rosling’s TED Talks, or the rise of sabermetrics (the “Moneyball” phenomenon) not only in baseball but in many other sports, or the importance of standardized testing scores in K-12 education, or Karl Rove’s usage of data mining to identify likely supporters, or the FiveThirtyEight revolution in electoral forecasting, or the quantification of the financial markets, or zillions of other examples. I believe one of my professors has written a book about this phenomenon.

But this comes with a problem. The problem is that we do not currently collect and scrub good enough data to support this recent fascination with numbers, and on top of that our brains are not wired to understand data. And if you have a lot riding on bad data that is poorly understood, then people will distort the data or find other ways to game the system to their advantage.

Readers of this blog will all be familiar with the phenomenon of rating subprime mortgage-backed securities and their structured offspring using data exclusively from a period of rising house prices — because those were the only data that were available. But the same issue crops up in many different stories covering different aspects of society.

CompStat, an approach to policing that focuses on tracking detailed crime metrics, was widely credited with helping New York and other cities reduce crime in the 1990s. Last year, This American Life ran a story, based on a police officer’s secret recordings, detailing how in at least one precinct officers were pressured to boost their numbers through dubious arrests and citations. They also found another precinct where serious crimes were reported as less serious crimes in order to make their numbers look better than they really were.

In a recent New York Times story, David Segal describes how law schools massage their metrics to score higher in the US News and World Report rankings. Segal focuses on the tricks that some schools seem to use to boost the number of graduates employed nine months after graduation; for example, some schools apparently hire their own graduates to temporary positions that happen to span the date on which employment rates are measured. The rankings are based on statistics that are defined by the American Bar Association but are self-reported by the schools and not audited by anyone.

At a time when it’s often tough to tell the difference between the corporate news and its advertisements, it’s essential to keep independent journalism strong. Support Truthout today by clicking here.

The big, well-known example of how the importance of data breeds data manipulation is standardized testing. In the early days of the standardized testing boom, the key statistic was the percentage of students at or above grade level, defined as the fiftieth percentile on some standardized test. (For those wondering if this is circular, the scaled score required to be at the fiftieth percentile is set before the test based on the attributes of the questions included in the test; it is not set after the test based on students’ actual performances.) So one obvious tactic would be to focus on students in roughly the thirtieth to sixtieth percentiles while ignoring the others. Another, more problematic tactic would be to classify as many low-performing students as possible into special education so that they would not be in the denominator. (Then there is blatant cheating, like giving your students more time to take the test or simply correcting their answers afterward — Freakonomics has a chapter on this – since few if any school districts have the capacity or the motivation to oversee the tests rigorously.) Even leaving aside data manipulation issues, there is also the basic problem that test difficulty varies from year to year. The test in year N + 1 is calibrated to be the same difficulty as the test in year N, but this is all based on statistics, and there is this thing called random variation to deal with.

And I recently read Natalie Obiko Pearson’s story in Bloomberg on the problems with greenhouse gas emissions data. Most of the numbers we read are self-reported by countries and the companies in those countries, and even if they are honest (a big if) they are “bottom up” estimates — based on how much fossil fuel is being consumed. But when scientists actually measure changes in greenhouse gases in the atmosphere, they get different results than predicted by the bottom-up estimates. And in all the examples cited in Bloomberg, actual atmospheric measurements are higher than bottom-up estimates. This could be because the article didn’t mention atmospheric measurements that were lower than predicted by official data. But it could also be because both the companies burning the fossil fuels and the countries aggregating the data have the same incentive to underreport: companies because it means they don’t have to buy as many carbon permits and countries because it means they can claim to be under their Kyoto Protocol targets.

Greenhouse gases are a good example of how we think data will help save us — if we can track how much carbon dioxide each company is producing, we can make it pay for that carbon — but we may just not have good enough data. In general, I think the current trend toward using more and more data is a good thing. I mean, what’s the alternative: gut intuition? But this only increases the importance of having good data to begin with. And when some parties benefit from bad data, this can be a big challenge with no easy solution.

Join us in defending the truth before it’s too late

The future of independent journalism is uncertain, and the consequences of losing it are too grave to ignore. To ensure Truthout remains safe, strong, and free, we need to raise $43,000 in the next 6 days. Every dollar raised goes directly toward the costs of producing news you can trust.

Please give what you can — because by supporting us with a tax-deductible donation, you’re not just preserving a source of news, you’re helping to safeguard what’s left of our democracy.

Latest Stories

Op-Ed

Education & Youth

Gaza Solidarity Encampment Regroups Following Arrest of 108 Students at Columbia

The protesters, who are demanding that Columbia divest from Israel, have relocated and rebuilt their encampment.

By: Emily Janakiram ,; Truthout

News

Environment & Health

Polluters Must Foot the Bill for Cleaning Up “Forever Chemicals” Under New Rule

Health advocates celebrated the news and some called for further action to limit exposure to the pervasive chemicals.

By: Shannon Kelleher ,; TheNewLede

News

Reproductive Rights

Organizers in Idaho Are Readying for an Abortion Rights Ballot Initiative

Polling in Idaho shows that an abortion rights initiative could potentially be successful.

By: Chris Walker ,; Truthout

News

LGBTQ Rights

Meet the Teenager Challenging Iowa’s Anti-LGBTQ Legislation

Iowa teen Puck Carlson is pushing back against Iowa legislators’ attack on LGBTQ youth through activism and a lawsuit.

By: Laura Crossett ,; Truthout

News

Politics & Elections

US Vetoes Palestine’s Bid to Become a Full Member of the UN

The resolution's failure "will not defeat our determination," said Palestine's permanent observer at the United Nations.

By: Jessica Corbett ,; CommonDreams

News

Human Rights

Survey: Palestinians and Their Allies Are Experiencing Racism, Isolation

The survey showcases how "anti-Palestinian racism affects many more people than previously expected," the authors said.

By: Chris Walker ,; Truthout

News

Education & Youth

Biden Administration’s New Title IX Rules Reverse Trump-Era Changes

Advocates for sexual violence survivors and the LGBTQ community have long called for updates to the federal regulations.

By: Nadra Nittle ,; The19th

Interview

Human Rights

Israeli Police Arrest Renowned Feminist Scholar Nadera Shalhoub-Kevorkian

More than 100 professors around the world released a statement calling for her immediate release.

By: Amy Goodman ,; DemocracyNow!

News

Human Rights

Columbia Suspends Ilhan Omar’s Daughter Day After Omar Grills Its President

Omar had grilled Shafik about her characterization of pro-Palestine protests as antisemitic.

By: Sharon Zhang ,; Truthout

News

LGBTQ Rights

Transgender Teen in WV Can Play on Track Team, Federal Appeals Court Says

“This is a massive victory for trans rights,” LGBTQ legislative researcher Allison Chapman told Truthout.

By: Zane McNeill ,; Truthout

Menu

Join us in defending the truth before it’s too late