Risks of data-driven decision making

As the world goes more digital and talk of “big data” enters mainstream dialogue nearly all firms are obsessed with data driven thinking – measure, analyze, modify, repeat – a seemingly endless pattern. I love data-driven decisions because the approach offers a democratic way to solve problems, versus purely deferring to those in positions of authority. Indeed, I suspect that much like Codecademy is making programming a mainstream skill, so too will data analysis become a required skill for mostprofessions in the next few years.

House of Lies - more true than we realize

However, there are several risks to data-driven decision making that are important to think about so we as managers, and more generally as the public, can make more informed decisions.

Is the analysis mathematically accurate?

Every statistics class teaches the importance of using intuition when it comes to data analysis because data can be easily misinterpreted. Confounding variables, sampling bias, correlation versus causation, insufficient data to correctly point out what is an outlier and what isn’t, poorly chosen axis or time periods, even measurement error… the list of risks in statistical analysis are many and far more common than most people realize yet people are increasingly conditioned to blindly believe a statement like “x was a statistically significant change due to y.” The very term “statistically significant” just indicates a confidence interval for a change in a data set but it doesn’t mean any or all of those above risks were correctly factored in.

Landmark studies with seemingly solid data-driven conclusions have been overturned years later because the data was misinterpreted by PhDs years earlier. If doctorates who do analysis for a living can make data errors then isn’t is possible that we might as well in our zeal to be data-driven managers?

Even putting aside these seemingly esoteric math issues, the risks of a data-driven world are more prevalent than we realize.

Does the analysis tell the full story?

In many cases, quantitative metrics are blunt instruments and often ignore qualitative factors in the outcomes. There’s an excellent article by Columbia professor Peter Marber on how economic indicators like GDP, unemployment rate and inflation rate are rife with such problems. Two representative examples; first on that all-important metric of Gross Domestic Product:

To any government statistician tallying GDP, $100 spent on textbooks is sadly no more valuable to society than $100 spent on cigarettes. Americans spend more than $80 billion on smoking each year and an estimated $160 billion on the health care costs related to smoking-induced illnesses. Together that’s about 1.5 percent of American GDP—nothing to boast about.

Next Marber tackles the ubiquitous unemployment rate:

United States has posted unemployment rates below 5 percent for the majority of the last 15 years. Nobel-winning economist Michael Spence suggests America’s employment “success” was actually the replacement of some 10 million manufacturing and export-related jobs with low-wage, low skilled service jobs like construction workers, interior decorators, or paint department managers—domestic jobs that cannot be outsourced to lower-cost labor markets. As soon as the economy took a hit in 2008, these were the first to go, because they weren’t central to consumer needs.

These examples show how metrics can sometimes hide important qualitative elements and focus only on quantitative aspects. Sure, supplementary metrics could address the specific problems above but as the list of metrics grows to fully describe a complex system like the economy people will find it harder to understand the interactions between them, i.e. if one part goes up is the other part supposed to go up and is it concerning if it does or does not etc.

Indeed, Apple's Steve Jobs disliked slides, or rather any data-heavy presentation, in meetings because he felt they were a crutch that people relied on which prevented them from thinking. Steve understood that slides and data analysis can sometimes mask what’s really important to the situation at hand.

Is the analysis something that benefits society?

More disturbing than conducting accurate and complete analysis is how data can be used to engineer outcomes for short-term gain.

Take for example a recent analysis of what factors correlate with a political candidate’s chances of winning elections. Height, facial features, voice pitch, stance/gait… the list is detailed and exacting. It’s not a stretch that political strategists are now building up candidates with this and other data to win elections (Mitt Romney?), rather than trying to find candidates that best represent the public’s interest.

Of course, data being used for nefarious purposes should not take away from the tool that is data-driven analysis; after all, every tool can have positive and negative applications. But it should make us pause and think about the consequences of being proponents of measuring everything (there’s a whole aside on the public’s willingness to give up our personal data to various “free” services like search engines, social media sites etc. only to have that data used in ways that we wouldn’t be comfortable with).

Einstein once said “Not everything that counts can be counted, and not everything that can be counted counts.” Seems a fitting counterbalance for our obsession with big data. After all, what statistic could have described the beauty and simplicity that was the iPhone during its initial design?