Omics

Large-scale data the end of scientific method?

Chris Anderson, editor in chief of Wired Magazine, recently wrote an interesting article entitled “The End of Theory: The Data Deluge Makes the Scientific Method Obsolete” and subsequently took some serious criticism in the comments section of the online article and in further counter articles for proclaiming basically that the scientific method is dead. In one of his concluding statements he writes,

“The new availability of huge amounts of data, along with the statistical tools to crunch these numbers, offers a whole new way of understanding the world. Correlation supersedes causation, and science can advance even without coherent models, unified theories, or really any mechanistic explanation at all.”

While it is no doubt true that the availability of the types of large-scale data that we have today make de novo pattern-finding much more possible and powerful, that doesn’t negate the power of time-tested hypothesis testing. Instead, I would contend that this isn’t an either / or scenario. Both processes, both ways of doing things are powerful and useful, and in fact it’s when you use the two together in an iterative cycle that you can gain the greatest value from and the deepest insight into your data.

For example, in much of the work that we’ve done, it is in the process of examining data for the sake of testing a hypothesis that we see a novel pattern or correlation which offers new insight. The follow up to that insight is then critical and is rightly so, a process of hypothesis testing.

There is another aspect to this all that requires consideration as well. At some point, the massive amounts of data, and all of the analyses conducted on them need to yield knowledge. That to me is possibly the greatest hurdle in this brave new data filled world we’re in.

Chris’s article contains a number of examples, each of which contains a snapshot of gorgeous visualizations rendered for the sake of conveying the results derived from the various analyses run. That’s because in many ways, at the very core of all of this, visualization is simply one of the best if not the best bridge we have between data and knowledge, between a sea of information and something that we can act on, whether it is to design a novel therapeutic, buy or sell stock or detect an influenza outbreak.