The Art of Data Science A Guide for Anyone Who Works with Data
Data analysis is hard, and part of the problem is that few people can explain how to do it. It’s not that there aren’t any people doing data analysis on a regular basis. It’s that the people who are really good at it have yet to enlighten us about the thought process that goes on in their heads.
Imagine you were to ask a songwriter how she writes her songs. There are many tools upon which she can draw. We have a general understanding of how a good song should be structured: how long it should be, how many verses, maybe there’s a verse followed by a chorus, etc. In other words, here’s an abstract framework for songs in general. Similarly, we have music theory that tells us that certain combinations of notes and chords work well together and other combinations don’t sound good. As good as these tools might be, ultimately, knowledge of song structure and music theory alone doesn’t make for a good song. Something else is needed.
Science is knowledge which we understand so well that we can teach it to a computer. Everything else is art.
Describing data analysis presents a difficult conundrum. On the one hand, developing a useful framework involves characterizing the elements of a data analysis using abstract language in order to find the commonalities across different kinds of analyses. Sometimes, this language is the language of mathematics. On the other hand, it is often the very details of an analysis that makes each one so difficult and yet interesting. How can one effectively generalize across many different data analyses, each of which has important unique aspects?
What we have set out to do in this book is to write down the process of data analysis. What we describe is not a specific “formula” for data analysis—something like “apply this method and then run that test”— but rather is a general process that can be applied in a variety of situations. Through our extensive experience both managing data analysts and conducting our own data analyses, we have carefully observed what produces coherent results and what fails to produce useful insights into data. Our goal is to write down what we have learned in the hopes that others may find it useful.