This year, I embarked on the half-exciting /half-terrifying task of conquering the theories and practices of Big Data analysis. For some, the thought processes involved in writing code may be intuitive and come as second nature. If you are one of those lucky folk, keep moving along, this is not the blog you’re looking for. For those like me, that are coming from the comforts of the SPSS world, here is what you may experience in your first attempts at Big Data research:
1. Denial – “Surely, it can’t be that hard.”
In this initial stage, sometime after the first lesson/workshop, you may tell yourself that there has been some sort of mistake and cling to a false reality. ‘If I simply re-read those introductory chapters,’ you’ll tell yourself, ‘I will surely find that crucial, overlooked paragraph that will shine a light on what all this means.’ But computer languages are no different from any other language – if you’ve never been exposed to it before, it’s going to come across as pretty unintelligible. And, just as with learning any other language, there’s no hidden paragraph in your textbook that will suddenly make you, say, a fluent Bosnian speaker. There’s only patience and practice and then some more patience and some more practice.
2. Anger – “Look it up?! I don’t even know what to Google!”
Once you have recognized that there was no mistake and denial of your incompetence cannot continue, you may become frustrated, especially at proximate individuals*. The learning curve here is incredibly steep. With every new function there are more things you can do, and with that comes the frustration of wanting to do more but not yet knowing how. The trick is to channel your anger into playing around with the things you do know. Tinker with different datasets, explore the new expressions you have learned and see if you can casually work the phrase “parsing HTML pages” into your next conversation (sample dialogue: ‘Perhaps you could do the cooking this evening, honey, I’m busy parsing HTML pages’).
3. Bargaining – “I’d give anything to understand why my code doesn’t work.”
Even when your code does work, it can be so puzzling that you’d agree to almost anything the Devil had on offer in exchange for understanding why it did so. This is possibly the longest and trickiest stage to cope with. It’s a result of finally feeling that you have a grasp on things but also realizing just how much there is still to learn. It’s a bit like knowing those essential four chords on the guitar that enable you to play any pop song of the last decade, earning you all the cheers at the family picnic – and then trying your hand at writing a symphony. You might go from feeling like you’re about to change all of your projects to include some sort of automated content analysis as you spend a day merrily coding away, only to discover that none of it works because you’ve probably missed a hyphen somewhere. Even though fluency in these languages no longer seems the impossible dream it once was, it’s important to remember that, as with other languages, even the smallest error can have a drastic impact (Just think of the havoc something as small as a comma could play with the sentence ‘Let’s eat, Grandma!’). Sometimes there is no other way to detect these errors than to have a second pair of eyes go over them or to step away from and try again once you’ve given your brain a breather. But remember: the worst is now over. Also remember: patience.
4. Depression – “This is never going to work, why even bother?”
Some may transition from bargaining to a short-lived period of gloom. During this fourth stage, you might despair at the lack of recognition for your suffering and trade thoughts of improvement for dreams of the day when you can uninstall the virtual box from your computer and forget this ever happened. If you find yourself in this spot, rest assured that this is a) brief, and b) good! It indicates a commitment to not just understanding but also mastering the tools you’ve been handed. Just remember: you’re getting better not just at using those tools, but also at troubleshooting. You’ll probably have a deadline or project to force you onwards, and when you finally figure out your mistake (without Googling!), amend your code and lay eyes on a perfectly structured output, that means you’re just about to hit stage 5.
5. Acceptance – “This is so cool!”
In this last stage, you get to embrace the future: you can now (kind of) conduct Big Data analyses! You could, if you wanted to, predict the sharing of online news articles, use Twitter data to assess the public opinion of a TV debate as it happens, analyze patterns of media consumption and political targeting – and that’s just in the realm of political communication. You find yourself looking at your other projects and thinking: ‘It’s pretty good, but it still needs a pinch of automated content analysis’. This stage is a return to calm and stable emotions, a boost to your research ego and a gateway to exciting new research opportunities of which more and more social scientists are taking advantage.
*I’d like to acknowledge my co-workers, Anna, Linda, Lisanne, Robin, Tom and especially Susan, with whom I share an office and whose patience is unwavering.