Several years ago, when I discovered ‘the wonderful world of online dating’, I struggled with some fundamental questions—and no, I am not only referring to the timeless: “What am I actually doing here?”
Questions like: (a) “How long can I swipe (on Tinder), until I run out of potential matches?”, (b) “What is up with everyone indicating their height in their biographies?”, and (c) “Are music, movies, and festivals, really the only three interests people have?”
It were these types of questions, that got me interested in examining people’s online dating behavior—on platforms like Tinder. I wanted to know what people actually write on their profiles.
Around this time, I also discovered the power of data mining and computational social sciences. I figured that if I were able to find a way to scrape Tinder profiles, I could use automatic natural language processing techniques to analyze what people write.
Tinder profiles are freely available online data and can therefore be mined quite easily with a couple lines of code.
I set myself a goal: to collect all Tinder profiles from the Netherlands.
Stage 1: Initial Meeting
Step one was to write a script in the programming language Python.
I started by integrating the Python package called Pynder. This package could call to the Tinder API—in other words: show me a few profiles from potential matches. I added a loop consisting of a couple lines of additional code (seventeen lines to be precise), which together enabled me to collect and save all Tinder profiles in the area.
The loop would (1) call for a batch of profiles, (2) save the profiles, (3) then swipe the profiles left (sorry!), (4) wait several seconds (in order to trick the Tinder API into thinking I was a real person), after which it would (5) repeat the process until there were no more profiles available in the area. I repeated this entire process five times, from different locations in the Netherlands—and would eventually gather about 95% of the country for Tinder profiles.
After about 72 hours of data collection, nearly all of the Tinder profiles in the Netherlands were collected (goal reached!). The end result was a dataset containing a quarter million (!) Tinder profiles—of which, fun fact, only about 60 thousand were from women.
With this dataset I was able to answer my questions.
Stage 2: Curiosity, Interest, and Infatuation
a) Assume that on average, a Tinder user spends about two seconds assessing and swiping a potential match (either to the left or to the right). With about 180 thousand potential male matches to assess, this would take a woman about 100 hours of swiping. Men on the other hand, only have about 34 hours of fun swipe-time at their disposal—until there is no one to swipe left.
But what about the (b) height and (c) interests of Tinder users?
b) It became clear that both men and women report their height rather often. The average reported height however, for both men and women, is about five centimeters greater than the respective average height for men and women in the Netherlands (as reported by Statistics Netherlands). This could imply that tall people, more often than shorter people, report their height on Tinder. Or alternatively, that people lie about their size.
c) A frequency analysis of all words used in the biographies showed that people do seem to have other interests than music, movies, and festivals. These three interests were however—by far—mentioned the most.
Stage 3: Enlightenment
The findings I describe in this post, are by no means groundbreaking, though it is hopefully clear that a dataset has a lot of potential. Therefore, if there is one thing I would like you to take home from this post, it is:
Empower yourself by learning how to code.
It took me a couple of weeks to get familiar enough with Python to collect this data. Of course, no one likes learning curves, though an investment of only a few hours for a couple of weeks, might just enable you to collect insightful data on that one online behavior you are interested in. So what are you waiting for?