You may have read the phrase that ‘we may collect data anonymously from time to time’ on some generous websites. Others don’t even have the courtesy to convey this much to the user.
What if you were told that someone with an intent to uncover all your browsing data could do just that with careful analysis and a bit of patience. Simply put, they can hack and discover your anonymous browsing habits.
Two German researchers were able to dig up the browsing history of a Judge and German Member of Parliament, along with the browsing habits of three million other unsuspecting German citizens.
So how did they do it? Should we all be worried about our ‘private’ browsing histories?
The Power of Raw Data
It may sound like a complicated hacking feat to find someone’s browsing history. In reality it is much simpler.
A journalist, Svea Eckert, teamed up with a data expert, Andreas Dewes, to see for himself how private our browsing activities are. The duo were able to obtain a database containing 3 billion URLs from 9 million different websites over the course of their 30 day observation period.
How they obtained the data is quite brilliant and interesting.
At first they thought about taking the direct approach by buying the data from various third party sources. However, they decided to take an indirect approach which actually led them to obtain the data for free.
The two individuals set up a fake marketing company with all the standard marketing jargon and attractive pictures and animations. Next, they approached data brokers and asked them to invest in their marketing program by providing them with raw data. They would feed the data to their ‘machine learning algorithm’ and use it for targeted advertisement.
Eventually a data broker agreed to provide them with ‘anonymous data’ which they later ‘de-anonymized’ rather easily.
The ‘De-Anonymization’ Process
There are two major methods behind this process. The first one is the simpler one. Some users can simply be identified by their usernames on certain sites like Twitter or Facebook.
Whenever a user visits their own Analytics page on a site like Twitter, the URL for the page has the username present in it only visible to them. But if the URL is somehow obtained (in this case via the data broker) the username is right there within the URL for the taking. This username can then be used to identify the person and further link the remaining URLs visited by the same individual, thus acquiring their complete browsing history.
The second is a more probabilistic approach. It works by connecting unique ‘routines’ of individuals to publicly available information. For example, each individual will have a unique Facebook ID, bank account, Google account etc. This can be used to create unique identifiers and correlate it with information about the user from public sources like Youtube playlists, Soundcloud lists and other shared content publicly available.
The last piece of the puzzle is, the app/plugins behind the collection of this data. According to Dewes, the prime offender was a tool for safe surfing called Web of Trust.