Be Prepared for a Surprise if You Thought Anonymous Browsing Protected You

You may have read the phrase that ‘we may collect data anonymously from time to time’ on some generous websites. Others don’t even have the courtesy to convey this much to the user.

What if you were told that someone with an intent to uncover all your browsing data could do just that with careful analysis and a bit of patience. Simply put, they can hack and discover your anonymous browsing habits.

Two German researchers were able to dig up the browsing history of a Judge and German Member of Parliament, along with the browsing habits of three million other unsuspecting German citizens.

So how did they do it? Should we all be worried about our ‘private’ browsing histories?

The Power of Raw Data

It may sound like a complicated hacking feat to find someone’s browsing history. In reality it is much simpler.

A journalist, Svea Eckert, teamed up with a data expert, Andreas Dewes, to see for himself how private our browsing activities are. The duo were able to obtain a database containing 3 billion URLs from 9 million different websites over the course of their 30 day observation period.

How they obtained the data is quite brilliant and interesting.

At first they thought about taking the direct approach by buying the data from various third party sources. However, they decided to take an indirect approach which actually led them to obtain the data for free.

The two individuals set up a fake marketing company with all the standard marketing jargon and attractive pictures and animations. Next, they approached data brokers and asked them to invest in their marketing program by providing them with raw data. They would feed the data to their ‘machine learning algorithm’ and use it for targeted advertisement.

Eventually a data broker agreed to provide them with ‘anonymous data’ which they later ‘de-anonymized’ rather easily.

The ‘De-Anonymization’ Process

There are two major methods behind this process. The first one is the simpler one. Some users can simply be identified by their usernames on certain sites like Twitter or Facebook.

Whenever a user visits their own Analytics page on a site like Twitter, the URL for the page has the username present in it only visible to them. But if the URL is somehow obtained (in this case via the data broker) the username is right there within the URL for the taking. This username can then be used to identify the person and further link the remaining URLs visited by the same individual, thus acquiring their complete browsing history.

The second is a more probabilistic approach. It works by connecting unique ‘routines’ of individuals to publicly available information. For example, each individual will have a unique Facebook ID, bank account, Google account etc. This can be used to create unique identifiers and correlate it with information about the user from public sources like Youtube playlists, Soundcloud lists and other shared content publicly available.

The last piece of the puzzle is, the app/plugins behind the collection of this data. According to Dewes, the prime offender was a tool for safe surfing called Web of Trust.

The irony on this one is really high folks. A fun fact is that the developers of the plugin changed the privacy policy after the above facts were presented. They now accept the fact that they sell user data but it remains ‘anonymous’ which clearly is not the case.

      • If you know the dynamics how this and the grabs works, then @AbdulWAHABAlam:disqus has pointed out very correctly. The traces left over on such web pages using these tools that have tracking and capture codes, populate their database with your sensitive credentials such as email and sell them out to the marketing agencies. That is why you often receive such inbox spams etc. In other cases, your account passwords are even prone to be spoofed.

  • close