A new software tool developed at Columbia University is providing valuable insights into how some very popular websites make use of the sensitive data they collect from their users. The software could help sniff out potential abuses from advertisers and contribute in making the usage of sensitive data a lot more transparent to the end user.

It is no secret that many of the most popular websites and online services actively track and store sensitive information from their users. The data they collect can includes location, emails and search histories, which companies then attempt to monetize to the best of their ability – for instance, by producing better targeted ads, video suggestions and product recommendations.

Some of these services can be very useful, as they improve the user experience; however, it is very difficult to tell exactly how this sensitive data is being used, and that is a problem. As web services keep aggressively collecting more and more personal data to profile their users and maximize profits, it is important to make sure that this information is used in an ethical way, preventing abuses or morally questionable business practices (such as in the case of credit companies reportedly adjusting loan offers based on users' Facebook activity).

Columbia University researchers Roxana Geambasu, Augustin Chaintreau and Mathias Lecuyer have developed XRay, a software tool that aims to address this issue and bring more transparency to the web.

Their system works by tracking how the user's behavior influences "user targeting" including personalized advertisements, product recommendations and video suggestions. It then uses a probabilistic mathematical model to correlate the inputs and the outputs – the user behavior and the targeting from the website – to give users a good sense of how their personal data is being used. According to the researchers, their system has been able to predict user targeting with an accuracy of 80 to 90 percent.

In its current iteration, XRay can analyze data from Google Gmail, Amazon and YouTube. However, because of its highly flexible black box approach, Geambasu and colleagues say it could be easily adapted to new websites, even tracking data across multiple services.

The scientists created a set of emails with keywords, some of which included sensitive information, and then used XRay to examine what ads would appear to specifically target those messages.

The analysis that followed concluded that it is indeed possible for advertisers to target sensitive topics in users' inboxes, particularly with respect to health issues – including cancer, depression and pregnancy. The scientists also discovered actual examples of such abuses, such as advertisements that targeted the topic of debt on users' inboxes in order to advertise subprime loans for second-hand cars.

XRay is still in its early stages of development, but the researchers hope that releasing the software under an open-source license, as they have done, will help the development of a new generation of software tools that can ultimately help make the web a lot more transparent.

A live online demo of the software helps users better understand the ad targeting specifically for Gmail.