VPN won't save you: how personal data is collected through SuperCookies





Thomas Dunning said: "At 300 percent [profit] there is no crime that he [capital] would not risk, even if only on pain of the gallows." These words, spoken in the 19th century, are still relevant today. Companies that do business on the Internet are inventing ever more sophisticated ways to spy on users. 



The history of cookies has gone through several scandals related to privacy violations, the work of browsers with them gradually changed and, it would seem, finally entered a civilized channel. We have learned to protect our data and it has become relatively safe to eat cookies.



But then Supercookies came along - a pretty sticky thing that literally can't be bypassed. 



Actually Supercookiesnot some specific technology, but a general name for various means of collecting and storing private information, which work covertly bypassing known restrictions. Let's see what supercookies are, how exactly they collect our data and how to protect ourselves from them.



How developer naivety created the foundations for collecting personal data 



The idea of ​​storing site data locally arose at the very dawn of the Internet. Initially, cookie technologies pursued extremely good goals, but, like most protocols and technologies, they were created by idealists who did not really care about security, who simply could not imagine the real scale of the network's development in the future. 



These problems began when computers were large and programmers were somewhat naive. For example, one of the vulnerabilities of cellular networks arose for the same reason of naivety: Fear and Terror of SS7



Also with cookies. The idea of ​​remembering the products that the user has added to the cart, not forcing him to enter a password every time they visit the page, remember the appearance of the site, and so on, is great. What could go wrong?



Statistics are a tidbit for advertising. Information about users, their behavior on sites, preferences in the choice of goods, time spent on the page. Businessmen are ready to do anything for the sake of it, and the advertising strategy is built according to it. 



The first alarm was raised in 1996. Just a couple of years after the cookie was introduced, the Financial Times published an article in its newspaper about the threat to privacy:





Investigations conducted in 1996 and 1997 by the US Federal Trade Commission resulted in a cookie specification. One of its provisions was that third-party cookies should either be completely blocked, or at least not work by default.



 In addition to the ability to secretly monitor user actions, the first versions had other disadvantages. For example, they could be intercepted and replaced, and then used to enter the site with the password of another user. In the course of the evolution of working with cookies, several directives have been issued that tighten the policy on the use of cookies, for example: limiting their duration. 



Subsequently, cookies were equated to personal data and gradually tightened the requirements for collecting information about users, up to the most ridiculous and annoying decision: a pop-up notification that cookies are used on the site and an offer to agree with this fact. It got everyone so bad that some browsers included the ability to remove these banners in their settings, and several extensions were written to block annoying warnings.



How Flash-Supercookie Collected Personal Data



Supercookies were initially unpleasant and potentially dangerous. Unlike regular cookies, their source was not from a specific site address, but from a higher-level domain. For example, instead of binding to the site habr.com, the cookie was assigned to the com domain name and could control the user's work with any sites in this domain zone. This possibility was so obvious that it was blocked by browsers from the very beginning. But, when there were ways to store private information in other ways, this name was remembered and it stuck.



For a very long time, the ability to provide sophisticated interaction on an Internet site (showing videos, animated banners and browser games) was almost exclusively possible with the help of Flash technology. The notorious Adobe Flash Player heavily loaded the processor, did not allow to properly catch the errors that occurred, which caused the browser to crash and slow down. In addition, the engine contained many vulnerabilities that were mercilessly exploited by attackers of all stripes. 



Since commercial companies are not much different from scoundrels in the choice of means, they did not hesitate to use the capabilities of flash to collect information about users. For this, a technology called " Local Shared Objects"(LSO, Local Shared Objects). It was originally intended, for example, to save progress in a browser flash game or adjust the volume in an audio player. LSOs ​​are available from different browsers because they refer to a flash player. With their help, you can restore regular http-cookies if the user has deleted them, and inside the LSOs themselves, you can store a lot of collected information about the computer user. 



For a long time this technology did not come to the attention of security professionals, but in 2009 Jeremy Kirk published his study on privacy issues: Study: Adobe Flash cookies pose vexing privacy questions... Gradually, third-party extensions began to appear that allow you to control and delete Flash cookies, but clumsy browser manufacturers and Adobe were in no hurry to pay attention to this problem. It wasn't until 2011 that mainstream browsers learned to work with these cookies the same way they do with regular ones. 



But in the end, the leaky and slow-moving Flash-player finally got everyone, and HTML5 became widespread enough, which made it possible to completely get rid of the Flash technology. They killed him for a long time and painfully, the manufacturing company very slowly abandoned its brainchild. In 2012, Adobe promised that it would end support for the technology within about a decade. In 2017, the deadline for removing the flash player from the site was announced - December 2020. Those three years, according to the company, were necessary for developers to adapt their sites to HTML5. A month ago, this period came to an end, and annoying warnings that the plugin, finally and irrevocably, is being removed from everywhere, began to appear in all browsers, although it is not clear why users should be notified about this.who are not particularly interested in the intricacies of the internal structure of sites.



On this, the topic of flash cookies can be considered completely exhausted.



Super-cookies based on ETag , one of the identifiers of the HTTP header that responds to the request, whether the current version of the resource differs from the loaded one, worked approximately the same way . Such cookies were discovered around the same time as Flash-Supercookie and, after a lawsuit in 2011, were relatively rare. 



HTTP Supercookie: How Verizon and Access Sell Data on the Sly



Flash isn't the only way to spy on users through cookies. 



Both of these stories of data collection and trading were possible due to the fact that ordinary users did not really understand the intricacies of technology. Security experts diligently sounded the alarm, but they were not listened to and for a very long time little attention was paid to the privacy of the HTTP connection. 



The HTTP protocol without SSL encryption has lived for a very long time: SSL certificates were paid, sometimes not cheap at all (some companies sold them for more than $ 100). The second reason is the complexity of use. This certificate is now installed and updated by running one script, and before Mozilla launched its Let's Encrypt initiative , not every admin thought it necessary to learn how to install SSL.



But in the meantime, ISPs were exploiting and selling the ability to track users over HTTP to advertisers. 



It worked in a rather sophisticated way. When a user visited the site, his provider inserted special information into the HTTP header: UIDH (Unique Identifier Headers), unique for each user, which made it possible to completely uniquely identify the computer or smartphone from which the page was opened. For such an operation, the sensational DPI technology was used



The problem was that the user has practically no influence on this process, because everything happens on the side of the Internet access service provider. The ID is embedded after the request leaves the browser on the way to the site. 



The information from this super cookie is not stored locally and therefore cannot be deleted. Ad blockers can't do anything either. In addition to the address of the site to which the browser goes and the time of the request, UIDH can transmit information about the mobile phone number from which the user is surfing the Internet, the time of the request and other data.



The most famous scandal involving this method of tracking users involves Verizon, a US cellular provider. Verizon began using UIDH in 2012 to serve personalized ads, actively trading in the personal data of its customers. It was only in 2014 that the company publicly admitted this fact, burying the mention of it deep in Q&A on its website. Nevertheless, this was noticed, and a flurry of criticism fell on Verizon for such a shameless attitude towards its users. In 2015, the company was forced to add a setting to the user's personal account to disable the use of UIDH for its devices, and in 2016 it was finally finished off. The FCC has fined the company $ 1.35 million. 



Alas, Verizon isn't alone. 



The company Access has created a special website Amibeingtracked.com (current address: www.accessnow.org/aibt/ ) and began to analyze the HTTP headers of mobile phone users who have agreed to testing. It turned out that 15.3% of the requests contained super cookies. Users from all over the world took part in the participation. It turned out that almost all major mobile operators followed their users this way. 



There is a suspicion that Verizon backed down only because this method has ceased to be relevant due to the widespread adoption of SSL, which I mentioned above. And the amount does not seem impressive compared to recent antitrust fines, where the bill goes into hundreds of millions of dollars. It is likely that Verizon made much more money from PD sales. 



In addition, it should be noted that not only sites operating over the HTTPS protocol are able to protect against such surveillance, but also surfing through a VPN. In this case, the provider also cannot substitute UIDH. Both methods of protection are relevant, and recent carpet locks have taught many computer literacy, and a lot of people have learned about VPN.



HSTS-Supercookie: Why SSL Won't Save Us



It would seem that encryption should provide protection against eavesdropping, but it turned out that this is not a guarantee of privacy. The next method is purely academic, hardly practical, and is a demonstration of the sophistication of Sam Greenhalgh of Radical Research, who demonstrated this mechanism in 2015.



It is based on the fact that for each site in the browser there is a special boolean variable that stores the state of how the user entered the site: via HTTPS or HTTP. For example, the last time a user visited the site habr.com via HTTPS, and the site flibusta.is - via HTTP (alas, this is now relevant due to the sluggishness of the library admins). The browser will have data something like this:



habr.com: 1;
      
      





And flibusta.is will not be mentioned in the HSTS database. 



Thus, you can register several domains of the form: 00-hsts-supercookie.net, 01-hsts-supercookie.net, 02-hsts-supercookie.net, 03-hsts-supercookie.net. 



Then write a script that, when entering your site, will generate calls to "cookies" according to a template that is unique for each user, forming a table with values ​​of the form in the HSTS database of his browser:



00-hsts-supercookie.net: 1;

02-hsts-supercookie.net: 1;
      
      





And then read data for "cookies", substituting 0 for sites that are not in the database. In this example, the number 1010 will be formed. If you register two or three dozen sites, then unique identifiers will be enough to provide them in general to all subscribers who, theoretically, can enter the site. 



To be fair, it should be noted that the browser developers reacted to this information, and the cookies are now cleared along with the data in this table. But besides this flag, modern browsers store a lot of other information, which will be discussed below. 



HTML5-Supercookie



Progress is moving forward, HTML5 has confidently conquered the Internet, Flash has been killed, the capabilities of the new standard allow you to create miracles on pages that were previously inaccessible. But has the Internet become safer? Unfortunately no.



All of these technologies provide a lot of information through which a unique "digital fingerprint" of the browser can be formed. 



If you visit modern sites, then you use the capabilities of HTML5, which means that information about: user-agent, canvas size, screen resolution and color depth, system fonts and much more will be transmitted to the site in order to adequately render it in the browser ... In addition, the HTML5 standard allows you to save data in Localstorage, a special storage that is not available to the user through the usual menu for clearing cookies, visiting history or browser cache.



You can see everything that the browser sends to the site at www.deviceinfo.me . Try to enter - the amount of information is impressive! 



There are also sites that compare all the information transmitted and calculate how unique your particular device is among many others. For example, I went to coveryourtracks.eff.org , the site promoted by a non-profit human rights organization Foundation Electronic Frontier (Electronic Frontier Foundation, EFF) and found that:



Your Results



Your browser fingerprint appears to be unique among the 300,802 tested in the past 45 days.



Currently, we estimate that your browser has a fingerprint that conveys at least 18.2 bits of identifying information.



The measurements we used to obtain this result are listed below. You can read more about our methodology, statistical results, and some defenses against fingerprinting here.


In theory, it is believed that you can slightly reduce the uniqueness of the browser by using a proxy or VPN and private browser mode. Alas, this did not help me much, the results obtained in the private tab through the VPN built into Opera gave almost the same results: 



Your Results



Your browser fingerprint appears to be unique among the 300,854 tested in the past 45 days.



Currently, we estimate that your browser has a fingerprint that conveys at least 18.2 bits of identifying information.



The measurements we used to obtain this result are listed below. You can read more about our methodology, statistical results, and some defenses against fingerprinting here.


Although one device in three hundred thousand is in the sea, where hundreds of millions of smartphones and desktops float, there is a place to hide.



Cache-Supercookie: A sophisticated way to authenticate a user through a cache



Another way to spy on a user is the sophisticated use of cached information. Some sites use the same images on their pages, browsers save disk space and bandwidth by caching data. An image or a font is downloaded once, and then it is loaded from the local storage. The technology is as old as the Internet, but even then there was a way to distinguish one user from another.



For example, a tracker encodes a unique ID into a cached image from a single site. The other uses the same image, and the tracker extracts the ID from the cached image when the user visits the second site.



You can fight such super-cookies by separating caches for different sites. A week ago, the Firefox team reportedthat they included this mechanism in the 85 version of the browser. The amount of cached data increases, but it becomes more difficult to track the user.



What kind of cookies will we be fed in the future?



All this struggle seems to be right, privacy is something that in the modern world they are trying to tear away from us by any means. Cameras connected to face and license plate recognition services have long become a reality, they catch criminals and issue speeding tickets. DPI for mobile operators shapes the traffic of the few remaining torrent clients and reduces their bandwidth so that they do not interfere with others watching YouTube videos. Blocked "prohibited" sites.



I was recently told a story about how in a small Russian town, with the help of surveillance cameras, they caught a group of gopniks who had fun by beating lonely passers-by, and then intimidated them so that they would not tell the police. When the patrolmen drove to the scene, the gopniks stood quietly, not worrying about their safety, but the police officers who got out of the cars did not even ask questions, they simply twisted all those involved. Because the cameras that filmed the beating recognized them all by their faces, and no additional testimony was needed.



Therefore, is it so scary for advertising companies to track our actions on the Internet?



Hard to tell. It's not just about surveillance, but annoying "personalized" trading. On the one hand, in my memory, I only once used an advertising offer from the search results, which was marked "Advertising". Over the years, people have developed "banner blindness", and even without an ad blocker, this method does not work very well. On the other hand, high search relevance on Google, Amazon or AliExpress works thanks to clever trackers that track our activity.



But almost everyone can remember a mysterious story, when in a real conversation or a Telegram chat they mentioned a category of goods, and after a few minutes the smartphone showed sites with banners of the things discussed. And the most annoying innovation is the crawling out warnings about the use of cookies that do not disappear until you press the button to agree with them. 



Is it good or bad? Probably neither one nor the other. It has become commonplace, and privacy is an illusion.






All Articles