Blacklight - Website Privacy Inspector





Blacklight is a real-time website privacy inspector.



This tool emulates ways of observing a user browsing the web. Users enter the desired URL into Blacklight, the inspector navigates to the website, scans for known types of privacy violations, and immediately returns a privacy analysis of the surveyed site.



The way Blacklight works is to visit each website with a headless browser (browser without a graphical interface) running specialized software created by The Markup. The software monitors which scripts on the site can potentially observe the user by running seven tests, each examining a different known observation method.



Blacklight monitors the following types of surveillance:



  • Third-party cookies
  • Advertising trackers
  • Keyloggers
  • Session recording
  • Fingerprinting on Canvas
  • Facebook Tracking
  • "Remarketing Audiences" Google Analytics


More details about them and their limitations are described below.



Blacklight is built on top of the NodeJS Javascript environment, the Puppeteer Node library , which provides a high level of control over the Chromium browser (open-source Chrome). When a user enters a URL into Blacklight, the tool launches a headless browser with a new profile and visits the site's home page, as well as a randomly selected page deeper within the same website.



Who is spying on you while you work, study or surf the Internet?



While the browser visits the website, it runs specialized software in the background that monitors scripts and network requests to understand when and how user data is collected. To monitor scripts, Blacklight modifies various properties of the browser Window API that can be used for fingerprinting. This allows Blacklight to keep track of which script made a call to a specific function using the Stacktrace-js package . Network requests are collected using the monitoring tool contained in the Puppeteer API .



Blacklight uses script data and network requests to run the seven tests listed above. After that, it closes the browser and generates a report for the user.



It records a list of all URLs that the searched website has requested. In addition, it creates a list of all requested domains and subdomains. A publicly available tool does not save these lists unless the user chooses to share the results with us using the appropriate option.



We define domain names using the Public Suffix + 1 method . By the concept of own domain (first-party domain), we mean any domain corresponding to the visited website, including subdomains. By third-party we mean any domain that does not correspond to the website you are visiting. The tool compares a list of third-party domains from website requests to the Tracker Radar datasetDuckDuckGo website.



This data fusion allows Blacklight to add the following information about third-party domains found on the site under investigation:



  1. Domain owner name.
  2. Categories assigned by DuckDuckGo to each domain, describing its observable objectives and intent.


This additional information about third-party domains is provided to users as context for Blacklight test results. Among other things, this information is used to calculate the number of ad-related trackers present on the website.



Blacklight runs tests based on the page root URL entered into the tool's interface. For example, if the user enters example.com/sports , then Blacklight starts exploration from example.com , dropping the / sports path . If the user enters sports.example.com , then Blacklight begins its exploration at sports.example.com .



The results of Blacklight checks for each requested domain are cached for 24 hours; such cached reports are returned in response to subsequent requests from users of the same website made within those 24 hours. This is to prevent malicious use of the tool by attempting to overload the website with thousands of automated visits.



Blacklight also tells users if their scores are higher, lower, or roughly equal to those of the top 100,000 websites on the Tranco List . More on this below.



The Blacklight codebase is open source and available on Github ; it can also be downloaded as an NPM module .



Our analysis is limited. Blacklight emulates a user visiting a website, but its automated behavior is different from that of a human, and this behavior can trigger various types of surveillance. For example, an automated request can trigger more fraud checks but fewer ads.



Given the dynamic nature of web technologies, there is also the possibility that some of these tests will become obsolete over time. In addition, there may be new acceptable uses of technology that Blacklight considers to be violations.



For this reason, Blacklight results should not be considered a final decision on potential website privacy violations. Rather, they should be regarded as an initial automated study requiring additional study for a final decision.



Previous work



Blacklight builds on various privacy control tools written over the past decade.



It runs Javascript facilities, which allows it to track browser Javascript API calls. This aspect of the work is based on OpenWPM , an open source web privacy measurement tool created by Steven Englehard, Gunes Akar, Dillon Reisman, and Arvind Narayanan of Princeton University. This tool is currently supported by Mozilla.



OpenWPM was used by Princeton's Web Transparency and Accountability Project , which monitored websites and services to study how companies collect and use data and mislead users.



Through a variety of studies conducted between 2015 and 2019, Princeton researchers have identified a variety of privacy breach technologies. These include browser fingerprinting and cookie synchronization , as well as session re-creation scripts that collect passwords and sensitive user data . One notable example is the prescription and health data leaks from walgreens.com.



Five of the seven tests that Blacklight performs are based on the techniques described in the aforementioned Princeton study. These are canvas fingerprinting, keylogging, session recording and third-party domain cookies.



OpenWPM contains code and techniques from other privacy research tools, including FourthParty , Privacy Badger, and FP Detective :



  • FourthParty was an open source platform for measuring dynamic web content, launched in August 2011 and maintained until 2014. It has been used in various studies, in particular in a study describing the way that websites like Home Depot leaked their usernames to third parties. Blacklight uses FourthParty's methodology to monitor the transmission of user information over the network to third parties.
  • Privacy Badger β€” , Electronic Frontier Foundation 2014 . .
  • FP Detective . 2013 .


The developers of Blacklight data analysis were inspired in part by the Website Evidence Collector , developed by the Electronic Data Protection Supervisor (EDPS) of the European Union. Website Evidence Collector is a NodeJS package that uses the Puppeteer library to study how a website collects user personal data. Some of the categories of data collected were selected by EDPS.



Other projects that influenced the development of Blacklight included UC Berkeley 's Web Privacy Census in 2012 and the Wall Street Journal's "What They Know" series .



How we analyzed each type of tracking



Third-party cookies



Third-party domain cookies are small pieces of data that tracking companies store in a user's web browser when they visit a website. This is a piece of text, usually a unique number or string of characters, that identifies a visitor when visiting other websites that contain a tracking code belonging to the same company. Third-party domain cookies are used by hundreds of companies to collect user profiles and display customized ads based on their behavior.



Popular browsers - Edge, Brave, Firefox, and Safari - block third-party domain tracking cookies by default , and the Chrome developers have announced they will reject them .



What Blacklight is testing



Blacklight monitors network requests for the "Set-Cookie" header and monitors all domains setting cookies using the document.cookie javascript property . Blacklight identifies third party domain cookies as cookies whose domain does not match the website you are visiting. We search DuckDuckGo Tracker Radar for these third-party domains to see who owns them, how often they are used, and what types of services they provide.



Keylogging



Keylogging is the process of tracking by third parties the text that a user enters on a web page before clicking the submit button. This technique is used for a variety of purposes, including identifying anonymous users , matching them with postal addresses and real names.



There are other reasons for keylogging as well, such as providing an autocomplete feature. Blacklight has no way of recognizing the intent with which the target website is using this technique.



What Blacklight is testing



To test if the site is keylogging, Blacklight enters predefined text (see Appendix) into all input fields, but never clicks the submit button. It monitors network requests to check if the entered data is being passed on to any servers.



Session recording



Session recording is a technology that allows third parties to track and record all user behavior on a web page, including mouse movements, clicks, page scrolling, and all form input without even pressing the submit button.



In a 2017 studyResearchers at Princeton University have found that session recorders collect sensitive information such as passwords and credit card numbers. When the researchers contacted the relevant companies, most of them responded quickly and eliminated the cause of the data breaches. However, the study emphasizes that these are not just bugs, but rather unsafe practices that, according to the researchers, should be completely stopped. Most of the companies that provide the session recording feature report that they use the data to give their customers (the web sites that install the technology) useful information about how to improve the usability of the web site. One company, Inspectlet, describes its service as tracking "the behavior of individual users on a site as if we were behind them."(Inspectlet did not respond to an email asking for comment.)





Screenshot of Inspectlet, a well-known session recording service provider.



What Blacklight Tests



By session recording, we mean the loading of a special type of script by a company known for providing session recording services.



Blacklight monitors network requests for specific URL substrings, which, according to a list compiled by researchers at Princeton University in 2017, are only encountered when recording sessions.



Sometimes keylogging is performed as part of recording sessions. In such cases, Blacklight correctly reports session recording as both keylogging and session recording, since both behaviors are observed, even though both tests recognize the same script.



Blacklight accurately recognizes situations in which a website loads these scripts, however, companies usually only record a sample of site visits, so not every user is logged and not every visit.



Fingerprinting on Canvas



Fingerprinting refers to a group of techniques that try to identify a browser without creating a cookie. They can identify the user even if he has blocked all cookies.



Canvas fingerprinting is a type of fingerprinting that identifies a user by drawing shapes and text on a user's web page, noticing the slightest difference in the way they are rendered.





Four examples of canvas fingerprinting found by Blacklight.



These differences in font rendering, anti-aliasing, anti-aliasing, and other aspects are used by marketers and other professionals to identify individual devices. All major Internet browsers, with the exception of Chrome, try to discourage canvas fingerprinting, either by not executing data queries for scripts seen in such practices, or by striving to standardize user fingerprints.



The image above shows examples of canvas types used by fingerprinting scripts. Such canvases are usually invisible to the user.



What Blacklight Tests



We are following the methodology described in this articleresearchers at Princeton University to recognize when the HTML canvas element is being used for tracking. We use the following parameters to identify the canvases to be rendered by fingerprinting:



  • The canvas element's height and width properties must not be less than 16px.
  • The test must be written to the canvas with at least ten characters.
  • The script should not call the save , restore, or addEventListener methods of the rendering context.
  • The script fetches the image using toDataURL or a single call to getImageData specifying an area of ​​at least 16px Γ— 16px.


We have not seen this in practice, but it is possible that Blacklight could mistakenly label judicious canvas usage to match these heuristics. To accommodate this, the tool captures the image rendered by the script and renders it. Users can determine how canvas is used by simply looking at the image. The results of a typical fingerprinting script are shown above.



Advertising trackers



Advertising trackers (Ad trackers) are technologies that identify and collect information about users. Such technologies are usually (but not always) used to some extent with the consent of the website owners. They are used to collect analytics about website users, for targeting advertisements, and data brokers and other data collectors to create their user profiles. They usually take the form of Javascript and web beacon scripts.



Web beacons are small 1px x 1px images posted on websites by third parties for tracking purposes. With this technique, third parties can determine user behavior: when a particular user entered the site, the type of their browser and the IP address used.



What Blacklight is testing



Blacklight checks all network requests against an EasyPrivacy list of URLs and URL substrings known to be tracked. Blacklight monitors network activity for requests made to these URLs and substrings.



Blacklight records requests made to third party domains only. It ignores any URL patterns in the EasyPrivacy list that match its own URL domain. For example, EFF stores its own analytics, which is why it makes requests to its analytics subdomain https://anon-stats.eff.org . If the user enters eff.org , then Blacklight does not consider calls to anon-stats.eff.org as requests to third-party domains.



We find these third party domains in the DuckDuckGo Tracker Radar dataset to see who owns them, how common they are, and what types of services they provide. We only include in the list those third-party domains that are in the Ad Motivated Tracking categories of the Tracker Radar dataset .



Pixel Facebook



The Facebook Pixel is a code created by Facebook that allows other websites to target their visitors using Facebook ads. Some of the most common actions tracked by a pixel are browsing a page or certain content, adding billing information, or making a purchase.



What Blacklight Tests



Blacklight looks for network requests from the site leading to Facebook and examines the URL data request parameters that match the pattern described in the Facebook pixel documentation. We are looking for three different data types: " standard events", "custom events" and " advanced matching ".



"Remarketing Audiences" Google Analytics



Google Analytics is the most popular website analytics platform today. According to whotracks.me , 41.7% of web traffic is analyzed by Google Analytics. While most of the functionality of this service is to provide website developers and website owners with information about how a site's audience interacts with it, this tool also allows a website to create custom audience lists based on user behavior and then target ads to those visitors in Web using Google Ads and Display & Video 360. Blacklight examines the sites it researches for this tool, but not how it is used.



What Blacklight is testing



Blacklight looks for network requests from the site under investigation that go to a URL starting with "stats.g.doubleclick", which also prefixes the Google Account ID with "UA-". This is described in more detail in the Google Analytics developer documentation .



Survey



To determine the prevalence of tracking technologies on the Internet, we tested 100,000 of the most popular websites according to Tranco List using Blacklight . The data and analysis code can be found on Github . Blacklight has successfully committed data for 81,593 of these URLs. For the rest, either the resolving failed, or the timeout occurred after several attempts, or the web page could not be loaded. The percentages shown below are based on 81,617 successful results.



The main discoveries made in our review:



  • 6% of websites used canvas fingerprinting.
  • 15% of websites downloaded scripts from known session recording services.
  • 4% of websites performed keystroke logging.
  • 13% of sites did not load any third-party domain cookies or tracking network requests.
  • The median number of third-party domain cookies is three.
  • The median number of downloaded ad trackers is seven.
  • 74% of sites loaded with Google tracking technology.
  • 33% of websites loaded with Facebook tracking technology.
  • 50% of sites used the Google Analytics remarketing feature.
  • 30% of sites used the Facebook pixel.


We have classified as Google tracking technology any network requests made to any of the following domains:



  • google-analytics.com
  • Doubleclick.net
  • Googletagmanager.com
  • Googletagservices
  • Googlesyndication.com
  • Googleadservices
  • 2mdn.net


We have classified as Facebook tracking technology any network requests made to any of the following Facebook domains:



  • facebook.com
  • Facebook.net
  • atdmt.com


Limitations



Blacklight's analysis is limited by four main factors:



  1. This is a simulation of user behavior, not their true behavior, which can trigger other tracking system responses.
  2. The website being monitored can track the user's actions for good purposes.
  3. False positives (possible with canvas fingerprinting): Very rarely, reasonable use of the HTML canvas element matches the heuristics Blacklight uses to identify canvas fingerprinting.
  4. : Javascript- Blacklight window API . , jQuery, jQuery , Blacklight , . , ; , 100 000 .


In terms of false positives, when Blacklight visits a site, that site can see that the request is coming from computers hosted in the Amazon AWS cloud infrastructure. Since botnets are often used in the cloud infrastructure, our tool can trigger the bot recognition software on the site, including canvas fingerprinting. This can lead to false positives for the canvas fingerprinting test, even though the test is not used to track users, but to recognize botnets.



To test this, we took a random sample of 1,000 sites from the top of the Tranco List that we already ran through Blacklight on AWS. We ran this sample through the Blacklight software on our local computer with an IP address in New York and found that the results from the on-premises Blacklight scan were very similar, but not exactly the same as the results from running on the cloud infrastructure.



Sample Results: Local Machine and AWS



Local AWS
Fingerprinting on Canvas 8% ten%
Session recording eighteen% 19%
Keylogging 4% 6%
Median third party cookies 4 five
Median number of third-party trackers 7 8


Not all tracking activities that are invisible to the user are necessarily malicious. For example, canvas fingerprinting is used for fraud prevention because it allows device identification . And keylogging can be used to implement auto-complete functionality.



Blacklight does not attempt to infer the reasons for using the specific tracking technologies it detects.



Nor can Blacklight accurately determine how a website is using the user data it collects by loading scripts to record sessions and monitor user behavior such as mouse movements and keystrokes.



Blacklight will not review the website's terms of service and privacy policies for any disclosure of their user tracking activities.



application



Input Field Values ​​The



table below lists the values ​​we have written in Blacklight to enter input fields on websites. We used the Mozilla article on the autocomplete attribute for reference. Blacklight also checks for base64, md5, sha256, and sha512 versions of these values.



Autocomplete attribute Blacklight meaning
Date 01/01/2026
Email blacklight-headless@themarkup.org
Password SUPERS3CR3T_PASSWORD
Search TheMarkup
Text IdaaaaTarbell
Url themarkup.org
Organization The markup
Organization Title Non-profit newsroom
Current Password S3CR3T_CURRENT_PASSWORD
New Password S3CR3T_NEW_PASSWORD
Username idaaaa_tarbell
Family Name Tarbell
Given Name Idaaaa
Name IdaaaaTarbell
Street Address PO Box # 1103
Address Line 1 PO Box # 1103
Postal Code 10159
CC-Name IDAAAATARBELL
CC-Given-Name IDAAAA
CC-Family-Name TARBELL
CC-Number 4479846060020724
CC-Exp 01/2026
CC-Type Visa
Transaction Amount 13371337


Acknowledgments



We thank Gunes Akar (University of Leuven), Stephen Englehard (Mozilla), Arvind Narayanan, and Jonathan Mayer (Princeton Princeton, CITP) for comments and suggestions on the draft article.






Advertising



Servers for hosting sites are epic from Vdsina.

We use extremely fast NVMe drives from Intel and do not save on hardware - only branded equipment and the most modern solutions on the market!






All Articles