Medusa, passports and shit code - why the passport numbers of all participants in Internet voting ended up on the Internet

After the end of the Internet voting, which ended surprisingly well, for a long time I and many people had a feeling that in Russia something simply could not go so well. Now you can relax - the reality did not disappoint and we saw a double madness: both in terms of the architecture of the solution, and in terms of cryptography.



By the way, the Ministry of Telecom and Mass Communications still excludes ANY possibility of leakage of passport data of voters



Meanwhile, the distribution of series of passports looks like this:



image



Let's reproduce the events and try to understand how all this could have been avoided



What happened?



On July 9, Meduza's material appears. The authorities actually made public the personal data of all Internet voters, where they told about the degvoter.zip archive.



How can I find the degvoter.zip archive?



I found it like this. A careful search through Yandex led me to the page:

vudu7.vuduwiki.duckdns.org/mk.ru/https_check.ege.edu.ru.html



The text "Https checkvoter.gosuslugi.ru degvoter.zip" was found there. The dating at that time was 7.7.2020 (before the publication of Medusa!), Now this text has already "moved" to the top of the page and the dating has changed.



The archive itself was removed from the site of the state service, but a copy of it was preserved in web.archive.org, from where it was downloaded by all persons interested in the study, including myself. To understand why this happened, I recommend referring to the primary source - the robots.txt file on the State Service website.



What's inside degvoter.exe?



The degvoter program itself is written in C # and is a WinForms application written on the knee that works with a sqlite database. The files in the archive are dated 2020-06-30 22:17 (June 30, 2020). It can be seen that the application was written in the shortest possible time, because at that moment it was already 7:17 on July 1 in Kamchatka, and the fact that the plots opened there at 8:00 indicates that the deadline was closer than ever (it is good that they voted electronically only Moscow and Nizhny Novgorod).



Passport verification code: The



image



application, both from an architectural point of view and from a cryptographic point of view, is the worst shit code. And that's why:



Description of the architecture flaws and the principle of the attack on the recovery of passport identifiers



The program included a local database in which there was a passports table with two fields num and used. Where num was SHA256 (<series> + <number>).



Very often, when a programmer with no relevant experience approaches cryptography issues, he makes a bunch of the same type of mistakes. One of such errors is the use of a hash function without any kind of hanging. The passport identifier consists of a 4-digit series and a 6-digit number [xxxx xxxxxx]. Those. we have 10 ^ 10 options. The phone number, by the way, also consists of 10 digits [+7 (xxx) xxx-xx-xx]. In the scale of the modern digital world, these are not such large numbers. So one GB is more than 10 ^ 9 bytes, i.e. 100GB is enough to record all options. It is likely that you can sort of trite them. I measured that in single-threaded mode, a modern Intel Core i5 processor iterates over all sha256 hashes for one series of a passport in 5 seconds (000000-999999). And this is on the standard sha256 implementation without any additional tweaks. Those.a full search of all space on a regular computer will take less than a day. If we take into account that the search can be carried out in several threads, then an average processor will cope with such a task in a few hours. This is a demonstration of the fact that the developer of the system does not understand the principles of using hash functions. But even the correct use of hash functions with such an architecture does not save passport data from disclosure if the adversary has unlimited resources. After all, a person who has gained access to the database can get passport identifiers in a finite time, because one passport must be checked within a finite time. The whole question is only about the resources (although if they simply applied hashing in a couple of million rounds, even such a controversial architectural decision as the distribution of the database along with the application would not lead to such a loud effect, sincewould allow you to protect yourself at least from journalists). Medusa just demonstrated the incompetence of the people who designed this part of the system.



Let's try to figure out how to make it much better on the one hand, and on the other hand, also keep within one development night.



Architecture on the knee



Suppose we have no time at all and need to write a solution during the night.

The obvious requirement is that the database with passport hashes must be on the server and it must be a client-server application. The question immediately arises, what to do if the Internet suddenly breaks down on the site? For these purposes, you need to make an Android version of the client application, which must also be given to download to PEC members. In places where there is no Internet or cellular communication, people did not vote at this vote.



The hash in the database should not be calculated directly from the passport ID. This is done so that the hashes in the database cannot be brute-force using existing tables for brute force. First, you need to use a strong-hash function. The main question is HOW it should be used. There are many possible implementations here, but in fact it all boils down to the use of an algorithm in which there will be three parameters: the type of the hash function, the number of iterations, and the value (s) that must be used to mix into the hash (it will be common for all hashes). The final requirement is that a strong hash function must be used within each iteration, and the hash computation speed must be several units per second. Even taking over the database from the server, an attacker in this case would take a significant amount of time to recover all the data.



Each of the client applications will be just an input field + an Http client that sends a request to the server.



The server works only over HTTPS and only during voting and has a limit of 1 RPS per IP. We use Redis as the RPS delimiter, where we write the IP address and TTL as a key in one second. If there is a value - the request from IP is not allowed, there is no value - the request from IP is allowed. This will make it possible to avoid brute force from the outside.



Written in this way, our solution, literally made of shit and sticks, will be an order of magnitude more secure than the current degvoter. At the same time, the difference in writing time is small and the process of writing the code itself can be parallelized for 3 people (server, win-client, android-client).



Let's look at possible leak scenarios.



We have the following points where you can get information about the system



  1. Server source code
  2. Compiled backend files
  3. Server DB
  4. Client applications


Client applications in this case do not carry any information, while the maximum number of people has access to them, and this is where the maximum probability of leaks is (which happened).



In order to recover information, you will need to access information from points (1,2) or (1,3). If there is only a base, then without a known hashing method, it will be impossible to recover something.



conclusions



  1. Every time when you need to work with personal data in some form - involve an architect
  2. Every time when you need to work with personal data in some form - involve a developer with experience / education in the field of cryptography or information security


These two simple rules will help to avoid the shame that we saw in the example with the degvoter application, (Remember that an ordinary developer may not understand the nuances of using hash functions)



The utility for demonstrating the possibility of recovering personal data DegvoterDecoder is located in the repository dedicated to the analysis of voting data ... By default, it is configured for 8 threads. If you have already downloaded the degvoter.zip archive and you program in C #, you can easily figure out how it works.



github.com/AlexeiScherbakov/Voting2020



All Articles