tl; dr: I am exploring the possibilities of matching accounts with phone numbers in the Russian Telegram segment.
There are many people in the world who want to be able to deanonymize an arbitrary user. These can be capitalists who do not shun spam, special services, scammers and just stalkers. Social services are trying to maneuver between the desire to attract the largest possible audience through the import of contacts and the limits on access to such information. They maneuver in different ways, some position themselves as socially as possible, while others value privacy more. The latter become targets of attacks from supporters of maximum privacy.
By default, in Telegram, as in less private messengers, you can get a user account, knowing his number. At the same time, the owner of the number can limit this opportunity only for mutual contacts, for this there is a special option. By default, it is turned off, which means we have a full Telegram of careless and deliberately public guys. The feature appears to be due to a drain on the user base. I decided to figure out how much it would cost to create a similar database, and whether I could create one.
I limited my interests to Russian users only. The base of number ranges, as it turned out, is publishedRossvyaz, which further simplified my task by eliminating the need to scrap sites with such information. In total, on September 7, operators were given almost six hundred million numbers, or rather, exactly 598035003.
I took several SIM cards, `telethon` (Python-module with a full MTProto implementation) and tried to create such a base at home.
Sharing contacts and adding to a group
Remember the story of the Hong Kong deanon? The bot added users to the group by phone number, thereby receiving an account linked to the phone. In the same article, a ZDNet journalist contacted a Telegram representative. The latter said that mass imports would be problematic.
We have suspected that some government-sponsored attackers have exploited this bug and use it to target Hong Kong protesters, in some cases posting immediate dangers to the life of the protestors
So I decided to first rummage through the contacts. The interface of official Telegram clients allows you to share only those users whose number you can see in one way or another. However, `telethon` allows you to share contacts with an arbitrary number. Judging by its API, the function of sharing is sending a file of a certain type . For preliminary verification, I sketched a script that sent the specified contact to my main account without any questions.
To check with the script, I opened a "clean" account (hereinafter referred to as the Bot), and sent three numbers to another account (hereinafter referred to as the Recipient): Pasha, Dasha and my own. Everyone has Telegram. Pasha shared his phone with everyone. Dasha added the recipient to her contacts. I added my number twice: first by adding Bot to my contacts, then deleted the bot and added myself to Bot's contacts.
The result can be interpreted by the picture: contacts are normally found only if the phone is available to the bot. It's even worse with the addition to the chat . I can't even add accounts with a known phone to Bot if they turned off this feature in the settings. In addition, Bots can be quickly banned if users start reporting spam. So I won't deanon anyone, it's time to forget about this idea.
Synchronization
Synchronizing contacts, as I said, potentially leads to account restrictions. But what does it look like? I wrote another script that takes random numbers from the database and adds them to contacts. After that, the script parses the contacts, adds the identifiers of the accounts found in the roster back to the database, marks the rest with zeros and removes the contacts from the roster.
Then I ran 5,000 random numbers, according to rumors, these are the limits that work in Telegram. I did not find any identifier in the output, except for the Bot itself. Now, in order to exclude possible errors in the code and trick of the Telegram, I manually add Dasha's number to the random numbers, disable deleting contacts from the roster, reduce the sample size to 3000 numbers and run it again. Dasha is not on the roster or in the database. An attempt to manually add Dasha hints that Telegram limits have worked.
Almost everything seems to be good. To make sure, I deleted my account and registered it again, reduced the number of phones to 3000, including Dasha, and ran the script again. The result is similar. It seems that the limits do not affect the account, but the phone number from which contacts are synchronized. Everything seems to be fine, and Telegram does provide a reasonable level of brute force protection.
Or not?
At least two services are mentioned on the Internet that allegedly scanned a wide range of numbers. One of them I checked, it works poorly. Suppose the second one is not lying and they really succeeded. Let's even assume that they do not need to process all 600 million numbers and they know from somewhere 150 million really active numbers (a little more than one number per capita in the Russian Federation). How much will it cost to scan everyone in six months, observing all the restrictions of the messenger? And in three months? And in a month? And in one day?
Let's say you can scan 5000 contacts from one number on the first day and 100 more on each subsequent day . In six months, from each number, you can enumerate 23,000 contacts
(180*100+5000)
; to search, you will need about 6,500 numbers(150000000/23000)
... Not much, right? If each SIM card will cost 150 rubles (which is expensive!), Then the cost of purchasing them will be less than a million rubles. Lifting amount even for small businesses! For SIM cards, you don't even need to keep a lot of equipment, you log in and run the script once a day.
But let's count to the maximum. Let's take the entire pool of 600 million numbers and reduce the time frame to a month. It turns out that you need 75 thousand SIM-cards, which will cost only about ten times more.
(600000000/(5000+30*100)*150=11250000)
... You will have to try to find as many SIM cards, but you can reduce their cost in such a batch. Potentially, you can use services that allow you to register an account from 3.5 rubles apiece, and the worst offenders can spread Trojans to steal SMS. Then it will be much cheaper. Development doesn't seem overly complicated, and hosting clients shouldn't be a big deal either. You may have to use many proxies, but this is not certain.
I was unable to collect the database due to Telegram limits, and that's good. This means that there is a certain entry threshold for such actions. I put useless scripts in a git... But there is nothing extremely difficult to do with some resources. Especially if you restrict interests to specific regions, for example, in the Jewish Autonomous Region there are less than a million numbers in the pool, and in Bashkortostan there are 12.5 million.
I suggested that the Telegram team inform the user about the possibility to hide the phone. And I remind you that anonymity in social services is conditional. If you do not want to fall under the massive deanon, hide your Telegram phone number for everyone except your contacts.