Parsing of the Smart Voting website and the new API on the CEC website

image



On September 13, 2020, a single voting day was held in Russia. In some regions, the opposition has used the “Smart Voting” strategy , whereby opposition-minded voters vote for a single candidate with the highest chance of defeating a representative from the authorities.



For the second year in a row, the process of selecting candidates for “Smart Voting” has caused discussions on the topic of its transparency. In addition, I am personally confused by the difficulties in summarizing the strategy that independent analysts may face. The organizers of the UMG do not publish detailed results of the strategy, but only diagrams showing how many opposition candidates entered the regional parliament.



On the site of "Smart Voting"you cannot get a list of supported candidates by specifying, for example, city and district. If someone wants to collect data on the region, he will have to do the monotonous work of selecting addresses for each district.



In no case do I reproach the developers of the UMG website, it has all the required functionality to implement the voting strategy. But due to the fact that in 2019 no one was involved in the collection and publication of detailed data on the results of the UMG (outside the Moscow elections), in these elections I decided to take the initiative into my own hands.



The result is a summary table like this . In this article, I will tell you how the given set of data was obtained , how information was collected from the Smart Voting sites and the new CEC web service .



image



Smart Voting Site



First, let's see what data we can extract from the Smart Voting site. On the main page of the site there is a field for entering the user's registration address. When you enter a string, a list of suggested addresses appears in the following format:



image




When choosing one of the proposed addresses, we are taken to the page of the polling station to which the selected address is attached:



image


The page lists the election campaigns that take place in this area. For each campaign, there is a list of candidates for / against whom they are offered to vote:



image


In this case, we see the election of the governor, for which the UMG did not indicate an opposition candidate. This is due to the fact that the elections of governors are held in two rounds and it does not matter which of the opposition candidates the voters will vote for in the first round.

We also see three candidates at once, for whom they are offered to vote in the elections to the city parliament. This is due to the fact that the elections in Sochi have multi-member constituencies.

In all other election campaigns, involved in the UMG this year, there were only single-member constituencies.



Let's look at the page code and find that all the described data is collected in a convenient JSON format. In the element with id = "__ NEXT_DATA__", which is used to draw the page, there is information about the polling station, the corresponding election campaigns and candidates:



Content of the __NEXT_DATA__ element
{
   "props":{
      "pageProps":{
         "id":"440384",
         "settings":{
            "id":1,
            "share_photo":"/ganimed-media/share_photo/smartvote_sharepic_1200x628.jpg",
            "video_on_main_page":"https://youtu.be/w8gapDGwWMY",
            "fake_mode":false,
            "title_share":",    ",
            "text_share":" ,      —    « ».   — .",
            "telegram_bot_link":"https://tlinks.run/smartvotebot",
            "viber_bot_link":"viber://public?id=smartvote",
            "facebook_bot_link":"https://facebook.com/umnoegolosovanie/",
            "alice_link":null,
            "vk_bot_link":null
         },
         "serverData":{
            "commission":{
               "id":440384,
               "number":"4317",
               "address":"354340,  ,  ,  ,   , 24",
               "descr":"   № 49 . .. ",
               "lat":"43.425923",
               "lon":"39.920152",
               "region_id":26,
               "region_intid":"135637827259064320000372513"
            },
            "campaigns":[
               {
                  "id":26,
                  "code":"krasnodar-gub-2020",
                  "title":"   ",
                  "is_regional":true,
                  "ready_date":null,
                  "district":{
                     "id":458,
                     "code":"oik-0",
                     "name":"0",
                     "leaflet":""
                  },
                  "candidates":[
                     {
                        "id":998,
                        "name":"  ",
                        "share_image":"/elections-api-media/share/26/998.png",
                        "anticandidate":true,
                        "self_nominated":false,
                        "has_won":false,
                        "has_second_round":false,
                        "party":{
                           "title":" ",
                           "antiparty":true
                        }
                     }
                  ]
               },
               {
                  "id":28,
                  "code":"krasnodar-sochi-gorduma-2020",
                  "title":"    ",
                  "is_regional":false,
                  "ready_date":null,
                  "district":{
                     "id":526,
                     "code":"oik-2",
                     "name":"2",
                     "leaflet":"/elections-api-media/28/526-1334-1335-5385.pdf"
                  },
                  "candidates":[
                     {
                        "id":1334,
                        "name":"  ",
                        "share_image":"/elections-api-media/share/28/1334.png",
                        "anticandidate":false,
                        "self_nominated":true,
                        "has_won":false,
                        "has_second_round":false,
                        "party":null
                     },
                     {
                        "id":1335,
                        "name":"  ",
                        "share_image":"/elections-api-media/share/28/1335.png",
                        "anticandidate":false,
                        "self_nominated":true,
                        "has_won":false,
                        "has_second_round":false,
                        "party":null
                     },
                     {
                        "id":5385,
                        "name":"  ",
                        "share_image":"/elections-api-media/share/28/5385.png",
                        "anticandidate":false,
                        "self_nominated":false,
                        "has_won":false,
                        "has_second_round":false,
                        "party":{
                           "title":"",
                           "antiparty":false
                        }
                     }
                  ]
               }
            ]
         },
         "error":null,
         "currentUrl":"https://votesmart.appspot.com/candidates/440384"
      }
   },
   "page":"/candidates/[id]",
   "query":{
      "id":"440384"
   },
   "buildId":"U8hjaoxZw8TINu-DU_Ixw",
   "runtimeConfig":{
      "HOST":"https://votesmart.appspot.com"
   },
   "isFallback":false,
   "customServer":true,
   "gip":true
}




For the polling station, the number (number) of the corresponding PEC and its identifier are indicated in the database of the UMG website. Id = 440834 matches the number found in the URL of the page (/ candidates / 440834).



Can we, knowing the PEC number and region, calculate the commission identifier on the UMG website? I could not find an obvious dependence, since the identifiers are distributed quite chaotically:

Sochi, PEC # 4512 -> id = 440834

Sochi, PEC # 4513 -> id = 441403

Sochi, PEC # 4514 -> id = 1781216



How to collect a list of reflections of numbers PEC in id pages? It sounds extremely inefficient to iterate over and check all sorts of identifiers from 1 to 2,000,000, most of these identifiers are not working.



But, if we have a list of addresses, we can relatively easily put together a list of relevant polling stations. When you enter a string on the initial screen, a list of suitable addresses is returned from the server along with the corresponding commission identifiers:



Search for a site by address



https://votesmart.appspot.com/api/v1/cik/addresses?query=ADDRESS


  • ADDRESS - address, preferably in the format "Subject, city, street, house". It is also desirable without abbreviations "street", "d.", Since the parser on the server does not handle them well


Example request:



https://votesmart.appspot.com/api/v1/cik/addresses?query= Lenin's Smolensk



Query result
{
   "suggestions":[
      {
         "value":" ,  ,  ,  ",
         "data":{
            "fullname":" ,  ,  ,  ",
            "level":"7",
            "region_id":69,
            "commission_id":null,
            "intid":"138474570115456000000347353",
            "path":"135637827259064320000359815,135637827259064320000359819,135637827259064320000359820,138474570115456000000347353",
            "snippet":" ,  <em></em>,  , <em></em> ",
            "score":118.84238
         }
      },
      {
         "value":" ,  ,  ,  , 12",
         "data":{
            "fullname":" ,  ,  ,  , 12",
            "level":"8",
            "region_id":69,
            "commission_id":1124357,
            "intid":"135659820348349440000359937",
            "path":"135637827259064320000359815,135637827259064320000359819,135637827259064320000359822,135659820348349440000359708,135659820348349440000359937",
            "snippet":" ,  <em></em>,  , <em></em> , 12",
            "score":115.14931
         }
      },
...
   ]
}




Where can I get a list of addresses for extracting data from the site? Enumerating the database of all addresses in the country seems to be an ineffective solution, because to solve our problem we only need one address per constituency.



Each constituency has an average of 2 to 8 precincts. Even though the address of a polling station, in rare cases, may not correspond to the constituency to which it belongs, I put forward the following hypothesis: by going through the PEC addresses on the UMG website, you can collect information about each constituency.



Later, with the help of this hypothesis, I managed to collect information on almost all constituencies. Due to the heterogeneity of the address format in the database of election commissions, only the addresses of 10 out of 1100 constituencies I had to select manually.



On the Internet, you can find a regularly updated database of election commissions of the Russian Federation , containing information on the addresses and even the composition of PECs. But for greater relevance and reliability of the data (and also because I was not satisfied with the format of a certain field), I decided to collect the list of addresses myself, because, as it turned out, the CEC website has all the functionality necessary for this.



New CEC web service. API methods



GAS "Vybory" is an automated system developed in 1995, intended for the preparation and conduct of elections and referendums in the Russian Federation.



If you have ever been interested in the course of an election campaign, you probably came across this site , which publishes basic information from the GAS "Vybory" system, including the counting of votes, even before the approval of the election results:



image



And if earlier, to extract the election results, the dataminers used this site, during the days of the Voting on amendments to the Constitution , a captcha suddenly appeared on the site . The captcha is very persistent, it appears when you go to each page of the site:



image


As you yourself can visually evaluate, the captcha is of course very simple, and surely someone has already found ways to bypass it. Instead of doing machine learning, I turned to a new section on the CEC website, which few people know about yet: Digital services



image



This section appeared just during the voting on amendments and contains several web services that, via HTTP requests communicate with the internal API to receive data from the GAS "Vybory" system. The Habr user has already paid attention to this functionality. Let's consider it in more detail.



The following is a description of the main requests of the new API that were used in this project:



Each data structure in the system contains a VRN key- a unique identifier for the entity, be it a site, campaign, district or candidate .




PEC information



http://cikrf.ru/iservices/voter-services/committee/subjcode/SUBJECT_CODE/num/COMMITTEE_NUM




Example request:



http://cikrf.ru/iservices/voter-services/committee/subjcode/ 01 / num / 2



Query result
{
   "vrn":"4014001117979",
   "name":"   №2",
   "subjCode":"01",
   "numKsa":"01T001",
   "vid":"5",
   "address":{
      "address":"385200,  ,   ,  ,   .., 16",
      "descr":"  №1",
      "phone":"8-87772-9-23-72",
      "lat":"44.882893",
      "lon":"39.187187"
   },
   "votingAddress":{
      "address":"385200,  ,   ,  ,   .., 16",
      "descr":"  №1",
      "phone":"8-87772-9-23-72",
      "lat":"44.882893",
      "lon":"39.187187"
   }
}







Information about election campaigns at the site



http://cikrf.ru/iservices/voter-services/vibory/committee/COMMITTEE_VRN


  • COMMITTEE_VRN - PEC identifier


Request example:



http://cikrf.ru/iservices/voter-services/vibory/committee/ 4544028162533



Query result
[
   {
      "vrn":"100100163596966",
      "date":"2020-07-01",
      "name":"         ",
      "subjCode":"0",
      "pronetvd":null,
      "vidvibref":"0"
   },
   {
      "vrn":"25420001876696",
      "date":"2020-09-13",
      "name":"       ",
      "subjCode":"54",
      "pronetvd":"0",
      "vidvibref":"2"
   },
   {
      "vrn":"4544220183446",
      "date":"2020-09-13",
      "name":"        ",
      "subjCode":"54",
      "pronetvd":null,
      "vidvibref":"2"
   }
]





List of electoral districts



http://cikrf.ru/iservices/sgo-visual-rest/vibory/CAMPAIGN_VRN/tvd


  • CAMPAIGN_VRN - campaign ID


Request example:



http://cikrf.ru/iservices/sgo-visual-rest/vibory/ 457422069597 / tvd



Query result
{
   "_embedded":{
      "tvdDtoList":[
         {
            "vrn":457422069601,
            "namtvd":"    ",
            "namik":"    ",
            "numtvd":"0",
            "vidtvd":"ROOT",
            "_links":{
               "results":{
                  "href":"http://cikrf.ru/iservices/sgo-visual-rest/vibory/457422069597/results/457422069601/proportion"
               }
            }
         },
         {
            "vrn":457422069602,
            "namik":"   № 1",
            "numtvd":"1",
            "vidtvd":"OIK",
            "_links":{
               "results":{
                  "href":"http://cikrf.ru/iservices/sgo-visual-rest/vibory/457422069597/results/457422069602/major"
               }
            }
         },
         ...
      ]
   },
   "_links":{
      "self":{
         "href":"http://cikrf.ru/iservices/sgo-visual-rest/vibory/457422069597/tvd"
      }
   }
}




NUMTVD is the county number. Number zero is usually responsible for the results for a single district. For example, if elections are held under a mixed system, the "zero constituency" is responsible for voting under a proportional system. The rest of the constituencies are single-member or multi-member.



As you can see, the data structure also contains a link that can be used to find out the election results. The link is generated even before the publication of the voting results.






List of candidates participating in the election campaign



http://cikrf.ru/iservices/sgo-visual-rest/vibory/CAMPAIGN_VRN/candidates/?page=PAGE_NUM&numokr=NUMTVD


  • CAMPAIGN_VRN - campaign ID
  • PAGE_NUM - list page number
  • NUMTVD - county number (optional)


Request example:



http://cikrf.ru/iservices/sgo-visual-rest/vibory/ 4674220125616 / candidates /? Page = 1 & numokr = 11



Query result
{
   "_embedded":{
      "candidateDtoList":[
         ...
         {
            "index":50,
            "vrn":4674020270868,
            "fio":"  ",
            "datroj":"23.04.1964 00:00:00",
            "vidvig":"",
            "registr":"",
            "vrnio":4674220132098,
            "namio":"    \"     \"   ",
            "numokr":11,
            "tekstat2":"1",
            "_links":{
               "self":{
                  "href":"http://cikrf.ru/iservices/sgo-visual-rest/vibory/4674220125616/candidates/4674020270868"
               }
            }
         },
         {
            "index":56,
            "vrn":4674020269642,
            "fio":"  ",
            "datroj":"15.02.1986 00:00:00",
            "vidvig":"",
            "registr":"  ",
            "namio":"",
            "numokr":11,
            "tekstat2":"1",
            "_links":{
               "self":{
                  "href":"http://cikrf.ru/iservices/sgo-visual-rest/vibory/4674220125616/candidates/4674020269642"
               }
            }
         },
         {
            "index":105,
            "vrn":4674020271181,
            "fio":"  ",
            "datroj":"15.07.1994 00:00:00",
            "vidvig":"",
            "registr":"",
            "vrnio":4674220134054,
            "namio":"     \"   \"",
            "numokr":11,
            "tekstat2":"1",
            "_links":{
               "self":{
                  "href":"http://cikrf.ru/iservices/sgo-visual-rest/vibory/4674220125616/candidates/4674020271181"
               }
            }
         },
         ...
         
      ]
   },
   "_links":{
      "self":{
         "href":"http://cikrf.ru/iservices/sgo-visual-rest/vibory/4674220125616/candidates?page=1&numokr=11"
      }
   },
   "page":{
      "size":20,
      "totalElements":9,
      "totalPages":1,
      "number":1
   }
}




The page structure contains the total number of pages, it can be used to determine when you reach the last page (or by an empty list returned from the server).






The API contains other methods, mainly to find out more information about elections / candidates. If necessary, you can easily track the required requests. Now, you can start uploading data.



Downloading data from the CEC website



Before starting to download the necessary data, it was necessary to draw up a list of election campaigns that we will use in the project. The fact is that “Smart Voting” did not take place everywhere, but in elections:

— ,

— ,

— ( 200 )

( 4 ).

//


I decided to ignore the by-elections to the State Duma, because of the insignificance of these data. A Wikipedia article on Election Day helped to compile a list of elections to local councils , because it just listed elections in large cities.



Turning to a friend (who helped me implement this project by doing the required manual work), I asked him to compile a list of URLs for the respective election campaigns, taking them from the main page of the classic CEC website . The fact is that the URL contains the region and campaign identifiers that we will need for further parsing.



vybory.izbirkom.ru/region/izbirkom?action=show&vrn=21120001136916&
region=11&prver=1&pronetvd=1


As a result, the list consisted of 43 election campaigns. In total, more than 9000 separate election campaigns were held on the Single Election Day to the bodies of various levels.



Now, with the list of choices and the API methods listed earlier, downloading the data was easy. By writing a python script , making regular requests using the requests module, I saved the data about candidates and polling stations in the original JSON format.



The main thing to consider when downloading information about polling stations: it is not enough to go through all possible numbers starting from 1 until the server returns an empty value. The fact is that the numbering of PECs in the region can be interrupted, and go, for example, in the following form:

... # 1001 - # 1016, # 1101 - # 1136, 1138 ...

or:

# 0 - # 700, # 900 - # 1002, 1004 ...

To determine the maximum PEC number in the region and not make unnecessary requests, I collected data as follows: I tried to upload data on the first 1000 numbers, and then checked if i + 1, i + 5, i + 100, i + 500, i + 1000 numbers correspond to any PEC (in which case, continued downloading).



Also, I recommend keeping the PEC number from which you downloaded the data about the precinct. The fact is that the returned data does not contain the PEC number, but only the name in the form: "Precinct Election Commission No. 100" . The process of obtaining the original PEC number, which I later had to deal with, led to short-term bugs and frustration. As it turned out, the numbering in the name of PECs in some regions has a different format.



For example, in Udmurtia, the name of the PEC had the following numbering: “№1 / 01, №1 / 02, №1 / 03" , in the Lipetsk region: "№01-01, №01-02, №01-03" . In the Orenburg region, I came across a real exotic: it was the only region where a number of election commissions were named after someone. For example, "Precinct Election Commission No. 1696 named after" Pustovitov Brothers "



Downloading data from the site of "Smart Voting"



Now, for each PEC address collected, we are going to download voting data from the UMG website. Before that, it is worth considering several features (which I learned already in the process):



First, it is necessary to take into account that the addresses in the CEC database have a different format, sometimes even in certain regions of the regions. I had to remove the abbreviations “d.”, “G.” And “st.”, Since the site of “Smart Voting” was not at all able to cope with the search for addresses for such queries. I also recommend removing the postal code from the address, as well as the sometimes encountered prefix "Russian Federation".



Secondly, the UMG website has strong protection against DDoS attacks, and even if you make a hundred requests with an interval of 0.3 seconds, your IP will be banned. It would be possible to use a set of paid proxies, but personally I just used free proxies and alternated requests from my own and a third-party IP. In order not to get banned, there was an interval of about 0.7 seconds between requests. As a result, downloading all the data took about a day.



Using the queries from the first chapter, the algorithm is as follows:



  1. Formatting the PEC address
  2. Making a request for a list of suitable addresses
  3. We get a list containing site page IDs
  4. We check if we have already downloaded the data about the site by this identifier
  5. Load the HTML page of the site for this identifier
  6. We extract the element "__NEXT_DATA__" and save the data in JSON format


The page was parsed using the beautifulsoup4 library .



This process is not flawless: usually the script does not find a dozen polling stations in the region on the website, or at the address of one PEC you find information about a completely different PEC.



This is not a problem, because for each district, we just need to find at least one corresponding page on the site.



To validate the completeness of the data, we write a simple script that checks if the data set downloaded from the UMG site contains information about each constituency. If something is missing, we replenish the dataset manually. Again, there were fewer than 10 such exceptional situations out of 1100 counties.



Combining data from UMG and CEC websites



At this stage, we collect a convenient data structure with information about each candidate by district: candidate ID, full name, party, a tag with information about whether he is supported by the UMG.



Example of a collected candidate dataset
{
    "33": [
        {
            "name": "  ",
            "vrn": 4444032121758,
            "birthdate": "05.05.1958 00:00:00",
            "party": "",
            "smart_vote": 0
        },
        {
            "name": "  ",
            "vrn": 4444032122449,
            "birthdate": "16.11.1977 00:00:00",
            "party": "",
            "smart_vote": 0
        },
        {
            "name": "  ",
            "vrn": 4444032122782,
            "birthdate": "27.02.1996 00:00:00",
            "party": "",
            "smart_vote": 0
        },
        {
            "name": "  ",
            "vrn": 4444032123815,
            "birthdate": "20.11.1991 00:00:00",
            "party": "",
            "smart_vote": 1
        },
        {
            "name": "  ",
            "vrn": 4444032124060,
            "birthdate": "21.07.1996 00:00:00",
            "party": "",
            "smart_vote": 0
        },
        {
            "name": "  ",
            "vrn": 4444032123597,
            "birthdate": "21.05.1974 00:00:00",
            "party": "",
            "smart_vote": 0
        }
    ],
    ...
}




The algorithm is fairly straightforward:



  1. Based on the data array from the UMG website, we create a list of supported candidates for each district
  2. Using the data array from the CEC website, we create a filtered list of admitted candidates for each constituency
  3. In each district, by full name, we calculate the correspondence Candidate-UMG-Candidate-CEC


Of course, such a simple algorithm must account for many potential problem situations.



First, there is a chance that in one constituency there will be candidates with completely identical names. Fortunately, among 5000 candidates, such a situation was only in one case, and none of the candidates was supported by the UMG.



Secondly, it should be taken into account that there may be errors in the database of the CEC website. The most common mistake: line breaks and extra spaces in the full name. Also, when collecting data on the voting results, there was a situation in which the letter "" in the surname was replaced by "e".



Third, the relevance of the data must be taken into account. The data on the website of the CEC and the UMG changed and updated until Saturday: some candidates were removed / reinstated, in some districts the support of the UMG changed.



To validate the UMG lists, a simple script was written that makes one request per district (after all, the dataset we have collected now allows us to uniquely identify the page dedicated to each district) and checks whether the names match those that we received earlier.



An interesting task was to identify the parties by the name of their branches. This point could be skipped, but I decided to do it to unify the information. The problem is that candidates from one party may have different names in the CEC database. For example, in the case of the KPRF, there were more than 40 options:



  ()    "   " 
-   ""
  
     "   "
...


The situation turns into an interesting parsing problem, when there are 25 batches and almost each has a different spelling for each region. Fortunately, with the help of my friend, who helped me with all the manual work, we compiled a list of keywords by which the candidate's party is uniquely determined.



Downloading election results from the CEC website



The collected data set was sufficient to achieve the initial goal of the project - we compiled lists of UMG-2020 candidates for each constituency. But if there is a technical opportunity to get the election results, why not use it?






District election results



http://cikrf.ru/iservices/sgo-visual-rest/vibory/CAMPAIGN_VRN/results/DISTRICT_VRN/major


  • CAMPAIGN_VRN - campaign ID
  • DISTRICT_VRN - District ID


Request example:

http://cikrf.ru/iservices/sgo-visual-rest/vibory/ 457422069597 / results / 457422069602 / major



Query result
{
   "report":{
      "tvd":"",
      "date_sign":"none",
      "vrnvibref":"457422069597",
      "line":[
         {
            "txt":"     ",
            "kolza":"8488",
            "index":"1"
         },
         {
            "txt":" ,   ",
            "kolza":"6700",
            "index":"2"
         },
         ...
         {
            "txt":"  ",
            "kolza":"65",
            "index":"9"
         },
         {
            "txt":"  ",
            "kolza":"1948",
            "index":"10"
         },
         ...
         {
            "delimetr":"1"
         },
         {
            "txt":"  ",
            "numsved":"1",
            "kolza":"112",
            "index":"11",
            "namio":"    ",
            "perza":"5.56",
            "numsvreestr":"4574030258379"
         },
         {
            "txt":"  ",
            "numsved":"2",
            "kolza":"186",
            "index":"12",
            "namio":"     ",
            "perza":"9.24",
            "numsvreestr":"4574030258723"
         },
         {
            "txt":"  ",
            "numsved":"3",
            "kolza":"54",
            "index":"13",
            "namio":"",
            "perza":"2.68",
            "numsvreestr":"4574030258555"
         },
         ...
      ],
      "data_gol":"13.09.2020 00:00:00",
      "is_uik":"0",
      "type":"423",
      "version":"0",
      "sgo_version":"5.6.0",
      "isplann":"0",
      "podpisano":"1",
      "versions":{
         "ver":{
            "current":"true",
            "content":"0"
         }
      },
      "vibory":"        ",
      "repforms":"1",
      "generation_time":"14.09.2020 07:59:21",
      "nazv":"    () ",
      "datepodp":"14.09.2020 05:44:00"
   }
}




As you can see, the results are returned in the form of the protocol of the regional commission. Each region has a different protocol format and the number of introductory lines in it, so careful validation of the data you extract must be performed.






When GAS "Vybory" began to publish preliminary results, I ran into a little disappointment. It turned out that through the API you can get data only on those results that are officially approved. The preliminary results can still be viewed on the old website of the election commission, but not through the new web services.



A day later, the results of 50% were known, and by the end of the week the results of almost all elections were summed up, some regions still refused to approve the results. At the time of this writing, 7 days have passed, and the results of the elections in Tambov have not yet been approved. In addition, in some constituencies there is a recount, which is why these results are also not available through the API.



Conclusion: API methods are currently not suitable for promptly receiving voting results. You will either have to wait more than a week for the results to be approved, or you will have to parse the old site of the election commission, finding a way to bypass the captcha.



I'm tired of waiting for elections to be approved in ~ 30 out of 1100 constituencies, so I wrote a script using the selenium library that downloads data from the classic site of the electoral commission and asks me to manually solve the captcha for each request. With such a small number of requests, it doesn't take long to manually solve a captcha.



As a result, I collected the data on the voting results into the following structure :



Example of county voting results
{
...
"33": {
        "candidate_total": {
            "4444032121758": 880,
            "4444032122449": 236,
            "4444032122782": 143,
            "4444032123597": 152,
            "4444032123815": 149,
            "4444032124060": 72
        },
        "is_final": 1,
        "non_valid_votes": 132,
        "registered_voters": 6928,
        "valid_votes": 1632
    },
...
}




For each constituency, I saved the total number of voters on the lists (to calculate the turnout), the number of valid and invalid ballots. The structure contains a dictionary: Candidate identifier -> The number of votes he typed.



Publication of the results of UMG-2020



First, I published the collected data in JSON format on GitHub . Data will be updated until the results are validated in all districts.



Secondly, to draw attention to the project, I decided to generate a Google Spreadsheet, which contains all the collected data in a convenient form for visual analysis.



I will not go into details, no difficulties (except for studying the Google Sheets API) should arise. I recommend this article , which details interacting with the Google Sheets API in Python.



image



As a result, we got the following table, which contains:





Afterword



The idea for this mini-project came up 3 days before voting day and I am personally pleased with how I managed to study and implement everything in the shortest possible time (although the code turned out to be terrible).



I'm not going to draw any conclusions about the results of the Smart Voting strategy, I just provided tools for fans of electoral statistics. I am sure there will be some among you and soon we will see wonderful studies, with interesting graphs and diagrams.



All Articles