Voting anomalies on amendments to the Russian Constitution. Part 2

Link to the first part



The main goal of the second part is to study in detail the phenomenon of mass drawing (fictionalization) of voting results using specific examples.



As in the first part, all calculations, visualizations and data parsing are provided in Google Colab, which is available from this Google Colab link .





Why is it important to massively analyze election data?



95 . .



-, 14 . .



-, ( ) 78 . - 8.8 . -.



.



, RUElectionData "". . .



โ€” , . . , IT - .



, โ€œโ€ () . , .



โ€œโ€ , . . , .



:



  1. , . 327 19 , 25 . 19 . .
  2. โ€œโ€ . , , , 2014 2014 .
  3. 2000-2018 .


. , . . .



.



:



10 46 12 91.0 10 90.0 ( ):







43 43 79%, 78%, 75%, 74.8% 74% ( ):







60 , 72 21 '' 79.0%, 5 78.0% 4 78.9% ( ):







76 โ„– 2 85 14 '' 75.0% 8 74.9% ( ):







22 , โ€˜โ€™ 80%, 79%, 78%, 77% 75% ( ):















































, , . , , SOS .



, . () ( ):





, -, '' '' ( ):





.





, 71.1% 85.6%. (duplicates) - ds. min_n_duplicates, . total_duplicates . get_duplicates:



def get_duplicates(dq,col_name='yes_pct',min_n_duplicates=3):
       ds=pd.DataFrame(dq[col_name].values.round(1), 
       columns= [col_name]).groupby(col_name).size().to_frame('size') 
       ds=ds[ds['size']>=min_n_duplicates].sort_values(ascending=False,by='size')
       total_duplicates=ds['size'].sum()
       ds.reset_index(level=0, inplace=True)
       return total_duplicates,ds 


ds 5 โ„–5 (size () ):





:





0.1%, . - dr : ( ) total_duplicates, pct_duplicates prob_duplicates.





, , .



, 5%. , 9%, โ€˜โ€™ 7%.



n_levels=50 (10 ). n_stations=40 , n_identicals=10 , :



def get_p(n_identicals=10,n_stations=40,n_levels=50):
    bin_coeff=special.binom(n_stations, n_identicals)              
    prob=bin_coeff*(1/n_levels)**n_identicals*
   ((n_levels-1)/n_levels)**(n_stations-n_identicals)  
    return prob


, . c get_prob_duplicates:




from scipy.special import factorial
def multinomial_coeff(c): return factorial(c.sum()) / factorial(c).prod()

def get_prob_duplicates(duplicates=[10,5],n_stations=40,n_levels=50):
    n_duplicates=len(duplicates)
    sum_duplicates=sum(duplicates)
    coeffs = np.array(duplicates+[n_stations-sum_duplicates])
    mc=multinomial_coeff(coeffs)  
    prob=mc*(n_levels-n_duplicates)**(n_stations-sum_duplicates)/
    n_levels**n_stations
    return prob


. , .



, . , 0.1%.



, n_levels n_stations, , .





โ€œโ€ .



. .



(numpy.round(1)), 0 9. plot_first_digit . :









.



.



.



โ€˜โ€™ x-np.floor(x) :



yes_pct.apply(lambda x: x-np.floor(x)).hist(bins=25,grid=True)




.

x-np.round(x) :



yes_pct.apply(lambda x: x-np.round(x)).hist(bins=25,grid=True)




, . , ( -!). .



82.0% ( ), N=1021. 0.82*1021=838.2, 838. 838/1021=81.97%.



, ยฑ1/(2N) . , . . .



, . โ€˜โ€™ 100%-โ€™โ€™. . , .



,



. . . .



, , 2014 , 2014 , 2014 , 16 2014. . .



โ€œโ€.



, , , 3 2014 . wikipedia :



-: 10 319 723 (88,7%)

-: 500 279 (4,3%)

: 372 301 (3,2%)



: 15 845 575

: 11 634 412

: 442 108 (3,8%)



:

10 , 0,00001%. , 88.70000% ~ 1/10000.



10 319 723/11 634 412 100=88.70000 %

372 301/11 634 412
100=3.10000 %

442 108/11 634 412*100=3.80000%



(1/10000)^3 = 10^(-12).



, 16 2014 (wikipedia).



: 306 258

: 274 101

ยซ ยป: 262 041



274 101 306 258 โ€” 274101/306258 = 89.500%, ยซยป 262041 274 101 95.600%.



, , .



~ 300 000, 0,001%. , 89,500% 95,600% (1/100)^2 = 0.0001, 10 0.001 .



-.



. ( ) โ€˜โ€™ ( โ€œโ€) . - . . โ€” .





, 2000 2020 2008 .





. , , . , .



. , . , () . , .



: , .



, . -, . , .



- , , . .



, , . , .



1: 50% 371 . (c )





2: 260 33 , - ( ) . 259 .



โ„– 259 ( 32%, "" 50.79%, "" 48.37%),โ„– 260 (33, 44.48, 55.11 )



(64.84%) 260 (33.5%). =64.8%-33.5=31%.



, , 259 260 ( ) , .





3: , 1108 850/1219=70%.

, 482/1219=40%. , =70-40=30%. , 7.36, ยซ . . ยป.



, ( ) . , 30% 50% - - /, . , , .



, 80 90 55 . โ€œโ€:"Our power comes from the perception of our power".



IT , - . ", ".



: ยซ , , , ยป III; " , , ; , โ€” ยป ; ยซ , โ€” ยป .



-.



, , . , - ( ), .



?



, .



, c โ„– 2236 . -. , 99% โ€“ ยซยป.





. , . .



, , , . .



. , 2020 , .



, , .



, . , โ€œ โ€. .



The openness and accessibility of the data, as well as the reproducibility of the analysis results, are important. In publishing two parts of this article, I pursued exactly this goal. If the reader does not agree with the conclusions or does not trust the mathematical model on the basis of which the data are interpreted, then he can build his own model using the given data and code.




All Articles