The main goal of the second part is to study in detail the phenomenon of mass drawing (fictionalization) of voting results using specific examples.
As in the first part, all calculations, visualizations and data parsing are provided in Google Colab, which is available from this Google Colab link .
Why is it important to massively analyze election data?
95 . .
-, 14 . .
-, ( ) 78 . - 8.8 . -.
.
, RUElectionData "". . .
, โโ () . , .
:
- , . 327 19 , 25 . 19 . .
- โโ . , , , 2014 2014 .
- 2000-2018 .
. , . . .
:
10 46 12 91.0 10 90.0 ( ):
, . () ( ):
, -, '' '' ( ):
.
, 71.1% 85.6%. (duplicates) - ds. min_n_duplicates, . total_duplicates . get_duplicates:
def get_duplicates(dq,col_name='yes_pct',min_n_duplicates=3):
ds=pd.DataFrame(dq[col_name].values.round(1),
columns= [col_name]).groupby(col_name).size().to_frame('size')
ds=ds[ds['size']>=min_n_duplicates].sort_values(ascending=False,by='size')
total_duplicates=ds['size'].sum()
ds.reset_index(level=0, inplace=True)
return total_duplicates,ds
ds 5 โ5 (size () ):
:
0.1%, . - dr : ( ) total_duplicates, pct_duplicates prob_duplicates.
, , .
, 5%. , 9%, โโ 7%.
n_levels=50 (10 ). n_stations=40 , n_identicals=10 , :
def get_p(n_identicals=10,n_stations=40,n_levels=50):
bin_coeff=special.binom(n_stations, n_identicals)
prob=bin_coeff*(1/n_levels)**n_identicals*
((n_levels-1)/n_levels)**(n_stations-n_identicals)
return prob
from scipy.special import factorial
def multinomial_coeff(c): return factorial(c.sum()) / factorial(c).prod()
def get_prob_duplicates(duplicates=[10,5],n_stations=40,n_levels=50):
n_duplicates=len(duplicates)
sum_duplicates=sum(duplicates)
coeffs = np.array(duplicates+[n_stations-sum_duplicates])
mc=multinomial_coeff(coeffs)
prob=mc*(n_levels-n_duplicates)**(n_stations-sum_duplicates)/
n_levels**n_stations
return prob
. , .
, . , 0.1%.
, n_levels n_stations, , .
โโ .
. .
(numpy.round(1)), 0 9. plot_first_digit . :
.
.
.
โโ x-np.floor(x) :
yes_pct.apply(lambda x: x-np.floor(x)).hist(bins=25,grid=True)
.
x-np.round(x) :
yes_pct.apply(lambda x: x-np.round(x)).hist(bins=25,grid=True)
, . , ( -!). .
82.0% ( ), N=1021. 0.82*1021=838.2, 838. 838/1021=81.97%.
, ยฑ1/(2N) . , . . .
, . โโ 100%-โโ. . , .
,
. . . .
, , 2014 , 2014 , 2014 , 16 2014. . .
โโ.
, , , 3 2014 . wikipedia :
-: 10 319 723 (88,7%)
-: 500 279 (4,3%)
: 372 301 (3,2%)
: 15 845 575
: 11 634 412
: 442 108 (3,8%)
:
10 , 0,00001%. , 88.70000% ~ 1/10000.
10 319 723/11 634 412 100=88.70000 %
372 301/11 634 412100=3.10000 %
442 108/11 634 412*100=3.80000%
(1/10000)^3 = 10^(-12).
, 16 2014 (wikipedia).
: 306 258
: 274 101
ยซ ยป: 262 041
274 101 306 258 โ 274101/306258 = 89.500%, ยซยป 262041 274 101 95.600%.
, , .
~ 300 000, 0,001%. , 89,500% 95,600% (1/100)^2 = 0.0001, 10 0.001 .
-.
. ( ) โโ ( โโ) . - . . โ .
, 2000 2020 2008 .
. , , . , .
. , . , () . , .
: , .
, . -, . , .
, , . , .
1: 50% 371 . (c )
, ( ) . , 30% 50% - - /, . , , .
, 80 90 55 . โโ:"Our power comes from the perception of our power".
: ยซ , , , ยป III; " , , ; , โ ยป ; ยซ , โ ยป .
-.
, , . , - ( ), .
?
, .
, c โ 2236 . -. , 99% โ ยซยป.
. , . .
, , , . .
. , 2020 , .
, , .
, . , โ โ. .
The openness and accessibility of the data, as well as the reproducibility of the analysis results, are important. In publishing two parts of this article, I pursued exactly this goal. If the reader does not agree with the conclusions or does not trust the mathematical model on the basis of which the data are interpreted, then he can build his own model using the given data and code.