🎨 👨🏻‍💻 🛰️ Black [O] lives Matter: Race, Crime, and Fire to Kill in the United States. Part 2 👩🏼‍🤝‍👨🏽 🛀🏼 💅🏻

In the first part of the article, I described the background for the study, its goals, assumptions, inputs and tools. Now we can say Gagarin's without further ado ...

Go!

We import the libraries and define the path to the directory with all the files:

import pandas as pd, numpy as np

#      
ROOT_FOLDER = r'c:\_PROG_\Projects\us_crimes'

Death at the hands of the law

Let's start by analyzing data on police casualties. Let's upload a file from CSV to DataFrame:

#    Fatal Encounters (FENC)
FENC_FILE = ROOT_FOLDER + '\\fatal_enc_db.csv'

#   DataFrame
df_fenc = pd.read_csv(FENC_FILE, sep=';', header=0, usecols=["Date (Year)", "Subject's race with imputations", "Cause of death", "Intentional Use of Force (Developing)", "Location of death (state)"])

, , : , ( ), ( , ), , .

, " " . , , FENC , , . , ( ). Fatal Encounters Excel ( ).

df_fenc.columns = ['Race', 'State', 'Cause', 'UOF', 'Year']
df_fenc.dropna(inplace=True)

, . . FENC, , (Hispanic/Latino), (Asian/Pacific Islander) (Middle Eastern). . :

df_fenc = df_fenc.replace({'Race': {'European-American/White': 'White', 'African-American/Black': 'Black', 
                          'Hispanic/Latino': 'White', 'Native American/Alaskan': 'American Indian',
                          'Asian/Pacific Islander': 'Asian', 'Middle Eastern': 'Asian',
                          'NA': 'Unknown', 'Race unspecified': 'Unknown'}}, value=None)

( ) :

df_fenc = df_fenc.loc[df_fenc['Race'].isin(['White', 'Black'])]

"UOF" ( )? , ( ) . , (, ) , . : 1) - (: , ; : ); 2) ; , , ( ) , . , :

df_fenc = df_fenc.loc[df_fenc['UOF'].isin(['Deadly force', 'Intentional use of force'])]

. CSV, :

df_state_names = pd.read_csv(ROOT_FOLDER + '\\us_states.csv', sep=';', header=0)
df_fenc = df_fenc.merge(df_state_names, how='inner', left_on='State', right_on='state_abbr')

df_fenc.head(), :

	Race	State	Cause	UOF	Year	state_name	state_abbr
0	Black	GA	Gunshot	Deadly force	2000	Georgia	GA
1	Black	GA	Gunshot	Deadly force	2000	Georgia	GA
2	Black	GA	Gunshot	Deadly force	2000	Georgia	GA
3	Black	GA	Gunshot	Deadly force	2000	Georgia	GA
4	Black	GA	Gunshot	Deadly force	2000	Georgia	GA

, :

#     
ds_fenc_agg = df_fenc.groupby(['Year', 'Race']).count()['Cause']
df_fenc_agg = ds_fenc_agg.unstack(level=1)
#     UINT16  
df_fenc_agg = df_fenc_agg.astype('uint16')

2 : White ( ) Black ( ), ( 2000 2020). :

#        (- )
plt = df_fenc_agg.plot(xticks=df_fenc_agg.index, color=['olive', 'g'])
plt.set_xticklabels(df_fenc_agg.index, rotation='vertical')
plt.set_xlabel('')
plt.set_ylabel('-    ')
plt

() , .

2.4 . , . , .

( ):

#  CSV     (1991 - 2018)
POP_FILE = ROOT_FOLDER + '\\us_pop_1991-2018.csv'
df_pop = pd.read_csv(POP_FILE, index_col=0, dtype='int64')

#     -     2000 - 2018 .
df_pop = df_pop.loc[2000:2018, ['White_pop', 'Black_pop']]

#  ,    
df_fenc_agg = df_fenc_agg.join(df_pop)
df_fenc_agg.dropna(inplace=True)

#       
df_fenc_agg = df_fenc_agg.astype({'White_pop': 'uint32', 'Black_pop': 'uint32'})

. 2 , ( 1 . ):

df_fenc_agg['White_promln'] = df_fenc_agg['White'] * 1e6 / df_fenc_agg['White_pop']
df_fenc_agg['Black_promln'] = df_fenc_agg['Black'] * 1e6 / df_fenc_agg['Black_pop']

, :

	Black	White	White_pop	Black_pop	White_promln	Black_promln
Year
2000	148	291	218756353	35410436	1.330247	4.179559
2001	158	353	219843871	35758783	1.605685	4.418495
2002	161	363	220931389	36107130	1.643044	4.458953
2003	179	388	222018906	36455476	1.747599	4.910099
2004	157	435	223106424	36803823	1.949742	4.265861
2005	181	452	224193942	37152170	2.016112	4.871855
2006	212	460	225281460	37500517	2.041890	5.653255
2007	219	449	226368978	37848864	1.983487	5.786171
2008	213	442	227456495	38197211	1.943229	5.576323
2009	249	478	228544013	38545558	2.091501	6.459888
2010	219	506	229397472	38874625	2.205778	5.633495
2011	290	577	230838975	39189528	2.499578	7.399936
2012	302	632	231992377	39623138	2.724227	7.621809
2013	310	693	232969901	39919371	2.974633	7.765653
2014	264	704	233963128	40379066	3.009021	6.538041
2015	272	729	234940100	40695277	3.102919	6.683822
2016	269	723	234644039	40893369	3.081263	6.578084
2017	265	743	235507457	41393491	3.154889	6.401973
2018	265	775	236173020	41617764	3.281493	6.367473

2 - . :

plt = df_fenc_agg.loc[:, ['White_promln', 'Black_promln']].plot(xticks=df_fenc_agg.index, color=['g', 'olive'])
plt.set_xticklabels(df_fenc_agg.index, rotation='vertical')
plt.set_xlabel('')
plt.set_ylabel('-    \n 1   ')
plt

df_fenc_agg.loc[:, ['White_promln', 'Black_promln']].describe()

	White_promln	Black_promln
count ()	19.000000	19.000000
mean ( .)	2.336123	5.872145
std (. )	0.615133	1.133677
min (. )	1.330247	4.179559
25%	1.946485	4.890977
50%	2.091501	5.786171
75%	2.991827	6.558062
max (. )	3.281493	7.765653

1. 5.9 1 . 2.3 1 . ( 2.6 ).
2. () 1.8 , . ( , , .)
3. - 2013 . (7.7 ); - 2018 . (3.3 ).
4. ( 0.1 - 0.2 ), 2009 . 2011 - 2013 .

, :

- , , ?

- , . 2.6 , .

, - , , .

CSV :

CRIMES_FILE = ROOT_FOLDER + '\\culprits_victims.csv'
df_crimes = pd.read_csv(CRIMES_FILE, sep=';', header=0, index_col=0, usecols=['Year', 'Offense', 'Offender/Victim', 'White', 'White pro capita', 'Black', 'Black pro capita'])

- : , , , ( - "White", "Black" - "White pro capita", "Black pro capita").

(`df_crimes.head()`):

	Offense	Offender/Victim	Black	White	Black pro capita	White pro capita
Year
1991	All Offenses	Offender	490	598	1.518188e-05	2.861673e-06
1991	All Offenses	Offender	4	4	1.239337e-07	1.914160e-08
1991	All Offenses	Offender	508	122	1.573958e-05	5.838195e-07
1991	All Offenses	Offender	155	176	4.802432e-06	8.422314e-07
1991	All Offenses	Offender	13	19	4.027846e-07	9.092270e-08

. :

#    ( )
df_crimes1 = df_crimes.loc[df_crimes['Offender/Victim'] == 'Offender']
#    (2000-2018)    
df_crimes1 = df_crimes1.loc[2000:2018, ['Offense', 'White', 'White pro capita', 'Black', 'Black pro capita']]

(1295 * 5 ):

	Offense	White	White pro capita	Black	Black pro capita
Year
2000	All Offenses	679	0.000003	651	0.000018
2000	All Offenses	11458	0.000052	30199	0.000853
2000	All Offenses	4439	0.000020	3188	0.000090
2000	All Offenses	10481	0.000048	5153	0.000146
2000	All Offenses	746	0.000003	63	0.000002
...	...	...	...	...	...
2018	Larceny Theft Offenses	1961	0.000008	1669	0.000040
2018	Larceny Theft Offenses	48616	0.000206	30048	0.000722
2018	Drugs Narcotic Offenses	555974	0.002354	223398	0.005368
2018	Drugs Narcotic Offenses	305052	0.001292	63785	0.001533
2018	Weapon Law Violation	70034	0.000297	58353	0.001402

1 1 ( ). :

df_crimes1['White_promln'] = df_crimes1['White pro capita'] * 1e6
df_crimes1['Black_promln'] = df_crimes1['Black pro capita'] * 1e6

, ( ), :

df_crimes_agg = df_crimes1.groupby(['Offense']).sum().loc[:, ['White', 'Black']]

	White	Black
Offense
All Offenses	44594795	22323144
Assault Offenses	12475830	7462272
Drugs Narcotic Offenses	9624596	3453140
Larceny Theft Offenses	9563917	4202235
Murder And Nonnegligent Manslaughter	28913	39617
Sex Offenses	833088	319366
Weapon Law Violation	829485	678861

plt = df_crimes_agg.plot.barh(color=['g', 'olive'])
plt.set_ylabel(' ')
plt.set_xlabel('-   ( 2000 - 2018 )')

, , :

, , " " , ,
, ( 2 " ")

, "" . , :

df_crimes_agg1 = df_crimes1.groupby(['Offense']).sum().loc[:, ['White_promln', 'Black_promln']]

	White_promln	Black_promln
Offense
All Offenses	194522.307758	574905.952459
Assault Offenses	54513.398833	192454.602875
Drugs Narcotic Offenses	41845.758869	88575.523095
Larceny Theft Offenses	41697.303725	108189.184125
Murder And Nonnegligent Manslaughter	125.943007	1016.403706
Sex Offenses	3633.777035	8225.144985
Weapon Law Violation	3612.671402	17389.163849

plt = df_crimes_agg1.plot.barh(color=['g', 'olive'])
plt.set_ylabel(' ')
plt.set_xlabel('-    1    ( 2000 - 2018 )')

. ( ) , . " " 3 .

" " (All Offenses) , ( ) ( - , ).

#   'All Offenses' =  
df_crimes1 = df_crimes1.loc[df_crimes1['Offense'] == 'All Offenses']
#    , , ,    :
#df_crimes1 = df_crimes1.loc[df_crimes1['Offense'].str.contains('Assault|Murder')]

#       
df_crimes1 = df_crimes1.groupby(level=0).sum().loc[:, ['White_promln', 'Black_promln']]

	White_promln	Black_promln
Year
2000	6115.058976	17697.409882
2001	6829.701429	20431.707645
2002	7282.333249	20972.838329
2003	7857.691182	22218.966500
2004	8826.576863	26308.815799
2005	9713.826255	30616.569637
2006	10252.894313	33189.382429
2007	10566.527362	34100.495064
2008	10580.520024	34052.276749
2009	10889.263592	33954.651792
2010	10977.017218	33884.236826
2011	11035.346176	32946.454471
2012	11562.836825	33150.706035
2013	11211.113491	32207.571607
2014	11227.354594	31517.346141
2015	11564.786088	31764.865490
2016	12193.026562	33186.064958
2017	12656.261666	34900.390499
2018	13180.171893	37805.202605

plt = df_crimes1.plot(xticks=df_crimes1.index, color=['g', 'olive'])
plt.set_xticklabels(df_fenc_agg.index, rotation='vertical')
plt.set_xlabel('')
plt.set_ylabel('-  \n 1   ')
plt

1. 2 , , , 3 ( ).
2. ( 2 18 ). , : 2001 2006 . , 2007 2016 , 2017 . 2 ( ).
3. 2007-2016 ., , .

, :

- ?

- 3 .

: , " , ?"

- - .

, :

#  
df_uof_crimes = df_fenc_agg.join(df_crimes1, lsuffix='_uof', rsuffix='_cr')
#    (.   )
df_uof_crimes = df_uof_crimes.loc[:, 'White_pop':'Black_promln_cr']

	White_pop	Black_pop	White_promln_uof	Black_promln_uof	White_promln_cr	Black_promln_cr
Year
2000	218756353	35410436	1.330247	4.179559	6115.058976	17697.409882
2001	219843871	35758783	1.605685	4.418495	6829.701429	20431.707645
2002	220931389	36107130	1.643044	4.458953	7282.333249	20972.838329
2003	222018906	36455476	1.747599	4.910099	7857.691182	22218.966500
2004	223106424	36803823	1.949742	4.265861	8826.576863	26308.815799
2005	224193942	37152170	2.016112	4.871855	9713.826255	30616.569637
2006	225281460	37500517	2.041890	5.653255	10252.894313	33189.382429
2007	226368978	37848864	1.983487	5.786171	10566.527362	34100.495064
2008	227456495	38197211	1.943229	5.576323	10580.520024	34052.276749
2009	228544013	38545558	2.091501	6.459888	10889.263592	33954.651792
2010	229397472	38874625	2.205778	5.633495	10977.017218	33884.236826
2011	230838975	39189528	2.499578	7.399936	11035.346176	32946.454471
2012	231992377	39623138	2.724227	7.621809	11562.836825	33150.706035
2013	232969901	39919371	2.974633	7.765653	11211.113491	32207.571607
2014	233963128	40379066	3.009021	6.538041	11227.354594	31517.346141
2015	234940100	40695277	3.102919	6.683822	11564.786088	31764.865490
2016	234644039	40893369	3.081263	6.578084	12193.026562	33186.064958
2017	235507457	41393491	3.154889	6.401973	12656.261666	34900.390499
2018	236173020	41617764	3.281493	6.367473	13180.171893	37805.202605

, :

White_pop -
Black_pop -
White promln_uof - ( 1 )
Black promln_uof - ( 1 )
White promln_cr - , ( 1 )
Black promln_cr - , ( 1 )

, ... , :)

, . - :)

plt = df_uof_crimes['White_promln_cr'].plot(xticks=df_uof_crimes.index, legend=True)
plt.set_ylabel('-     1  .')
plt2 = df_uof_crimes['White_promln_uof'].plot(xticks=df_uof_crimes.index, legend=True, secondary_y=True, style='g')
plt2.set_ylabel('-     1  .', rotation=90)
plt2.set_xlabel('')
plt.set_xlabel('')
plt.set_xticklabels(df_uof_crimes.index, rotation='vertical')
plt

, . , :

plt = df_uof_crimes['Black_promln_cr'].plot(xticks=df_uof_crimes.index, legend=True)
plt.set_ylabel('-     1  .')
plt2 = df_uof_crimes['Black_promln_uof'].plot(xticks=df_uof_crimes.index, legend=True, secondary_y=True, style='g')
plt2.set_ylabel('-     1  .', rotation=90)
plt2.set_xlabel('')
plt.set_xlabel('')
plt.set_xticklabels(df_uof_crimes.index, rotation='vertical')
plt

: "", : , .

, :

df_corr = df_uof_crimes.loc[:, ['White_promln_cr', 'White_promln_uof', 'Black_promln_cr', 'Black_promln_uof']].corr(method='pearson')
df_corr.style.background_gradient(cmap='PuBu')

	White_promln_cr	White_promln_uof	Black_promln_cr	Black_promln_uof
White_promln_cr	1.000000	0.885470	0.949909	0.802529
White_promln_uof	0.885470	1.000000	0.710052	0.795486
Black_promln_cr	0.949909	0.710052	1.000000	0.722170
Black_promln_uof	0.802529	0.795486	0.722170	1.000000

: = 0.885, = 0.722. , , , ( ), . , , , .

, . ( , , ). : ( 100, %):

#   ( )
df_uof_crimes_agg = df_uof_crimes.loc[:, ['White_promln_cr', 'White_promln_uof', 'Black_promln_cr', 'Black_promln_uof']].agg(['mean', 'sum', 'min', 'max'])
# ""   
df_uof_crimes_agg['White_uof_cr'] = df_uof_crimes_agg['White_promln_uof'] * 100. / df_uof_crimes_agg['White_promln_cr']
df_uof_crimes_agg['Black_uof_cr'] = df_uof_crimes_agg['Black_promln_uof'] * 100. / df_uof_crimes_agg['Black_promln_cr']

	White_promln_cr	White_promln_uof	Black_promln_cr	Black_promln_uof	White_uof_cr	Black_uof_cr
mean	10238.016198	2.336123	30258.208024	5.872145	0.022818	0.019407
sum	194522.307758	44.386338	574905.952459	111.570747	0.022818	0.019407
min	6115.058976	1.330247	17697.409882	4.179559	0.021754	0.023617
max	13180.171893	3.281493	37805.202605	7.765653	0.024897	0.020541

plt = df_uof_crimes_agg.loc['mean', ['White_uof_cr', 'Black_uof_cr']].plot.bar(color=['g', 'olive'])
plt.set_ylabel(' -   - ')
plt.set_xticklabels(['', ''], rotation=0)

, , . , , - .

1. ( ). : , .
2. , " " , ( ). , "" ( -> -> -> ).
3. , . .

, :

- , ?

- Yes, such a correlation is observed, although it is heterogeneous across races: for whites it is almost perfect, for blacks it is almost imperfect.

In the next part of the article, we will look at the geographic distribution of the analyzed data across the US states.

Link to the English version of the article (at the request of workers).

Black [O] lives Matter: Race, Crime, and Fire to Kill in the United States. Part 2

Go!

Death at the hands of the law

More articles: