Occasion
A couple of months ago, as part of the collaboration between Amazon and Formula 1, researchers "using the power of cloud technologies" rolled out a comparison of the speed of pilots of all time ( link ). Naturally, the material was hype advertising, and it achieved its goal. In the whole formula world for a couple of days, there was only talk in the spirit of "why is there no pilot N in the rating?" and "how can M be faster than K if K made it in season L." It became interesting for me to more or less repeat this research and, if possible, do without "all the power of cloud technologies."
A task
. . , , , , , . . AWS – . , . .
F1, – 10% , 90% - . - , .. , , . , , . , , .
. , , «» . , 2 , , « » - , . , . , , , .
. , , . «» , , 1. , , , .
, , kp, .
.. V = V0*kp*kc*Tr, kc – , Tr – , .
, ln(ki)-ln(kj)=ln(tj)-ln(ti) .
. m*(n+1), n – , m – . 1 -1 , , .
AWS, , , - . , , « » - , . .. «» 2 0 .
, F1. . , , , , . , , , - .
, , 2 . 3 . 1 – , . , «». – . , . . – . , , .
, . , 1, 2 3 . , , .
import sys
import re
import urllib.request
def get_wikipedia_page(title, lang='en'):
url = 'https://'+lang+'.wikipedia.org/wiki/'+(title.replace(' ', '_'))
fp = urllib.request.urlopen(url)
mybytes = fp.read()
mystr = mybytes.decode("utf8")
fp.close()
return mystr
title = 'List of Formula One Grands Prix'
try:
print('process: '+title)
th = get_wikipedia_page(title)
r1 = re.findall(r'href="/wiki/[\d][\d][\d][\d]_[\w]*_Grand_Prix"',th)
list_of_GP = list(set(r1))
except:
print("Unexpected error:", sys.exc_info()[0])
titles = list(map(lambda x: x[12: -1].replace('_', ' '), list_of_GP))
for title in titles:
try:
print('process: '+title)
th = get_wikipedia_page(title)
with open('texts/'+title+'.txt', 'w', encoding='utf8') as the_file:
the_file.write(th)
the_file.close()
except:
print("Unexpected error:", sys.exc_info()[0])
html . . ( ), ( ), «» . csv.
DataFrame. , :
+ . ,
+ – , 1 , 2
= + . , . , . - , .
qual_df = pd.read_csv('qual_results.csv')
qual_df['Track_pl_len'] = qual_df['Track'] + '_' +qual_df['Track_len'].apply(str)
qual_df['Car'] = qual_df['Constructor'] + '_' +qual_df['Year'].apply(str)
qual_df['Driver_pl_year'] = qual_df['Driver']+'_'+qual_df['Year'].apply(str)
qual_df_2 = qual_df.copy()
qual_df_2['Driver_pl_year'] = qual_df_2['Driver']+'_'+((qual_df_2['Year'].apply(int)-1)).apply(str)
double_df = pd.concat([qual_df, qual_df_2])
del qual_df2
. , . . , , . , , . .
. 2 , .
, ..
, 2 –
, «» «» - 6 2%. , «» 1 – , . ,
2 One Hot Encoding , x = 1 , y -1
, . , – 1 , .. 1, 0. –
. ln(k) - .
, .
, , «» , .. , F1 Amazon.
, :
,
18 , . , + + , . , 10. , . . , .
– . 2019 , . , 1.
– . . , . , 1 . . 1, / .
, :
N | Driver | Total min |
1 | Ayrton Senna | - 0,435 |
2 | Michael Schumacher | - 0,408 |
3 | Alain Prost | - 0,289 |
4 | Damon Hill | - 0,037 |
5 | Lewis Hamilton | - 0,037 |
6 | Charles Leclerc | 0,016 |
7 | Rubens Barrichello | 0,024 |
8 | Fernando Alonso | 0,067 |
9 | Nico Rosberg | 0,081 |
10 | Nigel Mansell | 0,102 |
11 | Carlos Pace | 0,117 |
12 | Mika Häkkinen | 0,145 |
13 | Max Verstappen | 0,147 |
14 | Valtteri Bottas | 0,153 |
15 | Elio de Angelis | 0,164 |
16 | Daniel Ricciardo | 0,165 |
17 | Jarno Trulli | 0,172 |
18 | Giancarlo Fisichella | 0,184 |
( , .. 1 ):
, .
:
2020 – , ( – 1979 , ).
, Renault , . Racing Point, 2 , Alpha Tauri 2020, Red Bull 2019 Ferrari 2020, - 2018 .
, 2019 Mercedes, 10 , Red Bull . , , , , .
. , . - , . . , .
, ( , .. 2 , 1):
Driver | Car | Time_predicted_sec |
Lewis Hamilton | Mercedes | 77,711 |
Valtteri Bottas | Mercedes | 77,850 |
Max Verstappen | Red Bull Racing-Honda | 78,252 |
Lando Norris | McLaren-Renault | 78,324 |
Sergio Pérez | Racing Point-BWT Mercedes | 78,345 |
Lance Stroll | Racing Point-BWT Mercedes | 78,439 |
Daniel Ricciardo | Renault | 78,451 |
Carlos Sainz Jr. | McLaren-Renault | 78,549 |
Esteban Ocon | Renault | 78,665 |
Alexander Albon | Red Bull Racing-Honda | 78,878 |
Pierre Gasly | AlphaTauri-Honda | 78,985 |
Daniil Kvyat | AlphaTauri-Honda | 79,108 |
Charles Leclerc | Ferrari | 79,116 |
Sebastian Vettel | Ferrari | 79,531 |
Romain Grosjean | Haas-Ferrari | 79,656 |
Kevin Magnussen | Haas-Ferrari | 79,738 |
Kimi Räikkönen | Alfa Romeo Racing-Ferrari | 80,399 |
Antonio Giovinazzi | Alfa Romeo Racing-Ferrari | 80,658 |
1 Amazon . : .
, , , / , , .
, . , , – 70 1. , .
It was not possible to build a decent regression, a seemingly problem-free model still throws up unexpected results. All attempts to construct a unified model, taking into account both the influence of the machine and the pilot, have come to the necessity of strict regularization.
I attach a github to the article with initial data and pivot tables for self-study, I'll add the code as soon as I'm not so ashamed of it.