Formula 1 and the same plate with the speed of the pilots

Occasion

A couple of months ago, as part of the collaboration between Amazon and Formula 1, researchers "using the power of cloud technologies" rolled out a comparison of the speed of pilots of all time ( link ). Naturally, the material was hype advertising, and it achieved its goal. In the whole formula world for a couple of days, there was only talk in the spirit of "why is there no pilot N in the rating?" and "how can M be faster than K if K made it in season L." It became interesting for me to more or less repeat this research and, if possible, do without "all the power of cloud technologies."

original rating
original rating

A task

. . , , , , , . . AWS – . , . .

F1, – 10% , 90% - . - , .. , , . , , . , , .

. , , «» . , 2 , , « » - , . , . , , , .

. , , . «» , , 1. , , , .

, , kp, .

.. V = V0*kp*kc*Tr, kc – , Tr – , .

, ln(ki)-ln(kj)=ln(tj)-ln(ti) .

. m*(n+1), n – , m – . 1 -1 , , .

AWS, , , - . , , « » - , . .. «» 2 0 .

, F1. . , , , , . , , , - .

  , , 2 . 3 . 1 – , . , «». – . , . . – . , , .

, . , 1, 2 3 . , , .

import sys
import re
import urllib.request


def get_wikipedia_page(title, lang='en'):
    url = 'https://'+lang+'.wikipedia.org/wiki/'+(title.replace(' ', '_'))
    fp = urllib.request.urlopen(url)
    mybytes = fp.read()
    
    mystr = mybytes.decode("utf8")
    fp.close()
    return mystr

title = 'List of Formula One Grands Prix'

try:
    print('process: '+title)
    th = get_wikipedia_page(title)
    r1 = re.findall(r'href="/wiki/[\d][\d][\d][\d]_[\w]*_Grand_Prix"',th)
    list_of_GP = list(set(r1))
except:
    print("Unexpected error:", sys.exc_info()[0])

titles = list(map(lambda x: x[12: -1].replace('_', ' '), list_of_GP))


for title in titles:
    try:
        print('process: '+title)
        th = get_wikipedia_page(title)
        with open('texts/'+title+'.txt', 'w', encoding='utf8') as the_file:
            the_file.write(th)
            the_file.close()
    except:
        print("Unexpected error:", sys.exc_info()[0])

html . . ( ), ( ), «» . csv.

DataFrame. , :

+ . ,

+ – , 1 , 2

= + . , . , . - , .

qual_df = pd.read_csv('qual_results.csv')

qual_df['Track_pl_len'] = qual_df['Track'] + '_' +qual_df['Track_len'].apply(str)
qual_df['Car'] = qual_df['Constructor'] + '_' +qual_df['Year'].apply(str)
qual_df['Driver_pl_year'] = qual_df['Driver']+'_'+qual_df['Year'].apply(str)

qual_df_2 = qual_df.copy()
qual_df_2['Driver_pl_year'] = qual_df_2['Driver']+'_'+((qual_df_2['Year'].apply(int)-1)).apply(str)
double_df = pd.concat([qual_df, qual_df_2])
del qual_df2

. , . . , , . , , . .

. 2 , .

  • , ..

  • , 2 –

  • , «» «» - 6 2%. , «» 1 – , . ,

  • 2 One Hot Encoding , x = 1 , y -1

  • , . , – 1 , .. 1, 0. –

  • . ln(k) - .

, .

, , «» , .. , F1 Amazon.

, :

,

18 , . , + + , . , 10. , . . , .

– . 2019 , . , 1.

– . . , . , 1 . . 1, / .

, :

N

Driver

Total min

1

Ayrton Senna

-     0,435

2

Michael Schumacher

-     0,408

3

Alain Prost

-     0,289

4

Damon Hill

-     0,037

5

Lewis Hamilton

-     0,037

6

Charles Leclerc

       0,016

7

Rubens Barrichello

       0,024

8

Fernando Alonso

       0,067

9

Nico Rosberg

       0,081

10

Nigel Mansell

       0,102

11

Carlos Pace

       0,117

12

Mika Häkkinen

       0,145

13

Max Verstappen

       0,147

14

Valtteri Bottas

       0,153

15

Elio de Angelis

       0,164

16

Daniel Ricciardo

       0,165

17

Jarno Trulli

       0,172

18

Giancarlo Fisichella

       0,184

  ( , .. 1 ):

  , .

:

2020 – , ( – 1979 , ).

, Renault , . Racing Point, 2 , Alpha Tauri 2020, Red Bull 2019 Ferrari 2020, - 2018 .

, 2019 Mercedes, 10 , Red Bull . , , , , .

. , . - , . . , .

, ( , .. 2 , 1):

Driver

Car

 Time_predicted_sec

Lewis Hamilton

Mercedes

                        77,711

Valtteri Bottas

Mercedes

                        77,850

Max Verstappen

Red Bull Racing-Honda

                        78,252

Lando Norris

McLaren-Renault

                        78,324

Sergio Pérez

Racing Point-BWT Mercedes

                        78,345

Lance Stroll

Racing Point-BWT Mercedes

                        78,439

Daniel Ricciardo

Renault

                        78,451

Carlos Sainz Jr.

McLaren-Renault

                        78,549

Esteban Ocon

Renault

                        78,665

Alexander Albon

Red Bull Racing-Honda

                        78,878

Pierre Gasly

AlphaTauri-Honda

                        78,985

Daniil Kvyat

AlphaTauri-Honda

                        79,108

Charles Leclerc

Ferrari

                        79,116

Sebastian Vettel

Ferrari

                        79,531

Romain Grosjean

Haas-Ferrari

                        79,656

Kevin Magnussen

Haas-Ferrari

                        79,738

Kimi Räikkönen

Alfa Romeo Racing-Ferrari

                        80,399

Antonio Giovinazzi

Alfa Romeo Racing-Ferrari

                        80,658

1 Amazon . : .

, , , / , , .

, . , , – 70 1. , .

It was not possible to build a decent regression, a seemingly problem-free model still throws up unexpected results. All attempts to construct a unified model, taking into account both the influence of the machine and the pilot, have come to the necessity of strict regularization.

I attach a github to the article with initial data and pivot tables for self-study, I'll add the code as soon as I'm not so ashamed of it.

Link to Github




All Articles