The Best of Kaggle: What Competitive Data Science Is and How to Succeed in It

Hello Habr! In the blog on our website, we regularly publish articles about data and everything related to it. We publish some materials from there and here.



How do companies know which data scientist is cooler when they hire them? How to show your talent and become famous in the community? What is the basis for the rating, based on which you can then be hired for a prestigious position? We will tell you about the most famous competitive platform, the possibilities and rules of its game, and also reveal the list of the best participants from Russia.










Data science is, by definition, a science. Therefore, in order to evaluate developers and analysts, the Hirsch index , widespread among scientists, has been and is being applied for a long time  . It helps, by the number of publications and their citation, to understand how much scientific work is in demand - and hence its author. 

The Hirsch index h is equal to the number of articles, each of which was referred to at least h times. That is, to calculate it, they take all the scientist's articles that were quoted by his colleagues, arrange them in decreasing order of the number of references to them, assigning them numbers. After that, they find the last article, whose number does not exceed the number of citations. This number is the Hirsch index.
Complicated? It seems not very good, and real data scientists understand right away - just not very suitable for evaluating their work. After all, the result of their work is much more often a code, not a scientific text. In addition, data scientists are in demand in the market, and the market is more important about examples of algorithms than achievements in science. 



But often companies keep information about their employees and their work secret. Data Scientists are especially carefully hidden in Russia, where  there is a  huge shortage of personnel in this area. 



In response to demand, competitive platforms for developers have grown in popularity. The most famous service is Kaggle (pronounced: "cajl"), which is owned by Google. Students use  it and professional developers  tellhow to upgrade your rating. The solutions applied there set fashion among data scientists, and companies in Russia and in the world pay attention to their place in Kaggle's ratings when hiring. 



In 2017,  more than a million users were registered in Kaggle  , and in August 2020, users from Russia  googled the  service almost as often as the phrase "Big Data": 







Kaggle is completely free, and any user can host a data mining competition or participate in an existing one. The system hosts open data sets  , as well as cloud tools for their processing and machine learning. There is also an opportunity to study and a section for posting vacancies, where contests will also help to select the best candidates. 



How it works



One of the interesting features of Kaggle, thanks to which it became so popular in the data science environment, is  the rating system



Users can earn points and improve their ranking in four different categories: 



  • Competition.  Alone or as a team, you solve machine learning problems. Competitions are very diverse: from a simple and straightforward task of predicting the  number of survivors on the Titanic  to  assessing the effectiveness of defense players  when playing a pass from the NFL Big Data Bowl 2021.
  • Program code.  Share your code with the community by running it on Kaggle Notebooks, a cloud computing environment.
  • Data sets.  You can help other data scientists by sharing new data.
  • Discussions.  Discuss tasks and share your best solutions, as well as rate other users' posts.


Promotion in each of the categories does not depend on the others. Different levels of achievements are available in them: 



  • Beginner.  You just need to register.
  • Participant.  You filled out your profile and talked to the community, and also used all the platform's features:

    - Run one script.

    - We took part in one competition.

    - We wrote one comment.

    - We gave one vote to one of the participants.

  • .  Kaggle . , Kaggle . 
  • .  , Kaggle . «» , .
  • .  . .


Medals are awarded for excellent results in competitions, popular program code or useful data set and remain forever. At the same time, points lose their value over time, which allows the overall ranking to remain relevant.



Who comes first?



Most of all Kaggle has  registered  users from India and the USA. Russians occupy a stable fifth place in the overall rating of countries - between China and Japan. The first place in the overall ranking of data  science competitions  is taken by Guanshuo Xu, a data scientist from New York. In five years, he scored over 255 thousand points in Kaggle competitions (this is an absolute record).



Guanshuo  graduated Bachelor's degree in Electrical and Electronic Engineering from Tongji University in Shanghai, and then entered the Master's degree at the University of New Jersey. Since 2010, he has been working on image recognition and machine learning algorithms, in 2017 he first became a grandmaster at Kaggle, and since 2019 he has been working as a Data Scientist at H2O.ai (Cisco, Intel and PayPal use the algorithms of this company). 



The best data scientists from Russia according to Kaggle



To compile a list of the best practicing data scientists in Russia, we used the  data of the  participants of the Kaggle competitions, who have personal information.



The strongest  Russian developer participating in the Kaggle competition  Dmitry Gordeev  ( dott ) also works at H2O.ai. He signed up with Kaggle eight years ago and has 114,000 points today.



In the overall Kaggle ranking, he  is ranked ninth... Dmitry graduated from Moscow State University in 2010, doing image recognition and data mining there. Having worked in the retail risk modeling group at a bank since 2008, he has grown to a divisional director and moved to Austria in 2013. In 2014, he completed a data science course on Coursera, and in 2020 he joined the  team  at H2O.ai.



On the  second place  among Russian Data Scientist in the rating Kaggle competitions - Arthur Kuzin ( n01z3 ) - it takes the 28th place in the overall ranking of Kaggle, having more than 71 thousand points. 



Arthur graduated from the Moscow Institute of Physics and Technology in 2011 and worked in research analytics from 2008 to 2016. After that, he got a job at Avito as a Data Scientist, and for the past few years has been leading the Computer Vision team at X5 Retail Group. Arthur has  several  physics publications and a patent for a device for calibrating transmission electron microscopes.



The third place  in the overall ranking of Kaggle competitions among Russians is taken by Artem Kulakov ( Art) - in the overall ranking he is 29th and 71 thousand Kaggle points, which he earned over two years of participation in the competition. Artem is studying at the Higher School of Economics with a degree in Computer Science and has already worked as a Data Analyst in Tinkoff Bank and Megafon. Artem is now freelancing and specializing in Computer Vision and NLP tasks.



In fourth place is Roman Soloviev ( ZFTurbo ) - he has 69 thousand points and 31st in the overall ranking of Kaggle competitions. Roman is a leading researcher at the Institute for Design Problems in Microelectronics of the Russian Academy of Sciences.



In fifth place is  Ilya Larchenko ( ilialar), currently ranked 37th in the overall Kaggle rankings with 65k points. Ilya graduated from Moscow Institute of Physics and Technology in 2014, and then worked as an analyst and developer. Since 2017, he  led  the Data Scientist team at DOC +, and in 2020 he moved to Thailand, where he works as Data Science Manager at Agoda. 



A small element of gamification that allows users to earn points and medals in Kaggle competitions has changed the hiring game. 



The example of the best data scientists from Russia shows that education and experience working with data are not so important for building a successful career. For example, Artem Kulakov is still studying at the university, and began taking part in competitions on Kaggle only two years ago. Now he is on the list of the best data scientists in Russia and works as a freelancer. Guangshuo Xu graduated with a bachelor's degree in Electrical and Electronics Engineering and now works at H2O.ai, a leader in open source data science solutions.



Start with simple tasks today - and who knows, maybe in a year or two you will be in the ranking of the best data scientists and move progress forward, implementing HIV research technologies  , models for  predicting congestion of highways and much more. The main thing is to have the desire to develop in the field of Data Science and to practice as much as possible. 



image






Recommended articles






All Articles