"Bike kit" or creating homemade datasets for analysis and machine learning

A few years ago, I got briefly interested in machine learning and data analysis, even wrote a short series about my immersion in this amazing world, from the point of view of a complete beginner.



As often happens when learning something new, I really wanted to make my own "bike". Unfortunately, I don't know much about mathematics and programming, so my own dataset became a candidate for the role of a "bicycle".



More than two years have passed since that moment and now I have got around to sharing my little experience with you.



In this article, we will consider several potential sources for self-collecting data (including not very popular ones), and also try to find at least some benefit in this process.





Table of Contents:

Part I: Introduction

Part II: Data Sources

Part III: Does It Benefit?

Part IV: Conclusion





Part I: Introduction



From the introductory part, you probably already guessed that I am not a data analysis and machine learning guru. I can hardly be called a pioneer in the field of searching for sources of open data. Therefore, this article is not about good practices, but about quenching the "itch in your hands" in case you came up with the idea to create your own dataset.



.



. . (, ), «5 ».



, « » , « » .





II:



.





. .



Kaggle. Kaggle .



, - , .



.



« » 2010- , - 2015 .



2017 . . API . , , : « , ».



.





. , API .



, , :



  • , .
  • , « ».




, . , . «, ...» , , .



, .



.



. , API .

, .



( ), .



, - () - .





« », .

, .



, , .



.





, « » , :



  • – , .
  • – . , .csv. ( ).

  • , , Sportradar API. . , .
  • , , .


,



. . .



. , .



. , . 2016 2020 GitHub.

, -, .



, , (, ).





III: ?



. , .



, :



  1. Python ( ) . , - .
  2. , .
  3. . , . , , .
  4. , . , , «», . , , , .
  5. , - .




IV:



, «», , .



, , : « », .



.



- , .




All Articles