A few years ago, I got briefly interested in machine learning and data analysis, even wrote a short series about my immersion in this amazing world, from the point of view of a complete beginner.
As often happens when learning something new, I really wanted to make my own "bike". Unfortunately, I don't know much about mathematics and programming, so my own dataset became a candidate for the role of a "bicycle".
More than two years have passed since that moment and now I have got around to sharing my little experience with you.
In this article, we will consider several potential sources for self-collecting data (including not very popular ones), and also try to find at least some benefit in this process.
Table of Contents:
Part I: Introduction
Part II: Data Sources
Part III: Does It Benefit?
Part IV: Conclusion
Part I: Introduction
From the introductory part, you probably already guessed that I am not a data analysis and machine learning guru. I can hardly be called a pioneer in the field of searching for sources of open data. Therefore, this article is not about good practices, but about quenching the "itch in your hands" in case you came up with the idea to create your own dataset.
.
, « » , « » .
II:
.
. .
Kaggle. Kaggle .
, - , .
.
« » 2010- , - 2015 .
.
, , :
.
. , API .
, .
( ), .
, « » , :
,
. . .
. , .
. , . 2016 2020 GitHub.
, -, .
III: ?
. , .
, :
- Python ( ) . , - .
- , .
- . , . , , .
- , . , , «», . , , , .
- , - .
IV:
, «», , .
, , : « », .
.
- , .