Neural networks: where to get data for fine learning of algorithms?

Hello, Habr! My name is Alisa Neveikina and I work for a startup from Belarus SmartCoders. We are engaged in the development of neural networks and AI-based solutions for business. This post was the result of deep thoughts about the features of the development of projects using machine learning, as well as the models of monetization of these technologies. If you have already worked with AI or are planning to do so, I invite you to join the discussion.



The development of AI is determined by the algorithms that are embedded in the neural network. However, they can only be verified by relying on significant amounts of data. If we are talking about an already run-in system that has passed the baptism of fire at least in one of the companies, then everything is clear. But what to do when the neural network needs data “like air” to prove its viability?



We started working as contractors on various projects, fulfilling the tasks of the customer. This is how the solution for the Salary2.me project appeared, which helps to determine the real salary of an IT worker in Moscow, Kiev, Minsk and in many European cities.



But in order to claim universality, these technologies lack machine learning on existing data sets.



Where can I get data for machine learning?



At the stage of developing the algorithms themselves, you can use some kind of synthetic data sets. However, this is not enough to further improve the functions of AI. We need live data sets to find corner cases, check how algorithms behave on different samples, and so on. But getting such a set turns out to be not so easy, because:



Everyone is afraid of data privacy



, , , , -. , .





— . ? , , .





, . , . .



?



, “” . , , . -, , . — , , . “ ”.



,



  • - , . , , , . ?
  • , , , , . - . , . , ?
  • , . , , , “” .




, . , , fine tuning .



, , . , , , . .




All Articles