When we redesigned our Deep Learning course at the end of last year to make it more visual and case-based from real business practice, we included a new module on data markup on the Yandex.Toloka crowd platform.
But since crowdsourcing is not the only way to markup, we have prepared for new students of the course a translation of this article from the Lionbridge blog with an overview of the main approaches to data markup. We hope you find it useful too.
The quality of a machine learning project directly depends on how you approach the solution of 3 main tasks: data collection, its preprocessing and markup.
Markup is usually a complex and time-consuming process. For example, image recognition systems often involve drawing bounding boxes around objects, while product recommendation systems and sentiment analysis systems may require knowledge of the cultural context. Do not forget also that a data array can contain tens or more thousands of samples that need markup.
, , . , 5 .
:
In-house: , . : . , , , -.
: , . ., . , , . , ; , . , , .
: β . - , . , , . , , .
: , , . - (GAN). GAN ( ), . - . GAN . . , , , .
Β« Β»: . , , . , , , . , , .
:
| ||
In-house |
|
|
|
|
, |
|
| |
, |
|
|
|
|
|
|
|
|
. : , , , . .
-------------
Deep Learning 6.0 Newprolab 9 .
- Deep Learning 7.0 - c 30 22 2021 .