Rescue an ordinary datasinter. How to work on computer vision to complete a project and not lose yourself



My name is Alexandra Tsareva. My colleagues and I are working on projects in the field of computer vision at the Machine Learning Center of Jet Infosystems. I would like to share our experience in the development and implementation of projects in the field of computer vision.



In this article I will talk about how the process of work of a datasetist on a project looks like not from a "spiritual" and, in fact, datasignist point of view, but more from an organizational one. And I hope that this post will be followed by several more and will be able to write a small series.



I’ll make two important points right away:



  1. These steps apply to almost any dataset project. But some moments are caused by the hype effect around CV, some glory of the "silver bullet" in computer vision and the customer's desire "to have it with the neural network."
  2. , , — , - . , , ( , ..) , — - .


: ?



When a customer decides that he needs datasetists and some kind of artificial intelligence that will help him, first of all, he needs to understand what problem he is going to solve. At this stage, a datasetist acts as a data "psychoanalyst" and asks in detail about data, external constraints from the point of view of business and problems that one would like to solve in an ideal world. The customer often already knows everything about the future task - you just need to help him understand and formalize this knowledge (understand his internal data world, and sometimes - and come to terms with its peculiarities).



Of course, computer vision is a very interesting area, there is always something to count and do with your hands. But all this is quite expensive - both in terms of development hours, and in the cost of specialists, and in the equipment they require. We cannot help but wonder if the optimal solution to the problem really requires CV. Perhaps there are other machine learning tools that are more suitable and can better solve the problem with a shorter development time and higher accuracy?



I'll show you the idea with a simple example. One retailer wanted to implement CCTV image recognition to keep track of how many people are queuing at the checkout. It would seem an obvious task - there is a video archive, there are even pre-trained neural networks - "counters". Sign the schedule, do it.



But from a conversation with a retailer, the datasinter learns that the task is not related to the load on the cashier at a particular moment in time. The global task is not to call unnecessary employees to replace, but at the same time to avoid queues. The retailer has a large database that shows the number of buyers who visited the store (if you saw the sellers bending over the frames at the exit from the store, you saw the simplest implementation of such a counter), data on purchases from the checkout ... And in fact, the task is not count people in queue, and predict the workload on cashiers and optimize their work schedule.



You can, of course, solve it by counting people in archived videos. But tabular data is usually stored deeper and easier to process. It is fair to offer an alternative - perhaps the customer has only heard that CV is wow.



So, at the first stage, we make sure that the problem that the customer wants to solve is really a machine vision problem, and that this technology is best suited for solving it.



Step two: how will we solve the CV problem and assess the success of the solution?



The second stage is the statement of the mathematical problem, the choice of metrics.



Are there any known solutions, neural networks, that can solve this problem? Maybe even a boxed product? If the task is new - maybe there are publications on which we can rely and preliminary assess the attainable quality?



At this stage, we discuss the metrics for solving the problem from the point of view of our work, as datascientists, and from the point of view of the customer, in the context of solving a business problem.



Working with data sometimes begins at the same time as agreeing on the solution and metrics, but for convenience we will separate it into a separate step.



Step three: explore and understand our data



It is quite important to assess whether we have enough data to solve the task at hand. Obviously, small datasets will be rejected even at the stage of setting, but in the process of getting to know the business problem, new nuances may arise. Situations are different: we can have 1000 images, of which only 10 belong to the required class, and no secret machine vision technologies will help us.



Maybe we will immediately understand that it is not difficult to make the existing dataset better - to ask employees to photograph more objects than are routinely filmed, to collect some additional data through outsourcing or open datasets.



At the same stage, drawbacks are noted in terms of future model learning, and many of them can return us to the step when we re-discuss the fundamental solvability of the problem. The most common example is low diversity in data or underrepresentation of one of the classes. It is possible to compete to increase the diversity using various augmentation techniques, but this does not always allow us to make a model ready for the real world. However, if it seems to us that we can cope with the difficulties in the data, the fourth step comes to the rescue.



Step four: develop a prototype model



At this stage, we are not talking about a model ready for implementation, but about a prototype that will answer us and the customer to the question of whether it is worth continuing to work in this direction, whether the possible result meets the expectations (ours and, most importantly, the customer's). After working with the data, we start developing a pilot model and evaluate its quality. Some obvious things: at this stage, the model is validated against a deferred dataset The two main options are the images we put aside, or the images that our client typed while we were working on the project.



The term of work on the project at the pilot stage for the development of a prototype is at least a month. During this time, in most cases, the customer can accumulate data for testing. This is a good test for datasetists to see how serious they are to solve the problem: if we take data that we have never seen, we check how our model can generalize and whether it fits the answers to the validation dataset (which, of course, is also deferred. but it's too late to find out that everything has changed in the real world since then, it will be quite a shame).



On the other hand, at the moment of receiving the deferred dataset, we can find out to what extent the objects falling into the sphere of interests of the project customer (and this also applies to internal customers) are stable and correspond to the sample used for training and validating the model. After all, such a situation is quite possible: computer vision is being implemented as a large project to digitize everything and everyone, and at the same stage important changes occur in the data source (for example, the main conveyor has been completely rebuilt, and the images coming from it have fundamentally changed).



Analysis of the pilot's results helps to decide on the future of the project. It may turn out that the customer has very high requirements for the level of accuracy: for example, 99% of the answers must be correct, and in the previous steps it seemed to us that this was fundamentally achievable. But on the pilot, we achieved 93% accuracy and, of course, we cannot promise a guaranteed 6% gain. It is logical to discuss the options for the development of the project with the available pilot results with the customer - about collecting additional data, reducing the required metric, or even freezing the project until new breakthroughs in the CV sphere.



The first four steps can be shown with the following diagram:







They take much less time than actual development. Nevertheless, the future success of the project is laid down - how much it will meet the customer's expectations and really solve his problem.



Step five: do the rest;)



The whole project, from concept confirmation to implemented solution, will look something like this:





The plan and terms of project development are approximate. In real projects, different nuances always arise, since the data is different for everyone, they are collected at different speeds, and the stage of introducing computer vision within the company itself can be very different depending on what the integration is planned with, who is conducting it, etc. .P. After all, it's one thing when it comes to installing an identification system for employees entering the office based on recordings from surveillance cameras for a long period of time: we can immediately start working with them. It's another matter if we have a small number of example images with which we will make a proof of concept - we will check whether the problem is solved in principle, and how long it will take to collect a full-fledged dataset, we do not know.



Thus, the slide shows very approximate timelines for an abstract project in a vacuum, but I think it is useful to know that the full cycle from proof of concept to implementation and implementation takes about a year. These terms can both increase for complex tasks, and decrease in case if the solution “out of the box” is preferable or the problem is well known and does not require research work.



All Articles