How VK Data Scientists make advertising effective

Last year we hosted Artem Popov, the team lead of the VK Performance Advertising team. We share with you the transcript of the broadcast and the recording.








My name is Artem, I am the head of performance advertising at VK. Our team is engaged in the fact that, on the one hand, makes advertising on VK more effective, more profitable for advertisers, more interesting for users. This is a big product goal.



On the other hand, technically, we are a team of ML engineers, fairly ordinary developers who spend a lot of time on tasks related to data science and ML. Today I want to talk about these two topics, because they are both interesting to me, I like to talk about them. I very much hope that we will have live communication; if someone is watching the broadcast, it will be more interesting if you write questions.



Overall, I want to divide our conversation into two blocks. In the first one, I will talk about different tasks that are encountered in advertising in the intersection with data science. That is, why advertising can be an interesting area for an ML specialist, for a data science specialist. On the other hand, I want to share my experience of moving into an engineering ML project, what I studied during those four years that I was doing ML as part of engineering history. And talk about what things are found in large companies, but which are not talked about in different courses; what are the skills that are difficult to learn when you are studying data science or ML within the university or online education. I will try to devote half an hour to each topic.



First, I'll talk about advertising, advertising technologies, computational advertising as a field for research, scientific and engineering activities - that is, a field of knowledge; what tasks are there, and why is it fun to do them when there is NLP and other hype things, but advertising is not often talked about.



The overall challenge is: we have a set of advertisers. They can be any users who want to promote knowledge about their product, their business. They all have their own, different goals. Someone just wants to show their ad to as many people as possible; For example, the conditional Coca-Cola has the task of making everyone know about their brand, so that everyone remembers that there is such a drink, that it is a new year, and there is no other alternative than this drink if you go to the store. Another good example is Fairy: how many other detergents do you know besides him? It's all brand awareness; large advertisers set a goal - to make sure that everyone has in their heads the knowledge that a certain product exists.



There are other goals. For example, more applied, so-called performance goals. This is when the advertiser wants you to follow the ad and take an action. For example, they went to the site and left an application for a loan. Or you bought something from an online store. Etc.

In general, the task of advertising is to bring new users to some business (so-called leads) and to make these users do something useful for the advertiser and bring profit to his business.



We have a platform - a place where we allow you to display ads. In the case of VK, this is a tape; in the case of some other site, this could be, for example, a banner. The purpose of the site is to make money from advertising; it sells users' attention for money. Thanks to this, VK remains a free project; other monetization models could be thought of, but the ad model works well for a similar service.



Users usually don't want to see ads: that's not why they came to the service. But it turns out that the advertising contract works exactly like this: the user pays with his attention for using the service. Therefore, our goal as an advertising network is to make sure that users do not get mad, advertising does not push them away or scare them away.



It would be absolutely gorgeous if the ads turned out to be useful for the user - that is, the promoted businesses and posts turned out to be as interesting to the user as the regular content. This is generally perfect.



We have three forces: the user, the advertiser, and the site. We must establish such interaction between them so that each of them fulfills its goals. Imagine: you go to the VK service, open the feed - and you see the place where we insert ads. There are many advertisers who apply for this resource and want to be shown exactly their ads - but how to choose the ad that needs to be shown at any given moment to each user?



The standard method that is actively used in advertising is online auction. You may have seen different auction options in real life or on Ebay, for example: it could be a situation where everyone can bid. One user says - I bet 10 rubles, another comes in - bets 20, the third - 100, and so on. However, it is impractical to conduct auctions of the kind “who paid more, won” in the Internet environment, therefore closed auctions are used. In this case, each of the participants makes a bet silently. Figuratively speaking, all the pieces of paper with the bets are put into a pot, and then someone comes, takes them apart, finds the piece of paper with the highest number and says - you won.



Let's say there are two advertisers - for example Nike and Coca-Cola. One of them is willing to pay 5 kopecks for each display, and the other - 10. The other wins, and the further development of the story depends on the type of auction. There are two main types of advertising: first and second price auctions. In a first-price auction, the winner pays the price he has named. For example, Coca-Cola says, "I'll pay 10 kopecks," without seeing other users' bids; the auction says - OK, 10 kopecks. Nike says "5 kopecks", Coca-Cola wins and pays 10 kopecks.



However, there is also a second price auction: in this case, the winner must pay exactly as much money as is needed to win all other bids. In our case, a step of 1 kopeck can be used, for example. Imagine the situation: the same Coca-Cola and Nike come along. Coca-Cola says: "I am ready to pay 100 rubles for the show," and Nike says: "I am ready to pay 1 kopeck." And Coca-Cola will be very offended to learn that it could have won by paying 2 kopecks instead of 100 rubles. The second price auction is considered more fair to all participants.



Based on this, a certain property is considered very important for an advertiser: always put up an honest maximum price at auctions that you are willing to pay for an impression. This is necessary for building any competent trading strategy.



In the auction of the first price, you need to come up with a cunning strategy, think - here, such and such advertisers can bid so much, but in the auctions of the second price this is not necessary. Everyone places the rate they are willing to pay the maximum. If you won, you pay exactly the same amount or less; if not, then you were not ready to pay more. This is a great property of this type of auction, which has spread from theory to widespread use in advertising systems.



However, that was not the case. It would seem, from a theoretical point of view, the second price auction is very good, and its properties allow it to be very practical. But in fact, when we are faced with real systems, it turns out that there are several factors that ensure the popularity of first-price auctions - for some reason people prefer to use them, rather than second-price auctions. One of the two main points about which I would like to say is that the second price auction is not transparent. That is, when an advertising auction is organized by some platform about which you do not know anything - you just participate as an advertiser - it tells you that your bid of (say) 10 kopecks won, and you have to pay the second price - let it be 9 kopecks. This second price is opaque; it is not clear where it came from. Generally,the site can easily deceive the advertiser by making so-called fake bets. There are also fair mechanisms for additional bids - for example, “reverse price”: you declare that a particular auction cannot be sold for less than 9 kopecks, and such an honest bid appears. But transparency is very important, and its lack of transparency turns off advertisers. When you do not know what is happening under the hood of an advertising system, you again have to come up with some strategies: you cannot just take and use the approach of setting only those prices that you are willing to pay.and its absence turns advertisers off. When you do not know what is happening under the hood of an advertising system, you again have to come up with some strategies: you cannot just take and use the approach of setting only those prices that you are willing to pay.and its absence turns advertisers off. When you do not know what is happening under the hood of an advertising system, you again have to come up with some strategies: you cannot just take and use the approach of setting only those prices that you are willing to pay.



The second point is that the second price auction works well in an ideal setting with only one auction going on at a time. In a real advertising system, millions of them occur per second, conventionally. In such conditions, it is already necessary to come up with strategies, and the idea that the advertiser always has a simple and ready-made strategy for perfect trading at an auction breaks down.



I would like to talk about the moments that arise against the background of the tasks of designing auctions - how you can design them more interesting, more useful, and so on. I want to mention two points. Firstly, in the auction, it seems, everything should happen based on the maximum value, the maximum amount of money that the advertising network will earn by selling space; however, in reality, the value that is sold in an auction can be expressed in different ways for the site - not only in the amount that the advertiser is willing to pay. For example, it is very important for us that users are not infuriated by our advertising - it is desirable that it be useful to them. As part of the task of designing an auction, you have to come up with additional metrics for ranking ads. It turns out a number of metrics: expected profit, the likelihood of a negative user reaction, and so on.



The second variant of the problem is a “non-greedy” auction. In this case, we understand that we, as an advertising network, could earn more if we did not always sell advertising space to the winner, but distribute them among all advertisers in such a way as to maximize all their advertising budgets, while leaving some then honesty in the auction. That is, if we have several advertisers, and one of them is richer than the others and constantly buys out auctions, then perhaps we should not always give him the winner's bid. Such a "knapsack problem" - maybe you remember those from the courses on programming or data structure algorithms, if you had such a history. This is a very cool task, it is very interesting to do it, and there are scientific articles on it.



In VK we are just getting to this task; We have a lot of tasks, and another large part of the tasks that are in advertising are already closer to ML. Due to the fact that we have to fulfill different goals of advertisers, we need to predict with what probability the user will take this or that action in order to use this knowledge in forming the bidding strategy for the auction.



That is, the advertiser says: I want as many purchases in my store as possible, to spend my budget as well as possible. We, as an advertising platform, are thinking about how to highlight those users who are most likely to buy something in the store. A typical ML problem arises. Binary classification: '1' - the user clicked on the advertisement and did something good (bought), '0' - the user ignored the advertisement. Based on this problem, we need to build a binary classification, where at the output we use the probability of a useful event as a piece of further calculations of which bid we will use at the auction. The task sounds very trite in ML terms - like, take it and do it. But in real life, such a task has many difficulties, and I will tell you about a few. The main one isthat advertising is usually a high-load project. Both in terms of data and in terms of load. You need to respond to a request very quickly. The window is, say, 100 milliseconds - and you need to have time to respond. Based on this, certain restrictions are imposed on us - specifically, on how - in engineering - the models that we use to predict advertising can work. You can come up with a lot of cool things, make complex neural networks that, through several layers, highlight nonlinear interactions between features, but in life it is required that all this work very quickly.which we use to predict ads. You can come up with a lot of cool things, make complex neural networks that, through several layers, highlight nonlinear interactions between features, but in life it is required that all this work very quickly.which we use to predict ads. You can come up with a lot of cool things, make complex neural networks that, through several layers, highlight nonlinear interactions between features, but in life it is required that all this work very quickly.



It is trained on large amounts of data. The task turns out to be large-scale, with terabyte datasets. Based on this, you usually have to do things like distributed computing, distributed model training, think about which models to use at all. Advertising traditionally uses linear models such as logistic regressions or gradient boosting; sometimes they switch to cooler things like factorization machines. That is, the choice of models is large, but usually fairly simple ones are used - due to the heavy load.



Another point close to ML: we use data that is extremely unbalanced in terms of feedback. There is one 1 per 10,000 zeros. How to train ML machines under these conditions is a good question. You have to use different tricks; the main one is precisely the formulation of the problem: do not try to predict whether there will be a click (purchase) or not, for example, but use approaches related to probabilities, smooth out data with low cardinality to the maximum, and so on. Much can be thought of, but the difficulty is that positive events rarely occur in the task.



When you click through an ad and buy something, you don't always do it right away. That is, very often the targeted events that advertisers want happen with a long delay. This can be not only a purchase - for example, it can be registering in the game and pumping a character to level 10, or approving a loan application. This kind of thing can happen in a few days. And, when we need to train models, which, according to the ad, should say right now that the user will do something, but, in fact, we receive positive feedback only a few days after something happened, and this creates additional problems. To deal with these problems, there are several different solutions that have come up in science and practice. Such a funny complexity.



Another point: when a user sees advertisements for certain products, they can come in contact with them from various marketing channels. For example, you went out into the street - you saw a billboard "Taxi VKontakte"; started flipping through the feed - there is also an advertisement for "Taxi VKontakte", went to another service - and there is also an advertisement for "Taxi VKontakte", from the advertising service Yandex or Google. You say - yes, I understand already, put the application. And after that Yandex, VK, Google, the billboard needs to figure out whose participation in this process - in an action useful for the advertiser - is how many percent.



This process is called "attribution" - the issue of allocating which contact with the advertiser so strongly influenced your decision to take the targeted action. Usually simple models are used - for example, "last click attribution": the last ad you clicked on gets all the prizes. But then it turns out that it is unprofitable to engage in the first "cold contact" - showing ads to a user who is not familiar with the product. Based on this, there are many different ML models (including) that allow you to better distribute the profit from impressions. They use, among other things, attention-type chips, some more interesting things from neural networks. Cool task. And its goal, in the end, is to bargain at the auction better, more correctly, more correctly, and so on.



The better you understand who influenced the final purchase decision, the better you understand how much to deliver at any given moment. As a result, the entire industry is moving towards the realization that working with a prediction of the likelihood that you will buy something is less effective than working with an idea of ​​the degree of influence on the purchase decision. Perhaps you were already ready to buy the board game you visited. But online advertising is starting to catch up. Perhaps this advertisement would not have existed if everyone initially decided that you would buy this game anyway, and additional advertising is not needed - you have already made your decision. I think of the term "incrementality testing" and it still has to do with the ML domain of "causal inference." That is, we are moving from the task of predicting the probability of performing a target action to predicting the impact - that is,the difference in the likelihood of what you will buy after seeing the ad, minus what you already bought. This is also a very cool transition, a transition to a new idea of ​​how to work, how to predict some events.



In general, there are many tasks in advertising, and almost all of them follow from the initial desire to fulfill the advertiser's task. The goal "I want as many purchases in my store as possible" turns into a sequence of bids in the auction, because all advertisers are cut in the auction for impressions. This is a strategy. And we need to make the most cool transition from the original goal to the sequence of bets. This implies a desire to understand user behavior, extract commercial interests, automatically target ads, collect creatives - this is how ads look like - and so on.



If someone is interested in learning more about this, I have a talk at a meetup, where I talk in detail about what a data scientist can do in advertising, and why working on it can be interesting. It turned out to be a discovery for me. Initially, when I first entered the industry, I thought - well, advertising, what's interesting. The first, second, third year passed - and I realized how many cool tasks there are, and how interesting it is to do it from the point of view of an engineer.



I will now move on to ML in production. Now you have taken courses in machine learning, or studied at the university. You are a great Kegel specialist, let's say, and you are awesome clicking data analysis tasks in a competition. You come to a real company where a large product is being developed iteratively and consistently. And here some scrapping occurs. It turns out that you simply do not have many skills that you would like to own at the moment when you enter the industry. Nobody taught you this or explained how it works in reality; what are the difficulties in the industry, how different it is to solve problems with a clear formulation. If in the field of programming there is still a lot of talk about this, in the field of data science - not enough.



What are the challenges faced by newcomers to product teams who have been working on the same product for a long time - for example, making VK advertising effective? By "newbies" I mean myself, including; I've been doing this for four years and I still feel like a noob. This is a very difficult thing and interesting to understand.

The first thing I want to say about is the desire to start an existing task right away with some cool progressive method that can be found in science; desire to come up with a spaceship that will do everything cool at once. In real life, in the field of product development, data science is a place where we can very poorly predict which method will work in the end. It is very difficult to work with the original product requirements, because you do not know the way to solve the problem.



You have a task formulation: for example, to create an automatic moderation system that should solve some initial problems. You can run and find some super-top article in which the task is solved ideally, wind up the transformers. Or you can make a simple heuristic that will look at how many times we approved a given advertiser before, take a share of this number, compare with 70%, for example, and say: this one was often approved before. Such a thing can immediately help a business very much, bring useful information. And a complex system takes a long time, and it is not a fact that it will pay off. In data science, you need to very quickly come to this idea: you are constantly working in the mode of hypotheses, you do not know what will work, and in order to reduce risks and deliver value to the final user and business as quickly as possible,you need to deal with tasks from a simple solution to a complex one. Often - to deal with, starting with heuristics, without any ML - there may be no data for it. This can jar the data scientist who is interested in spinning neural models. But without this, you cannot get the initial baseline - a little step from which you can start. You can spend a very long time doing something that won't work at all.



The most difficult thing and unusual work in the industry is that after you have made a model, rolled it into production, its life is just beginning. It needs to be actively supported and changed. With the model, while she lives, there are a large variety of changes. First, there is such a thing as training serving skew.



In a simple way: if a model works in production on certain data, or on a set of general sets of tasks that your model must solve - and it was trained on other data obtained in other ways - then errors often occur at this point, and the model starts to work poorly. Ideally, you need to build a system in which models during analytics and during initial preparation are trained on the same data on which they will work in production.



Secondly, at any moment some feature, or the model itself, may start to work incorrectly, inadequately. You have a repository that stores characteristics; something has changed, instead of zeros, null began to flow, instead of "-1" other values ​​began to be returned; or someone multiplied these values ​​by 100 because they thought they were percentages, not fractions. And your model suddenly starts to malfunction. You need to notice these changes somehow. The coolest quality of the ML model is the silent error; she will never say that something went wrong - she will always give some result, depending on what you gave her. Garbage in, garbage out. This must be monitored somehow. Based on this, there is a huge amount of things that need to be tracked while the model is running in production. You need to build a monitoring system, monitor,so that the distribution of features does not change much, and so on. You have to understand that the model can give out full game at any moment, and somehow live in these conditions. What if these results directly withdraw money from our users - bidding on an ad auction, or trading on an exchange, or whatever? There are many ways of doing this, but it's important to think about it first.



Based on this, how does the life of a data scientist change, who comes into an environment where it is necessary to iteratively develop models? It seems to me that you need to be a good engineer first, and a good researcher second. Because your code and your conclusions, the insights that you get during research and analysis, your models - everything that is the product of your data scientist work - will not be used 2 or 10 times. A lot of people will look at how experiments are built, how the results are obtained, why, where, how what works, and how it is used in production. Therefore, what is most lacking in people who come to the industry from scratch - for example, from a university, carried away by the topic of data science, or analysts, or ML - are engineering skills. First of all, a data scientist is a subtype of a developer.He also works with code, he also works with something that then works in production in a changeable environment that people use. They will read your code. You will need to come up with effective, well-supported and testable solutions. This is the part that a lot of candidates lack. Therefore, if you are a beginner person in data science, pay a lot of attention to developer skills. How to write efficient, understandable, well-maintained code; engineering solutions in order to effectively build data exchange processes. All this will greatly help you in your career.which a large number of candidates lack. Therefore, if you are a beginner in data science, pay a lot of attention to developer skills. How to write efficient, understandable, well-maintained code; engineering solutions in order to effectively build data exchange processes. All this will greatly help you in your career.which a large number of candidates lack. Therefore, if you are a beginner in data science, pay a lot of attention to developer skills. How to write efficient, understandable, well-maintained code; engineering solutions in order to effectively build data exchange processes. All this will greatly help you in your career.



The second tip and the second thing that separates cool data scientists from those who are hard at work is immersion in the product context and in the engineering part of your model's environment. Let's say you're developing a model, and as a data scientist, it's easy for you to say, “My job is to develop the model, everything else is outside my responsibility. I teach models, that's my business. The backend will embed them, the data engineer will prepare the data, the tester will test it, the product manager will decide how to use it. " But a huge number of implications and ways to make the model cooler and more valuable are beyond the design process of the model itself. Example: if you are ranking search results, then you are ranking documents coming from outside; there is some sort of selection of candidates. If you know how this selection works, then you can easily understandthat the bottleneck in the work is not that the model does not work well, but that incorrect, uninteresting, incomplete documents are submitted to the input. On the other hand, if you know that in your product your model may, under certain circumstances, not work as you would like, but it is very difficult to make this work better, then you can change the product for the model. We can say: now the product is arranged differently, now it is not the user who calls the model, but the model calls some kind of automatic tool that coordinates several parameters that the user cannot follow. The idea is that you can change the product for the model, and not vice versa. If you are immersed in these areas, then as a data scientist and ML engineer, you can generate enormous profits and benefits for your product and your users.that the model does not work well, but that incorrect, uninteresting, incomplete documents are submitted for input. On the other hand, if you know that in your product your model may, under certain circumstances, work differently than you would like, but it is very difficult to make this work better, then you can change the product for the model. We can say: now the product is arranged differently, now it is not the user who calls the model, but the model calls some kind of automatic tool that coordinates several parameters that the user cannot follow. The idea is that you can change the product for the model, and not vice versa. If you are immersed in these areas, then as a data scientist and ML engineer, you can generate enormous profits and benefits for your product and your users.that the model does not work well, but that incorrect, uninteresting, incomplete documents are submitted for input. On the other hand, if you know that in your product your model may, under certain circumstances, not work as you would like, but it is very difficult to make this work better, then you can change the product for the model. We can say: now the product is arranged differently, now it is not the user who calls the model, but the model calls some kind of automatic tool that coordinates several parameters that the user cannot follow. The idea is that you can change the product for the model, and not vice versa. If you are immersed in these areas, then as a data scientist and ML engineer, you can generate enormous profits and benefits for your product and your users.incomplete documents. On the other hand, if you know that in your product your model may, under certain circumstances, not work as you would like, but it is very difficult to make this work better, then you can change the product for the model. We can say: now the product is arranged differently, now it is not the user who calls the model, but the model calls some kind of automatic tool that coordinates several parameters that the user cannot follow. The idea is that you can change the product for the model, and not vice versa. If you are immersed in these areas, then as a data scientist and ML engineer, you can generate enormous profits and benefits for your product and your users.incomplete documents. On the other hand, if you know that in your product your model may, under certain circumstances, work differently than you would like, but it is very difficult to make this work better, then you can change the product for the model. We can say: now the product is arranged differently, now it is not the user who calls the model, but the model calls some kind of automatic tool that coordinates several parameters that the user cannot follow. The idea is that you can change the product for the model, and not vice versa. If you are immersed in these areas, then as a data scientist and ML engineer, you can generate enormous profits and benefits for your product and your users.as we would like, but it is very difficult to make this work better, then you can change the product for the model. We can say: now the product is arranged differently, now it is not the user who calls the model, but the model calls some kind of automatic tool that coordinates several parameters that the user cannot follow. The idea is that you can change the product for the model, and not vice versa. If you are immersed in these areas, then as a data scientist and ML engineer, you can generate enormous profits and benefits for your product and your users.as we would like, but it is very difficult to make this work better, then you can change the product for the model. We can say: now the product is arranged differently, now it is not the user who calls the model, but the model calls some kind of automatic tool that coordinates several parameters that the user cannot follow. The idea is that you can change the product for the model, and not vice versa. If you are immersed in these areas, then as a data scientist and ML engineer, you can generate enormous profits and benefits for your product and your users.which the user cannot follow. The idea is that you can change the product for the model, and not vice versa. If you are immersed in these areas, then as a data scientist and ML engineer, you can generate enormous profits and benefits for your product and your users.which the user cannot follow. The idea is that you can change the product for the model, and not vice versa. If you are immersed in these areas, then as a data scientist and ML engineer, you can generate enormous profits and benefits for your product and your users.



Coming back to the question that ML is the realm of speculation. We never know how to come up with a great product, we have to try different paths to the final solution. Therefore, you need to build the workflow a little differently. Often people, especially engineers in the initial periods of career development, think that all sorts of managerial chips - SCRUM, Agile - are all bullshit, and they don't work. However, they often don't really work because they are used in the wrong context. For example, if you ever get into a SCRUM data science team, it will be hard and painful for you. Suddenly it turns out that the research has become difficult to predict, and you will not know how you will come to the result, but here - two-week iterations, something else, in general, management generates unnecessary garbage. Work processes,within which you work should help you, not hinder. That is, when data science takes and applies methods from conventional software development, it is not always effective.



Therefore, I want to say separately: if you work as a data scientist, and you have to interact with different people - customers, colleagues, cooperate at work - then it would be good for you to take care of understanding how to better build a collaborative process, self-organized team activity. A good way to understand how to do this is by going to a community called LeanDS. There gathered such people who are interested in understanding how to better build the processes of working on ML problems in a product development environment. From there, you can learn a lot of cool things that people have already come up with, and different specialists use in different companies. And from what I would advise, first of all, you need to switch to an approach when you formulate all tasks through product hypotheses. When you don't knowwhich will bring results, but you figure it out: I think that such and such a thing will help to promote user tasks so and so, pump metrics, and in such and such a time it can be checked. Such hypotheses are much easier to work with. Based on such an incomprehensible flow of work, where it is very difficult to predict how long your task will take, when you will come to the result, and in what way, in my opinion, Kanban works very well. I'm not going to talk about this for a long time, I just recommend: try looking at the LeanDS community. Check out their materials. I think everyone who works in data science and is faced with processes that have migrated from normal development will be interested in understanding what can be done differently and how to use processes to their advantage.that such and such a thing will help to promote user tasks, pump metrics, and in such and such a time it can be checked. Such hypotheses are much easier to work with. Based on such an incomprehensible flow of work, where it is very difficult to predict how long your task will take, when you will come to the result, and in what way, in my opinion, Kanban works very well. I'm not going to talk about this for a long time, I just recommend: try looking at the LeanDS community. Check out their materials. I think everyone who works in data science and is faced with processes that have migrated from conventional development will be interested in understanding what can be done differently and how to use processes to their advantage.that such and such a thing will help to promote user tasks, pump metrics, and in such and such a time it can be checked. Such hypotheses are much easier to work with. Based on such an incomprehensible flow of work, where it is very difficult to predict how long your task will take, when you will come to the result, and in what way, in my opinion, Kanban works very well. I'm not going to talk about this for a long time, I just advise: try looking at the LeanDS community. Check out their materials. I think everyone who works in data science and is faced with processes that have migrated from normal development will be interested in understanding what can be done differently and how to use processes to their advantage.Such hypotheses are much easier to work with. Based on such an incomprehensible flow of work, where it is very difficult to predict how long your task will take, when you will come to the result, and in what way, in my opinion, Kanban works very well. I'm not going to talk about this for a long time, I just recommend: try looking at the LeanDS community. Check out their materials. I think everyone who works in data science and is faced with processes that have migrated from normal development will be interested in understanding what can be done differently and how to use processes to their advantage.Such hypotheses are much easier to work with. Based on such an incomprehensible flow of work, where it is very difficult to predict how long your task will take, when you will come to the result, and in what way, in my opinion, Kanban works very well. I'm not going to talk about this for a long time, I just advise: try looking at the LeanDS community. Check out their materials. I think everyone who works in data science and is faced with processes that have migrated from normal development will be interested in understanding what can be done differently and how to use processes to their advantage.just a tip: try looking at the LeanDS community. Check out their materials. I think everyone who works in data science and is faced with processes that have migrated from normal development will be interested in understanding what can be done differently and how to use processes to their advantage.just a tip: try looking at the LeanDS community. Check out their materials. I think everyone who works in data science and is faced with processes that have migrated from conventional development will be interested in understanding what can be done differently and how to use processes to their advantage.



As a result, I’ll tell you what the guys who come to do data science tasks at the start of their careers lack, and how you can become cooler as a specialist and increase the chances of getting a job in a place that you like.



First, as I said before, engineering skills are very important for a data scientist. No less than skills related to ML, data analysis, probability theory and so on. First of all, I wish you to be a cool engineer, and secondly, a developer. Second, many people lack a clear skill of reformulating a business problem into a data science problem. This is a separate skill. This is a situation when you need to understand exactly what the customer wants from you - well, let's call it a person who wants something to work cool. Returning to the auto moderation example: what exactly does he want? After all, the task of auto-moderation can be set in very different ways, with different specific things that we want to do better in our system. Based on the tasks, the data science task is formulated in different ways; based on the data science problem, the optimized metrics are formulated in different ways,a method for selecting a dataset, assessing quality, and so on. This skill is very valuable for all data scientists. Let's say a customer says that his moderators cannot cope with the flow of processing tasks to check whether the ad is good or needs to be banned with a specific topic. Then you will find out that there are many different reasons for the ban, and during moderation you need to clearly describe them so that the advertiser can correct the ad. Based on this, you decide that you need to do some kind of multi-class classification that will generate a text explaining the reason, and the task will be very difficult. But wait - maybe the problem can be reformulated in a different way. And it turns out that you can focus not on rejecting ads, but on choosing the right ad. If the ad is good, you can just skip it,if it is bad, it can be given to live moderators, and no explanation needs to be generated. Based on this, you understand: if you need to concentrate on what is good, what can be skipped, then you need to understand how to manage this business - this stream of ads that will pass through your system. You understand: yeah, based on this task, I can choose ROC AUC as the metric that suits me, it describes well the relationship between the accuracy of the model and the number of ads that will automatically pass through our system. Etc. That is, based on this dialogue between the conditional customer and you, as a specialist, you can greatly simplify your task, having a good understanding of how to reformulate a business task into a data science task.you understand: if you need to focus on what is good, what can be skipped, then you need to understand how to manage this business - this stream of ads that will pass through your system. You understand: yeah, based on this task, I can choose ROC AUC as the metric that suits me, it describes well the relationship between the accuracy of the model and the number of ads that will automatically pass through our system. Etc. That is, based on this dialogue between the conditional customer and you, as a specialist, you can greatly simplify your task, having a good understanding of how to reformulate a business task into a data science task.you understand: if you need to focus on what is good, what can be skipped, then you need to understand how to manage this business - this stream of ads that will pass through your system. You understand: yeah, based on this task, I can choose ROC AUC as the metric that suits me, it describes well the relationship between the accuracy of the model and the number of ads that will automatically pass through our system. Etc. That is, based on this dialogue between the conditional customer and you, as a specialist, you can greatly simplify your task, having a good understanding of how to reformulate a business task into a data science task.You understand: yeah, based on this task, I can choose ROC AUC as the metric that suits me, it describes well the relationship between the accuracy of the model and the number of ads that will automatically pass through our system. Etc. That is, based on this dialogue between the conditional customer and you, as a specialist, you can greatly simplify your task, having a good understanding of how to reformulate a business task into a data science task.You understand: yeah, based on this task, I can choose ROC AUC as the metric that suits me, it describes well the relationship between the accuracy of the model and the number of ads that will automatically pass through our system. Etc. That is, based on this dialogue between the conditional customer and you, as a specialist, you can greatly simplify your task, having a good understanding of how to reformulate a business task into a data science task.how to reformulate a business problem into a data science problem.how to reformulate a business problem into a data science problem.



I would like to tell you about one more thing that helps a lot. It is about understanding what specific signals are being passed into the model you are developing in the form of features, and how it processes them. This is a skill that directly belongs to the competence of the ML-development team. For some reason, many candidates, in my experience, adhere to the "ML will eat everything" approach - put everything in a gradient boosting, and it will do. I am exaggerating, but, in general, it is very cool when you clearly understand that the sign that you are using carries exactly the information that you conveyed, and not the information that you planned to convey.



For example, let's say you decide that a good indicator of a user's responsiveness to an ad would be a click-through rate. That is, we take the number of ads seen for the entire time and the number of clicks on the ad, divide one by the other and get the user indicator. In one case, he says that the user likes to click on the ad, in the other, that he does not click at all. And we pass this number into our model - gradient boosting or linear regression. Then a thought may arise: the model does not have a way to distinguish between those users for whom we have a lot of statistics, from those who are few. One metric may not mean that a user always clicks on an ad, but that they only had one impression. The question arises: how to present this feature in such a way that the model distinguishes a large amount of statistics from a small one? First,what comes to mind is to put the number of ad impressions into the model. You can just put the number of impressions, but the dependence of our confidence in statistics on the number of ad impressions a user has is nonlinear. It turns out that you need to put not just impressions, but a square or logarithm of impressions. Then it turns out that if we have a linear model, then these two features do not interact with each other. It will not be possible to make schemes like “if a user has such and such amount of statistics, then we trust it for so many, use a feature with such and such weight”. Linear regression cannot build such connections, but gradient boosting can. Or you can reformulate the feature. Instead of raw statistics, you can smooth it out, use approaches from Bayesian transitions, add some a priori knowledge of how users click on average,and using a certain formula to mix them. Etc. It turns out that it is very important to understand what specific signal you are transmitting as a sign.



And second, it is very important to know how this feature will be used in the model. In linear regression, it will be used one at a time, in gradient boosting in a different way, and if it is a neural network, then it works with a different kind of data, and how the context works there.

You don't need to know exactly how the model works, but you need an intuitive understanding of what is possible and what is not. This is a cool skill for an ML specialist. If you ask me how any particular algorithm works in gradient boosting, then it will be difficult for me to explain in detail, but on my fingers I can do it. And in most practical situations, this is enough to effectively use the tool.



In the end, I would like to advise everyone to use such an iterative approach in real life, in products, when you move from simple to complex. You start with simple hypotheses that are quickly tested, and slowly come to those very complex, interesting scientific articles. And then the baseline is ready, and you can already write an article for KDD, for example.



I also want to say - maybe they will help me; I have two performances at meetups. One is devoted to what kind of data science tasks can be done in advertising (and in general, why data science specialists can be interested in advertising technologies as a springboard for applying their skills with interesting engineering challenges). And the second is a story about the traps that we fell into as a team of ML-developers of the system, about how we got out of them, and how you do not fall into the same traps that we fell into due to the lack of experience. I would like to share this experience. I think there are many useful things to find out here.



And yet - before that I talked about the LeanDS community dedicated to data science processes, data science project management in ML. I also strongly advise you to look at their materials, the guys are doing very cool things.



Have you ever built a complete sales funnel model?



In fact, we didn't happen to do this. But here it is very important that the external advertiser sets everything up very well. Therefore, it is good to deal with sales funnels when you are a data scientist on the side of an advertiser. Let's say you work for a large company that works with a large number of marketing channels, and are trying to build analytics that will allow you to have a good understanding of how exactly different marketing channels work, how well sales funnels are built, and so on. For us, on the part of VK as a system for advertisers, when working with such things, it is very important that the advertiser has everything well configured. So that advertising pixels always give the correct information about how the user entered the site, added something to the cart and bought. And then we must use this information toto make advertising strategies better and more effective. I want to do this; we practically did not do this, because setting up such systems is often difficult for an advertiser. It's probably easier to do this when you have complete control.



And such a question: how to connect entities (set attributes) when building a model?



For example, site visitor -> client



It’s probably good to just start from some previous user activity. In general, this is done in a variety of ways; I can tell you about one that is used in the construction of advertising systems, it is called look-alike. You may have heard of him. This is a situation when we say: here are the users who have visited our site, and here are those who bought something. Let's take a look at which users are most like those who bought something and less like those who did nothing. When we train such a model, where “1” are those who bought, “0” are those who did nothing, and conditionally “0.5” are those who visited the site, we can learn how to rank all users of our system by similarity to a potential client. We can use this knowledge in our model and tell the client what features,in terms of the model, they separate customers from ordinary visitors.






All Articles