Two captains of "Digital Breakthrough - 2020", or Solve a case in a few hours

image



With you in the open spaces of Habr there are two captains of the teams of Rosneft employees who participated in the finals of the Digital Breakthrough 2020 IT marathon and cannot remain silent about it.



Rosneft is not indifferent to IT events. Why, the company itself annually generates hackathons and challenges ( https://rn.digital/it2020 ) in order to rock the IT community to solve urgent problems for it. However, its employees are not allowed to participate in Rosneft's own hackathons-marathons - it is not ethical in relation to outside participants. But they, the developers and programmers of corporate science-intensive software, want to measure themselves against the heroic strength!



Therefore, the participation of Rosneft employees in various hackathons is a kind of maintenance of the team in a combat state, a source of inspiration and ideas, encouraging the development of non-standard approaches to IT tasks. The main thing is not to the detriment of the main work, but this is already obvious. The oil leader is focused on digitalizing business processes and import substitution of information technologies in the field of oil and gas production, and therefore the presence of a competitive spirit among the company's employees and the willingness to compete with the rest of the world are considered good form.



In this article, RN-BashNIPIneft employees Chingiz Akhmetov and Maya Bikmetova will tell in color about the path of their teams to the finale of a cool IT event, share insights and life hacks.



Chingiz Akhmetov and the "Inn BTG" team



Let's introduce ourselves first. “Inn BTG” is: “Innovatively. Brilliantly. Technologically. Grandly ". And also Vladimir Ryzhikov, Radmir Karimov and Murad Musin and Chingiz Akhmetov.



A little over a year ago, when one of the stages of the Digital Breakthrough - 2019 hackathon took place in our native Ufa, we formed a team to participate in it and got involved in solving the problem. On that memorable time, we reached the final, which was held in Kazan! Unfortunately, not all of our team managed to attend the event then. Therefore, in an emergency mode, the remnants of the team had to form a new "gang" of the same renegades from other cities (our Ufa was joined by Cherepovets and St. Petersburg). In the final, unfortunately, we did not manage to win or even get a nomination. Therefore, from last year's hackathon we have only merch in the form of beautiful bright sweatshirts with the symbols of the competition and an uncovered gestalt to win the final ...



In this regard, in 2020, two weeks before the start of the Ural IT hub of the Digital Breakthrough - 2020 marathon, it was decided to create a new team for a brilliant and unconditional victory, first in the regional, and then in the final stage of the competition.



Like last year, the backbone (3 fighters) was formed from colleagues from one department of our scientific and design institute, who are now developing a software package for geological modeling of oil fields "RN-GEOSIM". Also, for reliability, the team was replenished with a bank employee, but not for the sake of loans and mortgages, but since he is our friend in a former student's bench and he knows his business. Thus, it was possible to collect a "fantastic four", where each participant had experience in certain areas of programming, was in some way an expert.



image

Screenshot of the broadcast during the announcement of the case winners.



As experienced participants, we would like to note an increase in the level of organization of the hackathon than in the past. This year there are more tasks, and they are more diverse, for example, tasks have appeared not only for web services. This time, the hackathon is being held online, which made life much easier for the participants and made it possible to save on transportation costs. Major news, interviews with interesting people, various contests and quizzes are broadcast on the YouTube channel. Information for participants is posted on Telegram channels. In the main Telegram chat of the competition, there is communication with support on general questions, answers come immediately. There is also a channel for finding members and teams. For communication within the team, Discord was chosen (it was also used for communication with case experts at checkpoints). In a word, everything is for people!



Separately, we want to put a "plus" for the fact that now there is no rigid binding to the region: our team is from Ufa, but we were busy at one time and could not participate in the Volga Hub, but we were able to compete in the Ural. I was pleased that the recordings of all broadcasts and streaming protections are posted on the YouTube channel. Transparent and informative.



Our team chose to participate in the solution of the “Cycling Route” case from the Department of Informatization of the Tyumen Region: “Development of software that allows analyzing the initial placement of pedestrian crossings, bicycle paths and the actual walking and cycling of city residents”. Not that we are such big fans. This task was chosen because it was described in great detail in the interview of the case holder ( https://www.youtube.com/watch?v=hLPGCZ-5HRc ), and the main goal was to get some technique, algorithm. This is exactly our topic.



image

The teams prioritized the cases before the start. The final ratio of the number of teams by cases at the Ural Hub became visible after the start.



We understood that it is very important to choose the tools that will help to make MVP (Minimum Viable Product, or in our opinion “a product with minimal functions, but sufficient to satisfy the first consumers”) in the fastest way. For us, they are Python with its libraries NetworkX (for working with graphs) and OSMnx (for representing OpenStreetMap data as a graph). The performance of such a solution leaves much to be desired, since Python itself is slow, and NetworkX did not have specialized algorithms for working with road graphs. In our prototype, the operating time for fairly small clustering reached tens of minutes. It is clear that such a time is unacceptable for the user, but we have shown that the idea is working. In what follows, we considerit is better to rewrite the computational kernel into a compiled language using specialized algorithms with parallelization.



The brief essence of our algorithm is that the entire city (the scheme of the algorithm, left, top) is clustered into sections. In each section, the degree of demand for bike paths is determined (scheme of the algorithm, right, top). The road graph is filtered, leaving only those roads where bike paths can be built. Then the filtered graph is compared with the cluster, and only those roads that fall into the demanded sections of the cluster are left (the algorithm works, left, bottom). The resulting islands are connected by the algorithm of the shortest paths so that the system of bike paths is connected, that is, the cyclist can travel from one area to any other (the scheme of the algorithm, right, bottom).



image

The scheme of the algorithm.



We were divided into subtasks: one did the calculation part, the second brought up the server, the third did the visualization on the web page, the fourth did the desktop view. We managed to accomplish almost everything. We decided to transfer the resulting graph to the client entirely and draw through OpenLayers, although it would probably be better to deploy our own tile server.



image

Schematic designed by MVP (suboptimal).



Final results



In the final of the Digital Breakthrough competition, we got a case from the Ministry of Energy “Development of a system for modeling the dependence of electricity consumption and economic indicators of the Russian Federation by territories and industries”. Compared to the regional stage, the final was more ambitious: the number of cases increased (15 versus 9), the number of teams in a case (26 versus 10). The prize fund for the winners has also grown, the exact numbers can be seen on the website https://leadersofdigital.ru .



In the course of work on the problem and following the results of consultations with the case holders, we decided to implement the project in Jupyter Notebook in Python, and use econometrics for modeling. The dependent variable of the model was the actual volume of energy consumption, and the independent variable was the volume of mining and manufacturing. The forecast of the dependent variable was carried out according to the following algorithm:



1. Industrial production is forecasted (a linear trend is built):

a. The section of the graph is highlighted, which is used to calculate the trend coefficients.

b. The linear trend coefficients are determined by the least squares method.

2. A model of the dependence of the trend of electricity consumption on the volume of industrial production is built



image



3. The volume of energy consumption is predicted using the constructed model and modeling by the SARIMA method (the article https://habr.com/ru/company/ods/blog/327242/ helped us a lot in this. ).



image

Remainder forecast results.



Unfortunately, the chosen approach fell slightly short of the top three, and we took 22nd place. Will we participate in similar hackathons in the future? Of course! Solving tasks not related to work allows you to look at work tasks with a fresh eye later.



Maya Bikmetova and "NEII"



A few weeks before Digital Breakthrough 2020, I participated in another, smaller hackathon, and my team took first place. Probably, this influenced the decision to try his hand at the all-Russian competition.



I invited my colleagues and friends Marina Semyonova, Guzel and Nail Akmurzin to join the team, since it is much easier to work in time pressure with familiar guys. This does not waste time grinding in, and all 30-36 hours of the hackathon can be devoted to developing and creating an MVP. We entered the Volga IT hub of Digital Breakthrough - 2020 as a team of NEII. This name has been with us for a long time. We believe that it best reflects our essence: employees of a research institute engaged in the development of systems based on artificial intelligence (AI).



As my colleagues (and they are competitors) already wrote above, due to the complex epidemiological situation, the hackathon was held online. We had three checkpoints at which we talked with the moderator, technical expert and the case holder's representative. They listened to our ideas, gave advice, watched what we have done, and evaluated our presentation.



All teams were provided with a list of 10 cases. When registering, we prioritized. And one of the most priority cases for the team was to go to her at the hackathon. In addition, each case contained a general description of the problem, and a detailed one was given only at the beginning of the competition. We got a case where it was necessary to develop an intelligent system to automate information and provide social benefits to the population. The customer of this case was the Ministry of Social, Demographic and Family Policy of the Samara Region.



image

The general scheme of the developed solution.



We also chatted on Discord to discuss tasks. Together they threw in ideas for solutions, distributed who would do what. Marina was responsible for the server logic - the backend and the database, Guzel - for the presentation, and Nail for the frontend. I took over the development of a machine learning model, team management and communication with the case holder.



As a result of work on the task, we have implemented prototypes of two systems.



1. A web application in the form of a chat bot to inform the population about the benefits.



In the client part, the citizen describes his life situation. The resulting request is sent to the NLP service, which is responsible for natural language processing. Under the hood, the category of the user's request is determined using machine learning methods. In other words, the problem of text classification is being solved. The model's NLP forecast is sent to the database. The necessary information about the benefits expected is returned from the database using the key and provided again to the client.



image

The main backend script accesses the NLP service and the database.



In the dialog box, the user sees what benefits he is entitled to, what documents need to be collected.



image

Screenshot of a chatbot prototype to inform the population about benefits.



2. Service for checking the update of normative and legislative acts.



In the course of communication with a representative of the case holder, it turned out that the great pain of social protection workers is the need to regularly review many regulatory documents in order to quickly identify changes in legislation. If there are changes, the employee updates the local database. Naturally, this takes a lot of time and effort, and employees have less time to work directly with people. We are convinced that such routine work can and should be automated.



As a first approximation for solving this problem, we proposed a service that on a regular basis compares documents in the local social security database with the same documents in the database of some online service, for example, "Consultant Plus". If it detects differences in documents, the service sends a notification to the social security worker like “There have been changes in document X. Refresh your local database. " Thus, the social security officer is freed from the need to shovel a bunch of documents. The machine will detect changes in laws for him / her while he / she works with citizens.



If we talk about the technical side of the issue, as a baseline for solving the problem of comparing texts, we used the classic approach often used in Information Retrieval - representing documents in the form of numerical vectors with subsequent estimation of the cosine distance between them.



image

Formula for calculating the cosine distance between vectors of two documents.



And now a couple of hackathon survival hacks. We recommend at the very beginning to approve the list and versions of the libraries to work with. This will avoid situations when the project is not built due to version conflicts, and you need to submit your code for review in 10 minutes. By the way, it is better to post links to the presentation and the solution half an hour before the end of the competition, and not leave it until the last moment. At the hackathon, people were really nervous about the site that crashed five minutes before the final delivery of the entire project ...



image

Technologies used: long live open source! The main thing is to keep track of the versions of the frameworks.



On the one hand, the hackathon is a competition of technical specialists, and the jury evaluates the code: readability, operability, solution architecture, documentation. On the other hand, there is a case holder, and his representatives are far from the IT world. They, first of all, appreciate the beauty, consistency and clarity of the presentation, interface design. In other words, it is better to have a person in the team who will only deal with the presentation, someone who will use it to “sell” the solution to the case holder.



Final results



Unfortunately, there was no time to prepare for the final of Digital Breakthrough - 2020. Traditionally, there is more work by the end of the year. So we relied on luck and on each other!

In the final of the hackathon, we were in the same line-up. This time we came across a case from Sberbank: the task is to create a solution that would help speed up work with mail. It turned out to be a big problem, since senior executives have to sort through 2,000 emails a day, which takes at least 3 hours!



At the first checkpoint, we were told that the Sberbank developers have been puzzling over this problem for a long time, but they have not yet come up with a concrete solution - they needed an idea that would help them finally come to something.



And we began to think. In 2 hours, we generated many different interesting ideas (this was the most creative part of the hackathon, since one suggested, the other supplemented, and this idea turned into a perfect tool that would replace regular email = D).



As a result, we conceived the implementation of our own mail client "Sber-secretary", which would turn correspondence into chat, chats would be grouped by subject into folders, and the system would automatically determine the importance of correspondence (chat). In the chat, you could listen to unread messages, the texts of messages could be dictated, a new letter would be created with one voice command, and on the main window, instead of one correspondence, to speed up, from 1 to 4 could be displayed, and a few additional features.



image

Email client interface layout



Next, we assigned responsibilities: I was in charge of the server side of the project, Nail was in charge of the client side, Marina developed the design in figma, and Guzel did a beautiful and capacious presentation and prepared for defense.



We knew right away that we would not have time to make a working application, but we hoped that at least the client part would mature. There was a lot of work, and as a result, we showed in figma what functionality our mail client would have, described the architecture of the project, named a number of advantages of such a solution, compared with Outlook, the cost of the project and the stages of implementation.



I think we can say that the jury liked our decision, because during one of the checkpoints, the trackers showed that we are moving in the right direction. We did not manage to make a working application, but entered the top 5 (5th place) out of 11 teams. What does 5th place mean to us? This means we have the opportunity to show the best result next time!



If you also participated in the "Digital Breakthrough - 2020", then write your impressions and thoughts in the comments!



All Articles