🤴🏻 🔦 👨‍👨‍👦‍👦 How I made a bus ticket refund rule matcher by calling via human API 👩🏽‍💻 🎂 👲🏼

We are automating buses here; recently, with our help, all tickets in Russia have become electronic . The market is only just somehow entering IT, and there is still a lot of stuff being done in the granary books.

I'll tell you about one simple episode of automation, which has already been completed decades ago in aviation and on the railway, but has just begun with us. So, the situation: there are about a hundred different information systems that send us data about bus routes. It is a collection of self-written automations from different carriers and competing commercial products. Each system has its own format for recording how the return of the bus ticket is done. Most often - a human-readable record in Russian, written for operators and cashiers, but about 20% of systems do not send return data at all.

Some of the rules overlap, and there can be several levels of nesting: “ All tickets are non-refundable, but in this direction we return there according to 259-FZ, back - under these conditions . "

We need to show the passenger the terms of ticket refund (refundable, non-refundable, 100% refund or not, when it is possible to refund), use these parameters to search, compare and, in fact, automate refunds.

Well, I needed to understand how to turn several thousand texts in Russian into ticket parameters, where to store it and how to manage it all.

What do the data sources' responses look like?

Here are some examples:

The first thing that comes to mind is NLP parsing. To teach a neural network engine NLP to parse all this, you need a set of rules that will form a corpus for training. To get a set of rules, you need to parse everything manually and reduce it to certain sets of rules in a single format.

The solution turned out to be as simple as a log. Almost all return rules have not changed for years, and only a few new lines come in a month. We have a content department that collects data from various sources - for example, calls bus stations, collects stop data, and so on. Some of this is automated, some are not. What is being automated is covered by the script and tests and goes to prod. What is done manually can be simplified by the fact that already prepared data will come to the content, that is, we will call the human operator through some API containing a typical request form.

Parsing everything manually once and maintaining the changes manually turned out to be cheaper than screwing in and maintaining automation, and then monitoring its correctness. As a result, we used complex neural networks - directly the brain of the operators. And they showed very high performance.

Then we added a hashing rule according to MD5 after removing non-functional spaces and converting to one case - to understand that it has changed. If it has changed, automation sets a task for the content department, and the content department enters a new rule into our system.

Again, it is correct to use the BRMS class decision to store many rules. But everything turned out to be simpler for us, the whole set of rules was reduced to such matrices:

In this iteration, we decided to score on the modifiers. Firstly, it is not clear what they are. Secondly, they seem to be used in few places. At least until now, there has been no particular need for them.

It turns into such a text of a unified format:

Therefore, we store them directly in our system that manages the parameters of tickets. That is, in fact, we simply add to the database in each ticket a link to the rules for its return from this company.

This is how it began to look:

GDS are sources, then there is a "collapse" of flights (the same flight can come from different sources with some changes, there is more about this hell here , for example).

This is how the rule matcher works. A return rule is obtained from each flight, according to its hash, our corresponding rule is searched for (parsed into the form we need), and if everything worked out, it is applied:

Often GDS does not send return rules for a certain flight. In this case, we can have our own "manual" return rules. For example, we can apply the standard ones prescribed in the federal law. By the way, what is interesting, in theory, these should be the minimum conditions for everyone, but in practice they are often either improved or worsened by carriers.

Carriers may have local rules, as I gave an example - “for all flights this is how it is, but on flights Moscow - St. Petersburg it’s like this”. Especially for this, we have made the "priority" parameter for the "manual" rules. As a result, such a “manual” return rule consists of three parts: parameters by which we understand that this rule is suitable (city of departure / arrival, carrier, GDS), priority and result (in fact, the very intervals with retention percentages). When GDS issues a flight without refund rules, we go to the base with “manual” rules, select all that are suitable and take the one with the highest priority. Further, the flight is decorated with these received rules.

Of course, we may not cover something with such "manual" rules. To do this, we made a report, which includes directions that are not covered by the rules. It is manually disassembled by the content department staff.

Like this. As I said, everything is quite simple, but there are still plenty of such situations on the market, because the bus market is just opening electronic sales, and there is a huge zoo of self-written solutions, or there is often no automation at all.

Well, we have now created a unified base of ticket refund rules for each official bus route in Russia known to us.

How I made a bus ticket refund rule matcher by calling via human API

What do the data sources' responses look like?

More articles: