Hi-tech communications, or how we create a voice agent on just 500 records

image



On Habré more than once or twice they wrote about voice robots, the principles of their work and the tasks that they are able to solve. Accordingly, the general principles of creating such robots (we prefer to call them “digital agents) are clear to many. And this is good, because in this article we would like to talk about the rapid learning of robots.



We were able to successfully train agents on a very limited call base. The minimum number of records on the basis of which a full-fledged digital agent can be developed is only 500. (Spoiler alert - we are talking more about the specialization of an assistant, and not learning from scratch). How does the training take place, and what are the pitfalls, features, what lies at the heart of the technology? We will talk about this today.



What should a digital agent be able to do?



At the moment, the digital agents we are designing, which work with the use of the intent classifier in the b2c segment, can maintain a full-fledged dialogue. This became possible due to the fact that we taught them:



  • Determine in a person's speech and classify various answers, questions, objections.
  • Choose a response or reaction that is appropriate in meaning.
  • Determine cases when the subscriber is not tuned in to dialogue and expresses negative. Determine when the subscriber is a child and / or an elderly person, and correctly end the call in such cases.
  • Determine in a person's speech and record, if necessary, various entities that the subscriber calls: names, addresses, dates, phone numbers, etc.
  • . , , , . .
  • «» («», «») , .
  • , (, ) . , .
  • «background sound» (« »). , , « », , - , .. . — .


What is this feature for? So that the digital agent can take on the tasks of working out the incoming call center line and answer standard customer questions. In our experience, a digital agent can independently process up to 90% of requests. At the same time, human operators can do more creative tasks and help with non-standard issues. AI can be instructed to conduct a dialogue with call center subscribers, company support, etc.



Well, and what is most important in this segment is that digital agents are able to sell no worse (and in many cases even better) than a live operator. We create such advanced digital agents, for example, for large telecom operators.







How to train a robot to conduct a dialogue



This is a very interesting challenge. Now we are solving it in a completely different way than a couple of years ago. And right now, we need several hundred records to train an agent. But, of course, we did not come to this right away - we had to work a lot.



As it was before?



Several years ago, the extraction of intents and entities from human speech and their classification was carried out using regular expressions ( regex ). To put it simply, it is a text search language. For searching, a sample string (aka pattern) is used, which sets the search rule. The regex uses a special syntax to set the search rules. But this method had several disadvantages:



  • The need for a large and skilled human resource to create regular expressions.
  • – , , .
  • , — .
  • , .
  • - (NLU).
  • ( , , , , ) 3-7 ; .


?



We have developed a basic database, a dataset of millions of calls made using regular expressions: we verified and marked up the data and created a model that, in fact, imitates the result of the work of the classifier on regular expressions, but with better quality.



In the course of further use of the model on real projects, we carry out additional training through a special markup interface on our platform. Thus, content managers identify cues that are not classified accurately, mark them up and “feed” the model so that it can be improved on their cases.



Now training consists of two stages: directly training the model on the dataset and further training during commercial operation. At the moment, connecting to the NLU engine and express recognition tests take us only a few hours.



The quality that used to be achieved by weeks of meticulous work is now provided immediately thanks to the main base. For example, in the b2c segment, the initial% of errors in recognizing consent / refusal to perform a targeted action decreased 3 times (from 10% to 2-3% of the total number of cases).



The training begins with the provision of recordings of conversations between operators and customers of the client company. Ideally, a dataset should contain at least 500 records. In addition, additional information is requested, including due dates, priority, existing instructions and scripts for call center employees (optional, but desirable).



To solve it, we use a specialized NLU Engine . It is based on semantic parsing of text that comes from ASR systems. In the course of recognition, the main objects of recognition are of two types:



Entities .

Intents (intention).



An important point: we do not have a classical dialogue system in its traditional sense, a machine learning system does not compose an answer dynamically, the answer is determined by script designers. Bot phrases are strictly predetermined, there is no free will for artificial intelligence, fortunately or unfortunately ...



The logic of the conversation is also predetermined - in fact, the Intent classifier is used to determine what the subscriber told us and for what purpose. Having determined the Intent, we will be able to select the appropriate bot phrase, which is necessary to respond to the subscriber - in fact, to conduct a full-fledged dialogue.



Dialog example



Below is a complex HR script with the designation intents and entities; recognized entities and intents are indicated under each subscriber's replica and are in italics.



Bot: Hello! My name is Elena, I am calling about your resume. Tell me, is the vacancy offer relevant to you?



Subscriber: Yes, hello



Intent: hello_confirm = true



Bot:Look, now we have a vacancy for a sales office specialist. Responsibilities include consulting subscribers, selling equipment and working with the cash register. The registration is completely official, the salary is white, the average income is N rubles, the maximum level is practically unlimited, there is an opportunity for career growth. Schedule 5 working days with two floating days off. We also provide all social guarantees, voluntary medical insurance, bonuses and discounts from the company.



Consider these conditions for yourself?



Subscriber: And again, where are you calling from?



Intent: question = what_company



Bot: This is a retail network NNN



Subscriber: What is your schedule?



Intent: question = schedule



Bot:The work schedule is automatically generated by the program, taking into account the flow of customers and the workload on the office with two floating days off, that is, there can be no more than 40 hours of work shifts per day, 6 8 10 12 hours per week. In this case, the schedule is drawn up in advance for 2 weeks, that is, you know in advance your schedule, your weekend, and you can already plan.



Subscriber: Well,



Intent: confirmation = true



Bot: Let us then check your personal data with you ...

Please tell me your Surname Full name Patronymic



Subscriber : Elena Vyacheslavovna Gudkova



Entity: name = true



Bot: Please name your date of birth



Subscriber:thirteenth zero second eighty-seventh



Entities:

day = [13]

month == [February]

year == [1987]




Bot: Tell me, can I write down the number I'm calling you as a contact?



Subscriber: No, probably



Intent: confirmation = false



Bot: Tell me then the number by which I can contact you?



Subscriber: nine hundred ten eight hundred eighty fifty twenty five



Entity: telephone = true



Bot: Tell me, what is your education?



Subscriber: Higher



Entity: education = higher



Bot:Clarify, are you currently studying on a full-time basis?



Subscriber: No



Intent: confirmation = false



Bot: Tell me, do you have Russian citizenship?



Subscriber: Russian citizenship



Entity: citizenship = russia



Bot: Ok, that was the last question. Thank you for answering my questions, now I will send the questionnaire for consideration to the manager. The questionnaires are considered within 3 days, and in case of a positive decision, our employees will contact you. Goodbye.



Ok, the script is developed, then what?



After all this, the developed scripts are coordinated with the customer. In some cases, clients want to add or change something, which we do. Sometimes it becomes necessary to clarify the technical parameters:



  • Integration method.
  • Input / output parameters.
  • SIP trunk connection (if it is planned to use the customer's telephony).
  • SMS connection or connection to third-party customer systems (CRM, Campaign management).


What are the input and output parameters? These are various variables that our digital agent needs to initiate a call. First of all, of course, this is the phone number or id of the subscriber we are calling. Optionally, depending on the specific customer and project, it can be other data, for example:



  • various components and cost of services and services that the assistant should announce to different subscribers, depending on specific conditions;
  • names of service packages or services that the assistant calls to different subscribers;
  • different names by which the assistant can address callers when greeting;
  • Additional information.


That is, in order for the assistant, depending on certain conditions, to perform this or that action during the call or after it, you need to convey to him these conditions, which are called "input parameters".



Well, the output parameters are a set of data that the assistant should return to us after making a call.



For example: the subscriber's phone number, the duration of the call, the name of the project within which the call was made, the results of the call, etc. The output also contains the main result of the call, which depends on the results of the dialogue on a specific project (the simplest example is the “Consent” result if the subscriber agreed to perform the target action or the “Refusal” result if the subscriber refused). And the last thing is technical data on the status of the call and various codes of possible errors (the call was made, the call did not take place due to telephony problems, the call did not take place due to incorrect input data, etc.).

This data can simply fall into the output from the input data (for example, in the input data we passed the subscriber's number to the assistant, and in the output data following the call, the assistant gave us the same phone number).



Also, the robot can “collect” this data from the subscriber based on the result of the call: for example, write down the names, addresses, phone numbers and other information named by the subscribers and record them in the output data. Based on them, reporting and analytics are generated.



Well, then comes the turn of such stages as script scoring, logic development, pattern development, software verification and, finally, transferring the project to the client.



That, in fact, is all. Of course, the process of creating a digital agent itself is a little more complicated than described above - just within the framework of the article it will not be possible to indicate absolutely all the nuances. Now we are planning to continue this article by making the second part already about the technical aspects of training and the internal “kitchen” of the company. If you want to know something that is not in the article right now - ask and we will definitely answer.



All Articles