And chat, and bot, and speech therapist. How to develop an ML-based service for diagnosing speech defects in children

"Sasha walked along the highway", "Say: rrrrryba", "Cuckoo cuckoo ..." - you know, yes, these phrases that traumatized us all in childhood? It was some kind of endless experiment of parents with an unformed desired result and, most importantly, fears that their child is growing up with a speech impediment. 





Hey! My name is Dima Pukhov, I am the technical director at Cleverbots. I want to tell you how we taught the chatbot to recognize speech defects and achieved 80% accuracy in the diagnoses of an AI speech therapist.





Problem

Every second student has problems with pronunciation, speech therapists say. They can be eliminated at an early stage, but often the difficulties are attributed to childhood, and when speech defects become obvious, it is difficult to fix them. Therefore, the spectrogram, as a service for remote primary diagnostics, will be able to prevent the development and aggravation of problems and signal if specialist intervention is required. 





Last year, the pharmaceutical company Geropharm, in order to fight fears and stereotypes about cognitive development, launched the PRO.MOZG portal, where you can read a lot of useful and accessible materials about how the brain works, how diseases β€œwork” and the body changes. In addition, the site has a spectrogram - a service that helps parents test their child in a play format and determine if he has speech defects.





Briefly about the service

For users, the Spectrogram interface is implemented in the form of a chat bot and is built in as a widget on the website. Testing takes place in a playful way: under the guidance of parents, the child must pronounce the proposed phrases, which then need to be sent to the bot in the format of audio messages, and the ML model will automatically determine whether the phrase is pronounced with a defect.





It is important to step back a little and remember how it was in the beginning.





At the first stage of the introduction of a similar service, all the questionnaires were sent to a speech therapist, each had to be listened to and evaluated if the child's speech skills matched the age, an expert opinion was given and an appropriate mark was made in the system. And this is more than 10 entries per questionnaire. 





, , , , .





– . . .





– . , , . .





–

:





  • , - (MFCC). feature engineering;









  • Deep Learning, , speech2text. , , , ;





  • . , Yandex, Google, AWS , speech2text , , .





, , – .





, :





  1. ;





  2. ( );





  3. ;





  4. .





, – , .





. , , / .





(, ) 3Sigma - . , , / .





.





spectral & rhythm features librosa , . PCA , 0.99 ROC_AUC.





, speech2text. : Yandex, Google, Amazon. , , : , speech2text , .





... , , , . , , , . ( ). , – timestamp .





– spectral & rhythm features librosa, tsfresh PCA ( ). : ROC_AUC 0.85, – , .





( 52 ; <100 ). – , .





– . onset_detection, , . balanced_accuracy_score, 0.80, .





, , . – DTW Audio Fingerprinting. , , , , .





production

  • Python;





  • Kafka – ;





  • Django .





, , . , , .





. , , : , . , , , , .





( , , ~10 ) , «» , – . , , . , . , .





, , , , -, – 80% . -, : .





In the future, a complete transition is planned from the human-in-the-loop model, when a person's participation in diagnostics is necessary, to a complete automation of the process thanks to a retrained model. 








All Articles