Building a voice assistant in Python, part 1

Good afternoon. Probably, everyone watched films about the iron man and wanted a voice assistant similar to Jarvis. In this post, I'll show you how to make such an assistant from scratch. My program will be written in python 3 on windows operating system. So let's go!



Implementation



Our assistant will work according to the following principle:



  1. Constantly "listen" to the microphone
  2. Recognize words on google
  3. Execute the command, or respond


1) Synthesis of speech



First, we will install Russian voices in the windows system. To do this, follow the link and download the voices in the SAPI 5 -> Russian section. There are 4 voices there, you can choose any one you like. Install and move on.



We need to supply the pyttsx3 library for speech synthesis:



pip install pyttsx3


Then you can run the test program and check if it is running correctly.



import pyttsx3

text = '- '
tts = pyttsx3.init()
rate = tts.getProperty('rate') # 
tts.setProperty('rate', rate-40)

volume = tts.getProperty('volume') # 
tts.setProperty('volume', volume+0.9)

voices = tts.getProperty('voices')

#    
tts.setProperty('voice', 'ru') 

#    
for voice in voices:
    if voice.name == 'Anna':
        tts.setProperty('voice', voice.id)

tts.say(text)
tts.runAndWait()


2) Speech Recognition



There are many speech recognition tools, but they are all paid. So I tried to find a free solution for my project and found it! This is the speech_recognition library.



pip install SpeechRecognition


We also need the PyAudio library to work with the microphone.



pip install PyAudio


Some people have a problem installing PyAudio, so you should follow this link and download the version of PyAudio you need. Then enter into the console:



pip instal   


Then you run the test program. But before that, you must correct device_index = 1 in it to your microphone index value. You can find out the microphone index using this program:



import speech_recognition as sr
for index, name in enumerate(sr.Microphone.list_microphone_names()):
    print("Microphone with name \"{1}\" found for `Microphone(device_index={0})`".format(index, name))


Speech recognition test:



import speech_recognition as sr

def record_volume():
    r = sr.Recognizer()
    with sr.Microphone(device_index = 1) as source:
        print('.')
        r.adjust_for_ambient_noise(source, duration=0.5) #  
        print('...')
        audio = r.listen(source)
    print('.')
    try:
        query = r.recognize_google(audio, language = 'ru-RU')
        text = query.lower()
        print(f' : {query.lower()}')
    except:
        print('Error')

while True:
    record_volume()


If everything is fine, move on.



If you want the assistant to just talk to you (no AI), then this can be done using Google's free DialogFlow tool . After you log in, you will see a screen where you can already create your first bot. Click Create agent. We come up with a name for the bot (Agent name), select the language (Default Language) and click Create. The bot has been created!



To add new answers to different questions, you need to create a new intent. To do this, in the intents section, click Create intent. We fill in the fields "Title" and Training phrases, and then the answers. Click Save. That's all.



To control a bot in python, you need to write the following code. In my program, the bot voices all the answers.



import apiai, json, re
import pyttsx3
import speech_recognition as sr

tts = pyttsx3.init()
rate = tts.getProperty('rate')
tts.setProperty('rate', rate-40)
volume = tts.getProperty('volume')
tts.setProperty('volume', volume+0.9)
voices = tts.getProperty('voices')
tts.setProperty('voice', 'ru')
for voice in voices:
    if voice.name == 'Anna':
        tts.setProperty('voice', voice.id)

def record_volume():
    r = sr.Recognizer()
    with sr.Microphone(device_index = 1) as source:
        print('.')
        r.adjust_for_ambient_noise(source, duration=1) 
        print('...')
        audio = r.listen(source)
    print('.')
    try:
        query = r.recognize_google(audio, language = 'ru-RU')
        text = query.lower()
        print(f' : {query.lower()}')
        textMessage( text )
    except:
        print(' .')

def talk( text ):
    tts.say( text )
    tts.runAndWait()

def textMessage( text ):
    request = apiai.ApiAI(' ').text_request() #  API  Dialogflow
    request.lang = 'ru' #      
    request.session_id = ' id' # ID   (,    )
    request.query = text #        
    responseJson = json.loads(request.getresponse().read().decode('utf-8'))
    response = responseJson['result']['fulfillment']['speech'] #  JSON   
    #      -  ,   -    
    if response:
        request.audio_output = response
        talk(response)
    else:
        talk('.     .')

while True:
    record_volume()


That's all for today. In the next part I will tell you how to make a smart bot, i.e. so that he can not only answer, but also do something.



All Articles