The simplest Python voice assistant





To create a voice assistant, you do not need to have a lot of knowledge in programming, the main thing is to understand what functionality it should own. Many companies create them on the first line of communication with the client for convenience, workflow optimization and the best call classification. This article presents a program that can become the basis for your own chatbot, or more precisely, a voice assistant for voice recognition and subsequent command execution. With its help, we will be able to understand how the most frequently used voice assistants work.



First, let's declare the libraries we need:



import speech_recognition as sr
import os
import sys
import webbrowser
import pyttsx3 as p
from datetime import datetime
import time
import datetime
import random


Also, let's not forget to keep a log file, which we will need if we nevertheless decide to improve the bot for working with a neural network. Many companies use a neural network in their voice assistants to understand and respond to customer emotions.



Also, do not forget that by analyzing the logs, we will be able to understand the weak points of the bot algorithm and improve interaction with customers.



# 
chat_log = [['SESSION_ID', 'DATE', 'AUTHOR', 'TEXT', 'AUDIO_NUM']]
#  
i = 1
exit = 0
while exit == 0:
    session_id = str(i)
    if session_id not in os.listdir():
        os.mkdir(session_id)
        exit = 1
    else:
        i = i + 1
#   bot
author = 'Bot'
text = '!     ?'


In the log file, we write the time of the message, the author (bot or user) and the actual text itself.



#       
def log_me(author, text, audio): 
    now = datetime.datetime.now()
    i = 1
    exit = 0
    while exit == 0:
        audio_num = str(i)+'.wav'
        if audio_num not in os.listdir(session_id):
            exit = 1
        else:
            i = i + 1
    os.chdir(session_id)
    with open(audio_num , "wb") as file:
        file.write(audio.get_wav_data())
    chat_log.append([now.strftime("%Y-%m-%d %H:%M:%S"), author, text, audio_num])


We display the first message authored by the bot: Hello! How can I help you?



#             
print("Bot: "+ text)
log_me(author, text, audio)


And using this procedure in Jupyter Notebook, we can speak through the default playback device, spoken words:



# words
def talk(words):
    engine.say(words)
    engine.runAndWait()


We discussed above how to voice the text, but how can we turn our voice into text? Here speech recognition from Google and some manipulations with the microphone will help us.



#     
def command():
    rec = sr.Recognizer()
    with sr.Microphone() as source:
        #   
        print('Bot: ...')
        #   
        rec.pause_threshold = 1
        #    
        rec.adjust_for_ambient_noise(source, duration=1)
        audio = rec.listen(source)
    try:
        #     GOOGLE
        text = rec.recognize_google(audio, language="ru-RU").lower()
        #    
        print(':  ' + text[0].upper() + text[1:])
        log_me('User', text, audio)
    #     
    except sr.UnknownValueError:
        text = ' . .'
        print('Bot: ' + text)
        talk(text)
        #  
        text = command()
        log_me('Bot', text, , Null)
    return text


What can our assistant do besides listening to us? Everything is limited by our imagination! Let's take a look at some interesting examples.



Let's start with a simple one, let him open the site with the command - he will open the site (didn't you expect?).



#  ,       
def makeSomething(text):
    if ' ' in text:
        print('Bot:   NewTechAudit.')
        talk('  NewTechAudit.')
        log_me('Bot','  NewTechAudit.', Null)
        webbrowser.open('https://newtechaudit.ru/')


Sometimes it is useful to listen to your own words, but through someone else's lips. Let the bot still be able to repeat after us:



#  
    elif '' in text or '' in text or '' in text:
        print('Bot: ' + text[10].upper() + text[11:])
        talk(text[10:])
        log_me('Bot', text[10].upper() + text[11:] , Null)


Let him also be the interlocutor, but for now we will start only with an acquaintance:



#  
    elif ' ' in text or '  ' in text or ' ' in text:
        print('Bot:   Bot.')
        talk('  Bot')
        log_me('Bot', '  Bot', Null)


We can also ask the voice assistant to name a random number within the limits chosen by us in the format: Name a random number from (1st number) to (2nd number).



#  
    elif ' ' in text:
        ot=text.find('')
        do=text.find('')
        f_num=int(text[ot+3:do-1])
        l_num=int(text[do+3:])
        r=str(random.randint(f_num, l_num))
        print('Bot: ' + r)
        talk(r)
        log_me('Bot', r, Null)


In order to complete the program, you just need to say goodbye to the bot:



# 
    elif '' in text or ' ' in text:
        print('Bot:  !')
        talk(' ')
        log_me('Bot', ' ', Null)
        os.chdir(session_id)
        log_file = open( session_id + ".txt", "w")
        for row in chat_log:
            np.savetxt(log_file, row)
        log_file.close()
        sys.exit()


And to make it all work continuously, we create an endless loop.



#   
while True:
    makeSomething(command())


Let's conduct a test dialogue:







In the created session folder, all audio recording files of our voice and a text log file are stored:







The text log file is written:







In this article, we examined the simplest voice bot and the main useful functionality for the bot's further work with a neural network. To analyze the quality of the assistance provided and further work on improvement, we will be able to check the log file.



This bot can be the basis for your very own Jarvis!



All Articles