Today smartphones , smartwatches and fitness trackers are everywhere. They are useful for monitoring ourselves, our surroundings, can send notifications and even detect serious problems such as atrial fibrillation. And we are only at the beginning of the micromonitoring movement.
In this article, we'll take a look at the idea of a dog mood detector. We will create a device that listens for ambient sounds and, if a dog is present, tries to determine what sound it is making: friendly barking, frightened whine, or aggressive growl. Depending on the user's preference, the device vibrates when it thinks it is necessary to check the dog. This could possibly help owners keep an eye on their dogs when they are out of earshot. Of course, this is just a prototype, and the results of this idea have not yet been tested in real conditions.
We'll be using an Arm-based Raspberry Pi computer to prototype this device. It is a great platform for implementing machine learning on end-user devices. Arm processors are not only used in the Raspberry Pi, they also work on many cell phones, mobile game consoles, and a host of other devices. The energy-efficient processor in these computers has ample processing power and can be bought at an affordable price just about anywhere.
Unfortunately, so far the capabilities of small devices are often limited by processing power, lack of Internet connectivity, and, as a rule, small amounts of data storage. Although such devices can complete many observations in a short time, if there is no Internet connection due to storage limitations, it is often impossible to save all observations for later synchronization. In addition, sending huge amounts of data wirelessly also consumes an already small amount of battery power.
In order to make the most of the recorded signals, it is imperative to move the signal processing stage to the end-user devices themselves.
Over the past decade, machine learninghas made significant advances in the accuracy of many signal processing tasks such as object detection in images, video gesture recognition, and speech recognition. Today we are only scratching the surface of the possible. Using ML on small devices offers countless other ways to improve people's lives.
Start
We'll take a look at Google AudioSet for training . This is the largest collection of 10 second audio clips from YouTube videos. The data is provided in a preprocessed format compatible with the YouTube-8M starter kit . It will be used to train a model capable of classifying audio clips.
It may take some time to train this model, so we will move the processing to the Google Cloud AI platform.and after its completion, load the model. With all the components ready, we will transfer the model to the Raspberry Pi. We will also create a Python script to grab the input from the connected microphone and try to predict every second identified dog sounds.
Creating the desired model
First, let's create a folder somewhere for all of the work we're going to do.
To create a model, a dataset must be loaded . It is available from the link under the heading Features dataset. The easiest way is to download one gzip archived archive to your local computer.
Then unzip it and extract the files. There are three folders in this package: one contains the balanced training sample, the other contains the evaluation set, and the third contains the unbalanced training sample. Each folder contains over 4000 files.
TFRecord files contain preprocessed tags. File names start with the first two characters of the YouTube video ID. Because video IDs are case sensitive, you should be careful when extracting files if the local file system is case insensitive, as in Windows.
Helpful advice! The 7zip program is used to extract such trait files. 7zip supports command line options. This allows you to automatically rename existing files, making sure that files are renamed rather than overwritten.
Having received the correctly extracted dataset, we clone the YouTube-8M Github repositorywhich contains the code to train the model. It is recommended to clone it into the folder that was created for the extracted dataset.
Then update the readers.py file in the YouTube-8M folder to support the old AudioSet TFRecord files. This process includes two stages:
- Change all occurrences of "id" to "video_id".
- Change the default value of the num_classes parameter to 527. This number corresponds to the number of different categories in this audio dataset.
The identifier needs to be changed in five places and num_classes in two.
To run this program, deploy a new Python 3.6+ virtual environment and install tensorflow == 1.14. It is also convenient now to set the requirements for the output script that we are going to create in the next step. Although the version numbers are different for each package, the only hard requirement is to use tensorflow version 1.14. For other packages, you can simply install the latest version.
At this point, you are ready to train the model. First, run the training script locally to test it. It won't take long on a balanced training set. Open a command prompt window, navigate to the folder created in the first step of this section, and enter the following command (note that this is all one line):
python youtube-8m/train.py \ --train_data_pattern=./audioset_v1_embeddings/bal_train/*.tfrecord \
--num_epochs=100 \
--feature_names="audio_embedding" \
--feature_sizes="128" \
--frame_features \
--batch_size=512 \
--train_dir ./trained_models/yt8m \
--model=FrameLevelLogisticModel \
--start_new_model
Also note that while the line break characters \ work fine on Linux systems, they will have to be replaced with the ^ character on Windows.
After 100 epochs, this continues until it reaches approximately step 8500. The FrameLevelLogisticModel will operate with a maximum accuracy of approximately 58–59%. On our test system, the entire process took just under 20 minutes.
Other models are included with this starter kit, including DbofModel and LstmModel. Each will provide near perfect accuracy on training data, but both will be heavily overfitted on a balanced training set when tested with a score set.
Train the model in the cloud
An alternative is to train on a full set of sounds using an unbalanced dataset. In this case, processing will take much longer, but GPUs based on the Google Cloud AI platform can significantly help. A simple logistic model achieves an accuracy of about 88% on an unbalanced training set.
To run this process in the cloud, sign up and log in to your Google Cloud AI platform account, enable billing, and download the command line tools, which are detailed here .
With everything set up, go to the cloud console, create a new project and a new storage basket. The storage bucket name must be globally unique. It is easiest if it includes the name of the user account. Load the entire audioset_v1_embeddings and youtube-8m folders into this storage basket.
If done correctly, we should be able to open the Google Cloud SDK shell and run the commands below to get started. Be sure to replace your-project-name and your-storage-bucket-name with the appropriate account values. This is written for Unix based systems. Make the appropriate corrections for Windows systems.
BUCKET_NAME=gs://${USER}_yt8m_train_bucket
gsutil mb -p your-project-name $BUCKET_NAME
JOB_NAME=yt8m_train_$(date +%Y%m%d_%H%M%S)
gcloud --verbosity=debug ml-engine jobs submit training $JOB_NAME
--python-version 3.5 --package-path=youtube-8m --module-name=youtube-8m.train --staging-bucket=$BUCKET_NAME --region=us-east1 --config=youtube-8m/cloudml-gpu.yaml -- --train_data_pattern='gs://your-storage-bucket-name/audioset_v1_embeddings/unbal_train/*.tfrecord' --model=FrameLevelLogisticModel --train_dir=$BUCKET_NAME/yt8m_train_frame_level_logistic_model
Again, note that the last call to gcloud is one long command with configuration options.
It will take more than half a day to complete. When all is said and done, load the model output from your cloud storage bucket:
$BUCKET_NAME/yt8m_train_frame_level_logistic_model
Execution on Raspberry Pi
We are demonstrating this application on an Arm based Raspberry Pi 4 computer running a Raspbian OS with Python 3 installed. Install PyAudio on this device. In case of problems, this answer should help.
Connect a USB microphone (with an optional headset for audio output for testing). At this point, the microphone is easiest to set up as the default device. Go to your Raspian desktop and click on the speaker icon next to the clock in the upper right corner and then select the microphone to use.
The final important step is to get instruments that process raw audio with the same 128-D compression as the AudioSet. The tool used for this is included in the Github repository of Tensorflow models mentioned earlier. Follow the exact same installation procedure on the Pi, and don't forget to install it on your Python 3 instance. Also, clone this repository into the same folder where you cloned the YouTube-8M dataset and repository.
Run the vggish_smoke_test.py script to make sure everything is installed correctly.
Now copy the model downloaded from the Google Cloud platform to the folder with the microphone listening script .
Execute this script. It will start listening on the default device and will write the predictions to the console.
If the desired device cannot be configured as the default device, run the command “python model-run.py list” to display a list of all devices by index. Find the device index, then run the command again with that index. For instance:
python model-run.py 3
Copy the entire contents of this folder to your Raspberry Pi and run the script with the code again. Once a second, there should be predictions of how much noise the device thinks the dog is making! The withdrawal phase can be replaced with any mechanism most appropriate for the device and target user.
Conclusion
Today we looked at one possible application of sound-based machine learning supported by Arm-based mobile devices. This concept needs to be tested in more detail before going to market, but the ability to run an arbitrary audio detection model on a mobile device already exists.
The AudioSet data includes 527 tags with a robust ontology of urban sounds. There are also possibilities to improve the sound processing before passing it to our predictor, such as applying a cocktail party algorithm and passing each sound source through a vggish filter .
Running a dog mood detector on a Raspberry Pi with an Arm microprocessor is very exciting. To make this even more interesting, you can transform and digitize the model using the tools in the TensorFlow package, and then run it on a low-cost, low-power Arm microcontroller using the TensorFlow Lite microcontroller package .
Sounds interesting? Experiment and find out what problem this approach can solve. You never know how much you can influence someone's life. And to find out what else machine learning is capable of in the right hands, come to learn (not forgetting the HABR promo code, of course).
Other professions and courses
PROFESSION
COURSES
- Java developer profession
- JAVA QA engineer
- Frontend developer profession
- Profession Ethical hacker
- C ++ developer profession
- Profession Unity Game Developer
- Profession Web developer
- The profession of iOS developer from scratch
- Profession Android developer from scratch
COURSES
- Machine Learning Course
- « Machine Learning Data Science»
- «Machine Learning Deep Learning»
- Data Engineering
- «Python -»
- « »
- DevOps