In this tutorial (if you can call it that) I will show you how you can quickly and quite simply organize the playback of an audio file using the ESP32 microcontroller.
A bit of theory
As Wikipedia tells us, the ESP32 is a series of low-cost, low-power microcontrollers. They are a system on a chip (SoC) with integrated Wi-Fi and Bluetooth controllers and antennas. Based on the Tensilica Xtensa LX6 core in single and dual core variants. A radio frequency path is integrated into the system. MK was created and developed by the Chinese company Espressif Systems, and is manufactured by TSMC according to the 40 nm process technology. You can read more about the capabilities of the chip on the Wikipedia page and in the official documentation.
Once, as part of mastering this controller, I wanted to play sound on it. At first I thought I would have to use PWM. However, after reading the documentation more closely, I discovered the presence of two channels of an 8-bit DAC. Of course, this radically changed the matter.
The Technical Reference says that the DAC in the ESP32 is built on a chain of resistors (apparently, it means the R2R chain) using a certain buffer. The output voltage can be varied from 0 volts to supply voltage (3.3 volts) with a resolution of 8 bits (i.e. 256 values). The conversion of the two channels is independent. There is also a built-in CW generator and DMA support.
I decided not to go into DMA for now, limiting myself to building a player based on a timer. As you know, in order to reproduce the simplest WAV file of PCM format, it is enough to read raw data from it at the sampling rate specified in the file and push it through the DAC channels, preliminarily reducing (if necessary) the bitness of the data to the bitness of the DAC. I was lucky: I found a set of sounds in the WAV PCM 8 bit 11025 Hz mono format, ripped from the resources of an old game. This means that we will use only one DAC channel.
We will also need a timer capable of generating 11025 Hz interrupts. According to the same Technical Reference, ESP32 has on board two timer modules with two timers each, for a total of four timers. They are 64-bit, each with a 16-bit prescaler and the ability to generate an interrupt on a level or an edge.
From theory to practice
Armed with the wave_gen example from esp-idf, I set off to write the code. I didn't bother with creating a file system: the goal was to get sound, and not make a full-fledged player out of ESP32.
To begin with, I overtook one of the WAV files to the sish array. The xxd utility built into Debian helped me a lot with this. Simple command
$ xxd -i file.wav > file.c
we get a sish file with an array of data in hexadecimal form inside and even with a separate variable that contains the file size in bytes.
Next, I commented out the first 44 bytes of the array - the header of the WAV file. Along the way, I disassembled it by fields and found out all the information I needed about it:
const uint8_t sound_wav[] = {
// 0x52, 0x49, 0x46, 0x46, // chunk "RIFF"
// 0xaa, 0xb4, 0x01, 0x00, // chunk length
// 0x57, 0x41, 0x56, 0x45, // "WAVE"
// 0x66, 0x6d, 0x74, 0x20, // subchunk1 "fmt"
// 0x10, 0x00, 0x00, 0x00, // subchunk1 length
// 0x01, 0x00, // audio format PCM
// 0x01, 0x00, // 1 channel, mono
// 0x11, 0x2b, 0x00, 0x00, // sample rate
// 0x11, 0x2b, 0x00, 0x00, // byte rate
// 0x01, 0x00, // bytes per sample
// 0x08, 0x00, // bits per sample per channel
// 0x64, 0x61, 0x74, 0x61, // subchunk2 "data"
// 0x33, 0xb4, 0x01, 0x00, // subchunk2 length, bytes
From here you can see that our file has one channel, a sampling rate of 11025 hertz and a resolution of 8 bits per sample. Note that if I wanted to parse the header programmatically, then I would need to take into account the byte order: in WAV it is Little-endian, that is, the least significant byte first.
I ended up creating a structure type for storing sound information:
typedef struct _audio_info
{
uint32_t sampleRate;
uint32_t dataLength;
const uint8_t *data;
} audio_info_t;
And created an instance of the structure itself, filling it in as follows:
const audio_info_t sound_wav_info =
{
11025, // sampleRate
111667, // dataLength
sound_wav // data
};
In this structure, the sampleRate field is the value of the header field of the same name, the dataLength field is the value of the subchunk2 length field, and the data field is a pointer to an array with data.
Next, I included the header files:
#include "driver/timer.h"
#include "driver/dac.h"
and created function prototypes to initialize the timer and its Alarm interrupt handler, as in the wave_gen example:
static void IRAM_ATTR timer0_ISR(void *ptr)
{
}
static void timerInit()
{
}
Then he started filling the initialization function.
The timers in ESP32 end up clocked from APB_CLK_FREQ equal to 80 MHz:
driver / timer.h:
#define TIMER_BASE_CLK (APB_CLK_FREQ) /*!< Frequency of the clock on the input of the timer groups */
soc / soc.h:
#define APB_CLK_FREQ ( 80*1000000 ) //unit: Hz
To get the counter value at which you need to generate an Alarm interrupt, you need to divide the clock frequency of the timer by the value of the prescaler, and then by the required frequency with which the interrupt should be triggered (for us it is 11025 Hz). In the interrupt handler, we will pass a pointer to the structure with the data that we want to reproduce.
Thus, the timer initialization function looks like this:
static void timerInit()
{
timer_config_t config = {
.divider = 8, //
.counter_dir = TIMER_COUNT_UP, //
.counter_en = TIMER_PAUSE, // -
.alarm_en = TIMER_ALARM_EN, // Alarm
.intr_type = TIMER_INTR_LEVEL, //
.auto_reload = 1, //
};
//
ESP_ERROR_CHECK(timer_init(TIMER_GROUP_0, TIMER_0, &config));
//
ESP_ERROR_CHECK(timer_set_counter_value(TIMER_GROUP_0, TIMER_0, 0x00000000ULL));
// Alarm
ESP_ERROR_CHECK(timer_set_alarm_value(TIMER_GROUP_0, TIMER_0, TIMER_BASE_CLK / config.divider / sound_wav_info.sampleRate));
//
ESP_ERROR_CHECK(timer_enable_intr(TIMER_GROUP_0, TIMER_0));
//
timer_isr_register(TIMER_GROUP_0, TIMER_0, timer0_ISR, (void *)&sound_wav_info, ESP_INTR_FLAG_IRAM, NULL);
//
timer_start(TIMER_GROUP_0, TIMER_0);
}
The clock frequency of the timer is not divisible by 11025, no matter what prescaler we set. Therefore, I selected such a divider at which the frequency is as close as possible to the required one.
Now let's move on to writing the interrupt handler. Everything is simple here: we take the next byte from the array, feed it to the DAC, and move along the array further. However, first of all, you need to clear the timer interrupt flags and restart the Alarm interrupt:
static uint32_t wav_pos = 0;
static void IRAM_ATTR timer0_ISR(void *ptr)
{
//
timer_group_clr_intr_status_in_isr(TIMER_GROUP_0, TIMER_0);
// Alarm
timer_group_enable_alarm_in_isr(TIMER_GROUP_0, TIMER_0);
audio_info_t *audio = (audio_info_t *)ptr;
if (wav_pos >= audio->dataLength) wav_pos = 0;
dac_output_voltage(DAC_CHANNEL_1, *(audio->data + wav_pos));
wav_pos ++;
}
Yes, working with the built-in DAC in ESP32 boils down to calling one built-in function dac_output_voltage (actually not).
Actually, that's all. Now we need to enable the operation of the DAC channel we need inside the app_main () function and initialize the timer:
void app_main(void)
{
…
ESP_ERROR_CHECK(dac_output_enable(DAC_CHANNEL_1));
timerInit();
We collect, flash, listen :) In principle, you can connect the speaker directly to the controller leg - it will play. But it's better to use an amplifier. I used the TDA7050 that was lying around in my bins.
That's all. Yes, when I finally began to sing, I also thought that everything turned out to be much easier than I thought. However, maybe this article will help you in some way for those who have just started to master the ESP32.
Maybe someday (and if anyone likes this under-article) I'll drive an ESP32 DAC using DMA. It's still more interesting there, because in this case you will have to work with the built-in I2S module.
UPD.
I decided to give an example of how it works for me to demonstrate. This is a board from Heltec with OLED and LoRa transceiver, which, of course, are not used in this case.