When designing a project with ability to process an audio signal or audio data we typically consider a subset of the following components:
- Analog signal input to connect e.g. a microphone
- Storage media, e.g. microSD card with audio files to read them
- WI-Fi interface to obtain an audio data stream from the internet
- Bluetooth interface to obtain an audio data stream from e.g. a BT headset
- I2S interface to obtain audio data stream from a codec chip
- Ethernet interface to obtain an audio data stream from the internet
- An internal chip’s flash memory with some audio samples to play
- User Interface e.g. buttons or some other means to provide user input
- Analog signal output to connect headphones or and amplifier with speakers
- Storage media, e.g. microSD card to write some audio files, e.g. with recording
- WI-Fi interface to send out an audio data stream to the internet
- Bluetooth interface to stream audio data to e.g. a BT headset
- I2S interface to stream some data to a codec chip
- Ethernet interface to stream an audio data stream to the internet
- An internal chip’s flash memory to store some audio recording
- User Interface e.g. a display, LEDs or some means of haptic feedback
Main Processing Unit:
A microcontroller or a computer with processing power to read the data from the input, process (e.g. encode / encode) and send to the output.
The ESP32 has all the above features or is able to support them (e.g. can drive Ethernet PHY). Considering the ESP32 cost is about $3, and availability of ESP-ADF software development platform, we are able to develop an audio project with minimum additional components at very low price.
Depending on the application, required functionality and performance, we may consider two project groups.
- Minimum - having minimum additional components, assuming using on board I2S, or PDM interface as well as DAC, if no high qualify audio on the output is required.
- Typical - with an external codec chip and a power amplifier, for high quality output audio and multiple input / output options.
There may be several variation between the above projects, by adding or removing features / components. Below are couple of examples.
With several peripherals on ESP32, I2S or PDM or DAC interfaces can be used to implement a minimum project. With the digital microphones, we could input voice signals and build a command voice control project minumum that could communicate with a cloud service.
With two on board DACs, if 8-bit width on the output is satisfactory, we may implement another project minimum - a device to play an internet connected radio.
When looking for better audio quality and more interfacing options we would use an external I2S codec to do all the analog input and output signal processing. The codec chip, depending on type, may provide additional functionality like audio input signal preamplifier, headphone output amplifier, multiple analog input and outputs, sound effects, etc. The I2S is considered as the industry standard for interfacing with audio codec chips, or in general for a high speed, continuous transfer of the audio data. To optimize performance of audio data processing additional memory may be required. For such cases consider using ESP32-WROVER that provides 4 MB PSRAM on a single module together with the ESP32 chip.
The ESP-ADF is designed primarily to support projects with a codec chip. The ESP32 LyraT board is an example of such a project. The software interfacing with the board is done by Audio HAL and a driver. The codec chip used on the ESP32 LyraT is ES8388. Boards with a different codec chip may be supported by providing a different driver.