ESP Audio Render is a high-level audio rendering component for Espressif SoCs. It multiplexes one or more PCM input streams, applies optional audio processing via ESP-GMF pipelines (ALC, Sonic, EQ, etc.), and outputs through a user-defined writer callback.
- Stream: An individual PCM input to the renderer (e.g., music, TTS, notification).
- Stream Processor: Audio effect applied before mixing (e.g., Sonic speed change, EQ).
- Mixed Processor: Effect applied after mixing (e.g., ALC, limiter).
- Element: A GMF processing node that implements a function (EQ, Sonic, etc.).
- Pool: The memory/object pool used by GMF to create processors.
- Writer: User callback that receives final PCM data for playback (e.g., I2S, Bluetooth sink).
- Multiple input streams mixed into a single output
- Optional per-stream and/or mixed processing using ESP-GMF elements
- Flexible sink: Customizable through write callbacks
- Dynamic processor pipeline generation for optimization
- Advanced control like
pause,resume,flush,speed change
For one stream, processing (if any) is applied inline and the result is directly sent to the writer.
flowchart LR
A[Input Stream] -- PCM --> P[Optional Stream Processor]
P --> SINK[Writer]
In multi-stream mode, each stream has its own processor and buffer. A mixer thread combines them, applies optional mixed-processing, and outputs via the writer.
flowchart LR
A[Stream 0] --> P0[Processor] --> RB0[Buffer]
B[Stream 1] --> P1[Processor] --> RB1[Buffer]
N[Stream N] --> PN[Processor] --> RBN[Buffer]
subgraph Mixer Thread
RB0 --> M[Mixer]
RB1 --> M
RBN --> M
end
M --> Mixed[Mixed Processor]
Mixed --> SINK[Writer]
Renderer is created with esp_audio_render_cfg_t.
| Field | Description | Required | Default |
|---|---|---|---|
max_stream_num |
Max number of streams (1 = no mixer, >1 = mixer) | ✅ | — |
out_writer |
Final PCM writer callback | ✅ | — |
out_ctx |
Context pointer for writer callback | Optional | NULL |
out_sample_info |
Desired output format (must match sink) | Optional | Dynamic change through esp_audio_render_set_out_sample_info |
pool |
GMF pool handle (needed if using processors) | Optional | NULL |
process_period |
Mixer process unit in ms (e.g. 20 ms) | Optional | 20 ms |
stateDiagram-v2
[*] --> Created: esp_audio_render_create
Created --> Opened: esp_audio_render_stream_open
Opened --> Writing: esp_audio_render_stream_write
Writing --> Paused: esp_audio_render_stream_pause(true)
Paused --> Writing: esp_audio_render_stream_pause(false)
Writing --> Closed: esp_audio_render_stream_close
Closed --> Destroyed: esp_audio_render_destroy
Notes:
- Destroy will automatically close all streams - never call any render related API again
- After closing, a stream can be re-opened if needed.
- 🎵 Single Stream Playback – Decode and render one audio source (e.g., MP3, WAV).
- 📱 TTS + Notification Mixing – Mix voice prompts with system sounds.
- 🎧 Background Music + Voice Chat – Simultaneous playback, with optional ducking (lowering background music volume during voice).
- 🎹 Music Generation (Auto-Generated Tracks) – Example: a piano piece split into 4 tracks (melody, chords, pedal, percussion), mixed in real-time.
#include "esp_audio_render.h"
#include "esp_gmf_pool.h"
#include "esp_codec_dev.h"
static int my_write(uint8_t *pcm, uint32_t len, void *ctx)
{
esp_codec_dev_handle_t play_handle = ctx;
esp_codec_dev_write(play_handle, pcm, len);
return 0;
}
// Make sure codec device is opened and set to 48kHz, 2ch, 16bit
void example(esp_codec_dev_handle_t play_handle)
{
esp_gmf_pool_handle_t pool = NULL;
esp_gmf_pool_init(&pool);
// Add your customized element into pool, or use esp-gmf loader
esp_audio_render_cfg_t cfg = {
.max_stream_num = 1,
.out_writer = my_write,
.out_ctx = play_handle,
.out_sample_info = {
.sample_rate = 48000,
.channels = 2,
.bits_per_sample = 16,
},
.pool = pool,
};
esp_audio_render_handle_t render = NULL;
esp_audio_render_create(&cfg, &render);
// Suppose input sample info is 16kHz, 2ch, 16bit
esp_audio_render_sample_info_t in = {
.sample_rate = 16000,
.channels = 2,
.bits_per_sample = 16,
};
esp_audio_render_stream_handle_t stream;
esp_audio_render_stream_get(render, ESP_AUDIO_RENDER_FIRST_STREAM, &stream);
esp_audio_render_stream_open(stream, &in);
// Loop to feed data
esp_audio_render_stream_write(stream, buf, len);
esp_audio_render_stream_close(stream);
esp_audio_render_destroy(render);
esp_gmf_pool_deinit(pool);
}For more detailed usage, take the example code audio_render and simple_piano for reference.
- Align PCM frame sizes with the configured process period (default 20 ms).
- In multi-stream mode, avoid underruns by ensuring all streams provide enough data.
- Use
esp_audio_render_stream_get_latency()to monitor end-to-end buffering.