Name	Name	Last commit message	Last commit date
parent directory ..
examples	examples
include	include
private_inc	private_inc
src	src
test_apps	test_apps
CHANGELOG.md	CHANGELOG.md
CMakeLists.txt	CMakeLists.txt
Kconfig	Kconfig
LICENSE	LICENSE
README.md	README.md
README_CN.md	README_CN.md
idf_component.yml	idf_component.yml

ESP Audio Render

中文版

ESP Audio Render is a high-level audio rendering component for Espressif SoCs. It multiplexes one or more PCM input streams, applies optional audio processing via ESP-GMF pipelines (ALC, Sonic, EQ, etc.), and outputs through a user-defined writer callback.

Glossary

Stream: An individual PCM input to the renderer (e.g., music, TTS, notification).
Stream Processor: Audio effect applied before mixing (e.g., Sonic speed change, EQ).
Mixed Processor: Effect applied after mixing (e.g., ALC, limiter).
Element: A GMF processing node that implements a function (EQ, Sonic, etc.).
Pool: The memory/object pool used by GMF to create processors.
Writer: User callback that receives final PCM data for playback (e.g., I2S, Bluetooth sink).

Key Features

Multiple input streams mixed into a single output
Optional per-stream and/or mixed processing using ESP-GMF elements
Flexible sink: Customizable through write callbacks
Dynamic processor pipeline generation for optimization
Advanced control like pause, resume, flush, speed change

Architecture

Single Stream

For one stream, processing (if any) is applied inline and the result is directly sent to the writer.

flowchart LR
  A[Input Stream] -- PCM --> P[Optional Stream Processor]
  P --> SINK[Writer]

Multiple Streams

In multi-stream mode, each stream has its own processor and buffer. A mixer thread combines them, applies optional mixed-processing, and outputs via the writer.

flowchart LR
  A[Stream 0] --> P0[Processor] --> RB0[Buffer]
  B[Stream 1] --> P1[Processor] --> RB1[Buffer]
  N[Stream N] --> PN[Processor] --> RBN[Buffer]

  subgraph Mixer Thread
    RB0 --> M[Mixer]
    RB1 --> M
    RBN --> M
  end
  M --> Mixed[Mixed Processor]
  Mixed --> SINK[Writer]

Configuration

Renderer is created with esp_audio_render_cfg_t.

Field	Description	Required	Default
`max_stream_num`	Max number of streams (1 = no mixer, >1 = mixer)	✅	—
`out_writer`	Final PCM writer callback	✅	—
`out_ctx`	Context pointer for writer callback	Optional	NULL
`out_sample_info`	Desired output format (must match sink)	Optional	Dynamic change through `esp_audio_render_set_out_sample_info`
`pool`	GMF pool handle (needed if using processors)	Optional	NULL
`process_period`	Mixer process unit in ms (e.g. 20 ms)	Optional	20 ms

Stream Lifecycle

stateDiagram-v2
  [*] --> Created: esp_audio_render_create
  Created --> Opened: esp_audio_render_stream_open
  Opened --> Writing: esp_audio_render_stream_write
  Writing --> Paused: esp_audio_render_stream_pause(true)
  Paused --> Writing: esp_audio_render_stream_pause(false)
  Writing --> Closed: esp_audio_render_stream_close
  Closed --> Destroyed: esp_audio_render_destroy

Notes:

Destroy will automatically close all streams - never call any render related API again
After closing, a stream can be re-opened if needed.

Typical Scenarios

🎵 Single Stream Playback – Decode and render one audio source (e.g., MP3, WAV).
📱 TTS + Notification Mixing – Mix voice prompts with system sounds.
🎧 Background Music + Voice Chat – Simultaneous playback, with optional ducking (lowering background music volume during voice).
🎹 Music Generation (Auto-Generated Tracks) – Example: a piano piece split into 4 tracks (melody, chords, pedal, percussion), mixed in real-time.

Minimal Example

#include "esp_audio_render.h"
#include "esp_gmf_pool.h"
#include "esp_codec_dev.h"

static int my_write(uint8_t *pcm, uint32_t len, void *ctx)
{
    esp_codec_dev_handle_t play_handle = ctx;
    esp_codec_dev_write(play_handle, pcm, len);
    return 0;
}

// Make sure codec device is opened and set to 48kHz, 2ch, 16bit
void example(esp_codec_dev_handle_t play_handle)
{
    esp_gmf_pool_handle_t pool = NULL;
    esp_gmf_pool_init(&pool);
    // Add your customized element into pool, or use esp-gmf loader

    esp_audio_render_cfg_t cfg = {
        .max_stream_num = 1,
        .out_writer = my_write,
        .out_ctx = play_handle,
        .out_sample_info = {
            .sample_rate = 48000,
            .channels = 2,
            .bits_per_sample = 16,
        },
        .pool = pool,
    };
    esp_audio_render_handle_t render = NULL;
    esp_audio_render_create(&cfg, &render);

    // Suppose input sample info is 16kHz, 2ch, 16bit
    esp_audio_render_sample_info_t in = {
        .sample_rate = 16000,
        .channels = 2,
        .bits_per_sample = 16,
    };
    esp_audio_render_stream_handle_t stream;
    esp_audio_render_stream_get(render, ESP_AUDIO_RENDER_FIRST_STREAM, &stream);
    esp_audio_render_stream_open(stream, &in);

    // Loop to feed data
    esp_audio_render_stream_write(stream, buf, len);

    esp_audio_render_stream_close(stream);
    esp_audio_render_destroy(render);
    esp_gmf_pool_deinit(pool);
}

For more detailed usage, take the example code audio_render and simple_piano for reference.

Best Practices

Align PCM frame sizes with the configured process period (default 20 ms).
In multi-stream mode, avoid underruns by ensuring all streams provide enough data.
Use esp_audio_render_stream_get_latency() to monitor end-to-end buffering.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

ESP Audio Render

Glossary

Key Features

Architecture

Single Stream

Multiple Streams

Configuration

Stream Lifecycle

Typical Scenarios

Minimal Example

Best Practices

FilesExpand file tree

esp_audio_render

Directory actions

More options

Directory actions

More options

Latest commit

History

esp_audio_render

Folders and files

parent directory

README.md

ESP Audio Render

Glossary

Key Features

Architecture

Single Stream

Multiple Streams

Configuration

Stream Lifecycle

Typical Scenarios

Minimal Example

Best Practices