Language bindings

All wrappers are thin shells over the same C-ABI surface in src/crispasr_c_api.cpp. Anything the CLI can do — transcribe, VAD, diarize, LID, align, download — is one function call in every language.

Session setter reference

All generation-control setters are available in every binding. Each call is a thin proxy over the C-ABI function of the same name. Setters that return an error code -2 are soft no-ops — the active backend doesn't expose that knob, but the call is safe to make.

C-ABI setter	Bindings name (Python/Rust/Go/Java)	Notes
`set_temperature(temp, seed)`	`set_temperature` / `set_temperature` / `SetTemperature` / `setTemperature`	ASR + TTS backends that sample; rc=-2 = no backend supports it
`set_tts_seed(seed)`	`set_tts_seed` / `set_tts_seed` / `SetTTSSeed` / `setTtsSeed`	Chatterbox, vibevoice, qwen3-tts, orpheus; rc=-2 for others
`set_max_new_tokens(n)`	`set_max_new_tokens` / `set_max_new_tokens` / `SetMaxNewTokens` / `setMaxNewTokens`	AR backends; ≤ 0 clears override
`set_frequency_penalty(f)`	`set_frequency_penalty` / `set_frequency_penalty` / `SetFrequencyPenalty` / `setFrequencyPenalty`	AR backends; ≤ 0 disables
`set_tts_steps(n)`	`set_tts_steps` / `set_tts_steps` / `SetTTSSteps` / `setTtsSteps`	Chatterbox S3Gen CFM steps; vibevoice DPM-Solver++ steps
`set_top_p(p)`	`set_top_p` / `set_top_p` / `SetTopP` / `setTopP`	Chatterbox AR T3 loop
`set_min_p(p)`	`set_min_p` / `set_min_p` / `SetMinP` / `setMinP`	Chatterbox AR T3 loop
`set_repetition_penalty(r)`	`set_repetition_penalty` / `set_repetition_penalty` / `SetRepetitionPenalty` / `setRepetitionPenalty`	Chatterbox (1.0 = no penalty)
`set_cfg_weight(w)`	`set_cfg_weight` / `set_cfg_weight` / `SetCFGWeight` / `setCfgWeight`	Chatterbox (0.5 = upstream default; 0 = unconditional)
`set_exaggeration(e)`	`set_exaggeration` / `set_exaggeration` / `SetExaggeration` / `setExaggeration`	Chatterbox emotion scalar (0.5 = upstream default)
`set_max_speech_tokens(n)`	`set_max_speech_tokens` / `set_max_speech_tokens` / `SetMaxSpeechTokens` / `setMaxSpeechTokens`	Chatterbox AR loop token budget (default 1000 ≈ 20 s)
`set_length_scale(s)`	`set_length_scale` / `set_length_scale` / `SetLengthScale` / `setLengthScale`	Kokoro phoneme duration multiplier (1.0 = normal)
`set_best_of(n)`	`set_best_of` / `set_best_of` / `SetBestOf` / `setBestOf`	Best-of-N sampling for temperature > 0
`set_beam_size(n)`	`set_beam_size` / `set_beam_size` / `SetBeamSize` / `setBeamSize`	Beam search width
`set_grammar_text(gbnf, root, penalty)`	`set_grammar_text` / `set_grammar_text` / `SetGrammarText` / `setGrammarText`	GBNF constrained decoding (whisper); empty string clears
`set_fallback_thresholds(...)`	`set_fallback_thresholds` / `set_fallback_thresholds` / `SetFallbackThresholds` / `setFallbackThresholds`	Whisper entropy/logprob/no-speech thresholds + temp-inc
`set_alt_n(n)`	`set_alt_n` / `set_alt_n` / `SetAltN` / `setAltN`	Per-token alternative candidates (whisper greedy)
`set_whisper_decode_extras(...)`	`set_whisper_decode_extras` / `set_whisper_decode_extras` / `SetWhisperDecodeExtras` / `setWhisperDecodeExtras`	suppress_nst, suppress_regex, carry_initial_prompt
`set_ask(prompt)`	`set_ask` / `set_ask` / `SetAsk` / `setAsk`	Free-form prompt for LLM-style backends

Tip — chunk-boundary dedup for bindings. When a binding drives a CAP_UNBOUNDED_INPUT backend (parakeet, canary, …) chunk-by-chunk and needs to stitch the output, call crispasr_lcs_dedup_prefix_count between adjacent chunks. It returns the number of leading tokens of chunk[i] that duplicate the tail of chunk[i-1] (NeMo-style sub-word LCS over emitted token ids). The binding then drops that many tokens from chunk[i] and rebuilds its own segment / word / text representation. The C declaration lives in include/crispasr.h; see also the --lcs-dedup / --lcs-min-length CLI flags.

Language	Status	Surface
C / C++	✓	Full (the C-ABI is the source of truth)
Python	✓	Full — transcribe, VAD, diarize, LID, align, registry
Rust	✓	Full — same surface as Python
Dart / Flutter	✓	Full — used by CrisperWeaver
Go	✓	Full (all 11 capabilities)
Java	✓	Transcribe + align + LID
Ruby	✓	Transcribe
JavaScript	partial	WebAssembly approach; see PLAN.md #59

Python

from crispasr import (
    Session, diarize_segments, detect_language_pcm,
    align_words, cache_ensure_file, registry_lookup,
    # Diarize pipeline primitives (#107):
    SpeakerEmbedder, PyannoteCache, agglomerative_cluster,
)

# Transcribe (any of the 24 ASR backends via one session object)
sess = Session("parakeet-tdt-0.6b-v3-q4_k.gguf")
sess.set_max_new_tokens(256)       # AR backends; <= 0 clears
sess.set_frequency_penalty(0.4)    # AR backends; <= 0 disables
segs = sess.transcribe_vad(pcm, "silero-v5.1.2.bin")  # stitched VAD pass

# Run each shared post-step standalone
lang = detect_language_pcm(pcm, model_path="ggml-tiny.bin")
diarize_segments(my_segs, pcm, method=DiarizeMethod.VAD_TURNS)
words = align_words("canary-ctc-aligner.gguf", "hello world", pcm)

# Auto-download a canonical model
entry = registry_lookup("parakeet")
path  = cache_ensure_file(entry.filename, entry.url)

# Custom diarize pipeline: pluggable embedder + cosine clustering.
# Same building blocks as `--diarize-embedder` in the CLI.
emb = SpeakerEmbedder("auto", n_threads=4)             # 'titanet'/'indextts'/.gguf
embeddings = [emb.embed(pcm[s.t0*16000:s.t1*16000]) for s in segs]
labels = agglomerative_cluster(embeddings, merge_threshold=0.5, max_speakers=8)
emb.close()

Install: pip install crispasr (or build locally from python/).

Rust

use crispasr::{
    Session, DiarizeMethod, DiarizeOptions, DiarizeSegment,
    LidMethod, detect_language_pcm, align_words,
    cache_ensure_file, registry_lookup,
    // Diarize pipeline primitives (#107):
    SpeakerEmbedder, PyannoteCache, agglomerative_cluster,
};

let sess = Session::open("cohere-transcribe-q4_k.gguf", 4)?;
sess.set_max_new_tokens(256)?;
sess.set_frequency_penalty(0.4)?;
let segs = sess.transcribe_vad(&pcm, "silero-v5.1.2.bin", None)?;

let entry = registry_lookup("canary")?.unwrap();
let path  = cache_ensure_file(&entry.filename, &entry.url, false, None)?;

// Custom diarize pipeline: pluggable embedder + cosine clustering.
let emb = SpeakerEmbedder::new("auto", 4, None)?;     // "titanet"/"indextts"/.gguf
let mut flat: Vec<f32> = Vec::new();
for s in &segs {
    if let Some(v) = emb.embed(&pcm[(s.t0 * 16000.0) as usize .. (s.t1 * 16000.0) as usize]) {
        flat.extend(v);
    }
}
let labels = agglomerative_cluster(&flat, (flat.len() / emb.dim() as usize) as i32,
                                   emb.dim(), 0.5, 8)?;

Crate: bindings/rust/.

Dart / Flutter

import 'package:crispasr/crispasr.dart' as crispasr;

final sess = crispasr.CrispasrSession.open(modelPath, backend: 'parakeet');
final segs = sess.transcribeVad(pcm, vadModelPath);

final lang = crispasr.detectLanguagePcm(
  pcm: pcm, method: crispasr.LidMethod.whisper, modelPath: tinyPath);
final words = crispasr.alignWords(
  alignerModel: ctcPath, transcript: text, pcm: pcm);

Package: flutter/crispasr/.

Reference application: CrisperWeaver — a cross-platform Flutter desktop/mobile transcription app built on package:crispasr. Ships with a model browser + downloader (all 10 backends + quants), drag-and-drop files, mic capture, SRT/VTT/TXT export, per-run performance metrics, and full en/de i18n. The v0.5.4 release uses transcribeVad so every non-whisper backend benefits from stitched Silero VAD with zero CrisperWeaver-side work.

Go

import "github.com/CrispStrobe/CrispASR/bindings/go/crispasr"

sess, _ := crispasr.OpenSession("parakeet.gguf", crispasr.SessionOpts{Threads: 4})
defer sess.Close()
_ = sess.SetMaxNewTokens(256)
_ = sess.SetFrequencyPenalty(0.4)
segs, _ := sess.Transcribe(pcm, crispasr.TranscribeOpts{Vad: true})

Module: bindings/go/crispasr/.

Java

import org.crispasr.CrispASR;

try (var sess = CrispASR.openSession("granite-speech.gguf")) {
    sess.setMaxNewTokens(256);
    sess.setFrequencyPenalty(0.4f);
    var segs = sess.transcribe(pcm);
}

JAR: bindings/java/.

Ruby

require "crispasr"

sess = CrispASR::Session.open("parakeet.gguf")
segs = sess.transcribe(pcm)

Gem: bindings/ruby/.

Mobile

./build-ios.sh                    # iOS xcframework with Metal
./build-android.sh --vulkan       # Android NDK with Vulkan GPU

The xcframework drops into a Swift/Objective-C app via package add crispasr.xcframework; the Android NDK build produces an .so that Flutter or native Android consumes through package:crispasr's FFI layer.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Language bindings

Session setter reference

Python

Rust

Dart / Flutter

Go

Java

Ruby

Mobile

FilesExpand file tree

bindings.md

Latest commit

History

bindings.md

File metadata and controls

Language bindings

Session setter reference

Python

Rust

Dart / Flutter

Go

Java

Ruby

Mobile