You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: tutorials/audio/callhome_diar/README.md
+6-5Lines changed: 6 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,14 +1,14 @@
1
1
# Speaker Diarization on CallHome English with NeMo Curator
2
2
3
-
This tutorial runs [Streaming Sortformer](https://huggingface.co/nvidia/diar_streaming_sortformer_4spk-v2) speaker diarization on the [CallHome English](https://catalog.ldc.upenn.edu/LDC97S42) dataset using NeMo Curator's `InferenceSortformerStage`, then evaluates Diarization Error Rate (DER).
3
+
This tutorial runs [Streaming Sortformer](https://huggingface.co/nvidia/diar_streaming_sortformer_4spk-v2.1) speaker diarization on the [CallHome English](https://catalog.ldc.upenn.edu/LDC97S42) dataset using NeMo Curator's `InferenceSortformerStage`, then evaluates Diarization Error Rate (DER).
4
4
5
5
Inference runs in parallel via `Pipeline` + `XennaExecutor` for high throughput.
6
6
7
7
## Prerequisites
8
8
9
9
- Python 3.10+
10
10
- NeMo Curator installed (see [installation guide](https://docs.nvidia.com/nemo/curator/latest/admin/installation.html))
11
-
-[`sox`](https://sox.sourceforge.net/) command-line tool (for stereo-to-mono conversion; install via `apt install sox`, `brew install sox`, or `conda install -c conda-forge sox`)
11
+
-[`ffmpeg`](https://ffmpeg.org/) command-line tool (for stereo-to-mono conversion; pre-installed in the NeMo Curator container)
12
12
- CallHome English dataset with `.wav` files and `eng/*.cha` ground-truth annotations
13
13
14
14
### Dataset layout
@@ -51,7 +51,7 @@ Key arguments:
51
51
|`--output-dir`|`output`| Root for RTTM files, results JSON, and checkpoints |
52
52
|`--collar`|`0.25`| Collar tolerance (seconds) for DER scoring |
53
53
|`--clean`| off | Remove entire output directory before re-running |
54
-
|`--model`|`nvidia/diar_streaming_sortformer_4spk-v2`| Hugging Face model id |
54
+
|`--model`|`nvidia/diar_streaming_sortformer_4spk-v2.1`| Hugging Face model id |
55
55
56
56
### Streaming configuration
57
57
@@ -67,7 +67,7 @@ All values are in **80 ms frames**. Override via `--chunk-len`, `--chunk-right-c
67
67
## What the script does
68
68
69
69
1.**File discovery (`CallHomeReaderStage`)** — Scans the dataset directory for WAV files with matching `.cha` annotations, skipping already-processed files. Emits one `AudioTask` per file.
70
-
2.**Mono conversion (`EnsureMonoStage`)** — CallHome WAVs are stereo (one channel per speaker). This stage downmixes to mono 16 kHz via `sox` so the model sees both speakers.
70
+
2.**Mono conversion (`EnsureMonoStage`)** — CallHome WAVs are stereo (one channel per speaker). This stage downmixes to mono 16 kHz via `ffmpeg` so the model sees both speakers.
71
71
3.**Diarization inference (`InferenceSortformerStage`)** — Runs Streaming Sortformer on each mono file. Also writes RTTM files to `<output-dir>/rttm/`.
72
72
4.**DER evaluation (`DERComputationStage`)** — Compares predicted segments against CHA ground truth. Scoring is restricted to the UEM region (min/max annotated timestamps from CHA) with a configurable collar tolerance (default 0.25 s).
73
73
@@ -102,7 +102,7 @@ pipeline = Pipeline(
102
102
stages=[
103
103
MyAudioReaderStage(data_dir="/path/to/audio"), # your reader stage
0 commit comments