Skip to content

Commit aa1bc0d

Browse files
ruby : add VAD::Context#segments_from_samples, allow Pathname, etc. (#3633)
* ruby : Bump version to 1.3.6 * Fix code in example * Add sample code to transcribe from MemoryView * Define GetVADContext macro * Use GetVADContext * Extract parse_full_args function * Use parse_full_args in ruby_whisper_full_parallel * Free samples after use * Check return value of parse_full_args() * Define GetVADParams macro * Add VAD::Context#segments_from_samples * Add tests for VAD::Context#segments_from_samples * Add signature for VAD::Context#segments_from_samples * Add sample code for VAD::Context#segments_from_samples * Add test for Whisper::Context#transcribe with Pathname * Make Whisper::Context#transcribe and Whisper::VAD::Context#detect accept Pathname * Update signature of Whisper::Context#transcribe * Fix variable name * Don't free memory view * Make parse_full_args return struct * Fallback when failed to get MemoryView * Add num of samples when too long * Check members of MemoryView * Fix a typo * Remove unnecessary include * Fix a typo * Fix a typo * Care the case of MemoryView doesn't fit spec * Add TODO comment * Add optimazation option to compiler flags * Use ALLOC_N instead of malloc * Add description to sample code * Rename and change args: parse_full_args -> parse_samples * Free samples when exception raised * Assign type check result to a variable * Define wrapper function of whisper_full * Change signature of parse_samples for rb_ensure * Ensure release MemoryView * Extract fill_samples function * Free samples memory when filling it failed * Free samples memory when transcription failed * Prepare transcription in wrapper funciton * Change function name * Simplify function boundary
1 parent bf422cb commit aa1bc0d

19 files changed

+396
-150
lines changed

bindings/ruby/README.md

Lines changed: 33 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -323,7 +323,24 @@ whisper
323323
end
324324
```
325325
326-
The second argument `samples` may be an array, an object with `length` and `each` method, or a MemoryView. If you can prepare audio data as C array and export it as a MemoryView, whispercpp accepts and works with it with zero copy.
326+
The second argument `samples` may be an array, an object with `length` and `each` method, or a MemoryView.
327+
328+
If you can prepare audio data as C array and export it as a MemoryView, whispercpp accepts and works with it with zero copy.
329+
330+
```ruby
331+
require "torchaudio"
332+
require "arrow-numo-narray"
333+
require "whisper"
334+
335+
waveform, sample_rate = TorchAudio.load("test/fixtures/jfk.wav")
336+
# Convert Torch::Tensor to Arrow::Array via Numo::NArray
337+
samples = waveform.squeeze.numo.to_arrow.to_arrow_array
338+
339+
whisper = Whisper::Context.new("base")
340+
whisper
341+
# Arrow::Array exports MemoryView
342+
.full(Whisper::Params.new, samples)
343+
```
327344
328345
Using VAD separately from ASR
329346
-----------------------------
@@ -334,13 +351,27 @@ VAD feature itself is useful. You can use it separately from ASR:
334351
vad = Whisper::VAD::Context.new("silero-v6.2.0")
335352
vad
336353
.detect("path/to/audio.wav", Whisper::VAD::Params.new)
337-
.each_with_index do |segment, index|
354+
.each.with_index do |segment, index|
338355
segment => {start_time: st, end_time: ed} # `Segment` responds to `#deconstruct_keys`
339356
340357
puts "[%{nth}: %{st} --> %{ed}]" % {nth: index + 1, st:, ed:}
341358
end
342359
```
343360
361+
You may also low level API `Whisper::VAD::Context#segments_from_samples` as such `Whisper::Context#full`:
362+
363+
```ruby
364+
# Ruby Array
365+
reader = WaveFile::Reader.new("path/to/audio.wav", WaveFile::Format.new(:mono, :float, 16000))
366+
samples = reader.enum_for(:each_buffer).map(&:samples).flatten
367+
368+
# Or, object which exports MemoryView
369+
waveform, sample_rate = TorchAudio.load("test/fixtures/jfk.wav")
370+
samples = waveform.squeeze.numo.to_arrow.to_arrow_array
371+
372+
segments = vad.segments_from_samples(Whisper::VAD::Params.new, samples)
373+
```
374+
344375
Development
345376
-----------
346377

bindings/ruby/ext/extconf.rb

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@
77
have_library("gomp") rescue nil
88
libs = Dependencies.new(cmake, options).to_s
99

10+
$CFLAGS << " -O3 -march=native"
1011
$INCFLAGS << " -Isources/include -Isources/ggml/include -Isources/examples"
1112
$LOCAL_LIBS << " #{libs}"
1213
$cleanfiles << " build #{libs}"

bindings/ruby/ext/ruby_whisper.c

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,3 @@
1-
#include <ruby.h>
2-
#include <ruby/memory_view.h>
31
#include "ruby_whisper.h"
42

53
VALUE mWhisper;

bindings/ruby/ext/ruby_whisper.h

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
#ifndef RUBY_WHISPER_H
22
#define RUBY_WHISPER_H
33

4+
#include <ruby.h>
5+
#include <ruby/memory_view.h>
46
#include "whisper.h"
57

68
typedef struct {
@@ -55,6 +57,13 @@ typedef struct {
5557
struct whisper_vad_context *context;
5658
} ruby_whisper_vad_context;
5759

60+
typedef struct parsed_samples_t {
61+
float *samples;
62+
int n_samples;
63+
rb_memory_view_t memview;
64+
bool memview_exported;
65+
} parsed_samples_t;
66+
5867
#define GetContext(obj, rw) do { \
5968
TypedData_Get_Struct((obj), ruby_whisper, &ruby_whisper_type, (rw)); \
6069
if ((rw)->context == NULL) { \
@@ -69,6 +78,17 @@ typedef struct {
6978
} \
7079
} while (0)
7180

81+
#define GetVADContext(obj, rwvc) do { \
82+
TypedData_Get_Struct((obj), ruby_whisper_vad_context, &ruby_whisper_vad_context_type, (rwvc)); \
83+
if ((rwvc)->context == NULL) { \
84+
rb_raise(rb_eRuntimeError, "Not initialized"); \
85+
} \
86+
} while (0)
87+
88+
#define GetVADParams(obj, rwvp) do { \
89+
TypedData_Get_Struct((obj), ruby_whisper_vad_params, &ruby_whisper_vad_params_type, (rwvp)); \
90+
} while (0)
91+
7292
#define GetVADSegments(obj, rwvss) do { \
7393
TypedData_Get_Struct((obj), ruby_whisper_vad_segments, &ruby_whisper_vad_segments_type, (rwvss)); \
7494
if ((rwvss)->segments == NULL) { \

0 commit comments

Comments
 (0)