Skip to content

Commit 0d14756

Browse files
ruby : add support for Parakeet (ggml-org#3885)
* Add Whisper::Parakeet::Params * Add tests for Parakeet::Params * Remove unused variabel * Add callbacks to Parakeet::Params * Group callback and user_data params * Undefine local macros * Define GetParakeetParams * Remove unused variable * Use ITERATE_CALLBACK_PARAMS * Use ITERATE_CALLBACK_PARAMS instead of ITERATE_USER_DATA_PARAMS * Fix memsize * Remove unnecessary macros * Simplify params registration * Define Parakeet * Add hook methods to Parakeet::Params * Fix typo * Check callback container in GetParakeetParams * Reduce if * Free parakeet_full_params * Implement Parakeet::Context#initialize * Add TestParakeetContext * Add Parakeet::Segment * Prevent double-free * Add Parakeet::Context#transcribe * Add Parakeet::Context#each_segment * Define Parakeet::Segment attributes * Define Parakeet::Segment#deconstruct_keys * Add tests for Parakeet::Segment#deconstruct_keys * Run Parakeet::Context#transcribe without GVL * Make it to abort for Parakeet * Add Parakeet.log_set * Define Parakeet::Token * Define Parakeet::Segment#each_token * Implement some hooks of Parakeet::Params * Convert int to VALUE * Implement hooks for Parakeet * Implement Parakeet::Context#full * Add tests for Parakeet::Context#full * Add Parakeet to RBS * Fix ruby_whisper_parakeet_params_free * Free ruby_whisper_parakeet_context * Add tests for hooks * Add Parakeet section to README * Add more attributes of Parakeet::Context * Add tests for Parakeet::Context's attributes * Update RBS * Register parakeet-tdt-0.6b-v3 * Narrow scope of log constants * Extract activate and deactivate of log_queue * Make start_log_callback_thread private * Don't call start_log_callback_thread unncecessarilly * Early return from log_queue_enqueue when not active * Gropu log_queue members * is_active -> is_open * Fix English * Share parakeet full body function * ruby_whisper_parakeet_abort_callback_user_data -> ruby_whisper_abort_callback_user_data * NULL check for callback containers * Fix Parakeet.log_set * Omit Parakeet tests on CI * Extract Whisper::LogSettable * Join log callback thread in a log queue function * Revert Join log callback thread in a log queue function * Extract output methods to modules * Move Parakeet init functions into init_parakeet() * Add output methods to Parakeet classes * Add Parakeet's output methods to RBS * Use Whisper::Output in RBS * Add LogSettable to RBS * Fix module Token -> class Token * Add Parakeet::Model * Add test for Parakeet::Model * Add Parakeet::Model to RBS * Move position of Parakeet::Model in RBS * Parakeet -> TestBase::Parakeet * Add Parakeet::Context#model in RBS * Add Whisper::Output * Fix nil check * Define ruby_whisper_parakeet_model_memsize * Fix order of declaration in ruby_whisper_parakeet_model_get_xxx * Define Parakeet.system_info_str * Add test for Parakeet.system_info_str * Add signature of Parakeet.system_info_str * Define Parakeet::VERSION * Add test for Parakeet::VERSION * Add signature of Parakeet::VERSION * Add Parakeet::Context::Params * Make Parakeet::Context.new accept Context::Params * Add test for Parakeet::Context.new with Context::Params * Update RBS * Remove params from Parakeet::Params which are moved from whisper_parakeet_full_params * Remove tests for removed params * Make Parakeet tests follow original behavior changes * Add Parakeet model shortcuts * Alloc token data in factory instead of alloc func * Fix variable name * Update RBS * Refactor log settable module * Use log settable for Whisper * Address deadlock * Make test follow change of log queue implementation * Refactor to make abort callback use the same way to parakeet's way * Remove redundant structs * Fix test name * Fix README * Add missing parallel transcription * Fix test for parakeet info * Remove removed params * Wait for logs dequeued * Fix instance variable name * Load etc feature * Remove unnecessary comment * Remove unnecessary thread safety check * Remove outdated comment * Skip downloading model if cache exists * Change Hugging Face URI for Parakeet models * Bump required Ruby version to 3.3 * Fix English
1 parent 9efddaf commit 0d14756

38 files changed

Lines changed: 3005 additions & 333 deletions

.github/workflows/bindings-ruby.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,6 @@ jobs:
2727
steps:
2828
- uses: ruby/setup-ruby@afeafc3d1ab54a631816aba4c914a0081c12ff2f # v1.310.0
2929
with:
30-
ruby-version: '3.2'
30+
ruby-version: '3.3'
3131
- uses: actions/checkout@df4cb1c069e1874edd31b4311f1884172cec0e10 # v6
3232
- run: rake test

bindings/ruby/README.md

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -396,6 +396,37 @@ whisper
396396
.full(Whisper::Params.new, samples)
397397
```
398398
399+
### Parakeet ###
400+
401+
whispercpp gem now supports NVIDIA's ASR model Parakeet.
402+
403+
If you want to use Parakeet instead of Whisper, the API should feel familiar.
404+
In most cases, replace `Whisper::Context` and `Whisper::Params` with `Whisper::Parakeet::Context` and `Whisper::Parakeet::Params`, then use `#transcribe`, `#full`, `#each_segment`, and `#each_token` in the same way.
405+
406+
```ruby
407+
require "whisper"
408+
409+
# It's useful to assign Whisper::Parakeet to top-level Parakeet constant unless you use Parakeet gem.
410+
Parakeet = Whisper::Parakeet
411+
412+
parakeet = Parakeet::Context.new("path/to/model")
413+
414+
params = Parakeet::Params.new(
415+
no_context: true
416+
)
417+
418+
parakeet
419+
.transcribe("path/to/audio.wav", params)
420+
.each_segment do |segment|
421+
puts "[#{segment.start_time} --> #{segment.end_time}] #{segment.text}"
422+
end
423+
```
424+
425+
The main differences are:
426+
427+
* Namespace is `Whisper::Parakeet`.
428+
* Parakeet also supports `on_new_token` / `new_token_callback` in addition to segment and progress callbacks.
429+
399430
Custom context params
400431
---------------------
401432

bindings/ruby/Rakefile

Lines changed: 16 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -84,6 +84,21 @@ else
8484
end
8585
end
8686

87+
TEST_PARAKEET_MODEL = "test/fixtures/for-tests-ggml-parakeet-tdt.bin"
88+
TEST_PARAKEET_MODEL_SRC = File.expand_path(File.join(__dir__, "..", "..", "models", "for-tests-ggml-parakeet-tdt.bin"))
89+
TEST_PARAKEET_MODEL_DIR = TEST_PARAKEET_MODEL.pathmap("%d")
90+
directory TEST_PARAKEET_MODEL_DIR
91+
if File.exist? TEST_PARAKEET_MODEL_SRC
92+
file TEST_PARAKEET_MODEL => [TEST_PARAKEET_MODEL_SRC, TEST_PARAKEET_MODEL_DIR] do |t|
93+
symlink t.source, t.name
94+
end
95+
else
96+
require "open-uri"
97+
file TEST_PARAKEET_MODEL => TEST_PARAKEET_MODEL_DIR do |t|
98+
File.write t.name, URI("https://github.com/ggml-org/whisper.cpp/raw/refs/heads/master/models/for-tests-ggml-parakeet-tdt.bin").read
99+
end
100+
end
101+
87102
TEST_MEMORY_VIEW = "test/jfk_reader/jfk_reader.#{RbConfig::CONFIG['DLEXT']}"
88103
file TEST_MEMORY_VIEW => "test/jfk_reader/jfk_reader.c" do |t|
89104
chdir "test/jfk_reader" do
@@ -93,4 +108,4 @@ file TEST_MEMORY_VIEW => "test/jfk_reader/jfk_reader.c" do |t|
93108
end
94109
CLEAN.include TEST_MEMORY_VIEW
95110

96-
task test: [LIB_FILE, TEST_MEMORY_VIEW, TEST_FIXTURE_AUDIO]
111+
task test: [LIB_FILE, TEST_MEMORY_VIEW, TEST_FIXTURE_AUDIO, TEST_PARAKEET_MODEL]

bindings/ruby/ext/ruby_whisper.c

Lines changed: 36 additions & 80 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,29 @@
11
#include "ruby_whisper.h"
22

33
VALUE mWhisper;
4+
VALUE mLogSettable;
45
VALUE mVAD;
6+
VALUE mParakeet;
57
VALUE cContext;
68
VALUE cParams;
79
VALUE cVADContext;
810
VALUE cVADParams;
911
VALUE cVADSegments;
1012
VALUE cVADSegment;
13+
VALUE cParakeetContext;
14+
VALUE cParakeetContextParams;
15+
VALUE cParakeetParams;
16+
VALUE cParakeetSegment;
17+
VALUE cParakeetModel;
1118
VALUE eError;
1219

1320
VALUE cSegment;
1421
VALUE cToken;
1522
VALUE cModel;
1623

24+
VALUE mOutputContext;
25+
VALUE mOutputSegment;
26+
1727
ID id_to_s;
1828
ID id_call;
1929
ID id___method__;
@@ -27,9 +37,11 @@ ID id_pre_converted_models;
2737
ID id_coreml_compiled_models;
2838
ID id_cache;
2939
ID id_n_processors;
30-
31-
static bool is_log_callback_finalized = false;
32-
static bool is_ruby_log_callback_present = false;
40+
ID id_extended;
41+
ID id_start_log_callback_thread;
42+
ID id_log_callback_thread;
43+
ID id_alive_p;
44+
ID id_join;
3345

3446
// High level API
3547
extern VALUE ruby_whisper_segment_allocate(VALUE klass);
@@ -45,8 +57,13 @@ extern void init_ruby_whisper_vad_params(VALUE *mVAD);
4557
extern void init_ruby_whisper_vad_context(VALUE *mVAD);
4658
extern void init_ruby_whisper_vad_segment(VALUE *mVAD);
4759
extern void init_ruby_whisper_vad_segments(VALUE *mVAD);
60+
extern void init_ruby_whisper_parakeet(VALUE *mWhisper);
4861
extern void register_callbacks(ruby_whisper_params *rwp, VALUE *context);
4962

63+
static ruby_whisper_log_queue whisper_log_queue;
64+
65+
LOG_SETTABLE_SETUP(whisper_log_queue, mWhisper, whisper_log_set)
66+
5067
/*
5168
* call-seq:
5269
* lang_max_id -> Integer
@@ -102,79 +119,6 @@ static VALUE ruby_whisper_s_system_info_str(VALUE self) {
102119
return rb_str_new2(whisper_print_system_info());
103120
}
104121

105-
static VALUE ruby_whisper_s_finalize_log_callback(VALUE self, VALUE id) {
106-
is_log_callback_finalized = true;
107-
return Qnil;
108-
}
109-
110-
typedef struct {
111-
int level;
112-
const char * buffer;
113-
} call_log_callbacks_args;
114-
115-
static void*
116-
call_log_callbacks(void *v_args) {
117-
VALUE log_callback = rb_iv_get(mWhisper, "log_callback");
118-
if (NIL_P(log_callback)) {
119-
return NULL;
120-
}
121-
122-
call_log_callbacks_args *args = (call_log_callbacks_args *)v_args;
123-
VALUE user_data = rb_iv_get(mWhisper, "user_data");
124-
rb_funcall(log_callback, id_call, 3, INT2NUM(args->level), rb_str_new2(args->buffer), user_data);
125-
126-
return NULL;
127-
}
128-
129-
static void
130-
ruby_whisper_log_callback(enum ggml_log_level level, const char * buffer, void * user_data) {
131-
if (is_log_callback_finalized) {
132-
return;
133-
}
134-
if (!is_ruby_log_callback_present) {
135-
return;
136-
}
137-
138-
call_log_callbacks_args args = {
139-
level,
140-
buffer,
141-
};
142-
if (ruby_thread_has_gvl_p()) {
143-
call_log_callbacks((void *)&args);
144-
} else {
145-
rb_thread_call_with_gvl(call_log_callbacks, (void *)&args);
146-
}
147-
}
148-
149-
/*
150-
* call-seq:
151-
* log_set ->(level, buffer, user_data) { ... }, user_data -> nil
152-
*/
153-
static VALUE ruby_whisper_s_log_set(VALUE self, VALUE log_callback, VALUE user_data) {
154-
VALUE old_callback = rb_iv_get(self, "log_callback");
155-
if (!NIL_P(old_callback)) {
156-
rb_undefine_finalizer(old_callback);
157-
}
158-
159-
rb_iv_set(self, "log_callback", log_callback);
160-
rb_iv_set(self, "user_data", user_data);
161-
162-
if (!NIL_P(log_callback)) {
163-
VALUE finalize_log_callback = rb_funcall(mWhisper, rb_intern("method"), 1, rb_str_new2("finalize_log_callback"));
164-
rb_define_finalizer(log_callback, finalize_log_callback);
165-
}
166-
167-
if (NIL_P(log_callback)) {
168-
whisper_log_set(NULL, NULL);
169-
is_ruby_log_callback_present = false;
170-
} else {
171-
whisper_log_set(ruby_whisper_log_callback, NULL);
172-
is_ruby_log_callback_present = true;
173-
}
174-
175-
return Qnil;
176-
}
177-
178122
void Init_whisper() {
179123
id_to_s = rb_intern("to_s");
180124
id_call = rb_intern("call");
@@ -189,9 +133,19 @@ void Init_whisper() {
189133
id_coreml_compiled_models = rb_intern("coreml_compiled_models");
190134
id_cache = rb_intern("cache");
191135
id_n_processors = rb_intern("n_processors");
136+
id_extended = rb_intern("extended");
137+
id_start_log_callback_thread = rb_intern("start_log_callback_thread");
138+
id_log_callback_thread = rb_intern("@log_callback_thread");
139+
id_alive_p = rb_intern("alive?");
140+
id_join = rb_intern("join");
192141

193142
mWhisper = rb_define_module("Whisper");
143+
rb_require("whisper/log_settable");
144+
mLogSettable = rb_path2class("Whisper::LogSettable");
194145
mVAD = rb_define_module_under(mWhisper, "VAD");
146+
rb_require("whisper/output");
147+
mOutputContext = rb_path2class("Whisper::Output::Context");
148+
mOutputSegment = rb_path2class("Whisper::Output::Segment");
195149

196150
rb_define_const(mWhisper, "VERSION", rb_str_new2(whisper_version()));
197151
rb_define_const(mWhisper, "LOG_LEVEL_NONE", INT2NUM(GGML_LOG_LEVEL_NONE));
@@ -222,8 +176,8 @@ void Init_whisper() {
222176
rb_define_singleton_method(mWhisper, "lang_str", ruby_whisper_s_lang_str, 1);
223177
rb_define_singleton_method(mWhisper, "lang_str_full", ruby_whisper_s_lang_str_full, 1);
224178
rb_define_singleton_method(mWhisper, "system_info_str", ruby_whisper_s_system_info_str, 0);
225-
rb_define_singleton_method(mWhisper, "log_set", ruby_whisper_s_log_set, 2);
226-
rb_define_private_method(rb_singleton_class(mWhisper), "finalize_log_callback", ruby_whisper_s_finalize_log_callback, 1);
179+
180+
LOG_SETTABLE_INIT(whisper_log_queue, mWhisper)
227181

228182
cContext = init_ruby_whisper_context(&mWhisper);
229183
init_ruby_whisper_context_params(&cContext);
@@ -236,8 +190,10 @@ void Init_whisper() {
236190
init_ruby_whisper_vad_segment(&mVAD);
237191
init_ruby_whisper_vad_segments(&mVAD);
238192
init_ruby_whisper_vad_context(&mVAD);
193+
init_ruby_whisper_parakeet(&mWhisper);
239194

240-
rb_require("whisper/context");
241-
rb_require("whisper/segment");
242195
rb_require("whisper/model/uri");
196+
197+
rb_include_module(cContext, mOutputContext);
198+
rb_include_module(cSegment, mOutputSegment);
243199
}

0 commit comments

Comments
 (0)