CUDA not working

I've set up cuda on my GPU and, while the terminal output has changed, transcribing audio doesn't seem to be any faster. There's also no spike in GPU usage. However no errors are thrown so I'm not really sure what I'm doing wrong.

```
Stderr: error: input file not found 'true'
whisper_init_from_file_with_params_no_state: loading model from './models/ggml-base.en.bin'
whisper_init_with_params_no_state: use gpu    = 1
whisper_init_with_params_no_state: flash attn = 0
whisper_init_with_params_no_state: gpu_device = 0
whisper_init_with_params_no_state: dtw        = 0
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce GTX 1660 SUPER, compute capability 7.5, VMM: yes
whisper_init_with_params_no_state: devices    = 2
whisper_init_with_params_no_state: backends   = 2
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51864
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 512
whisper_model_load: n_audio_head  = 8
whisper_model_load: n_audio_layer = 6
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 512
whisper_model_load: n_text_head   = 8
whisper_model_load: n_text_layer  = 6
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 2 (base)
whisper_model_load: adding 1607 extra tokens
whisper_model_load: n_langs       = 99
whisper_model_load:        CUDA0 total size =   147.37 MB
whisper_model_load: model size    =  147.37 MB
whisper_backend_init_gpu: using CUDA0 backend
whisper_init_state: kv self size  =    6.29 MB
whisper_init_state: kv cross size =   18.87 MB
whisper_init_state: kv pad  size  =    3.15 MB
whisper_init_state: compute buffer (conv)   =   17.22 MB
whisper_init_state: compute buffer (encode) =   85.86 MB
whisper_init_state: compute buffer (cross)  =    4.65 MB
whisper_init_state: compute buffer (decode) =   97.27 MB
main: WARNING: model is not multilingual, ignoring language and translation options

system_info: n_threads = 4 / 12 | WHISPER : COREML = 0 | OPENVINO = 0 | CUDA : ARCHS = 500,610,700,750,800,860,890 | PEER_MAX_BATCH_SIZE = 128 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | OPENMP = 1 | AARCH64_REPACK = 1 | 

main: processing '/home/99slayer/repos/audio-transcriber/data-files/audio/audio.wav' (3762190 samples, 235.1 sec), 4 threads, 1 processors, 5 beams + best of 5, lang = en, task = transcribe, timestamps = 1 ...

output_json: saving output to '/home/99slayer/repos/audio-transcriber/data-files/audio/audio.wav.json'

whisper_print_timings:     load time =   410.33 ms
whisper_print_timings:     fallbacks =   0 p /   0 h
whisper_print_timings:      mel time =   183.40 ms
whisper_print_timings:   sample time =  3322.75 ms /  3817 runs (     0.87 ms per run)
whisper_print_timings:   encode time =  2693.98 ms /     9 runs (   299.33 ms per run)
whisper_print_timings:   decode time =   148.78 ms /    23 runs (     6.47 ms per run)
whisper_print_timings:   batchd time =  6594.69 ms /  3751 runs (     1.76 ms per run)
whisper_print_timings:   prompt time =   712.99 ms /  1496 runs (     0.48 ms per run)
whisper_print_timings:    total time = 14391.32 ms
```

Also, after building whisper.cpp with cuda, regardless of whether the `withCuda` option is set to true or set to false it always tries to find a cuda device. The output above had `withCuda` set to true, but when it's set to false the output is the exact same.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA not working #203

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

CUDA not working #203

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions