hi, i would like to flag #2036 as not being fixed, see here the following reproduction
when running
build/bin/whisper-cli -m models/ggml-large-v3-turbo.bin --output-file domecium --output-json-full --language en -f domecium.mp3 --processors 2
on this file from this short poetry collection
domecium.mp3
it mentions that it has been split around 1:35
whisper_full_parallel: the audio has been split into 2 chunks at the following times:
whisper_full_parallel: split 1 - 00:01:35.460
whisper_full_parallel: the transcription quality may be degraded near these boundaries
When we look at domecium.json, we see the token level timestamps reset around that boundary
{
"text": "[_TT_1266]",
"timestamps": {
"from": "00:01:35,460",
"to": "00:01:35,460"
},
"offsets": {
"from": 95460,
"to": 95460
},
"id": 51631,
"p": 0.95917,
"t_dtw": -1
}
]
},
{
"timestamps": {
"from": "00:01:35,460",
"to": "00:01:47,640"
},
"offsets": {
"from": 95460,
"to": 107640
},
"text": " in days bygone long gone my father's mother who is now blest with the blest would take me out to walk",
"tokens": [
{
"text": "[_BEG_]",
"timestamps": {
"from": "00:00:00,000",
"to": "00:00:00,000"
},
"offsets": {
"from": 0,
"to": 0
},
"id": 50365,
"p": 0.997823,
"t_dtw": -1
},
{
"text": " in",
"timestamps": {
"from": "00:00:00,000",
"to": "00:00:00,290"
},
"offsets": {
"from": 0,
"to": 290
},
"id": 294,
"p": 0.75974,
"t_dtw": -1
},
So the token level timestamps still reset. The segment level timestamps are still correct however.
hi, i would like to flag #2036 as not being fixed, see here the following reproduction
when running
on this file from this short poetry collection
domecium.mp3
it mentions that it has been split around 1:35
When we look at domecium.json, we see the token level timestamps reset around that boundary
{ "text": "[_TT_1266]", "timestamps": { "from": "00:01:35,460", "to": "00:01:35,460" }, "offsets": { "from": 95460, "to": 95460 }, "id": 51631, "p": 0.95917, "t_dtw": -1 } ] }, { "timestamps": { "from": "00:01:35,460", "to": "00:01:47,640" }, "offsets": { "from": 95460, "to": 107640 }, "text": " in days bygone long gone my father's mother who is now blest with the blest would take me out to walk", "tokens": [ { "text": "[_BEG_]", "timestamps": { "from": "00:00:00,000", "to": "00:00:00,000" }, "offsets": { "from": 0, "to": 0 }, "id": 50365, "p": 0.997823, "t_dtw": -1 }, { "text": " in", "timestamps": { "from": "00:00:00,000", "to": "00:00:00,290" }, "offsets": { "from": 0, "to": 290 }, "id": 294, "p": 0.75974, "t_dtw": -1 },So the token level timestamps still reset. The segment level timestamps are still correct however.