Skip to content

Commit ea120a1

Browse files
committed
updated the mp3 to srt script
1 parent 0ad43e0 commit ea120a1

4 files changed

Lines changed: 32 additions & 43 deletions

File tree

tts_n_stt/README.md

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -30,17 +30,19 @@ Run it with below command
3030

3131
uv run stt_app.py
3232

33-
text to
34-
3533
## text to speech with simpler pydub
3634

3735
- text_to_mp3.py gives the speech file
3836

39-
- Use the above to get synthetic speech, which is then used for speech to text in json format using stt_app.py
37+
- Use the above to get synthetic speech, which is then used for speech to text in json format using stt_app.py(use mp3_to_srt.py directly)
4038

4139
- subtitles.srt is recieved in JSon Format. renamed it
4240

43-
## Use the JSON to SRT converter:
41+
## Use the mp3 to SRT converter using mp3_to_srt.py:
42+
43+
Ideally you will have your mp3 file. You will want to transcribe that.
44+
45+
- uv run mp3_to_srt.py your_voice.mp3 description_subs.srt
4446

45-
-
47+
The above command will do the needful.
4648

tts_n_stt/apple_description.mp3

-31.3 KB
Binary file not shown.
Lines changed: 25 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -2,12 +2,32 @@
22
# requires-python = ">=3.11"
33
# dependencies = [
44
# "pydub",
5+
# "faster-whisper",
56
# ]
67
# ///
8+
79
import json
810
import sys
911
from pathlib import Path
1012
from pydub import AudioSegment
13+
from faster_whisper import WhisperModel
14+
import os
15+
16+
# Load Whisper model (you can choose size: tiny, base, small, medium, large)
17+
model = WhisperModel("base", compute_type="auto")
18+
19+
def transcribe(filepath):
20+
segments, _ = model.transcribe(filepath)
21+
22+
result = []
23+
for segment in segments:
24+
result.append({
25+
"start": segment.start,
26+
"end": segment.end,
27+
"text": segment.text
28+
})
29+
30+
return result
1131

1232
def format_timestamp(seconds: float) -> str:
1333
"""Convert seconds to SRT timestamp format: HH:MM:SS,mmm"""
@@ -43,17 +63,13 @@ def generate_srt(transcript, audio_file, output_file):
4363
print(f"SRT file created: {output_file}")
4464

4565
if __name__ == "__main__":
46-
if len(sys.argv) != 4:
47-
print("Usage: uv run generate_srt.py transcript.json input.mp3 output.srt")
66+
if len(sys.argv) != 3:
67+
print("Usage: uv run json_to_srt.py input.mp3 output.srt")
4868
sys.exit(1)
4969

50-
transcript_file = sys.argv[1]
51-
audio_file = sys.argv[2]
52-
output_file = sys.argv[3]
53-
54-
with open(transcript_file, "r", encoding="utf-8") as f:
55-
data = json.load(f)
56-
transcript = data["transcription"]
70+
audio_file = sys.argv[1]
71+
output_file = sys.argv[2]
5772

73+
transcript = transcribe(audio_file)
5874

5975
generate_srt(transcript, audio_file, output_file)

tts_n_stt/subtitles.json

Lines changed: 0 additions & 29 deletions
This file was deleted.

0 commit comments

Comments
 (0)