Skip to content

Commit 214a497

Browse files
committed
tts and stt servers with readme updated
1 parent 09fac92 commit 214a497

4 files changed

Lines changed: 82 additions & 0 deletions

File tree

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@ media/
22
slides/
33
# Model stuff
44
*.onnx
5+
*.bin
56
# Byte-compiled / optimized / DLL files
67
__pycache__/
78
*.py[cod]

tts_n_stt/README.md

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
The folder contains the python servers for both
2+
Text to Speech and Speech to Text conversion. The
3+
scripts use uv package manager for dependencies.
4+
You can refer to this youtube video for more
5+
details: https://youtu.be/LZXps8KE4XM
6+
7+
Text to Speech with Kokoro Model:
8+
9+
The kokorotts models has to be downloaded for the
10+
app.py to work.
11+
12+
wget
13+
https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files-v1.0/kokoro-v1.0.onnx
14+
15+
wget
16+
https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files-v1.0/voices-v1.0.bin
17+
18+
The app.py file is the Text to Speech Gradio
19+
server. Run it with below command
20+
21+
uv run app.py
22+
23+
The onnx and bin files are not commited to the
24+
repo. So you have to download it.
25+
26+
Speech to Text with Whisper Model:
27+
28+
The stt_app.py is the Speech to Text Flask Server.
29+
Run it with below command
30+
31+
uv run stt_app.py
File renamed without changes.

tts_n_stt/stt_app.py

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
# /// script
2+
# requires-python = ">=3.11"
3+
# dependencies = [
4+
# "faster-whisper",
5+
# "flask",
6+
# ]
7+
# ///
8+
from flask import Flask, request, jsonify
9+
from faster_whisper import WhisperModel
10+
import os
11+
12+
app = Flask(__name__)
13+
14+
# Load Whisper model (you can choose size: tiny, base, small, medium, large)
15+
model = WhisperModel("base", compute_type="auto")
16+
17+
@app.route("/")
18+
def index():
19+
return "Whisper Transcription API is running."
20+
21+
@app.route("/transcribe", methods=["POST"])
22+
def transcribe():
23+
if 'file' not in request.files:
24+
return jsonify({"error": "No file uploaded"}), 400
25+
26+
file = request.files['file']
27+
if file.filename == '':
28+
return jsonify({"error": "Empty filename"}), 400
29+
30+
# Save file temporarily
31+
filepath = os.path.join("/tmp", file.filename)
32+
file.save(filepath)
33+
34+
segments, _ = model.transcribe(filepath)
35+
36+
result = []
37+
for segment in segments:
38+
result.append({
39+
"start": segment.start,
40+
"end": segment.end,
41+
"text": segment.text
42+
})
43+
44+
os.remove(filepath) # clean up
45+
46+
return jsonify({"transcription": result})
47+
48+
if __name__ == "__main__":
49+
app.run(debug=True, port=8000)
50+

0 commit comments

Comments
 (0)