Skip to content

Commit 9c3a6d8

Browse files
committed
docs: professional README
1 parent 5706aa5 commit 9c3a6d8

1 file changed

Lines changed: 127 additions & 169 deletions

File tree

README.md

Lines changed: 127 additions & 169 deletions
Original file line numberDiff line numberDiff line change
@@ -13,193 +13,149 @@ short_description: Speech & Meeting Intelligence — English · Hindi · Japanes
1313

1414
<div align="center">
1515

16-
# 🎙️ TranscriptAI
16+
# TranscriptAI
1717

18-
### Speech & Meeting Intelligence Platform
18+
**Meeting intelligence that understands not just what was said — but what was meant.**
1919

20-
<p>
21-
<a href="https://huggingface.co/spaces/KunalTheBeast/TranscriptAI">
22-
<img src="https://img.shields.io/badge/🚀%20Live%20Demo-HuggingFace-D96080?style=for-the-badge" alt="Live Demo"/>
23-
</a>
24-
&nbsp;
25-
<a href="https://github.com/aiKunalBisht/Transcript-ai">
26-
<img src="https://img.shields.io/badge/GitHub-Source%20Code-3C2416?style=for-the-badge&logo=github" alt="GitHub"/>
27-
</a>
28-
</p>
20+
[![Live Demo](https://img.shields.io/badge/🤗%20Live%20Demo-Hugging%20Face-FF9D00?style=for-the-badge)](https://huggingface.co/spaces/KunalTheBeast/TranscriptAI)
21+
[![GitHub](https://img.shields.io/badge/GitHub-Source-181717?style=for-the-badge&logo=github)](https://github.com/aiKunalBisht/Transcript-ai)
22+
[![Eval Score](https://img.shields.io/badge/Eval%20Score-93%25-brightgreen?style=for-the-badge)]()
23+
[![License MIT](https://img.shields.io/badge/License-MIT-blue?style=for-the-badge)]()
2924

30-
<p>
31-
<img src="https://img.shields.io/badge/Python-3.10%2B-C45C74?style=flat-square&logo=python&logoColor=white"/>
32-
<img src="https://img.shields.io/badge/Streamlit-UI-D96080?style=flat-square&logo=streamlit&logoColor=white"/>
33-
<img src="https://img.shields.io/badge/FastAPI-REST%20API-486858?style=flat-square&logo=fastapi&logoColor=white"/>
34-
<img src="https://img.shields.io/badge/Groq-Free%20Tier-B87830?style=flat-square"/>
35-
<img src="https://img.shields.io/badge/Eval%20Score-93%25%20EXCELLENT-486858?style=flat-square"/>
36-
<img src="https://img.shields.io/badge/License-MIT-A8897C?style=flat-square"/>
37-
</p>
38-
39-
<br/>
40-
41-
**Turn any meeting or speech into structured intelligence.**
42-
Summaries · Action Items · Tone Analysis · Communication Risk Signals
43-
44-
*Works in English, Hindi, and Japanese — output always in English.*
25+
Trilingual · English · Hindi · Japanese
4526

4627
</div>
4728

4829
---
4930

50-
## What It Does
31+
## The Problem
5132

52-
Paste or upload any transcript and get structured intelligence in seconds.
33+
Most meeting tools extract *what* was said. They miss everything underneath.
5334

54-
```
55-
Input: "Rahul: Vikram, client report Monday tak ready honi chahiye."
56-
"Vikram: Dekhte hain. Thoda mushkil hai."
57-
58-
Output: ✅ Action Item → Prepare client report | Owner: Vikram | Deadline: Monday
59-
🔴 Hindi Signal → "dekhte hain" — Classic Indian soft no (80% confidence)
60-
🔴 Hindi Signal → "thoda mushkil hai" — Indirect refusal (85% confidence)
61-
⚠️ Risk Level → HIGH — Commitment unlikely to be followed through
62-
🟣 Tone → Hesitant / Uncertain (Intensity: 2/5)
63-
```
35+
Every language and culture has indirect communication patterns — polite rejections, soft commitments, face-saving agreements — that a generic summarizer will log as action items that never get done.
36+
37+
TranscriptAI is built to catch exactly those signals.
6438

6539
---
6640

67-
## The Core Idea
41+
## Live Demo
42+
43+
**[→ Try it on Hugging Face](https://huggingface.co/spaces/KunalTheBeast/TranscriptAI)**
6844

69-
Most meeting tools extract *what* was said. TranscriptAI extracts *how* people communicate.
45+
No setup. No API key. Paste any transcript and get structured intelligence in seconds.
7046

71-
Each language has its own indirect communication patterns. A generic summarizer misses all of them:
47+
**Example — what a generic tool misses:**
7248

73-
| What was said | Generic AI | TranscriptAI |
49+
| What was said | Generic AI output | TranscriptAI output |
7450
|---|---|---|
75-
| *"Dekhte hain"* | "We will see" — neutral | 🔴 Classic Indian soft no — unlikely to happen |
76-
| *"検討いたします"* | "We will consider it" — action item | ⚠️ 72% rejection confidence — follow up in writing |
77-
| *"We'll circle back"* | Meeting note | 🌀 Corporate hedging — no concrete next step |
78-
| *"Haan haan bilkul"* | "Yes absolutely" — agreement | 🟠 Hierarchical yes — agreeing to please, may not follow through |
51+
| Indirect verbal agreement | ✅ Action item logged | ⚠️ Soft commitment — low follow-through probability |
52+
| Japanese polite consideration phrase | ✅ Action item logged | 🔴 72% rejection confidence — request written confirmation |
53+
| Corporate hedge — "we'll circle back" | 📝 Meeting note | 🌀 No concrete next step — escalation recommended |
54+
| Enthusiastic but hierarchical yes | ✅ Agreement confirmed | 🟠 Agreeing to please, not necessarily to act |
7955

8056
---
8157

82-
## Language Intelligence Layers
58+
## Output
8359

84-
Three separate NLP engines, auto-detected from the transcript:
60+
For every transcript, TranscriptAI produces:
8561

86-
### 🇮🇳 Hindi / Hinglish
87-
- Indirect no — `dekhte hain`, `thoda mushkil hai`, `koshish karenge`
88-
- Hierarchical yes — `haan haan bilkul`, `jo aap kahenge`
89-
- Face-saving exits — `upar se baat karta hoon`
90-
- Jugaad framing — `kuch na kuch ho jayega`
91-
- Respect deflection — `aap jo theek samjhe`
92-
- Detects both **Roman script and Devanagari**
62+
- **Summary** — concise narrative paragraph plus key bullet points scaled to meeting length
63+
- **Action items** — extracted with owner, deadline, and commitment strength rating
64+
- **Communication risk signals** — indirect rejections, hedging language, power imbalance markers
65+
- **Speaker tone profile** — 6-level colour-coded scale with intensity score per speaker
66+
- **Meeting health score** — 0 to 100 composite across sentiment, action clarity, risk, and AI confidence
67+
- **Session trends** — risk drift, hallucination rate, and workload patterns across meetings
9368

94-
### 🇬🇧 English
95-
- Commitment strength meter — "I will" vs "I'll try" vs "we'll see"
96-
- Escalation signals — "going to have to escalate", "reconsider the contract"
97-
- Power imbalance — "this is unacceptable", "you need to understand"
98-
- Corporate hedging — "circle back", "take under advisement", "touch base"
99-
- Passive aggression — "fine", "whatever works for you"
100-
- 40+ patterns across 4 categories
69+
---
10170

102-
### 🇯🇵 Japanese
103-
- 16 nemawashi soft rejection patterns with confidence scores
104-
- Keigo formality detection via MeCab morphological analysis
105-
- Deterministic JA↔EN code-switch counting
106-
- Cross-script speaker normalization — 田中 and Tanaka are the same speaker
71+
## Language Engines
10772

108-
---
73+
Three independent NLP modules, auto-detected from transcript content.
10974

110-
## Features
75+
### English
76+
Commitment strength grading distinguishes "I will deliver" from "I will try" from "we will see." Detects escalation signals, power imbalance language, passive aggression, and corporate hedging. Over 40 patterns across 4 categories.
11177

112-
**Summary Tab**
113-
- Full narrative paragraph — what was discussed, decided, and the outcome
114-
- 3–8 key bullet points scaled to transcript length
115-
- Previous session panel — meeting continuity tracking
78+
### Hindi
79+
Identifies indirect refusals, hierarchical agreement (saying yes to please rather than commit), face-saving exits, and vague reassurances. Handles both Roman script and Devanagari. Over 30 patterns.
11680

117-
**Meeting Health Score** — 0–100 from 4 signals
81+
### Japanese
82+
16 nemawashi soft-rejection patterns with per-pattern confidence scores. Keigo formality detection via MeCab morphological analysis. Cross-script speaker normalization — the same person written in kanji and in romanization resolves to a single speaker identity.
83+
84+
---
85+
86+
## Architecture
11887

11988
```
120-
Sentiment (30) + Action Clarity (25) + Communication Risk (25) + AI Confidence (20)
121-
```
89+
transcription/
90+
pii_masker.py Local anonymization — runs before any LLM call
91+
speaker_normalizer.py Cross-script speaker identity resolution
92+
audio_processor.py Whisper transcription pipeline
12293
123-
| Score | Label |
124-
|---|---|
125-
| 80–100 | 🟢 Productive Meeting |
126-
| 60–79 | 🟡 Mostly Aligned |
127-
| 40–59 | 🟠 Needs Follow-up |
128-
| 0–39 | 🔴 High Risk |
94+
analysis/
95+
analyzer.py LLM orchestration — Groq → Ollama → Mock fallback
96+
english_analyzer.py English NLP engine
97+
hindi_analyzer.py Hindi NLP engine
98+
soft_rejection.py Japanese nemawashi detector
99+
hallucination_guard.py Rule-based output verification
100+
japanese_tokenizer.py MeCab morphological analysis
129101
130-
**Speaker Tone Intelligence** — 6-level color-coded scale with intensity bars 1–5
102+
utils/
103+
evaluator.py ROUGE-L + F1 + semantic similarity scoring
104+
cache.py MD5 result caching — 24h TTL
105+
logger.py JSONL observability and trend analysis
131106
132-
```
133-
🔴 Aggressive → 🟠 Assertive → 🟡 Neutral → 🟢 Cooperative → 🔵 Deferential → 🟣 Hesitant
107+
app.py Streamlit UI — 7 tabs, health score, trend dashboard
108+
api.py FastAPI REST endpoints
134109
```
135110

136-
**Production Features**
137-
- APPI-compliant PII masking — names, phones, emails anonymized before LLM; restored after
138-
- Hallucination guard — 100% rule-based token overlap, LLM never validates itself
139-
- Groq → Ollama → Mock fallback with explicit UX feedback per provider
140-
- Meeting trends dashboard — soft rejection trends, hallucination drift, workload
141-
- FastAPI REST endpoint for CRM integration
142-
- MD5 result caching (24h TTL) + JSONL observability logging
111+
**Processing pipeline — order is strict:**
112+
113+
```
114+
1. PII Mask local, before LLM (privacy compliance)
115+
2. LLM Analysis Groq / Ollama / Mock
116+
3. PII Restore local, before normalization
117+
4. Normalize cross-script speaker deduplication
118+
5. Tone Classify per-speaker 6-level scoring
119+
6. NLP Layer language-specific signal detection
120+
7. Cache + Log MD5 cache write, JSONL append
121+
```
143122

144123
---
145124

146-
## Evaluation — 93% Overall Score
125+
## Evaluation
147126

148-
Custom evaluation system with **cultural corrections** — standard NLP metrics have Western bias.
149-
Japanese professional neutral speech is NOT incorrect. Soft sentiment scoring applied.
127+
Standard NLP metrics carry Western assumptions. Formal neutral speech in Japanese or indirect communication in South Asian business contexts scores poorly on metrics calibrated for direct English. This project uses a custom evaluation framework with cultural corrections applied at each version iteration.
150128

151-
| Version | Score | Key change |
152-
|---|---|---|
153-
| v1 | 30% | Baseline — exact matching only |
154-
| v2 | 55% | Fuzzy names, rule-based code-switch, semantic similarity |
155-
| v3 | 75% | Cultural ground truth, JA tokenization, soft sentiment |
156-
| v4 | 83% | Hallucination guard, nemawashi filter, speaker sort |
157-
| **v5** | **93% EXCELLENT** | Sentiment rules, tone intelligence, optimal bullet matching |
158-
159-
| Metric | Score |
160-
|---|---|
161-
| Action Items F1 | 1.0 — EXCELLENT |
162-
| Sentiment (soft/cultural) | 1.0 — EXCELLENT |
163-
| Hallucination Risk | LOW |
164-
| **Overall** | **93% — EXCELLENT** |
129+
| Version | Score | Primary Change |
130+
|---------|-------|----------------|
131+
| v1 | 30% | Baseline — exact string matching |
132+
| v2 | 55% | Fuzzy matching, semantic similarity |
133+
| v3 | 75% | Cultural ground truth, Japanese tokenization |
134+
| v4 | 83% | Hallucination guard, soft rejection filter |
135+
| v5 | **93%** | Tone intelligence, optimal bullet assignment |
136+
137+
| Metric | Result |
138+
|--------|--------|
139+
| Action Item F1 | 1.0 — Excellent |
140+
| Sentiment (cultural) | 1.0 — Excellent |
141+
| Hallucination Risk | Low |
142+
| Overall | **93%** |
165143

166144
---
167145

168-
## Architecture
146+
## Production Features
169147

170-
```
171-
transcription/
172-
pii_masker.py APPI anonymization — before LLM (v3: handles all bracket variants)
173-
speaker_normalizer.py Cross-script identity resolution
174-
audio_processor.py Whisper transcription
148+
**Privacy**
149+
PII anonymization runs locally before any transcript reaches an LLM. Names, phone numbers, and email addresses are masked on input and restored on output. No personal data is transmitted.
175150

176-
analysis/
177-
analyzer.py Groq → Ollama → Mock · trilingual detection · tone schema
178-
english_analyzer.py English NLP — 40+ patterns (hedging, power, escalation)
179-
hindi_analyzer.py Hindi NLP — 30+ patterns (Roman + Devanagari)
180-
soft_rejection.py Japanese 16-pattern nemawashi detector
181-
hallucination_guard.py 100% rule-based claim verification
182-
japanese_tokenizer.py MeCab morphological analysis
151+
**Reliability**
152+
Three-tier LLM fallback — Groq (1–2s, free tier) → Ollama (local, zero cost) → Mock (always available). MD5 result caching with 24-hour TTL means repeat queries return in under one second.
183153

184-
utils/
185-
evaluator.py ROUGE + semantic + F1 + optimal assignment matching
186-
logger.py JSONL logging + trend analysis engine
187-
cache.py MD5 result caching
154+
**Observability**
155+
Every analysis is written to a local JSONL log. A built-in trends dashboard tracks soft rejection rates, hallucination drift, and workload distribution across sessions.
188156

189-
app.py Streamlit UI — translucent navbar, 7 tabs, health score
190-
api.py FastAPI REST endpoints
191-
```
192-
193-
**Processing order (sequence is critical):**
194-
```
195-
1. PII mask — local, before LLM (APPI compliance)
196-
2. LLM analysis — Groq / Ollama / Mock
197-
3. PII restore — local, before normalization (so normalizer sees real names)
198-
4. Normalize — cross-script speaker dedup
199-
5. Tone classify — 6-level scale per speaker
200-
6. NLP layer — language-specific routing
201-
7. Cache + log — local JSONL
202-
```
157+
**Integration**
158+
FastAPI REST endpoint at `/analyze` for direct integration with CRM systems, Slack bots, or downstream pipelines.
203159

204160
---
205161

@@ -209,21 +165,25 @@ api.py FastAPI REST endpoints
209165
git clone https://github.com/aiKunalBisht/Transcript-ai.git
210166
cd Transcript-ai
211167
pip install -r requirements.txt
168+
```
212169

213-
# Recommended — Groq (1-2 second analysis, free tier)
214-
export GROQ_API_KEY=your_key_here # free at console.groq.com
170+
**Cloud — Groq (recommended, free tier)**
171+
```bash
172+
export GROQ_API_KEY=your_key_here # console.groq.com
215173
python -m streamlit run app.py
174+
```
216175

217-
# Fully local — zero data leaves your machine
176+
**Local — fully offline, zero data leaves your machine**
177+
```bash
218178
ollama pull qwen3:8b
219179
python -m streamlit run app.py
220180
```
221181

222-
Optional:
182+
**Optional dependencies**
223183
```bash
224-
pip install fugashi unidic-lite # MeCab Japanese tokenizer
225-
pip install scikit-learn # TF-IDF semantic similarity
226-
pip install sentence-transformers # Neural semantic understanding (~500MB)
184+
pip install fugashi unidic-lite # MeCab Japanese tokenizer
185+
pip install scikit-learn # TF-IDF semantic similarity
186+
pip install sentence-transformers # Neural semantic scoring
227187
```
228188

229189
---
@@ -232,50 +192,48 @@ pip install sentence-transformers # Neural semantic understanding (~500MB)
232192

233193
```bash
234194
python api.py
235-
# Interactive docs: http://localhost:8000/docs
195+
# Interactive docs at http://localhost:8000/docs
236196
```
237197

238198
```python
239199
import requests
240200

241-
r = requests.post("http://localhost:8000/analyze", json={
242-
"transcript": "Rahul: Friday tak deliver ho sakta hai? Priya: Dekhte hain.",
243-
"language": "hi",
201+
response = requests.post("http://localhost:8000/analyze", json={
202+
"transcript": "Alex: Can we get this delivered by Friday?\nJordan: We will see what we can do.",
203+
"language": "en",
244204
"mask_pii": True
245205
})
246-
print(r.json()["result"]["soft_rejections"]["risk_level"]) # HIGH
247-
print(r.json()["result"]["soft_rejections"]["risk_summary"]) # Commitment unlikely...
206+
207+
result = response.json()["result"]
208+
print(result["soft_rejections"]["risk_level"]) # HIGH
209+
print(result["soft_rejections"]["risk_summary"]) # Commitment unlikely to be followed through
248210
```
249211

250212
---
251213

252214
## Known Limitations
253215

254-
| Limitation | Path Forward |
255-
|---|---|
256-
| Speaker diarization ~70% accuracy | pyannote.audio |
257-
| Audio unavailable on HF Spaces | Groq Whisper API (next) |
258-
| 3 synthetic test cases | External validation on real transcripts |
259-
| Confidence scores are heuristic | Labeled dataset + calibration |
260-
| No feedback loop | User correction collection + fine-tuning |
216+
| Limitation | Planned Improvement |
217+
|------------|---------------------|
218+
| Speaker diarization ~70% accuracy | pyannote.audio integration |
219+
| Audio upload unavailable on HF Spaces | Groq Whisper API — next release |
220+
| Confidence scores are heuristic | Labeled dataset and calibration |
221+
| Demo uses synthetic test cases | Real-world transcript validation ongoing |
261222

262223
---
263224

264-
## Numbers
225+
## Project Scale
265226

266-
```
267-
19 Python files · 6,000+ lines · 90+ functions
268-
40+ English patterns · 30+ Hindi patterns · 16 Japanese soft rejection patterns
269-
500+ Japanese surnames · Eval score: 93% EXCELLENT
270-
Formats: TXT · VTT · JSON · MP4 · MP3 · WAV · M4A
271-
```
227+
19 Python files · 6,000+ lines · 90+ functions
228+
86 linguistic patterns across 3 languages · 500+ Japanese surname entries
229+
Supported formats: TXT · VTT · JSON · MP4 · MP3 · WAV · M4A
272230

273231
---
274232

275233
<div align="center">
276234

277-
Built by **[Kunal Bisht](https://github.com/aiKunalBisht)** · Pithoragarh, Uttarakhand, India
235+
Built by [Kunal Bisht](https://github.com/aiKunalBisht) Pithoragarh, India
278236

279-
[LinkedIn](https://linkedin.com/in/kunalhere) &nbsp;·&nbsp; [Hugging Face](https://huggingface.co/KunalTheBeast)
237+
[Hugging Face](https://huggingface.co/KunalTheBeast) · [LinkedIn](https://linkedin.com/in/kunalhere) · [GitHub](https://github.com/aiKunalBisht)
280238

281239
</div>

0 commit comments

Comments
 (0)