Skip to content

Commit 0b6ddb1

Browse files
committed
docs: full README with proper HF structure
1 parent 7736b5f commit 0b6ddb1

1 file changed

Lines changed: 79 additions & 2 deletions

File tree

README.md

Lines changed: 79 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,5 +10,82 @@ app_file: app.py
1010
pinned: false
1111
---
1212

13-
# TranscriptAI
14-
Japanese Business Intelligence Platform
13+
# TranscriptAI 🧠
14+
**Japanese Business Intelligence Platform — Speech & Meeting Analyzer**
15+
16+
[![Live Demo](https://img.shields.io/badge/Live%20Demo-HuggingFace-D96080?style=flat-square)](https://huggingface.co/spaces/KunalTheBeast/TranscriptAI)
17+
[![GitHub](https://img.shields.io/badge/GitHub-Repo-3C2416?style=flat-square&logo=github)](https://github.com/aiKunalBisht/Transcript-ai)
18+
[![Python](https://img.shields.io/badge/Python-3.10%2B-blue?style=flat-square)](https://python.org)
19+
[![License](https://img.shields.io/badge/License-MIT-green?style=flat-square)](LICENSE)
20+
21+
> Turns any meeting transcript into structured business intelligence in ~3 seconds.
22+
> Works with Otter.ai, Zoom, Google Meet, Whisperflow exports out of the box.
23+
24+
---
25+
26+
## How we got here — v1 to v5
27+
28+
| Version | What changed | Accuracy |
29+
|---|---|---|
30+
| v1 | Hard exact matching, no cultural awareness | 22–30% |
31+
| v2 | Fuzzy names, rule-based code-switch, TF-IDF similarity | ~45% |
32+
| v3 | MeCab keigo detection, bilingual ground truth, soft sentiment | ~60% |
33+
| v4 | Hallucination guard, 16 nemawashi patterns, APPI PII masking | 75–85% |
34+
| **v5 (live)** | 2-key Groq rotation, vector cache, health score, trends | **85–95%** |
35+
36+
---
37+
38+
## Why Japanese business is different
39+
40+
| What was said | Generic AI | TranscriptAI |
41+
|---|---|---|
42+
| 検討いたします | "We will consider it" — action item | ⚠ Likely soft rejection (72%) |
43+
| 難しいかもしれません | "May be difficult" — neutral | 🚨 High rejection signal (90%) |
44+
| 承知いたしました | Acknowledged | 🏯 High keigo — senior speaker |
45+
46+
---
47+
48+
## Features
49+
50+
- **Trilingual** — Japanese / Hindi (Hinglish) / English / Mixed
51+
- **Keigo detection** via MeCab morphological analysis
52+
- **16 nemawashi soft rejection patterns** with confidence scores
53+
- **8 Hindi indirect patterns** (देखते हैं, थोड़ा मुश्किल है)
54+
- **APPI-compliant PII masking** before any LLM call
55+
- **Hallucination guard** — 100% rule-based, LLM never validates itself
56+
- **2-key Groq rotation** — auto-failover on 429 rate limit
57+
- **Vector cache** — instant return for similar transcripts
58+
- **Meeting health score** (0–100) with breakdown
59+
- **Trends dashboard** — soft rejection drift, provider usage over time
60+
- **Groq Whisper** — MP4/MP3/WAV transcription (same free key)
61+
- **FastAPI REST** endpoint for CRM integration
62+
63+
---
64+
65+
## Quick start
66+
67+
```bash
68+
git clone https://github.com/aiKunalBisht/Transcript-ai.git
69+
cd Transcript-ai
70+
pip install -r requirements.txt
71+
export GROQ_API_KEY=your_key_here # free at console.groq.com
72+
python -m streamlit run app.py
73+
```
74+
75+
**HuggingFace Spaces:** Add `GROQ_API_KEY` in Space → Settings → Repository secrets.
76+
77+
---
78+
79+
## Evaluation
80+
81+
| Test case | v1 baseline | v5 live |
82+
|---|---|---|
83+
| Sales call · JA/EN mixed | 30.8% | **95.2%** |
84+
| Internal meeting · Japanese | 22.2% | **81.6%** |
85+
| Client conflict · EN/JA | 55.9% | **85.8%** |
86+
87+
---
88+
89+
## Built by
90+
91+
[Kunal Bisht](https://github.com/aiKunalBisht) · Benglore, Karnataka, India

0 commit comments

Comments
 (0)