@@ -10,5 +10,82 @@ app_file: app.py
1010pinned : false
1111---
1212
13- # TranscriptAI
14- Japanese Business Intelligence Platform
13+ # TranscriptAI 🧠
14+ ** Japanese Business Intelligence Platform — Speech & Meeting Analyzer**
15+
16+ [ ![ Live Demo] ( https://img.shields.io/badge/Live%20Demo-HuggingFace-D96080?style=flat-square )] ( https://huggingface.co/spaces/KunalTheBeast/TranscriptAI )
17+ [ ![ GitHub] ( https://img.shields.io/badge/GitHub-Repo-3C2416?style=flat-square&logo=github )] ( https://github.com/aiKunalBisht/Transcript-ai )
18+ [ ![ Python] ( https://img.shields.io/badge/Python-3.10%2B-blue?style=flat-square )] ( https://python.org )
19+ [ ![ License] ( https://img.shields.io/badge/License-MIT-green?style=flat-square )] ( LICENSE )
20+
21+ > Turns any meeting transcript into structured business intelligence in ~ 3 seconds.
22+ > Works with Otter.ai, Zoom, Google Meet, Whisperflow exports out of the box.
23+
24+ ---
25+
26+ ## How we got here — v1 to v5
27+
28+ | Version | What changed | Accuracy |
29+ | ---| ---| ---|
30+ | v1 | Hard exact matching, no cultural awareness | 22–30% |
31+ | v2 | Fuzzy names, rule-based code-switch, TF-IDF similarity | ~ 45% |
32+ | v3 | MeCab keigo detection, bilingual ground truth, soft sentiment | ~ 60% |
33+ | v4 | Hallucination guard, 16 nemawashi patterns, APPI PII masking | 75–85% |
34+ | ** v5 (live)** | 2-key Groq rotation, vector cache, health score, trends | ** 85–95%** |
35+
36+ ---
37+
38+ ## Why Japanese business is different
39+
40+ | What was said | Generic AI | TranscriptAI |
41+ | ---| ---| ---|
42+ | 検討いたします | "We will consider it" — action item | ⚠ Likely soft rejection (72%) |
43+ | 難しいかもしれません | "May be difficult" — neutral | 🚨 High rejection signal (90%) |
44+ | 承知いたしました | Acknowledged | 🏯 High keigo — senior speaker |
45+
46+ ---
47+
48+ ## Features
49+
50+ - ** Trilingual** — Japanese / Hindi (Hinglish) / English / Mixed
51+ - ** Keigo detection** via MeCab morphological analysis
52+ - ** 16 nemawashi soft rejection patterns** with confidence scores
53+ - ** 8 Hindi indirect patterns** (देखते हैं, थोड़ा मुश्किल है)
54+ - ** APPI-compliant PII masking** before any LLM call
55+ - ** Hallucination guard** — 100% rule-based, LLM never validates itself
56+ - ** 2-key Groq rotation** — auto-failover on 429 rate limit
57+ - ** Vector cache** — instant return for similar transcripts
58+ - ** Meeting health score** (0–100) with breakdown
59+ - ** Trends dashboard** — soft rejection drift, provider usage over time
60+ - ** Groq Whisper** — MP4/MP3/WAV transcription (same free key)
61+ - ** FastAPI REST** endpoint for CRM integration
62+
63+ ---
64+
65+ ## Quick start
66+
67+ ``` bash
68+ git clone https://github.com/aiKunalBisht/Transcript-ai.git
69+ cd Transcript-ai
70+ pip install -r requirements.txt
71+ export GROQ_API_KEY=your_key_here # free at console.groq.com
72+ python -m streamlit run app.py
73+ ```
74+
75+ ** HuggingFace Spaces:** Add ` GROQ_API_KEY ` in Space → Settings → Repository secrets.
76+
77+ ---
78+
79+ ## Evaluation
80+
81+ | Test case | v1 baseline | v5 live |
82+ | ---| ---| ---|
83+ | Sales call · JA/EN mixed | 30.8% | ** 95.2%** |
84+ | Internal meeting · Japanese | 22.2% | ** 81.6%** |
85+ | Client conflict · EN/JA | 55.9% | ** 85.8%** |
86+
87+ ---
88+
89+ ## Built by
90+
91+ [ Kunal Bisht] ( https://github.com/aiKunalBisht ) · Benglore, Karnataka, India
0 commit comments