Skip to content

Commit 1fcaa19

Browse files
Partha-dev01claude
andcommitted
Differentiate stages 4/7/9, add dynamic content, fix mobile camera, add criteria gates
Stage 4 → Word Echo: LLM-generated age-appropriate words via new /api/chat/generate-words endpoint, Polly TTS playback, echo matching. Stage 7 → Action Challenge: pure motor test with 6 fixed actions, live YOLO detection feedback (confidence bar, 5-dot frame counter, color-coded borders, contextual status text). Stage 9 → Speech & Comprehension: Part A sentence repetition with word-overlap scoring, Part B audio instruction following. Mobile camera: 3-tier progressive getUserMedia constraint negotiation, HTTPS check, specific error messages, retry/skip buttons. Criteria gates: minimum thresholds on stages 4/7/9/10 with retry/skip. 31/31 Playwright tests pass. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 3825ee1 commit 1fcaa19

11 files changed

Lines changed: 1505 additions & 1084 deletions

File tree

DOCS.md

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -409,6 +409,9 @@ npx playwright test # Run all 30 tests
409409
| R12 | **CI Playwright failures — AWS SDK "Region is missing"** | `next.config.ts` inlines env vars at build time with `?? ""` defaults. On CI (no `.env.local`), `BEDROCK_REGION`/`POLLY_REGION` resolved to `""` (empty string). Nullish coalescing (`??`) doesn't catch empty strings, so `process.env.BEDROCK_REGION ?? "us-east-1"``""`. AWS SDK threw `Error: Region is missing` outside try/catch → uncaught 500. Fix: changed `??` to `||` in all 4 API routes (summary, clinical, tts, conversation) and moved client creation inside try/catch blocks. TTS error status changed from 500 → 503. |
410410
| R13 | **Step 7 auto-advances without verifying motor actions** | Voice agent spoke motor instructions ("touch your nose", "wave") but immediately moved on without checking if the child performed the action. Also, agent text wasn't displayed prominently. Fix: added camera-based motor action verification using existing YOLO pose detection pipeline. Motor turns activate camera → YOLO extracts 17 keypoints → rule-based ActionDetector checks keypoint geometry → ActionTracker requires 5 consecutive positive frames → confirmed. Agent text now displayed in large centered speech bubble with domain emoji headers. |
411411
| R14 | **Stage 10 worker URL parse error** | `Failed to execute 'fetch' on 'WorkerGlobalScope': Failed to parse URL from /models/yolo26n-pose-int8.onnx`. ONNX model paths were relative URLs (`/models/...`) which fail inside Web Workers because relative paths resolve against the worker script URL (blob: or /_next/static/), not the page origin. Fix: prefixed all 4 model paths with `${self.location.origin}` in PipelineOrchestrator.ts and MultimodalOrchestrator.ts. |
412+
| R15 | **Stages 4, 7, 9 overlap and lack differentiation** | Stage 4 (Communication) and Stage 9 (Audio) were both simple speech echo tests with hardcoded word lists. Stage 7 (Preparation) mixed motor actions with LLM conversation. Fix: Stage 4 → pure Word Echo with LLM-generated age-appropriate words + Polly TTS. Stage 7 → pure Motor Action Challenge with fixed 6-action sequence + live YOLO detection feedback (confidence bar, 5-dot frame counter, color-coded borders). Stage 9 → Sentence Echo + Comprehension (Part A: sentence repetition with word-overlap scoring, Part B: audio instruction following). |
413+
| R16 | **Stage 10 camera fails on mobile** | `getUserMedia()` with fixed resolution constraints fails on many mobile browsers. Also: no HTTPS check (required for camera on mobile), generic error messages, no retry mechanism. Fix: 3-tier progressive constraint negotiation (ideal 320×240 → facingMode only → any video). HTTPS early check. Specific error messages per DOMException type (NotAllowedError, NotFoundError, NotReadableError, SecurityError). Retry Camera + Skip buttons on failure. Shared `cameraUtils.ts` reused by Stage 7 and Stage 10. |
414+
| R17 | **Stages auto-advance without criteria verification** | Some stages allowed proceeding even when insufficient data was collected. Fix: minimum criteria gates on Stages 4 (2/6 words), 7 (3/6 actions), 9 (2/7 items), 10 (5 samples + 30s). Stages show "Let's try again!" card with Try Again/Skip buttons when criteria not met. |
412415

413416
---
414417

@@ -538,3 +541,31 @@ npx playwright test # Run all 30 tests
538541
- Updated: `app/api/chat/conversation/route.ts` (action field in metadata)
539542
- Fixed: `app/lib/inference/PipelineOrchestrator.ts` (absolute model URLs)
540543
- Fixed: `app/lib/inference/MultimodalOrchestrator.ts` (absolute model URLs)
544+
545+
### v1.5.0 — 2026-03-04 (Stage Differentiation, Dynamic Content, Mobile Camera, Criteria Gates)
546+
547+
**Major Changes:**
548+
- **Stage 4 → Word Echo**: Dynamic LLM-generated (Bedrock Nova Lite) age-appropriate words spoken via Polly TTS. Child echoes back, matched via Web Speech API. 6 words per session from age-stratified pools (18-36mo, 36-60mo, 60+mo). Falls back to curated word pools when Bedrock unavailable.
549+
- **Stage 7 → Action Challenge**: Pure motor action test — fixed sequence of 6 actions (wave, touch nose, clap, raise arms, touch head, touch ears). Camera + YOLO pose detection with **live feedback**: confidence bar, color-coded camera border (red/blue/green), 5-dot frame counter showing consecutive detection progress, contextual status text ("Step into view", "Getting closer!", "Almost there!"). No LLM/TTS/STT — purely visual.
550+
- **Stage 9 → Speech & Comprehension**: Two-part test. Part A: 4 LLM-generated sentences with word-overlap matching (threshold 0.4). Part B: 3 audio instructions testing comprehension (any verbal response = engaged). Both spoken via Polly TTS.
551+
552+
**New:**
553+
- `POST /api/chat/generate-words` — Shared endpoint for dynamic content generation. Modes: `words`, `sentences`, `instructions`. Falls back to curated age-stratified pools (20 words, 6 sentences, 5 instructions per bracket).
554+
- `app/lib/camera/cameraUtils.ts` — Shared camera utility: 3-tier progressive `getUserMedia` constraint negotiation (ideal 320×240 → facingMode only → any camera). HTTPS early check. Specific error messages per DOMException type.
555+
- `consecutiveHits` exposed from ActionTracker and useActionCamera hook for real-time frame progress display.
556+
557+
**Fixed:**
558+
- **Mobile camera failures**: Progressive constraint fallback handles devices that can't satisfy resolution constraints. HTTPS check prevents silent failures on mobile HTTP. Specific error messages for NotFoundError, NotReadableError, OverconstrainedError, SecurityError. Retry Camera + Skip buttons added to Stage 10.
559+
- **Stages auto-advance without verification**: Minimum criteria gates added — Stage 4 (2/6 words), Stage 7 (3/6 actions), Stage 9 (2/7 items), Stage 10 (5 samples + 30s). Shows retry/skip menu when criteria not met.
560+
561+
**Files:**
562+
- Created: `app/api/chat/generate-words/route.ts`
563+
- Created: `app/lib/camera/cameraUtils.ts`
564+
- Rewritten: `app/intake/communication/page.tsx` (Word Echo)
565+
- Rewritten: `app/intake/preparation/page.tsx` (Action Challenge)
566+
- Rewritten: `app/intake/audio/page.tsx` (Speech & Comprehension)
567+
- Updated: `app/intake/video-capture/page.tsx` (mobile camera + criteria gate)
568+
- Updated: `app/hooks/useActionCamera.ts` (consecutiveHits + cameraUtils)
569+
- Updated: `app/lib/actions/actionDetector.ts` (consecutiveHits in tracker return)
570+
- Updated: `tests/intake-flow.spec.ts` (Step 4, 7, 9 test assertions)
571+
- Updated: `tests/app-pages.spec.ts` (generate-words API test)
Lines changed: 281 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,281 @@
1+
/**
2+
* POST /api/chat/generate-words
3+
*
4+
* Generates age-appropriate words, sentences, or instructions for
5+
* speech echo and comprehension stages. Falls back to curated
6+
* age-stratified pools when Bedrock is unavailable.
7+
*
8+
* Request body:
9+
* { ageMonths: number, count?: number, mode: "words"|"sentences"|"instructions" }
10+
*
11+
* Response:
12+
* { items: Array<{ text: string, emoji: string }>, fallback: boolean }
13+
*/
14+
15+
import { NextRequest, NextResponse } from "next/server";
16+
import {
17+
BedrockRuntimeClient,
18+
InvokeModelCommand,
19+
} from "@aws-sdk/client-bedrock-runtime";
20+
21+
/* ------------------------------------------------------------------ */
22+
/* Types */
23+
/* ------------------------------------------------------------------ */
24+
25+
type Mode = "words" | "sentences" | "instructions";
26+
27+
interface GenerateRequest {
28+
ageMonths: number;
29+
count?: number;
30+
mode: Mode;
31+
}
32+
33+
interface GeneratedItem {
34+
text: string;
35+
emoji: string;
36+
}
37+
38+
/* ------------------------------------------------------------------ */
39+
/* Bedrock client */
40+
/* ------------------------------------------------------------------ */
41+
42+
const BEDROCK_REGION = process.env.BEDROCK_REGION || "us-east-1";
43+
44+
function getBedrockClient(): BedrockRuntimeClient {
45+
return new BedrockRuntimeClient({ region: BEDROCK_REGION });
46+
}
47+
48+
/* ------------------------------------------------------------------ */
49+
/* Fallback pools — curated, age-stratified */
50+
/* ------------------------------------------------------------------ */
51+
52+
const WORD_EMOJIS: Record<string, string> = {
53+
mama: "👩", dada: "👨", ball: "⚽", dog: "🐶", cat: "🐱", milk: "🥛",
54+
more: "➕", up: "⬆️", bye: "👋", hi: "🙋", book: "📖", shoe: "👟",
55+
hat: "🎩", cup: "🥤", fish: "🐟", duck: "🦆", apple: "🍎", baby: "👶",
56+
car: "🚗", bird: "🐦", banana: "🍌", elephant: "🐘", butterfly: "🦋",
57+
dinosaur: "🦕", rainbow: "🌈", chocolate: "🍫", hello: "👋", purple: "🟣",
58+
circle: "⭕", triangle: "🔺", giraffe: "🦒", penguin: "🐧", rocket: "🚀",
59+
princess: "👸", monster: "👹", umbrella: "☂️", pumpkin: "🎃",
60+
strawberry: "🍓", airplane: "✈️", crocodile: "🐊", computer: "💻",
61+
adventure: "🗺️", incredible: "🌟", beautiful: "🌸", discovery: "🔍",
62+
astronaut: "🧑‍🚀", magnificent: "✨", helicopter: "🚁", wonderful: "💫",
63+
caterpillar: "🐛", watermelon: "🍉", constellation: "⭐", basketball: "🏀",
64+
trampoline: "🤸", hippopotamus: "🦛", refrigerator: "🧊", thermometer: "🌡️",
65+
xylophone: "🎵", vocabulary: "📝", harmonica: "🎶",
66+
};
67+
68+
const FALLBACK_WORDS: Record<string, string[]> = {
69+
young: ["mama", "dada", "ball", "dog", "cat", "milk", "more", "up", "bye", "hi",
70+
"book", "shoe", "hat", "cup", "fish", "duck", "apple", "baby", "car", "bird"],
71+
mid: ["banana", "elephant", "butterfly", "dinosaur", "rainbow", "chocolate", "hello",
72+
"purple", "circle", "triangle", "giraffe", "penguin", "rocket", "princess",
73+
"monster", "umbrella", "pumpkin", "strawberry", "airplane", "crocodile"],
74+
old: ["computer", "adventure", "incredible", "beautiful", "discovery", "astronaut",
75+
"magnificent", "helicopter", "wonderful", "caterpillar", "watermelon",
76+
"constellation", "basketball", "trampoline", "hippopotamus", "refrigerator",
77+
"thermometer", "xylophone", "vocabulary", "harmonica"],
78+
};
79+
80+
const FALLBACK_SENTENCES: Record<string, GeneratedItem[]> = {
81+
young: [
82+
{ text: "The cat is big", emoji: "🐱" },
83+
{ text: "I like dogs", emoji: "🐶" },
84+
{ text: "My ball is red", emoji: "⚽" },
85+
{ text: "I see a bird", emoji: "🐦" },
86+
{ text: "The sun is hot", emoji: "☀️" },
87+
{ text: "I want milk", emoji: "🥛" },
88+
],
89+
mid: [
90+
{ text: "The butterfly is very pretty", emoji: "🦋" },
91+
{ text: "I want to go outside and play", emoji: "🏃" },
92+
{ text: "My favorite color is blue", emoji: "🔵" },
93+
{ text: "The dog is running in the park", emoji: "🐶" },
94+
{ text: "I can count to ten", emoji: "🔢" },
95+
{ text: "The moon comes out at night", emoji: "🌙" },
96+
],
97+
old: [
98+
{ text: "The elephant walked through the tall jungle", emoji: "🐘" },
99+
{ text: "Can you tell me about your favorite game", emoji: "🎮" },
100+
{ text: "The beautiful rainbow appeared after the rain", emoji: "🌈" },
101+
{ text: "I like to read books before bedtime", emoji: "📚" },
102+
{ text: "The spaceship flew high into the sky", emoji: "🚀" },
103+
{ text: "My friend and I played at the park today", emoji: "🏞️" },
104+
],
105+
};
106+
107+
const FALLBACK_INSTRUCTIONS: Record<string, GeneratedItem[]> = {
108+
young: [
109+
{ text: "Clap your hands", emoji: "👏" },
110+
{ text: "Wave bye bye", emoji: "👋" },
111+
{ text: "Say your name", emoji: "🗣️" },
112+
{ text: "Touch your nose", emoji: "👃" },
113+
{ text: "Say mama", emoji: "👩" },
114+
],
115+
mid: [
116+
{ text: "Clap your hands two times", emoji: "👏" },
117+
{ text: "Say hello and then wave", emoji: "👋" },
118+
{ text: "Count to three out loud", emoji: "🔢" },
119+
{ text: "Tell me something that is red", emoji: "🔴" },
120+
{ text: "Say the word butterfly", emoji: "🦋" },
121+
],
122+
old: [
123+
{ text: "Clap your hands then touch your head", emoji: "👏" },
124+
{ text: "Say your name and how old you are", emoji: "🗣️" },
125+
{ text: "Count backwards from five", emoji: "🔢" },
126+
{ text: "Tell me your favorite animal and why", emoji: "🐾" },
127+
{ text: "Say a long word like hippopotamus", emoji: "🦛" },
128+
],
129+
};
130+
131+
/* ------------------------------------------------------------------ */
132+
/* Helpers */
133+
/* ------------------------------------------------------------------ */
134+
135+
function getAgeBracket(ageMonths: number): string {
136+
if (ageMonths < 36) return "young";
137+
if (ageMonths < 60) return "mid";
138+
return "old";
139+
}
140+
141+
function shuffle<T>(arr: T[]): T[] {
142+
const a = [...arr];
143+
for (let i = a.length - 1; i > 0; i--) {
144+
const j = Math.floor(Math.random() * (i + 1));
145+
[a[i], a[j]] = [a[j], a[i]];
146+
}
147+
return a;
148+
}
149+
150+
function pickFallbackWords(ageMonths: number, count: number): GeneratedItem[] {
151+
const bracket = getAgeBracket(ageMonths);
152+
const pool = FALLBACK_WORDS[bracket];
153+
return shuffle(pool).slice(0, count).map((w) => ({
154+
text: w,
155+
emoji: WORD_EMOJIS[w] || "🔤",
156+
}));
157+
}
158+
159+
function pickFallbackSentences(ageMonths: number, count: number): GeneratedItem[] {
160+
const bracket = getAgeBracket(ageMonths);
161+
return shuffle(FALLBACK_SENTENCES[bracket]).slice(0, count);
162+
}
163+
164+
function pickFallbackInstructions(ageMonths: number, count: number): GeneratedItem[] {
165+
const bracket = getAgeBracket(ageMonths);
166+
return shuffle(FALLBACK_INSTRUCTIONS[bracket]).slice(0, count);
167+
}
168+
169+
function pickFallback(mode: Mode, ageMonths: number, count: number): GeneratedItem[] {
170+
switch (mode) {
171+
case "words": return pickFallbackWords(ageMonths, count);
172+
case "sentences": return pickFallbackSentences(ageMonths, count);
173+
case "instructions": return pickFallbackInstructions(ageMonths, count);
174+
}
175+
}
176+
177+
/* ------------------------------------------------------------------ */
178+
/* Bedrock generation */
179+
/* ------------------------------------------------------------------ */
180+
181+
function buildPrompt(mode: Mode, ageMonths: number, count: number): string {
182+
const years = Math.floor(ageMonths / 12);
183+
const months = ageMonths % 12;
184+
const ageStr = months > 0 ? `${years} years and ${months} months` : `${years} years`;
185+
186+
switch (mode) {
187+
case "words":
188+
return `Generate exactly ${count} age-appropriate single words for a ${ageStr}-old child to repeat in a speech echo test. Mix easy and slightly challenging words. Use concrete nouns and simple words the child would know. Each word should have a relevant emoji. Return ONLY valid JSON array (no markdown, no code blocks): [{"text":"banana","emoji":"🍌"},...]`;
189+
case "sentences":
190+
return `Generate exactly ${count} short sentences (3-8 words each) appropriate for a ${ageStr}-old child to repeat in a speech test. Use familiar objects and actions. Each sentence should have a relevant emoji. Return ONLY valid JSON array (no markdown, no code blocks): [{"text":"The cat is sleeping","emoji":"🐱"},...]`;
191+
case "instructions":
192+
return `Generate exactly ${count} simple audio instructions for a ${ageStr}-old child. Each should ask them to do or say something simple and fun. Each instruction should have a relevant emoji. Return ONLY valid JSON array (no markdown, no code blocks): [{"text":"Clap your hands two times","emoji":"👏"},...]`;
193+
}
194+
}
195+
196+
function parseItems(raw: string): GeneratedItem[] | null {
197+
try {
198+
// Try direct parse
199+
const parsed = JSON.parse(raw);
200+
if (Array.isArray(parsed) && parsed.length > 0 && parsed[0].text) {
201+
return parsed;
202+
}
203+
} catch {
204+
// Try extracting JSON array from text
205+
const match = raw.match(/\[[\s\S]*\]/);
206+
if (match) {
207+
try {
208+
const parsed = JSON.parse(match[0]);
209+
if (Array.isArray(parsed) && parsed.length > 0 && parsed[0].text) {
210+
return parsed;
211+
}
212+
} catch { /* fall through */ }
213+
}
214+
}
215+
return null;
216+
}
217+
218+
/* ------------------------------------------------------------------ */
219+
/* POST handler */
220+
/* ------------------------------------------------------------------ */
221+
222+
export async function POST(request: NextRequest) {
223+
try {
224+
const body: GenerateRequest = await request.json();
225+
const { ageMonths = 36, count = 6, mode = "words" } = body;
226+
227+
if (!["words", "sentences", "instructions"].includes(mode)) {
228+
return NextResponse.json({ error: "Invalid mode" }, { status: 400 });
229+
}
230+
231+
const clampedCount = Math.max(1, Math.min(count, 10));
232+
233+
// Try Bedrock first
234+
try {
235+
const client = getBedrockClient();
236+
const prompt = buildPrompt(mode, ageMonths, clampedCount);
237+
238+
const command = new InvokeModelCommand({
239+
modelId: "amazon.nova-lite-v1:0",
240+
contentType: "application/json",
241+
accept: "application/json",
242+
body: JSON.stringify({
243+
messages: [{ role: "user", content: [{ text: prompt }] }],
244+
inferenceConfig: { maxNewTokens: 500, temperature: 0.8 },
245+
}),
246+
});
247+
248+
const controller = new AbortController();
249+
const timeout = setTimeout(() => controller.abort(), 3000);
250+
251+
const response = await client.send(command, { abortSignal: controller.signal });
252+
clearTimeout(timeout);
253+
254+
const responseBody = JSON.parse(new TextDecoder().decode(response.body));
255+
const text = responseBody?.output?.message?.content?.[0]?.text;
256+
257+
if (text) {
258+
const items = parseItems(text);
259+
if (items && items.length >= clampedCount) {
260+
return NextResponse.json({
261+
items: items.slice(0, clampedCount),
262+
fallback: false,
263+
});
264+
}
265+
}
266+
} catch {
267+
// Bedrock failed or timed out — use fallback
268+
}
269+
270+
// Fallback
271+
return NextResponse.json({
272+
items: pickFallback(mode, ageMonths, clampedCount),
273+
fallback: true,
274+
});
275+
} catch (err) {
276+
return NextResponse.json(
277+
{ error: err instanceof Error ? err.message : "Unknown error" },
278+
{ status: 500 },
279+
);
280+
}
281+
}

0 commit comments

Comments
 (0)