Skip to content

implement stream vision agent ai language learning teacher#10

Merged
Retsomm merged 5 commits into
mainfrom
dev
May 20, 2026
Merged

implement stream vision agent ai language learning teacher#10
Retsomm merged 5 commits into
mainfrom
dev

Conversation

@Retsomm
Copy link
Copy Markdown
Owner

@Retsomm Retsomm commented May 19, 2026

Summary by CodeRabbit

  • New Features

    • Real-time audio lessons with an AI voice teacher and a vision-agent-backed session service
    • Server API endpoints to provision audio calls and manage agent sessions
  • Improvements

    • Lesson view now opens the audio-teacher session automatically; enhanced in-call controls (join/end, mic/camera, status)
    • App config and runtime API URL handling updated for WebRTC/streaming and env-driven endpoints
  • Documentation

    • Added vision-agent README and example env vars for local setup

Review Change Stack

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 19, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 4e102ab4-340a-4352-825c-32e9832ae07f

📥 Commits

Reviewing files that changed from the base of the PR and between b03f214 and 583e3b9.

📒 Files selected for processing (1)
  • vision-agent/agent.py
💤 Files with no reviewable changes (1)
  • vision-agent/agent.py

📝 Walkthrough

Walkthrough

This PR integrates real-time audio lesson calls into the mobile app using Stream Video SDK and adds a Python Vision Agent backend service. The mobile client provisions calls via a new API endpoint, manages session state through a custom hook, and renders the AudioTeacherSession component with camera/microphone controls. The Vision Agent service connects to Stream Edge and OpenAI Realtime to deliver voice-based language teaching.

Changes

Audio Lesson Call Integration

Layer / File(s) Summary
Expo Config & Dependencies
app.config.js, package.json, skills-lock.json
iOS bundle identifier, web server output mode, and Stream SDK/WebRTC plugins added to Expo config. Package dependencies updated to include @stream-io/video-react-native-sdk, @stream-io/react-native-webrtc, @stream-io/node-sdk, and react-native-svg; scripts changed to use expo run:android/expo run:ios. Skills lock updated with agent and Stream service entries.
Backend Audio Call & Vision Agent Proxy
app/api/stream/audio-call+api.ts, app/api/vision-agent/session+api.ts, lib/api.ts
New API endpoint parses and validates AudioCallRequest (lessonId, languageId, user.id), sanitizes identifiers, upserts Stream users, creates/initializes video calls with metadata, generates scoped tokens, and returns credentials. Vision Agent session proxy endpoints forward start/stop requests to the upstream agent service with timeout/error extraction. getApiUrl() resolves API host on native platforms or falls back to configured base URL.
Stream Audio Call Hook
hooks/useStreamAudioCall.ts
Hook manages Stream Video SDK lifecycle: startCall() calls backend and connects user, joinCall() configures and joins the call, toggleMute() and toggleCamera() manage device state (auto-joining when needed), endCall() cleans up, and Vision Agent session lifecycle is tracked via agentSessionId. Cleanup effects prevent lingering sessions on unmount.
AudioTeacherSession Component & Controls
components/audio-teacher-session.tsx
Component refactored to delegate rendering to AudioTeacherSessionContent, adding autoStartCall and onCallEnded props. Integrates useStreamAudioCall to derive call state and display status copy. Header shows status-color dot and language + status. New learner status card displays user/call status badge. Camera/mic/status/call control buttons are explicitly wired to call methods with enabled/disabled logic tied to call state. Rendering conditionally wraps content in StreamVideoProvider when available.
Lesson Navigation & Screen Routing
app/(tabs)/learn.tsx, app/lesson/[lessonId].tsx
LessonCard onPress now always navigates to the lesson route, removing the prior guard that blocked "not-started" lessons. Lesson detail screen replaced entirely with AudioTeacherSession configured to auto-start, label "Learn", and navigate back to /learn when the call ends.

Vision Agent Service

Layer / File(s) Summary
Vision Agent Setup & Configuration
vision-agent/pyproject.toml, vision-agent/.env.example, vision-agent/.gitignore, vision-agent/.python-version, vision-agent/README.md, .env.example
Python 3.12 project with dependencies on python-dotenv and vision-agents (with getstream and openai extras). Environment variables STREAM_API_KEY, STREAM_API_SECRET, OPENAI_API_KEY, EXPO_PUBLIC_API_BASE_URL, and VISION_AGENT_BASE_URL documented. Gitignore excludes virtual environment, cache, and .env. README describes service architecture, environment loading order, local setup with uv, and call coordination via call_type/call_id.
Vision Agent Implementation
vision-agent/agent.py
Agent module loads environment from root and service .env files, validates required secrets, and provides PassthroughAudioFilter for PCM audio. build_teacher_instructions() generates dynamic voice prompts from language/lesson params. create_agent() instantiates an Agent with getstream.Edge, OpenAI Realtime config, and audio filter. join_call() joins the Stream call, sends initial greeting/practice prompt, and finalizes the session. Script entrypoint starts a Runner with concurrency limits and call lifecycle wiring.

Sequence Diagrams

sequenceDiagram
  participant Client as Mobile Client
  participant API as audio-call API
  participant StreamAPI as Stream Backend
  Client->>API: POST /api/stream/audio-call
  API->>API: Validate request (lessonId, languageId, user)
  API->>API: Sanitize IDs and generate callId/callCid
  API->>StreamAPI: upsertUsers for Stream user
  API->>StreamAPI: call.getOrCreate with metadata
  API->>StreamAPI: generateToken scoped to callCid
  API->>Client: Return apiKey, callCid, token, user info
Loading
sequenceDiagram
  participant User as User
  participant Lesson as AudioTeacherSession
  participant Hook as useStreamAudioCall
  participant Stream as Stream SDK
  User->>Lesson: Open lesson (autoStartCall)
  Lesson->>Hook: useStreamAudioCall()
  Hook->>Hook: startCall → POST /api/stream/audio-call
  Hook->>Stream: SDK load, connect user, create call
  Hook->>API: POST /api/vision-agent/session
  Hook->>Hook: agentSessionId returned
  Lesson->>Lesson: Render with call state
  Lesson->>Lesson: Auto-join when autoStartCall enabled
  User->>Lesson: Toggle camera/mic
  Lesson->>Hook: toggleCamera() / toggleMute()
  Hook->>Stream: Update camera/mic
  Lesson->>Lesson: Update UI (disabled state, badges)
  User->>Lesson: End call
  Lesson->>Hook: endCall()
  Hook->>Stream: Leave call, disconnect
  Hook->>API: DELETE /api/vision-agent/session
  Lesson->>User: Navigate to /learn
Loading
sequenceDiagram
  participant Mobile as Mobile App
  participant Agent as Vision Agent
  participant Stream as Stream Edge
  participant OpenAI as OpenAI Realtime
  Mobile->>Agent: POST /vision-agent/sessions (create)
  Agent->>Agent: load_environment(), validate secrets
  Agent->>Stream: connect via getstream.Edge
  Agent->>OpenAI: configure realtime (model, voice)
  Mobile->>Agent: POST /vision-agent/sessions/join (join_call)
  Agent->>Stream: join call
  Agent->>OpenAI: send greeting + practice prompt
  Agent->>OpenAI: audio in/out via passthrough filter
  Mobile->>Agent: DELETE /vision-agent/sessions (stop)
  Agent->>Stream: leave call
Loading

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • Retsomm/react-native-lingua#9: Builds directly on the unit/lesson screens work by integrating the new Stream Video-powered AudioTeacherSession and updating lesson navigation flow.

"🐰 I hopped in code, ears tuned to streams,
A lesson hummed in real-time dreams.
Mic and agent speak, the pupil repeats—
Small paws applaud these realtime feats."

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: implementing a Stream Vision Agent-based AI language learning teacher system. It reflects the primary architectural addition across multiple new files and components.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch dev

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@app/api/stream/audio-call`+api.ts:
- Around line 61-73: After calling cleanId and compactId for userId, lessonId,
languageId (producing safeUserId, safeLessonId, safeLanguageId,
safeUserSegment), validate that none are empty/undefined before building
callId/callCid; if any sanitized identifier is empty, return a 400 Bad Request
(or throw a typed HTTPError) with a clear message about invalid identifiers
instead of proceeding to construct callId/callCid, and include the original
input name in the error to aid the client.

In `@hooks/useStreamAudioCall.ts`:
- Around line 148-167: If startCall fails after creating/connecting the
StreamVideoClient, ensure you clean up any partially initialized resources: if
streamCall was created call its termination method (e.g., hangUp/leave/stop) and
if streamClient was connected call its disconnect method (e.g.,
disconnectUser/disconnect/close) before setting error state; implement this
inside the catch block of the async startCall routine in useStreamAudioCall.ts
so you call the appropriate teardown on the streamCall and streamClient (created
via new streamSdk.StreamVideoClient and returned from connectUser) prior to
setErrorMessage and setStatus("error").

In `@lib/api.ts`:
- Around line 15-20: The current fallback returns a relative path when host
detection fails (causing fetch("/api/stream/audio-call") to break
native/standalone builds); change the fallback to an absolute URL by reading
EXPO_PUBLIC_API_BASE_URL and using it when host is falsy (so the function that
currently returns `path` should instead return
`${EXPO_PUBLIC_API_BASE_URL}${path}`); add EXPO_PUBLIC_API_BASE_URL to
.env.example and your actual .env with the production API endpoint and wire it
into your native build configuration (update app.config.js or your env-loading
step) so the variable is available at runtime; alternatively document and
implement another fixed absolute-base approach if you don't want an env var.

In `@vision-agent/.gitignore`:
- Line 4: Update the .gitignore entry that currently only ignores ".env" so it
also covers env variants like ".env.local", ".env.production",
".env.development", etc.; edit the .gitignore (look for the existing ".env"
line) to broaden the pattern (and optionally keep an allowlist for example files
like ".env.example") so all environment-file variants are ignored to prevent
accidental secret commits.

In `@vision-agent/agent.py`:
- Around line 40-43: The code reads raw kwargs into language_name/lesson_title
and assigns language/lesson directly, allowing malicious metadata to inject
instructions; fix by validating and sanitizing these inputs before use (create
and call a helper like sanitize_metadata(value) on
kwargs.get("language_name")/kwargs.get("languageName") and
kwargs.get("lesson_title")/kwargs.get("lessonTitle")), enforcing length limits,
stripping/control-character/newline sequences, removing or escaping
directive-like phrases (e.g., "you are", "ignore previous", system tokens), and
falling back to the safe defaults if validation fails; update references to
language and lesson to use the sanitized values.
- Around line 89-94: The code calls await agent.finish() immediately after
agent.simple_response which ends the run and prevents multi-turn practice;
remove or defer the await agent.finish() call inside the async with
agent.join(call): block, and instead keep the session open to handle
conversation events (e.g., process incoming messages/turns and emit responses)
using the agent's event handlers or a loop, calling agent.finish() only when the
user explicitly exits or the call-end event occurs; locate references to
agent.simple_response and agent.finish in the current async with
agent.join(call): scope and move or gate the finish() invocation behind an
explicit exit/call-end condition.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: f8740c20-39f0-4b42-a6fe-7e5a77bd494c

📥 Commits

Reviewing files that changed from the base of the PR and between c948cce and f568969.

⛔ Files ignored due to path filters (2)
  • package-lock.json is excluded by !**/package-lock.json
  • vision-agent/uv.lock is excluded by !**/*.lock
📒 Files selected for processing (15)
  • app.config.js
  • app/(tabs)/learn.tsx
  • app/api/stream/audio-call+api.ts
  • app/lesson/[lessonId].tsx
  • components/audio-teacher-session.tsx
  • hooks/useStreamAudioCall.ts
  • lib/api.ts
  • package.json
  • skills-lock.json
  • vision-agent/.env.example
  • vision-agent/.gitignore
  • vision-agent/.python-version
  • vision-agent/README.md
  • vision-agent/agent.py
  • vision-agent/pyproject.toml

Comment thread app/api/stream/audio-call+api.ts
Comment thread hooks/useStreamAudioCall.ts Outdated
Comment thread lib/api.ts
Comment thread vision-agent/.gitignore Outdated
Comment thread vision-agent/agent.py Outdated
Comment thread vision-agent/agent.py Outdated
Comment on lines +89 to +94
await agent.simple_response(
"Greet the learner in English, introduce yourself as their AI teacher, "
"and begin the first short speaking practice.",
)
await agent.finish()

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🌐 Web query:

In the vision-agents Python SDK, does calling await agent.finish()insideasync with agent.join(call): immediately end the active session? What is the recommended pattern for keeping a voice session open for multi-turn conversation and ending only when the call/session should close?

💡 Result:

Yes—based on the Vision Agents quickstart and examples, calling await agent.finish() inside the async with agent.join(call): block is used to end the agent’s active run/session; the code shows agent.finish() being called as the last step while still inside the join context, implying it terminates the session before leaving the async with scope [1]. Recommended pattern for multi-turn voice conversations: keep the agent running inside the async with agent.join(call): block without immediately calling agent.finish(). Let the agent handle turns/events while the call/session stays open, and only call await agent.finish() (and then exit the async with) when you intend the call to close (e.g., when the user hangs up or you detect a “should close” condition). The SDK provides call/session lifecycle events (including CallSessionEndedEvent / participant join/leave) that you can subscribe to and use as your cue to end the agent [2]. Concretely, structure your join handler like: - async with agent.join(call): start the agent’s call processing - respond to events/turns (or run your conversation loop) while the call is active - on your chosen close condition (participant leaves, call ends, etc.), call await agent.finish() and then return/let the async with exit This matches the quickstart pattern (join → greet → await agent.finish() to end) [1], and it’s consistent with using lifecycle events to decide when the call/session has ended [2].

Citations:


🏁 Script executed:

fd -t f agent.py | head -20

Repository: Retsomm/react-native-lingua

Length of output: 1969


Remove or defer await agent.finish() to allow multi-turn practice interaction.

The code on line 93 terminates the agent session immediately after the greeting. Since the message explicitly states "begin the first short speaking practice," the session should remain open for interactive turns. Per the Vision Agents SDK, await agent.finish() ends the active run; keep the session open inside the async with agent.join(call): block by handling conversation events, and call finish() only when the call/session should actually close (e.g., on user exit or call-end event).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@vision-agent/agent.py` around lines 89 - 94, The code calls await
agent.finish() immediately after agent.simple_response which ends the run and
prevents multi-turn practice; remove or defer the await agent.finish() call
inside the async with agent.join(call): block, and instead keep the session open
to handle conversation events (e.g., process incoming messages/turns and emit
responses) using the agent's event handlers or a loop, calling agent.finish()
only when the user explicitly exits or the call-end event occurs; locate
references to agent.simple_response and agent.finish in the current async with
agent.join(call): scope and move or gate the finish() invocation behind an
explicit exit/call-end condition.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@app/api/vision-agent/session`+api.ts:
- Around line 59-73: The fetch calls to the Vision Agent (the POST to
`${getVisionAgentBaseUrl()}/calls/${body.callId}/sessions` and the other fetch
at lines 117-120) lack timeouts and can hang; wrap these fetches with an
AbortController, set a bounded timeout (e.g., configurable constant like
VISION_AGENT_TIMEOUT_MS) that calls controller.abort() via setTimeout, pass
controller.signal into fetch, and clear the timer on success; catch AbortError
and throw a controlled error (with context mentioning getVisionAgentBaseUrl()
and the callId/endpoint) so callers of the session API can fail fast instead of
waiting indefinitely.
- Around line 59-60: The URL is built with raw path IDs (body.callId and
sessionId) which can break routing; wrap both path segments with
encodeURIComponent when interpolating into the fetch URL (e.g., use
`${getVisionAgentBaseUrl()}/calls/${encodeURIComponent(body.callId)}/sessions`
and the analogous occurrence where sessionId is used) so all reserved characters
are percent-encoded; update both the fetch call that uses body.callId and the
later fetch that uses sessionId to use encodeURIComponent.

In `@hooks/useStreamAudioCall.ts`:
- Around line 183-185: The teardown before starting a new call currently only
awaits call?.leave() and client?.disconnectUser(), but leaves agentSessionRef
active; update the pre-start teardown in useStreamAudioCall (where call?.leave()
and client?.disconnectUser() are invoked) to also stop/terminate the existing
agent session referenced by agentSessionRef (e.g., call
agentSessionRef.current?.stop() or agentSessionRef.current?.endSession() /
equivalent cleanup method) and await it (with safe catch) before proceeding to
create a replacement call so no paid/active agent sessions are orphaned.
- Around line 243-259: When startCall fails and you run the teardown in the
catch block (calling stopAgentSession(nextAgentSession), streamCall.leave(), and
streamClient.disconnectUser()), also clear the retained call/client state so
later joinCall won't reuse stale instances: explicitly reset the in-scope state
holders referenced in this file (e.g., nextAgentSession, streamCall,
streamClient and isStreamClientConnected) via their setters after teardown and
before setting the error (e.g., setNextAgentSession(null), setStreamCall(null),
setStreamClient(null), setIsStreamClientConnected(false)), then setErrorMessage
and setStatus("error").
- Around line 373-387: The current teardown (call.leave(),
client?.disconnectUser(), setCall(null), setClient(null), setIsCameraOn(false),
setIsMuted(false), setStatus("ended")) is inside the try block so it won't run
if call.endCall() throws; move the local cleanup into a finally block so leave()
and disconnectUser() are always attempted and UI state is reset; keep the catch
to setErrorMessage(error.message) and setStatus("error"), then in finally run
call.leave(), client?.disconnectUser(), setCall(null), setClient(null),
setIsCameraOn(false), setIsMuted(false) and only setStatus("ended") in finally
if no error occurred (e.g., check a local errorOccurred flag or status !==
"error") to avoid overwriting the error status.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: f576ff89-365a-4278-bcd3-18891a543470

📥 Commits

Reviewing files that changed from the base of the PR and between f568969 and ac073e7.

📒 Files selected for processing (9)
  • .env.example
  • app.config.js
  • app/api/stream/audio-call+api.ts
  • app/api/vision-agent/session+api.ts
  • components/audio-teacher-session.tsx
  • hooks/useStreamAudioCall.ts
  • lib/api.ts
  • vision-agent/.gitignore
  • vision-agent/agent.py
🚧 Files skipped from review as they are similar to previous changes (4)
  • vision-agent/.gitignore
  • app.config.js
  • app/api/stream/audio-call+api.ts
  • components/audio-teacher-session.tsx

Comment thread app/api/vision-agent/session+api.ts Outdated
Comment thread app/api/vision-agent/session+api.ts Outdated
Comment thread hooks/useStreamAudioCall.ts
Comment thread hooks/useStreamAudioCall.ts
Comment thread hooks/useStreamAudioCall.ts
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
vision-agent/agent.py (1)

229-231: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Session ends too early for multi-turn practice.

Line 231 calls await agent.finish() immediately after the first greeting, which likely closes the run before learner interaction. Defer finish() until an explicit end condition (call-ended/user-exit).

For vision-agents Python SDK v0.2.0, what is the recommended pattern for keeping `async with agent.join(call):` active for multi-turn voice conversation, and when should `await agent.finish()` be called?
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@vision-agent/agent.py` around lines 229 - 231, The code calls await
agent.finish() right after the initial greeting which closes the session
prematurely; instead keep the async with agent.join(call): block active for
multi-turn conversation by deferring agent.finish() until an explicit end
condition (call end or user exit). Replace the single greeting followed by
finish with a loop inside async with agent.join(call): (use
agent.simple_response/opening_prompt to send the greeting, then repeatedly
receive user input/events and respond) and only call await agent.finish() when
you detect the terminal condition (call ended or a user-exit intent); reference
agent.join(call), agent.simple_response(opening_prompt), and agent.finish() when
making this change.
🧹 Nitpick comments (1)
vision-agent/agent.py (1)

223-227: ⚡ Quick win

Avoid re-injecting full lesson instructions into opening_prompt.

lesson_instructions is already applied at Line 222. Including it again in the first simple_response prompt increases prompt noise and can cause the model to echo internal instruction text instead of starting naturally.

Suggested minimal diff
-        opening_prompt = (
-            f"{lesson_instructions}\n\n"
-            "Now greet the learner in English, introduce yourself as their AI teacher, "
-            "and begin the first short speaking practice from this lesson."
-        )
+        opening_prompt = (
+            "Now greet the learner in English, introduce yourself as their AI teacher, "
+            "and begin the first short speaking practice from this lesson."
+        )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@vision-agent/agent.py` around lines 223 - 227, The opening_prompt currently
concatenates lesson_instructions again (duplicating instructions already applied
earlier); update the opening_prompt variable so it does not include
lesson_instructions and instead only contains the short greeting/instruction
text (e.g., the "Now greet the learner..." sentence). Locate opening_prompt in
agent.py and remove the reference to lesson_instructions so the prompt only
instructs the model to greet, introduce itself, and start the first short
speaking practice, thereby avoiding re-injection of full lesson_instructions
into the model input.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@vision-agent/agent.py`:
- Around line 229-231: The code calls await agent.finish() right after the
initial greeting which closes the session prematurely; instead keep the async
with agent.join(call): block active for multi-turn conversation by deferring
agent.finish() until an explicit end condition (call end or user exit). Replace
the single greeting followed by finish with a loop inside async with
agent.join(call): (use agent.simple_response/opening_prompt to send the
greeting, then repeatedly receive user input/events and respond) and only call
await agent.finish() when you detect the terminal condition (call ended or a
user-exit intent); reference agent.join(call),
agent.simple_response(opening_prompt), and agent.finish() when making this
change.

---

Nitpick comments:
In `@vision-agent/agent.py`:
- Around line 223-227: The opening_prompt currently concatenates
lesson_instructions again (duplicating instructions already applied earlier);
update the opening_prompt variable so it does not include lesson_instructions
and instead only contains the short greeting/instruction text (e.g., the "Now
greet the learner..." sentence). Locate opening_prompt in agent.py and remove
the reference to lesson_instructions so the prompt only instructs the model to
greet, introduce itself, and start the first short speaking practice, thereby
avoiding re-injection of full lesson_instructions into the model input.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: a17d8879-901e-4b12-88cf-60fa7dd30b9a

📥 Commits

Reviewing files that changed from the base of the PR and between f74a97e and b03f214.

📒 Files selected for processing (4)
  • app/api/stream/audio-call+api.ts
  • components/audio-teacher-session.tsx
  • hooks/useStreamAudioCall.ts
  • vision-agent/agent.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant