Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (1)
💤 Files with no reviewable changes (1)
📝 WalkthroughWalkthroughThis PR integrates real-time audio lesson calls into the mobile app using Stream Video SDK and adds a Python Vision Agent backend service. The mobile client provisions calls via a new API endpoint, manages session state through a custom hook, and renders the AudioTeacherSession component with camera/microphone controls. The Vision Agent service connects to Stream Edge and OpenAI Realtime to deliver voice-based language teaching. ChangesAudio Lesson Call Integration
Vision Agent Service
Sequence DiagramssequenceDiagram
participant Client as Mobile Client
participant API as audio-call API
participant StreamAPI as Stream Backend
Client->>API: POST /api/stream/audio-call
API->>API: Validate request (lessonId, languageId, user)
API->>API: Sanitize IDs and generate callId/callCid
API->>StreamAPI: upsertUsers for Stream user
API->>StreamAPI: call.getOrCreate with metadata
API->>StreamAPI: generateToken scoped to callCid
API->>Client: Return apiKey, callCid, token, user info
sequenceDiagram
participant User as User
participant Lesson as AudioTeacherSession
participant Hook as useStreamAudioCall
participant Stream as Stream SDK
User->>Lesson: Open lesson (autoStartCall)
Lesson->>Hook: useStreamAudioCall()
Hook->>Hook: startCall → POST /api/stream/audio-call
Hook->>Stream: SDK load, connect user, create call
Hook->>API: POST /api/vision-agent/session
Hook->>Hook: agentSessionId returned
Lesson->>Lesson: Render with call state
Lesson->>Lesson: Auto-join when autoStartCall enabled
User->>Lesson: Toggle camera/mic
Lesson->>Hook: toggleCamera() / toggleMute()
Hook->>Stream: Update camera/mic
Lesson->>Lesson: Update UI (disabled state, badges)
User->>Lesson: End call
Lesson->>Hook: endCall()
Hook->>Stream: Leave call, disconnect
Hook->>API: DELETE /api/vision-agent/session
Lesson->>User: Navigate to /learn
sequenceDiagram
participant Mobile as Mobile App
participant Agent as Vision Agent
participant Stream as Stream Edge
participant OpenAI as OpenAI Realtime
Mobile->>Agent: POST /vision-agent/sessions (create)
Agent->>Agent: load_environment(), validate secrets
Agent->>Stream: connect via getstream.Edge
Agent->>OpenAI: configure realtime (model, voice)
Mobile->>Agent: POST /vision-agent/sessions/join (join_call)
Agent->>Stream: join call
Agent->>OpenAI: send greeting + practice prompt
Agent->>OpenAI: audio in/out via passthrough filter
Mobile->>Agent: DELETE /vision-agent/sessions (stop)
Agent->>Stream: leave call
🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 6
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@app/api/stream/audio-call`+api.ts:
- Around line 61-73: After calling cleanId and compactId for userId, lessonId,
languageId (producing safeUserId, safeLessonId, safeLanguageId,
safeUserSegment), validate that none are empty/undefined before building
callId/callCid; if any sanitized identifier is empty, return a 400 Bad Request
(or throw a typed HTTPError) with a clear message about invalid identifiers
instead of proceeding to construct callId/callCid, and include the original
input name in the error to aid the client.
In `@hooks/useStreamAudioCall.ts`:
- Around line 148-167: If startCall fails after creating/connecting the
StreamVideoClient, ensure you clean up any partially initialized resources: if
streamCall was created call its termination method (e.g., hangUp/leave/stop) and
if streamClient was connected call its disconnect method (e.g.,
disconnectUser/disconnect/close) before setting error state; implement this
inside the catch block of the async startCall routine in useStreamAudioCall.ts
so you call the appropriate teardown on the streamCall and streamClient (created
via new streamSdk.StreamVideoClient and returned from connectUser) prior to
setErrorMessage and setStatus("error").
In `@lib/api.ts`:
- Around line 15-20: The current fallback returns a relative path when host
detection fails (causing fetch("/api/stream/audio-call") to break
native/standalone builds); change the fallback to an absolute URL by reading
EXPO_PUBLIC_API_BASE_URL and using it when host is falsy (so the function that
currently returns `path` should instead return
`${EXPO_PUBLIC_API_BASE_URL}${path}`); add EXPO_PUBLIC_API_BASE_URL to
.env.example and your actual .env with the production API endpoint and wire it
into your native build configuration (update app.config.js or your env-loading
step) so the variable is available at runtime; alternatively document and
implement another fixed absolute-base approach if you don't want an env var.
In `@vision-agent/.gitignore`:
- Line 4: Update the .gitignore entry that currently only ignores ".env" so it
also covers env variants like ".env.local", ".env.production",
".env.development", etc.; edit the .gitignore (look for the existing ".env"
line) to broaden the pattern (and optionally keep an allowlist for example files
like ".env.example") so all environment-file variants are ignored to prevent
accidental secret commits.
In `@vision-agent/agent.py`:
- Around line 40-43: The code reads raw kwargs into language_name/lesson_title
and assigns language/lesson directly, allowing malicious metadata to inject
instructions; fix by validating and sanitizing these inputs before use (create
and call a helper like sanitize_metadata(value) on
kwargs.get("language_name")/kwargs.get("languageName") and
kwargs.get("lesson_title")/kwargs.get("lessonTitle")), enforcing length limits,
stripping/control-character/newline sequences, removing or escaping
directive-like phrases (e.g., "you are", "ignore previous", system tokens), and
falling back to the safe defaults if validation fails; update references to
language and lesson to use the sanitized values.
- Around line 89-94: The code calls await agent.finish() immediately after
agent.simple_response which ends the run and prevents multi-turn practice;
remove or defer the await agent.finish() call inside the async with
agent.join(call): block, and instead keep the session open to handle
conversation events (e.g., process incoming messages/turns and emit responses)
using the agent's event handlers or a loop, calling agent.finish() only when the
user explicitly exits or the call-end event occurs; locate references to
agent.simple_response and agent.finish in the current async with
agent.join(call): scope and move or gate the finish() invocation behind an
explicit exit/call-end condition.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: f8740c20-39f0-4b42-a6fe-7e5a77bd494c
⛔ Files ignored due to path filters (2)
package-lock.jsonis excluded by!**/package-lock.jsonvision-agent/uv.lockis excluded by!**/*.lock
📒 Files selected for processing (15)
app.config.jsapp/(tabs)/learn.tsxapp/api/stream/audio-call+api.tsapp/lesson/[lessonId].tsxcomponents/audio-teacher-session.tsxhooks/useStreamAudioCall.tslib/api.tspackage.jsonskills-lock.jsonvision-agent/.env.examplevision-agent/.gitignorevision-agent/.python-versionvision-agent/README.mdvision-agent/agent.pyvision-agent/pyproject.toml
| await agent.simple_response( | ||
| "Greet the learner in English, introduce yourself as their AI teacher, " | ||
| "and begin the first short speaking practice.", | ||
| ) | ||
| await agent.finish() | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
In the vision-agents Python SDK, does calling await agent.finish()insideasync with agent.join(call): immediately end the active session? What is the recommended pattern for keeping a voice session open for multi-turn conversation and ending only when the call/session should close?
💡 Result:
Yes—based on the Vision Agents quickstart and examples, calling await agent.finish() inside the async with agent.join(call): block is used to end the agent’s active run/session; the code shows agent.finish() being called as the last step while still inside the join context, implying it terminates the session before leaving the async with scope [1]. Recommended pattern for multi-turn voice conversations: keep the agent running inside the async with agent.join(call): block without immediately calling agent.finish(). Let the agent handle turns/events while the call/session stays open, and only call await agent.finish() (and then exit the async with) when you intend the call to close (e.g., when the user hangs up or you detect a “should close” condition). The SDK provides call/session lifecycle events (including CallSessionEndedEvent / participant join/leave) that you can subscribe to and use as your cue to end the agent [2]. Concretely, structure your join handler like: - async with agent.join(call): start the agent’s call processing - respond to events/turns (or run your conversation loop) while the call is active - on your chosen close condition (participant leaves, call ends, etc.), call await agent.finish() and then return/let the async with exit This matches the quickstart pattern (join → greet → await agent.finish() to end) [1], and it’s consistent with using lifecycle events to decide when the call/session has ended [2].
Citations:
- 1: https://visionagents.ai/introduction/quickstart
- 2: https://visionagents.ai/reference/events-reference
🏁 Script executed:
fd -t f agent.py | head -20Repository: Retsomm/react-native-lingua
Length of output: 1969
Remove or defer await agent.finish() to allow multi-turn practice interaction.
The code on line 93 terminates the agent session immediately after the greeting. Since the message explicitly states "begin the first short speaking practice," the session should remain open for interactive turns. Per the Vision Agents SDK, await agent.finish() ends the active run; keep the session open inside the async with agent.join(call): block by handling conversation events, and call finish() only when the call/session should actually close (e.g., on user exit or call-end event).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@vision-agent/agent.py` around lines 89 - 94, The code calls await
agent.finish() immediately after agent.simple_response which ends the run and
prevents multi-turn practice; remove or defer the await agent.finish() call
inside the async with agent.join(call): block, and instead keep the session open
to handle conversation events (e.g., process incoming messages/turns and emit
responses) using the agent's event handlers or a loop, calling agent.finish()
only when the user explicitly exits or the call-end event occurs; locate
references to agent.simple_response and agent.finish in the current async with
agent.join(call): scope and move or gate the finish() invocation behind an
explicit exit/call-end condition.
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@app/api/vision-agent/session`+api.ts:
- Around line 59-73: The fetch calls to the Vision Agent (the POST to
`${getVisionAgentBaseUrl()}/calls/${body.callId}/sessions` and the other fetch
at lines 117-120) lack timeouts and can hang; wrap these fetches with an
AbortController, set a bounded timeout (e.g., configurable constant like
VISION_AGENT_TIMEOUT_MS) that calls controller.abort() via setTimeout, pass
controller.signal into fetch, and clear the timer on success; catch AbortError
and throw a controlled error (with context mentioning getVisionAgentBaseUrl()
and the callId/endpoint) so callers of the session API can fail fast instead of
waiting indefinitely.
- Around line 59-60: The URL is built with raw path IDs (body.callId and
sessionId) which can break routing; wrap both path segments with
encodeURIComponent when interpolating into the fetch URL (e.g., use
`${getVisionAgentBaseUrl()}/calls/${encodeURIComponent(body.callId)}/sessions`
and the analogous occurrence where sessionId is used) so all reserved characters
are percent-encoded; update both the fetch call that uses body.callId and the
later fetch that uses sessionId to use encodeURIComponent.
In `@hooks/useStreamAudioCall.ts`:
- Around line 183-185: The teardown before starting a new call currently only
awaits call?.leave() and client?.disconnectUser(), but leaves agentSessionRef
active; update the pre-start teardown in useStreamAudioCall (where call?.leave()
and client?.disconnectUser() are invoked) to also stop/terminate the existing
agent session referenced by agentSessionRef (e.g., call
agentSessionRef.current?.stop() or agentSessionRef.current?.endSession() /
equivalent cleanup method) and await it (with safe catch) before proceeding to
create a replacement call so no paid/active agent sessions are orphaned.
- Around line 243-259: When startCall fails and you run the teardown in the
catch block (calling stopAgentSession(nextAgentSession), streamCall.leave(), and
streamClient.disconnectUser()), also clear the retained call/client state so
later joinCall won't reuse stale instances: explicitly reset the in-scope state
holders referenced in this file (e.g., nextAgentSession, streamCall,
streamClient and isStreamClientConnected) via their setters after teardown and
before setting the error (e.g., setNextAgentSession(null), setStreamCall(null),
setStreamClient(null), setIsStreamClientConnected(false)), then setErrorMessage
and setStatus("error").
- Around line 373-387: The current teardown (call.leave(),
client?.disconnectUser(), setCall(null), setClient(null), setIsCameraOn(false),
setIsMuted(false), setStatus("ended")) is inside the try block so it won't run
if call.endCall() throws; move the local cleanup into a finally block so leave()
and disconnectUser() are always attempted and UI state is reset; keep the catch
to setErrorMessage(error.message) and setStatus("error"), then in finally run
call.leave(), client?.disconnectUser(), setCall(null), setClient(null),
setIsCameraOn(false), setIsMuted(false) and only setStatus("ended") in finally
if no error occurred (e.g., check a local errorOccurred flag or status !==
"error") to avoid overwriting the error status.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: f576ff89-365a-4278-bcd3-18891a543470
📒 Files selected for processing (9)
.env.exampleapp.config.jsapp/api/stream/audio-call+api.tsapp/api/vision-agent/session+api.tscomponents/audio-teacher-session.tsxhooks/useStreamAudioCall.tslib/api.tsvision-agent/.gitignorevision-agent/agent.py
🚧 Files skipped from review as they are similar to previous changes (4)
- vision-agent/.gitignore
- app.config.js
- app/api/stream/audio-call+api.ts
- components/audio-teacher-session.tsx
There was a problem hiding this comment.
♻️ Duplicate comments (1)
vision-agent/agent.py (1)
229-231:⚠️ Potential issue | 🟠 Major | ⚡ Quick winSession ends too early for multi-turn practice.
Line 231 calls
await agent.finish()immediately after the first greeting, which likely closes the run before learner interaction. Deferfinish()until an explicit end condition (call-ended/user-exit).For vision-agents Python SDK v0.2.0, what is the recommended pattern for keeping `async with agent.join(call):` active for multi-turn voice conversation, and when should `await agent.finish()` be called?🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@vision-agent/agent.py` around lines 229 - 231, The code calls await agent.finish() right after the initial greeting which closes the session prematurely; instead keep the async with agent.join(call): block active for multi-turn conversation by deferring agent.finish() until an explicit end condition (call end or user exit). Replace the single greeting followed by finish with a loop inside async with agent.join(call): (use agent.simple_response/opening_prompt to send the greeting, then repeatedly receive user input/events and respond) and only call await agent.finish() when you detect the terminal condition (call ended or a user-exit intent); reference agent.join(call), agent.simple_response(opening_prompt), and agent.finish() when making this change.
🧹 Nitpick comments (1)
vision-agent/agent.py (1)
223-227: ⚡ Quick winAvoid re-injecting full lesson instructions into
opening_prompt.
lesson_instructionsis already applied at Line 222. Including it again in the firstsimple_responseprompt increases prompt noise and can cause the model to echo internal instruction text instead of starting naturally.Suggested minimal diff
- opening_prompt = ( - f"{lesson_instructions}\n\n" - "Now greet the learner in English, introduce yourself as their AI teacher, " - "and begin the first short speaking practice from this lesson." - ) + opening_prompt = ( + "Now greet the learner in English, introduce yourself as their AI teacher, " + "and begin the first short speaking practice from this lesson." + )🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@vision-agent/agent.py` around lines 223 - 227, The opening_prompt currently concatenates lesson_instructions again (duplicating instructions already applied earlier); update the opening_prompt variable so it does not include lesson_instructions and instead only contains the short greeting/instruction text (e.g., the "Now greet the learner..." sentence). Locate opening_prompt in agent.py and remove the reference to lesson_instructions so the prompt only instructs the model to greet, introduce itself, and start the first short speaking practice, thereby avoiding re-injection of full lesson_instructions into the model input.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In `@vision-agent/agent.py`:
- Around line 229-231: The code calls await agent.finish() right after the
initial greeting which closes the session prematurely; instead keep the async
with agent.join(call): block active for multi-turn conversation by deferring
agent.finish() until an explicit end condition (call end or user exit). Replace
the single greeting followed by finish with a loop inside async with
agent.join(call): (use agent.simple_response/opening_prompt to send the
greeting, then repeatedly receive user input/events and respond) and only call
await agent.finish() when you detect the terminal condition (call ended or a
user-exit intent); reference agent.join(call),
agent.simple_response(opening_prompt), and agent.finish() when making this
change.
---
Nitpick comments:
In `@vision-agent/agent.py`:
- Around line 223-227: The opening_prompt currently concatenates
lesson_instructions again (duplicating instructions already applied earlier);
update the opening_prompt variable so it does not include lesson_instructions
and instead only contains the short greeting/instruction text (e.g., the "Now
greet the learner..." sentence). Locate opening_prompt in agent.py and remove
the reference to lesson_instructions so the prompt only instructs the model to
greet, introduce itself, and start the first short speaking practice, thereby
avoiding re-injection of full lesson_instructions into the model input.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: a17d8879-901e-4b12-88cf-60fa7dd30b9a
📒 Files selected for processing (4)
app/api/stream/audio-call+api.tscomponents/audio-teacher-session.tsxhooks/useStreamAudioCall.tsvision-agent/agent.py
Summary by CodeRabbit
New Features
Improvements
Documentation