Skip to content

Commit cd7705b

Browse files
author
shijiashuai
committed
feat: update dialogue system, audio service and digital human components
1 parent 3754c20 commit cd7705b

File tree

10 files changed

+724
-228
lines changed

10 files changed

+724
-228
lines changed

CLAUDE.md

Lines changed: 124 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,124 @@
1+
# CLAUDE.md
2+
3+
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
4+
5+
## Development commands
6+
7+
- Install dependencies: `npm install`
8+
- Start dev server: `npm run dev`
9+
- Production build: `npm run build`
10+
- Alternate builds:
11+
- `npm run build:mobile`
12+
- `npm run build:desktop`
13+
- `npm run build:ar`
14+
- Preview production build locally: `npm run preview`
15+
- Serve preview on `0.0.0.0:3000`: `npm run serve`
16+
- Lint: `npm run lint`
17+
- Run tests in watch mode: `npm test`
18+
- Run tests once: `npm run test:run`
19+
- Run coverage: `npm run test:coverage`
20+
- Run a single test file: `npx vitest run src/__tests__/digitalHuman.test.tsx`
21+
- Run tests matching a name: `npx vitest run -t "test name"`
22+
23+
## Stack and build setup
24+
25+
- React 18 + TypeScript app built with Vite.
26+
- Path alias `@/*` points to `src/*` in both Vite and Vitest configs.
27+
- Tailwind CSS is used for UI styling; dark-mode class support is enabled in `tailwind.config.js`.
28+
- Vitest uses the `jsdom` environment with setup from `src/__tests__/setup.ts`.
29+
- Vite build modes `mobile`, `desktop`, and `ar` only change compile-time flags (`__MOBILE__`, `__DESKTOP__`, `__AR__`) and output directories (`dist-mobile`, `dist-desktop`, `dist-ar`).
30+
31+
## High-level architecture
32+
33+
### App shell and routing
34+
35+
- Entry point is `src/main.tsx`, which renders `src/App.tsx`.
36+
- `src/App.tsx` sets up React Router with lazy-loaded pages:
37+
- `/` and `/advanced` -> `AdvancedDigitalHumanPage`
38+
- `/digital-human` -> `DigitalHumanPage`
39+
- The app is wrapped in a global `ErrorBoundary` and Suspense fallback UI.
40+
41+
### Two page modes
42+
43+
- `src/pages/AdvancedDigitalHumanPage.tsx` is the main experience and the default route. It combines:
44+
- full-screen 3D viewer background
45+
- settings drawer with tabs for basic controls, expressions, behavior, vision, and voice
46+
- chat/session UI
47+
- server health checks and reconnect flow
48+
- keyboard shortcuts and toast-driven status feedback
49+
- `src/pages/DigitalHumanPage.tsx` is a simpler demo page with the viewer plus a basic control panel.
50+
51+
### Central state model
52+
53+
- `src/store/digitalHumanStore.ts` is the central Zustand store for nearly all runtime state.
54+
- It holds playback, recording, mute, speaking, expression/emotion, behavior, connection status, loading/error state, and chat/session history.
55+
- Session IDs are persisted in `localStorage` under `metahuman_session_id`, with SSR-safe storage access.
56+
- Core services and pages commonly read/write state directly via `useDigitalHumanStore.getState()` rather than passing state deeply through props.
57+
58+
### Core runtime layers
59+
60+
The app is organized around `src/core/*` services:
61+
62+
- `src/core/avatar/DigitalHumanEngine.ts`
63+
- imperative façade over the Zustand store
64+
- translates high-level actions like `play`, `reset`, `setEmotion`, `setBehavior`, `playAnimation` into store updates
65+
- contains emotion -> expression mapping and timed auto-reset for animations
66+
- `src/core/audio/audioService.ts`
67+
- browser-only audio integration using Web Speech APIs
68+
- `TTSService` drives speech synthesis and updates speaking/behavior store state
69+
- `ASRService` wraps speech recognition, handles command mode vs dictation mode, and can forward transcripts into dialogue handling
70+
- `src/core/dialogue/dialogueService.ts`
71+
- HTTP client for backend chat requests
72+
- sends requests to `${VITE_API_BASE_URL || 'http://localhost:8000'}/v1/chat`
73+
- checks `${baseUrl}/health` for connectivity
74+
- includes timeout handling, retry logic for retryable failures, friendly error messages, and a local fallback reply when backend calls fail
75+
- `src/core/dialogue/dialogueOrchestrator.ts`
76+
- orchestrates a full dialogue turn
77+
- appends user/assistant messages to store history
78+
- toggles loading/thinking state
79+
- applies backend response emotion/action to the avatar engine
80+
- optionally invokes TTS for spoken replies
81+
- `src/core/vision/visionService.ts`
82+
- camera + MediaPipe integration for face/pose analysis
83+
- dynamically imports `@mediapipe/face_mesh` and `@mediapipe/pose`
84+
- maps face landmarks to emotion and derives motions like nod/shake/raiseHand/waveHand
85+
- model files are loaded from jsDelivr CDN at runtime, so vision features depend on camera permission and network access
86+
- `src/core/vision/visionMapper.ts`
87+
- converts raw face landmarks into the app’s higher-level emotion model
88+
89+
### UI/component structure
90+
91+
- `src/components/DigitalHumanViewer.tsx` is the 3D rendering boundary.
92+
- Uses React Three Fiber + Drei.
93+
- If `modelUrl` loads successfully, it renders the GLTF scene.
94+
- If loading fails or no model is supplied, it falls back to an internal procedural “CyberAvatar”.
95+
- Viewer behavior is driven from store state (`currentExpression`, `isSpeaking`, `currentAnimation`, `expressionIntensity`).
96+
- Control panels (`ControlPanel`, `ExpressionControlPanel`, `BehaviorControlPanel`, `VoiceInteractionPanel`, `VisionMirrorPanel`) are mostly thin UI layers that call into the engine/services.
97+
- Shared UI primitives live under `src/components/ui`.
98+
99+
## Backend/API assumptions
100+
101+
- The frontend expects a separate backend service at `VITE_API_BASE_URL` or `http://localhost:8000`.
102+
- Chat response shape expected by the frontend:
103+
- `replyText: string`
104+
- `emotion: string`
105+
- `action: string`
106+
- Health endpoint expected: `GET /health`
107+
- Chat endpoint expected: `POST /v1/chat`
108+
109+
## Testing notes
110+
111+
- Current test coverage is centered in `src/__tests__/digitalHuman.test.tsx`.
112+
- Tests heavily mock Three.js, React Three Fiber, and browser speech APIs; follow that pattern when adding UI/runtime tests for viewer or audio behavior.
113+
- Because the app relies on browser APIs (speech synthesis, speech recognition, camera/media devices), new tests usually need mocks rather than real integrations.
114+
115+
## Practical implementation notes
116+
117+
- Prefer modifying the advanced page flow unless the task is explicitly about the simpler `/digital-human` demo.
118+
- For behavior changes affecting avatar reactions, inspect the interaction between:
119+
- `src/store/digitalHumanStore.ts`
120+
- `src/core/avatar/DigitalHumanEngine.ts`
121+
- `src/core/dialogue/dialogueOrchestrator.ts`
122+
- `src/components/DigitalHumanViewer.tsx`
123+
- For backend chat issues, check both `dialogueService.ts` retry/fallback behavior and `AdvancedDigitalHumanPage.tsx` health-check/reconnect UI.
124+
- For speech features, verify whether logic belongs in browser service wrappers (`audioService.ts`) or page-level orchestration.

server/app/services/dialogue.py

Lines changed: 60 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,33 @@ def __init__(self) -> None:
3030
except ValueError:
3131
self.max_session_messages = 10
3232

33+
def _normalize_user_text(self, user_text: str) -> str:
34+
return (user_text or "").strip()
35+
36+
def _append_session_history(self, session_id: str, role: str, content: str) -> None:
37+
if not session_id or not content:
38+
return
39+
session_histories[session_id].append({
40+
"role": role,
41+
"content": content,
42+
"timestamp": datetime.now().isoformat(),
43+
})
44+
if len(session_histories[session_id]) > MAX_HISTORY_LENGTH * 2:
45+
session_histories[session_id] = session_histories[session_id][-MAX_HISTORY_LENGTH * 2:]
46+
47+
def _append_turn(self, session_id: Optional[str], user_text: str, reply_text: str) -> None:
48+
if not session_id:
49+
return
50+
self._append_session_history(session_id, "user", user_text)
51+
self._append_session_history(session_id, "assistant", reply_text)
52+
self._append_session_messages(
53+
session_id,
54+
[
55+
{"role": "user", "content": user_text},
56+
{"role": "assistant", "content": reply_text},
57+
],
58+
)
59+
3360
def _get_smart_mock_reply(self, user_text: str) -> Dict[str, Any]:
3461
"""智能本地 Mock 回复,根据用户输入生成合理的响应"""
3562
text_lower = user_text.lower()
@@ -69,27 +96,19 @@ async def generate_reply(
6996
7097
支持会话历史管理和 LLM 调用,当 API Key 未配置时使用智能 Mock 回复。
7198
"""
72-
# 记录用户消息到会话历史
73-
if session_id:
74-
session_histories[session_id].append({
75-
"role": "user",
76-
"content": user_text,
77-
"timestamp": datetime.now().isoformat(),
78-
})
79-
# 限制历史长度
80-
if len(session_histories[session_id]) > MAX_HISTORY_LENGTH * 2:
81-
session_histories[session_id] = session_histories[session_id][-MAX_HISTORY_LENGTH * 2:]
99+
session_id = (session_id or "").strip() or None
100+
user_text = self._normalize_user_text(user_text)
101+
if not user_text:
102+
return {
103+
"replyText": "我在听,请告诉我您想聊什么。",
104+
"emotion": "neutral",
105+
"action": "idle",
106+
}
82107

83108
if not self.api_key:
84109
logger.info("OPENAI_API_KEY 未配置,使用智能 Mock 回复")
85110
result = self._get_smart_mock_reply(user_text)
86-
# 记录助手回复到历史
87-
if session_id:
88-
session_histories[session_id].append({
89-
"role": "assistant",
90-
"content": result["replyText"],
91-
"timestamp": datetime.now().isoformat(),
92-
})
111+
self._append_turn(session_id, user_text, result["replyText"])
93112
return result
94113

95114
system_prompt = (
@@ -141,11 +160,13 @@ async def generate_reply(
141160
parsed = json.loads(content)
142161
except json.JSONDecodeError:
143162
logger.warning("LLM 返回内容不是合法 JSON,将内容作为 replyText 使用: %s", content)
144-
return {
163+
result = {
145164
"replyText": content,
146165
"emotion": "neutral",
147166
"action": "idle",
148167
}
168+
self._append_turn(session_id, user_text, result["replyText"])
169+
return result
149170

150171
reply_text = str(parsed.get("replyText", "")).strip() or f"你刚才说:{user_text}"
151172
emotion = str(parsed.get("emotion", "neutral")).strip() or "neutral"
@@ -156,20 +177,7 @@ async def generate_reply(
156177
if action not in {"idle", "wave", "greet", "think", "nod", "shakeHead", "dance", "speak"}:
157178
action = "idle"
158179

159-
# 记录消息到会话历史
160-
if session_id:
161-
session_histories[session_id].append({
162-
"role": "assistant",
163-
"content": reply_text,
164-
"timestamp": datetime.now().isoformat(),
165-
})
166-
self._append_session_messages(
167-
session_id,
168-
[
169-
{"role": "user", "content": user_text},
170-
{"role": "assistant", "content": reply_text},
171-
],
172-
)
180+
self._append_turn(session_id, user_text, reply_text)
173181

174182
return {
175183
"replyText": reply_text,
@@ -181,7 +189,9 @@ async def generate_reply(
181189
"LLM 请求超时 url=%s,将使用智能 Mock 回复",
182190
self._get_openai_chat_completions_url(),
183191
)
184-
return self._get_smart_mock_reply(user_text)
192+
result = self._get_smart_mock_reply(user_text)
193+
self._append_turn(session_id, user_text, result["replyText"])
194+
return result
185195
except httpx.HTTPStatusError as exc:
186196
body_preview = (exc.response.text or "")[:500]
187197
logger.error(
@@ -190,29 +200,39 @@ async def generate_reply(
190200
str(exc.request.url),
191201
body_preview,
192202
)
193-
return self._get_smart_mock_reply(user_text)
203+
result = self._get_smart_mock_reply(user_text)
204+
self._append_turn(session_id, user_text, result["replyText"])
205+
return result
194206
except httpx.RequestError as exc:
195207
req_url = str(exc.request.url) if exc.request else self._get_openai_chat_completions_url()
196208
logger.error(
197209
"LLM 请求异常 url=%s error=%s,将使用智能 Mock 回复",
198210
req_url,
199211
exc,
200212
)
201-
return self._get_smart_mock_reply(user_text)
213+
result = self._get_smart_mock_reply(user_text)
214+
self._append_turn(session_id, user_text, result["replyText"])
215+
return result
202216
except Exception as exc:
203217
logger.exception("调用 LLM 失败,将使用智能 Mock 回复: %s", exc)
204-
return self._get_smart_mock_reply(user_text)
218+
result = self._get_smart_mock_reply(user_text)
219+
self._append_turn(session_id, user_text, result["replyText"])
220+
return result
205221

206222
def clear_session(self, session_id: str) -> bool:
207223
"""清除指定会话的历史记录"""
224+
removed = False
208225
if session_id in session_histories:
209226
del session_histories[session_id]
210-
return True
211-
return False
227+
removed = True
228+
if session_id in self._session_messages:
229+
del self._session_messages[session_id]
230+
removed = True
231+
return removed
212232

213233
def get_session_history(self, session_id: str) -> List[Dict[str, str]]:
214234
"""获取指定会话的历史记录"""
215-
return session_histories.get(session_id, [])
235+
return list(session_histories.get(session_id, []))
216236

217237
def _get_openai_chat_completions_url(self) -> str:
218238
base_url = (self.base_url or "").strip()
@@ -268,7 +288,7 @@ async def _call_llm(self, messages: list[dict[str, str]]) -> Dict[str, Any]:
268288
return resp.json()
269289

270290
def _get_session_messages(self, session_id: str) -> list[dict[str, str]]:
271-
return self._session_messages.get(session_id, [])
291+
return list(self._session_messages.get(session_id, []))
272292

273293
def _append_session_messages(
274294
self,

0 commit comments

Comments
 (0)