FireRedTeam
diff --git a/‎.DS_Store‎
0 Bytes b/‎.DS_Store‎
0 Bytes
diff --git a/‎demos/firered_chat/index.html‎
Lines changed: 15 additions & 27 deletions b/‎demos/firered_chat/index.html‎
Lines changed: 15 additions & 27 deletions
diff --git a/‎demos/firered_chat/pics/arc.png‎
129 KB b/‎demos/firered_chat/pics/arc.png‎
129 KB
diff --git a/‎demos/firered_chat/pics/exp_barge_in.png‎
165 KB b/‎demos/firered_chat/pics/exp_barge_in.png‎
165 KB
diff --git a/‎demos/firered_chat/pics/exp_eot.png‎
88.1 KB b/‎demos/firered_chat/pics/exp_eot.png‎
88.1 KB
diff --git a/‎demos/firered_chat/pics/exp_latency.png‎
173 KB b/‎demos/firered_chat/pics/exp_latency.png‎
173 KB
diff --git a/‎demos/firered_chat/pics/flow.png‎
84.3 KB b/‎demos/firered_chat/pics/flow.png‎
84.3 KB
diff --git a/‎demos/firered_chat/pics/sys_config.png‎
123 KB b/‎demos/firered_chat/pics/sys_config.png‎
123 KB
diff --git a/‎demos/firered_chat/video/chat.mp4‎
Lines changed: 3 additions & 0 deletions b/‎demos/firered_chat/video/chat.mp4‎
Lines changed: 3 additions & 0 deletions
@@ -6,7 +6,7 @@
     <meta name="generator" content="Hugo 0.88.1" />
     <meta name="viewport" content="width=device-width, initial-scale=1">
     <link href="https://fonts.googleapis.com/css?family=Roboto:300,400,700" rel="stylesheet" type="text/css">
-    <link rel="stylesheet" href="" https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.4/styles/github.min.css">
+    <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.4/styles/github.min.css">
     <link rel="stylesheet" href="css/custom.css">
     <link rel="stylesheet" href="css/normalize.css">
 
@@ -58,21 +58,23 @@
                     <p style="text-align: left;">
                     </p>
                     <div class="text-center">
-                        <h2>FireRedChat: Toward Lifelike Full-Duplex Voice Interaction: A Pluggable System with Cascaded and Semi-Cascaded Implementations</h2>
-                        [<a href="http://arxiv.org/abs/2501.14350">Paper</a>]
-                        [<a href="https://github.com/FireRedTeam/FireRedASR">Code</a>]
+                        <h2>FireRedChat: A Pluggable, Full-Duplex Voice Interaction System with Cascaded and Semi-Cascaded Implementations</h2>
+                        [<a href="http://arxiv.org/abs/2501.14350" target="_blank">Paper</a>]
+                        [<a href="https://github.com/FireRedTeam/FireRedASR" target="_blank">Code</a>]
+                        [<a href="https://firered-chat.xiaohongshu.com" target="_blank">Try FireRedChat Online</a>]
 
                         <p class="fst-italic mb-0">
                             <br>
-                            <b><a href="https://fireredteam.github.io">FireRed Team</a></b>
+                            <b><a href="https://fireredteam.github.io" target="_blank">FireRed Team</a></b>
                         <p></p>
                         </p>
                     </div>
-                    <p><b>Abstract.</b>Full-duplex voice interaction enables users and agents to speak simultaneously, supporting barge-in for lifelike dialogue, and is critical for AI assistants and customer service. Existing approaches are either end-to-end models that handle turn-taking, complex to design and hard to control, or modular pipelines governed by turn-taking controllers that upgrade existing systems and allow per-module optimization. Prior frameworks embrace modularity but depend on non-open components and external model providers, impeding holistic optimization. In this work, we present a complete, practical full-duplex system comprising a turn-taking controller, an interaction module, and a dialogue manager. The controller integrates personalized VAD (pVAD) that suppresses false barge-ins from background noise and non-primary speakers, accurately timestamps primary-speaker segments, and explicitly enables barge-in triggered by the primary speaker; a semantic end-of-turn (EoT) detector improves stop decisions. With this controller, heterogeneous half-duplex pipelines, cascaded, semi-cascaded, or speech-to-speech, are seamlessly upgraded to full duplex. With our internal models, we implement cascaded and semi-cascaded variants: the former benefits from mature deployment; the latter perceives emotional and paralinguistic cues, yields more coherent responses, reduces latency and error propagation, and improves robustness. A dialogue manager extends capabilities via tool invocation and context management. We further propose three system-level metrics, barge-in, end-of-turn detection accuracy, and end-to-end latency, to assess naturalness, control accuracy, and efficiency of full-duplex interaction, with the aim of guiding subsequent improvements.
+                    <p><b>Abstract.</b> Full-duplex voice interaction allows users and agents to speak simultaneously with controllable barge-in, enabling lifelike assistants and customer service. Existing solutions are either end-to-end, difficult to design and hard to control, or modular pipelines governed by turn-taking controllers that ease upgrades and per-module optimization; however, prior modular frameworks depend on non-open components and external providers, limiting holistic optimization. In this work, we present a complete, practical full-duplex voice interaction system comprising a turn-taking controller, an interaction module, and a dialogue manager. The controller integrates streaming personalized VAD (pVAD) to suppress false barge-ins from noise and non-primary speakers, precisely timestamp primary-speaker segments, and explicitly enable primary-speaker barge-ins; a semantic end-of-turn detector improves stop decisions. It upgrades heterogeneous half-duplex pipelines, cascaded, semi-cascaded, and speech-to-speech, to full duplex. Using internal models, we implement cascaded and semi-cascaded variants; the semi-cascaded one captures emotional and paralinguistic cues, yields more coherent responses, lowers latency and error propagation, and improves robustness. A dialogue manager extends capabilities via tool invocation and context management. We also propose three system-level metrics, barge-in, end-of-turn detection accuracy, and end-to-end latency, to assess naturalness, control accuracy, and efficiency. Experiments show fewer false interruptions, more accurate semantic ends, and lower latency approaching industrial systems, enabling robust, natural, real-time full-duplex interaction.
                     </p>
                     <p>
 					<b>Contents</b>
                     <ul>
+                        <li><a href="#Demo">Demo</a></li>
                         <li><a href="#system-overview">System Overview</a></li>
                         <li><a href="#workflow-overview">Workflow Overview</a></li>
                         <li><a href="#config">Configurations between different systems.</a></li>
@@ -86,77 +88,63 @@ <h2>FireRedChat: Toward Lifelike Full-Duplex Voice Interaction: A Pluggable Syst
 
                 <div class="container pt-5 mt-5 shadow-lg p-5 mb-5 bg-white rounded">
                     <h2 id="Demo" style="text-align: center;">Demo</h2>
-                    <body>
                         <p style="text-align: center;">
-                            <video src="" height="1200" width="1200"></video>
+                            <video src="video/chat.mp4" controls style="max-width:100%; height:auto;"></video>
                         </p>
-                    </body>
                     <!-- <p style="text-align: center;">
                         <b>Figure 1.</b> FireRedChat System Modules.
                     </p> -->
                 </div>
                 <div class="container pt-5 mt-5 shadow-lg p-5 mb-5 bg-white rounded">
                     <h2 id="system-overview" style="text-align: center;">System Overview</h2>
-                    <body>
                         <p style="text-align: center;">
-                            <img src="pics/arc.png" height="1200" width="1200">
+                            <img src="pics/arc.png" style="max-width:90%; height:auto;">
                         </p>
-                    </body>
                     <p style="text-align: center;">
                         <b>Figure 1.</b> FireRedChat System Modules.
                     </p>
                 </div>
                 <div class="container pt-5 mt-5 shadow-lg p-5 mb-5 bg-white rounded">
                     <h2 id="workflow-overview" style="text-align: center;">Workflow</h2>
-                    <body>
                         <p style="text-align: center;">
-                            <img src="pics/flow.png" height="1200" width="1200">
+                            <img src="pics/flow.png" style="max-width:60%; height:auto;">
                         </p>
-                    </body>
                     <p style="text-align: center;">
                         <b>Figure 3.</b> FireRedChat Voice Interaction Flow.
                     </p>
                 </div>
                 <div class="container pt-5 mt-5 shadow-lg p-5 mb-5 bg-white rounded">
                     <h2 id="config" style="text-align: center;">Configurations Between Different Systems</h2>
-                    <body>
                         <p style="text-align: center;">
-                            <img src="pics/sys_config.png" height="1200" width="1200">
+                            <img src="pics/sys_config.png" style="max-width:40%; height:auto;">
                         </p>
-                    </body>
                     <p style="text-align: center;">
                         <b>Table 1.</b> Configurations between different systems.
                     </p>
                 </div>
                 <div class="container pt-5 mt-5 shadow-lg p-5 mb-5 bg-white rounded">
                     <h2 id="barge-in" style="text-align: center;">Barge-In Evaluation Results</h2>
-                    <body>
                         <p style="text-align: center;">
-                            <img src="pics/cer2.png" height="1200" width="1200">
+                            <img src="pics/exp_barge_in.png" style="max-width:45%; height:auto;">
                         </p>
-                    </body>
                     <p style="text-align: center;">
                         <b>Table 2.</b> Barge-In evaluation results.
                     </p>
                 </div>
                 <div class="container pt-5 mt-5 shadow-lg p-5 mb-5 bg-white rounded">
                     <h2 id="EoT" style="text-align: center;">End-of-turn Detection Evaluation Results</h2>
-                    <body>
                         <p style="text-align: center;">
-                            <img src="pics/exp_eot.png" height="1200" width="1200">
+                            <img src="pics/exp_eot.png" style="max-width:85%; height:auto;">
                         </p>
-                    </body>
                     <p style="text-align: center;">
                         <b>Table 3.</b> End-of-turn Detection evaluation results.
                     </p>
                 </div>
                 <div class="container pt-5 mt-5 shadow-lg p-5 mb-5 bg-white rounded">
                     <h2 id="Latency" style="text-align: center;">Latency of Different Systems</h2>
-                    <body>
                         <p style="text-align: center;">
-                            <img src="pics/exp_latency.png" height="1200" width="1200">
+                            <img src="pics/exp_latency.png" style="max-width:45%; height:auto;">
                         </p>
-                    </body>
                     <p style="text-align: center;">
                         <b>Table 4.</b> Latency of different systems.
                     </p>
 
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:42968d2b454e1fb803448e1e51ab5def2e2f9d20a2622a969d81e068802fd020
+size 329866207
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,3 @@`
	`1`	`+version https://git-lfs.github.com/spec/v1`
	`2`	`+oid sha256:42968d2b454e1fb803448e1e51ab5def2e2f9d20a2622a969d81e068802fd020`
	`3`	`+size 329866207`