You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
All eight tests passed. The framework achieved a 100% end-to-end success rate across both robot platforms (60/60 benchmark trials). The invalid-input rejection test produced five correct rejections. The backend swap was completed in approximately 2 minutes with no changes to code or any upstream module.
| RQ3 — Modularity and switch cost | 3.1 |**PASS**|
20
+
| RQ3 — Modularity and switch cost | 3.1|**PASS**|
20
21
21
22
---
22
23
@@ -28,20 +29,20 @@ All eight tests passed. The framework achieved a 100% end-to-end success rate ac
28
29
29
30
All six benchmark tasks completed on all five trials. No failed or rejected trials. Every command reached the backend and returned a success response. The Franka MoveIt stack initialised once at 11:25:26 and remained active throughout all 30 trials without requiring a restart.
30
31
31
-
| Task | Command | Trials | Outcome |
32
-
|---|---|---|---|
33
-
| 1 | Go to P1 | 5/5 | Success |
34
-
| 2 | Go linear to P2 | 5/5 | Success |
35
-
| 3 | Open gripper | 5/5 | Success |
36
-
| 4 | Teach current position as P3 | 5/5 | Success |
37
-
| 5 | Pick at P1 and place at P2 | 5/5 | Success |
38
-
| 6 | Pick at P1 and place at offset x=80, y=50 | 5/5 | Success |
| 4 | Teach current position as P3 | 5/5| Success |
38
+
| 5 | Pick at P1 and place at P2 | 5/5| Success |
39
+
| 6 | Pick at P1 and place at offset x=80, y=50 | 5/5| Success |
39
40
40
41
### Test 1.2 — UR10e (criterion: ≥ 24/30)
41
42
42
43
**Result: 30/30 · PASS**
43
44
44
-
The identical task set produced identical outcomes after backend swap. The UR backend registered only the UR and mock controllers, confirming correct vendor isolation. First-connection activation took 22 seconds (hardware cold-start); all subsequent responses completed within three seconds.
45
+
The identical task set produced identical outcomes after backend swap. The UR backend registered only the UR and mock controllers, confirming correct vendor isolation. First-connection activation took 22 seconds (hardware cold-start, measured from robot activation signal to ready); all subsequent responses completed within three seconds.
45
46
46
47
---
47
48
@@ -65,32 +66,32 @@ All five trials executed successfully. ASR confidence ranged from 0.92 to 0.95 a
65
66
66
67
Five paraphrases of the linear-motion command and five paraphrases of the teach-pose command were issued by voice. Nine of ten produced the correct IR.
67
68
68
-
| Task | Paraphrase | ASR output | IR correct |
69
-
|---|---|---|---|
70
-
| A | Move to P2 in a straight line | "Move to P2 in a straight line." | Yes |
71
-
| A | Go to P2 using linear motion | "Go to P2 using linear motion." | Yes |
72
-
| A | Linear move to P2 | "Linear Move 2P2" |**No** — LLM appended erroneous gripper close |
73
-
| A | Drive linearly to P2 | "Drive linearly to P2." | Yes |
74
-
| A | Reach P2 via a straight path | "Reach P2 via a straight path." | Yes |
75
-
| B | Save this position as P3 | "Save this position as P3." | Yes |
76
-
| B | Store current pose as P3 | "Store current pose as P3." | Yes |
77
-
| B | Remember this position as P3 | "Remember this position as P3." | Yes |
78
-
| B | Set P3 to current position | "Set P3 to current position." | Yes |
79
-
| B | Name this position P3 | "Name this position P3." | Yes |
80
-
81
-
**Failure analysis — trial A3:** The ASR garbled "Linear move to P2" into `"Linear Move 2P2"`. The LLM correctly extracted a `moveL to p2` command but also generated an erroneous`gripper close` step, producing a two-command sequence rather than a one-command sequence. The robot moved to P2 but executed an unrequested gripper close. No motion safety issue occurred. This represents a parser-level failure triggered by an abnormal ASR output.
| A | Move to P2 in a straight line | "Move to P2 in a straight line." | Yes|
72
+
| A | Go to P2 using linear motion | "Go to P2 using linear motion." | Yes|
73
+
| A | Linear move to P2 | "Linear Move 2P2" |**No** — LLM appended incorrect gripper close |
74
+
| A | Drive linearly to P2 | "Drive linearly to P2." | Yes|
75
+
| A | Reach P2 via a straight path | "Reach P2 via a straight path." | Yes|
76
+
| B | Save this position as P3 | "Save this position as P3." | Yes|
77
+
| B | Store current pose as P3 | "Store current pose as P3." | Yes|
78
+
| B | Remember this position as P3 | "Remember this position as P3." | Yes|
79
+
| B | Set P3 to current position | "Set P3 to current position." | Yes|
80
+
| B | Name this position P3 | "Name this position P3." | Yes|
81
+
82
+
**Failure analysis — trial A3:** The ASR garbled "Linear move to P2" into `"Linear Move 2P2"`. The LLM correctly extracted a `moveL to p2` command but also generated an incorrect`gripper close` step, producing a two-command sequence rather than a one-command sequence. The robot moved to P2 but executed an unrequested gripper close. No motion safety issue occurred. This represents a parser-level failure triggered by an abnormal ASR output.
82
83
83
84
### Test 2.3 — Invalid Input Rejection (criterion: 5/5 rejected before motion)
84
85
85
86
**Result: 5/5 · PASS**
86
87
87
-
| Input | Rejection site | Reason logged |
88
-
|---|---|---|
89
-
| "Move to P99" | Backend validator |`Unknown pose: 'p99'`|
90
-
| "Pick at somewhere" | Parser |`Vague or non-resolvable words such as 'somewhere' are NOT valid targets.`|
91
-
| "Hello robot" | Parser |`No valid command detected. Please provide a specific robot command.`|
92
-
| "Go to" | Pre-parser (dropped) | No LLM call dispatched; no backend request sent |
93
-
| "Move P1 and P2 simultaneously" | Parser |`Impossible command: cannot move to two poses at the same time.`|
| "Move to P99" | Backend validator |`Unknown pose: 'p99'`|
91
+
| "Pick at somewhere" | Parser|`Vague or non-resolvable words such as 'somewhere' are NOT valid targets.`|
92
+
| "Hello robot" | Parser |`No valid command detected. Please provide a specific robot command.`|
93
+
| "Go to" | Pre-parser (dropped) | No LLM call dispatched; no backend request sent|
94
+
| "Move P1 and P2 simultaneously" | Parser |`Impossible command: cannot move to two poses at the same time.`|
94
95
95
96
No robot motion occurred for any of the five inputs. One observation: the incomplete command `"Go to"` was discarded silently before the parsing stage. No rejection log entry was written for this input. The pass criterion is met, but a logged rejection message would provide clearer evidence of intentional handling.
96
97
@@ -120,18 +121,39 @@ All three trials executed successfully. Backend logs confirm the wait was handle
120
121
121
122
**Result: documented · PASS***(descriptive — no threshold)*
122
123
123
-
| Metric | Value |
124
-
|---|---|
125
-
| Physical swap time | < 2 minutes |
126
-
| Total inter-session gap (log-derived) | 2 min 11 s (11:42:54 → 11:45:05) |
127
-
| Modified files | 0 (backend swap only — no code edited) |
The upstream modules — `pipeline.py`, `ASR_module.py`, `parsing_module.py` — are identical across both vendor sessions, confirmed by the continuous frontend log. The IR format is identical across both backends. The Franka backend registered three controllers (mock, franka, ur) at startup; the UR backend registered two (mock, ur), confirming correct vendor isolation without code changes.
132
133
133
134
---
134
-
135
+
## End-to-End Timing
136
+
137
+
Timing was measured from the moment the audio recording stopped to the moment the backend confirmed execution complete. Two hardware-startup events are excluded from these figures: the Franka MoveIt stack initialisation on the first trial (18 s, one-time) and the UR cold-start activation on the first UR command (37 s, one-time). All 123 remaining trials are included. Timestamps are taken from the frontend log at one-second resolution.
138
+
139
+
The pipeline portion — ASR transcription plus LLM parsing — took an average of 1.0 s and 3.7 s respectively, totalling approximately 4.6 s regardless of command type. Robot execution time is the main source of variation and depends on the number of motion steps and the platform.
140
+
141
+
| Command type | N | ASR avg | LLM avg | Exec avg | Total avg | Total range |
| Move (joint) | 37 | 1.0 s | 3.2 s | 1.5 s | 5.7 s | 4–6 s |
144
+
| Move (linear) | 17 | 0.8 s | 3.4 s | 1.9 s | 6.1 s | 5–8 s |
145
+
| Gripper | 21 | 0.9 s | 3.1 s | 2.1 s | 6.0 s | 4–8 s |
146
+
| Teach pose | 16 | 0.9 s | 3.3 s | 0.1 s | 4.3 s | 4–5 s |
147
+
| Two-step sequence | 5 | 0.8 s | 4.0 s | 3.2 s | 8.0 s | 8–8 s |
148
+
| Pick & place | 11 | 1.0 s | 5.1 s | 7.8 s | 13.9 s | 12–17 s |
149
+
| Pick + offset | 11 | 1.2 s | 5.2 s | 7.0 s | 13.4 s | 11–16 s |
150
+
| Multi-step (5-cmd) | 5 | 1.4 s | 5.0 s | 7.4 s | 13.8 s | 13–14 s |
151
+
152
+
Pick-and-place tasks split by platform: Franka averaged 12.1 s (range 11–13 s); UR10e averaged 15.5 s (range 15–17 s). The difference reflects robot motion speed rather than any pipeline difference — the pipeline contribution is identical across platforms.
153
+
154
+
The LLM parsing time scales with command complexity: 3.1–3.4 s for single-step commands, rising to 5.0–5.2 s for four- and five-step sequences. ASR time remains stable across all command types at approximately 1 s.
0 commit comments