Rename voice feature to dictation

joaopauloschuler · bpsa2 · joaopauloschuler · commit 67e7d79409de · 2026-02-24T22:26:55.000-03:00
- Rename /voice command to /dictation
- Rename BPSA_VOICE_TRANSCRIBER to BPSA_DICTATION_TRANSCRIBER
- Rename BPSA_VOICE_MODEL to BPSA_DICTATION_MODEL
- Rename BPSA_DEFAULT_VOICE_MODEL to BPSA_DEFAULT_DICTATION_MODEL
- Rename pip extra bpsa[voice] to bpsa[dictation]
- Update all user-facing messages and banners
- Update README.md and CLI.md documentation
- Internal _voice_* variable names kept unchanged

Model: claude-sonnet-4.6

Co-Authored-By: bpsa2 &lt;241537330+bpsa2@users.noreply.github.com&gt;
diff --git a/CLI.md b/CLI.md
@@ -79,14 +79,14 @@ All optional. Configure `CompressionConfig` without touching code:
 | `BPSA_COMPRESSION_PRESERVE_FINAL_ANSWER_STEPS` | `1` | Keep final_answer steps uncompressed (`0` or `1`) |
 | `BPSA_COMPRESSION_MIN_CHARS` | `4096` | Min characters of content before an LLM compression call is made |
 
-### Voice Input Variables
+### Dictation Input Variables
 
-Requires `pip install bpsa[voice]`.
+Requires `pip install bpsa[dictation]`.
 
 | Variable | Required | Default | Description |
 |----------|----------|---------|-------------|
-| `BPSA_VOICE_TRANSCRIBER` | Yes (for `/voice`) | - | Transcriber name: `whisper` or `elevenlabs` |
-| `BPSA_VOICE_MODEL` | No | `base.en` (`whisper`) or `scribe_v2` (`elevenlabs`) | Model name passed to the transcriber (whisper only) |
+| `BPSA_DICTATION_TRANSCRIBER` | Yes (for `/dictation`) | - | Transcriber name: `whisper` or `elevenlabs` |
+| `BPSA_DICTATION_MODEL` | No | `base.en` (`whisper`) or `scribe_v2` (`elevenlabs`) | Model name passed to the transcriber (whisper only) |
 | `ELEVENLABS_API_KEY` | Yes (for `elevenlabs`) | - | API key for ElevenLabs Scribe API |
 
 ### Supported Model Classes (`BPSA_SERVER_MODEL`)
@@ -184,7 +184,7 @@ Use `prompt_toolkit` for:
 | `/show-tools` | List all loaded tools |
 | `/undo-steps [N]` | Remove last N steps from memory (default: 1) |
 | `/verbose` | Toggle verbose output |
-| `/voice [on\|off]` | Toggle voice dictation (requires `BPSA_VOICE_TRANSCRIBER`) |
+| `/dictation [on\|off]` | Toggle dictation (requires `BPSA_DICTATION_TRANSCRIBER`) |
 
 ## Configuration Layering
 
diff --git a/README.md b/README.md
@@ -23,7 +23,7 @@ limitations under the License.
 * 🖥️ **GUI interaction:** Launch, screenshot, click, type, and send keys to native GUI applications on X11 via xdotool/ImageMagick (`--gui-x11` flag).
 * 👁️ **Image loading:** Agents can load and visually inspect image files (plots, screenshots, diagrams) via the built-in `load_image` tool — always available, no flags needed.
 * 🎨 **Image tools:** Visual image diffing (`diff_images`), OCR text extraction from images (`screen_ocr`), and a canvas for drawing shapes, text, and annotations (`canvas_create`, `canvas_draw`) — always available.
-* 🎤 **Voice input:** Dictate prompts via microphone using Whisper or ElevenLabs transcription (`/voice` command, requires `BPSA_VOICE_TRANSCRIBER` env var).
+* 🎤 **Dictation input:** Dictate prompts via microphone using Whisper or ElevenLabs transcription (`/dictation` command, requires `BPSA_DICTATION_TRANSCRIBER` env var).
 * ⚡ **Native Python execution:** Execute Python code natively via `exec` for unrestricted processing.
 * 🌍 **Multi-language support:** Code in multiple languages beyond Python (Pascal, PHP, C++, Java and more).
 * 🛠️ **Developer tools:** Lots of new tools that help agents to compile, test, and debug source code in various computing languages.
@@ -33,10 +33,10 @@ limitations under the License.
 
 
 ## Installation
-Install the project, including the voice support, CLIs, OpenAI protocol and LiteLLM dependencies.
+Install the project, including the dictation support, CLIs, OpenAI protocol and LiteLLM dependencies.
 
 ```bash
-$ pip install bpsa[voice,browser,openai,litellm]
+$ pip install bpsa[dictation,browser,openai,litellm]
 ```
 
 This will set up the necessary libraries and the Beyond Python Smolagents framework in your environment.
@@ -62,23 +62,23 @@ BPSA_MAX_TOKENS=64000
 
 Context compression parameters can also be configured via env vars (e.g., `BPSA_COMPRESSION_ENABLED`, `BPSA_COMPRESSION_KEEP_RECENT_STEPS`). See [CLI.md](CLI.md) for the full list.
 
-#### Voice Input
+#### Dictation Input
 
-Dictate prompts via microphone instead of typing. Requires the voice extra and a transcriber environment variable:
+Dictate prompts via microphone instead of typing. Requires the dictation extra and a transcriber environment variable:
 
 ```bash
-pip install bpsa[voice]
+pip install bpsa[dictation]
 
 # Option 1: Whisper (local, offline)
-export BPSA_VOICE_TRANSCRIBER=whisper
-export BPSA_VOICE_MODEL=base.en        # optional (default: base.en)
+export BPSA_DICTATION_TRANSCRIBER=whisper
+export BPSA_DICTATION_MODEL=base.en        # optional (default: base.en)
 
 # Option 2: ElevenLabs (cloud API)
-export BPSA_VOICE_TRANSCRIBER=elevenlabs
+export BPSA_DICTATION_TRANSCRIBER=elevenlabs
 export ELEVENLABS_API_KEY=your_api_key
 ```
 
-Then use `/voice on` in the REPL to start listening and `/voice off` to stop. While active, the prompt shows `[mic] >` and transcribed speech is inserted at the cursor.
+Then use `/dictation on` in the REPL to start listening and `/dictation off` to stop. While active, the prompt shows `[mic] >` and transcribed speech is inserted at the cursor.
 
 ### BPSA CLI Usage
 
diff --git a/pyproject.toml b/pyproject.toml
@@ -101,15 +101,15 @@ vision = [
   "helium",
   "selenium",
 ]
-voice = [
+dictation = [
   "voicelistener>=1.0.3",
 ]
 vllm = [
   "vllm>=0.10.2",
   "torch"
 ]
 all = [
-  "bpsa[audio,blaxel,docker,e2b,gradio,litellm,mcp,mlx-lm,modal,openai,telemetry,toolkit,transformers,vision,voice,bedrock]",
+  "bpsa[audio,blaxel,docker,e2b,gradio,litellm,mcp,mlx-lm,modal,openai,telemetry,toolkit,transformers,vision,dictation,bedrock]",
 ]
 quality = [
   "ruff>=0.9.0",
diff --git a/src/smolagents/bp_cli.py b/src/smolagents/bp_cli.py
@@ -29,9 +29,9 @@
     BPSA_COMPRESSION_PRESERVE_FINAL_ANSWER_STEPS - Keep final_answer steps (default: 1)
     BPSA_COMPRESSION_MIN_CHARS                - Min chars before compressing (default: 4096)
 
-    Voice input (requires `pip install bpsa[voice]`):
-    BPSA_VOICE_TRANSCRIBER    - Transcriber name: 'whisper' or 'elevenlabs' (required for /voice)
-    BPSA_VOICE_MODEL          - Model name passed to transcriber (optional, whisper only)
+    Dictation input (requires `pip install bpsa[dictation]`):
+    BPSA_DICTATION_TRANSCRIBER - Transcriber name: 'whisper' or 'elevenlabs' (required for /dictation)
+    BPSA_DICTATION_MODEL       - Model name passed to transcriber (optional, whisper only)
     ELEVENLABS_API_KEY        - API key for ElevenLabs transcriber (required when using elevenlabs)
 """
 
@@ -99,7 +99,7 @@
     "GoogleColabModel": [],
 }
 
-BPSA_DEFAULT_VOICE_MODEL = None
+BPSA_DEFAULT_DICTATION_MODEL = None
 
 class Spinner:
     """Improved spinner using Rich library for better UX and reliability."""
@@ -535,13 +535,13 @@ def print_turn_summary(turn_num: int, elapsed: float, input_tokens: int, output_
     console.print(line)
 
 
-def print_banner(model_id: str, server_model: str, tool_count: int, voice_transcriber: str = None):
-    voice_line = f"\nVoice: [magenta]{voice_transcriber}[/]" if voice_transcriber else ""
+def print_banner(model_id: str, server_model: str, tool_count: int, dictation_transcriber: str = None):
+    dictation_line = f"\nDictation: [magenta]{dictation_transcriber}[/]" if dictation_transcriber else ""
     console.print(
         Panel.fit(
             f"[bold]BPSA - Beyond Python SmolAgents[/] v{VERSION}\n"
             f"Model: [cyan]{model_id}[/] ({server_model})\n"
-            f"Tools: [green]{tool_count}[/] loaded{voice_line}",
+            f"Tools: [green]{tool_count}[/] loaded{dictation_line}",
             border_style="blue",
         )
     )
@@ -606,7 +606,7 @@ def _save_aliases(aliases: dict):
     "/session-load", "/session-save",
     "/show-compression-stats", "/show-memory-stats", "/show-stats",
     "/save-step", "/set-max-steps", "/show-step", "/show-steps", "/show-tools", "/undo-steps", "/verbose",
-    "/voice",
+    "/dictation",
 ]
 
 
@@ -649,7 +649,7 @@ def print_help():
     table.add_row("/show-tools", "List all loaded tools")
     table.add_row("/undo-steps \[N]", "Remove last N steps from memory (default: 1)")
     table.add_row("/verbose", "Toggle verbose output")
-    table.add_row(r"/voice \[on|off]", "Toggle voice dictation (requires BPSA_VOICE_TRANSCRIBER)")
+    table.add_row(r"/dictation \[on|off]", "Toggle dictation (requires BPSA_DICTATION_TRANSCRIBER)")
     console.print(table)
     console.print()
 
@@ -706,16 +706,16 @@ def _voice_start():
     """Start the voice listener. Returns an error message string on failure, or None on success."""
     global _voice_listener
     if _voice_listener is not None:
-        return "Voice input is already active."
+        return "Dictation is already active."
     try:
         from voicelistener import VoiceListener
     except ImportError:
-        return "Voice input requires the voicelistener package. Install with: pip install bpsa[voice]"
+        return "Dictation requires the voicelistener package. Install with: pip install bpsa[dictation]"
 
-    transcriber_name = get_env("BPSA_VOICE_TRANSCRIBER", default="")
+    transcriber_name = get_env("BPSA_DICTATION_TRANSCRIBER", default="")
     if not transcriber_name:
         return (
-            "Set BPSA_VOICE_TRANSCRIBER environment variable to enable voice input"
+            "Set BPSA_DICTATION_TRANSCRIBER environment variable to enable dictation"
             f" (available transcribers: {', '.join(sorted(_VOICE_TRANSCRIBERS))})"
         )
     transcriber_name = transcriber_name.lower().strip()
@@ -725,7 +725,7 @@ def _voice_start():
             f" Available transcribers: {', '.join(sorted(_VOICE_TRANSCRIBERS))}"
         )
 
-    model = get_env("BPSA_VOICE_MODEL", default=BPSA_DEFAULT_VOICE_MODEL)
+    model = get_env("BPSA_DICTATION_MODEL", default=BPSA_DEFAULT_DICTATION_MODEL)
     kwargs = {}
     if model is not None:
         kwargs["model_id"] = model
@@ -752,7 +752,7 @@ def _voice_stop():
     """Stop the voice listener."""
     global _voice_listener
     if _voice_listener is None:
-        return "Voice input is not active."
+        return "Dictation is not active."
     _voice_listener.stop()
     _voice_listener = None
     # Drain any remaining items
@@ -1670,8 +1670,8 @@ def run_repl(skip_instructions: bool = False, auto_approve: bool = True, browser
     _verbose = verbose
 
     console.clear()
-    voice_transcriber = get_env("BPSA_VOICE_TRANSCRIBER", default=None)
-    print_banner(model_id, server_model, tool_count, voice_transcriber=voice_transcriber)
+    dictation_transcriber = get_env("BPSA_DICTATION_TRANSCRIBER", default=None)
+    print_banner(model_id, server_model, tool_count, dictation_transcriber=dictation_transcriber)
 
     instructions = None
     if not skip_instructions:
@@ -1889,8 +1889,8 @@ def get_input():
                 last_answer = None
                 first_turn = True
                 console.clear()
-                _vt = get_env("BPSA_VOICE_TRANSCRIBER", default="").strip() if _voice_listener is not None else None
-                print_banner(model_id, server_model, count_tools(agent), voice_transcriber=_vt or None)
+                _dt = get_env("BPSA_DICTATION_TRANSCRIBER", default="").strip() if _voice_listener is not None else None
+                print_banner(model_id, server_model, count_tools(agent), dictation_transcriber=_dt or None)
                 continue
             elif cmd == "/show-tools":
                 print_tools(agent)
@@ -2042,33 +2042,33 @@ def get_input():
                     continue
                 console.print(f"[cyan]Auto-approve: {'on' if _auto_approve else 'off'}[/]")
                 continue
-            elif cmd == "/voice":
+            elif cmd == "/dictation":
                 arg = cmd_args.strip().lower()
                 if arg == "on":
-                    console.print("[cyan]Loading voice support.[/]")
+                    console.print("[cyan]Loading dictation support.[/]")
                     if not _has_prompt_toolkit:
-                        console.print("[red]Voice input requires voicelistener. Install with: pip install voicelistener[/]")
+                        console.print("[red]Dictation requires voicelistener. Install with: pip install voicelistener[/]")
                     else:
                         err = _voice_start()
                         if err:
                             console.print(f"[red]{err}[/]")
                         else:
-                            console.print("[cyan][mic] Voice input active[/]")
+                            console.print("[cyan][mic] Dictation active[/]")
                 elif arg == "off":
                     err = _voice_stop()
                     if err:
                         console.print(f"[yellow]{err}[/]")
                     else:
-                        console.print("[cyan]Voice input deactivated[/]")
+                        console.print("[cyan]Dictation deactivated[/]")
                 elif arg == "":
                     if _voice_listener is not None:
-                        transcriber = get_env("BPSA_VOICE_TRANSCRIBER", default="(unknown)")
-                        model = get_env("BPSA_VOICE_MODEL", default=BPSA_DEFAULT_VOICE_MODEL)
-                        console.print(f"[cyan]Voice: on | transcriber: {transcriber} | model: {model}[/]")
+                        transcriber = get_env("BPSA_DICTATION_TRANSCRIBER", default="(unknown)")
+                        model = get_env("BPSA_DICTATION_MODEL", default=BPSA_DEFAULT_DICTATION_MODEL)
+                        console.print(f"[cyan]Dictation: on | transcriber: {transcriber} | model: {model}[/]")
                     else:
-                        console.print("[dim]Voice: off[/]")
+                        console.print("[dim]Dictation: off[/]")
                 else:
-                    console.print("[yellow]Usage: /voice [on|off][/]")
+                    console.print("[yellow]Usage: /dictation [on|off][/]")
                 continue
             else:
                 console.print(f"[yellow]Unknown command: {cmd}. Type /help for available commands.[/]")

Original file line number	Diff line number	Diff line change
`@@ -101,15 +101,15 @@ vision = [`
`101`	`101`	`"helium",`
`102`	`102`	`"selenium",`
`103`	`103`	`]`
`104`		`-voice = [`
	`104`	`+dictation = [`
`105`	`105`	`"voicelistener>=1.0.3",`
`106`	`106`	`]`
`107`	`107`	`vllm = [`
`108`	`108`	`"vllm>=0.10.2",`
`109`	`109`	`"torch"`
`110`	`110`	`]`
`111`	`111`	`all = [`
`112`		`- "bpsa[audio,blaxel,docker,e2b,gradio,litellm,mcp,mlx-lm,modal,openai,telemetry,toolkit,transformers,vision,voice,bedrock]",`
	`112`	`+ "bpsa[audio,blaxel,docker,e2b,gradio,litellm,mcp,mlx-lm,modal,openai,telemetry,toolkit,transformers,vision,dictation,bedrock]",`
`113`	`113`	`]`
`114`	`114`	`quality = [`
`115`	`115`	`"ruff>=0.9.0",`