Skip to content

Commit e3a4d74

Browse files
tbitcsoz-agent
andcommitted
fix: cp1252 Unicode decode, English-only hardening, privacy notice
Fixes #62, #63 - tools.py: add encoding='utf-8', errors='replace' to subprocess.run (fixes UnicodeDecodeError on Windows cp1252 when doctor/audit output contains checkmarks U+2713/U+2717) - runner.py: move LANGUAGE DIRECTIVE to very first line of system prompt; inject [LANG:EN] prefix on every user message; add _has_non_english() detector; auto-correction turn when response is in non-English script - PRIVACY.md: no telemetry, data flows only to user-configured LLM provider, patent search queries go to USPTO Open Data Portal Co-Authored-By: Oz <oz-agent@warp.dev>
1 parent bfd297e commit e3a4d74

3 files changed

Lines changed: 115 additions & 9 deletions

File tree

PRIVACY.md

Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
# specsmith — Privacy Policy
2+
3+
**Last updated: April 2026**
4+
5+
## Summary
6+
7+
specsmith is a local CLI tool. It collects no telemetry, sends no analytics, and stores no data on BitConcepts servers. The only external network calls it makes are those you explicitly configure.
8+
9+
---
10+
11+
## What data leaves your machine
12+
13+
### LLM providers (only when you run `specsmith run`)
14+
15+
When you start an agent session, specsmith sends your project's governance files (AGENTS.md, LEDGER.md snippets) and your chat messages to the LLM provider you have configured:
16+
17+
| Provider | Data destination | Privacy policy |
18+
|---|---|---|
19+
| Anthropic (Claude) | api.anthropic.com | https://www.anthropic.com/privacy |
20+
| OpenAI (GPT) | api.openai.com | https://openai.com/policies/privacy-policy |
21+
| Google (Gemini) | generativelanguage.googleapis.com | https://policies.google.com/privacy |
22+
| Mistral | api.mistral.ai | https://mistral.ai/privacy |
23+
| Ollama | localhost (no network call) | n/a — runs locally |
24+
25+
You control which provider is used. BitConcepts has no visibility into what is sent to these providers — all requests go directly from your machine to their API.
26+
27+
### GitHub issues (`specsmith` doesn't file issues automatically)
28+
29+
The specsmith CLI itself never creates GitHub issues. The VS Code extension has an optional, consent-gated bug reporter — see the extension's `PRIVACY.md`.
30+
31+
### Patent search (`specsmith patent`)
32+
33+
The `specsmith patent` command sends search queries to the USPTO Open Data Portal (developer.uspto.gov). No personally identifiable information is included unless you put it in the query.
34+
35+
---
36+
37+
## What stays on your machine
38+
39+
All of the following are stored locally only, never uploaded:
40+
41+
- `scaffold.yml` — project configuration
42+
- `AGENTS.md`, `LEDGER.md`, governance files
43+
- `.specsmith/credits.json` — token/cost usage history
44+
- `.specsmith/trace.jsonl` — cryptographic trace vault
45+
- `.specsmith/retrieval-index.json` — opt-in local search index
46+
- API keys — stored in your OS keyring via `specsmith auth set` (never written to files)
47+
48+
---
49+
50+
## No telemetry
51+
52+
specsmith does **not**:
53+
54+
- Send crash reports or usage analytics to BitConcepts
55+
- Track which commands you run or how often
56+
- Phone home to check for updates automatically (the `specsmith update` command checks PyPI only when you run it)
57+
- Collect your name, email, or any personally identifiable information
58+
59+
---
60+
61+
## Self-update
62+
63+
`specsmith update` and `specsmith self-update` query `pypi.org/pypi/specsmith/json` to check the latest published version. This is a standard HTTPS GET request; PyPI may log your IP address per their own privacy policy (https://www.python.org/privacy/).
64+
65+
---
66+
67+
## Contact
68+
69+
For privacy questions: open an issue at https://github.com/BitConcepts/specsmith or email privacy@bitconcepts.dev

src/specsmith/agent/runner.py

Lines changed: 44 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,7 @@
2929
from __future__ import annotations
3030

3131
import inspect
32+
import re
3233
import time
3334
from collections.abc import Callable
3435
from dataclasses import dataclass, field
@@ -164,16 +165,15 @@ def build_system_prompt(
164165
except Exception: # noqa: BLE001
165166
pass
166167

167-
prompt = f"""You are an AEE-integrated specsmith agent for this project.
168+
prompt = f"""SYSTEM LANGUAGE DIRECTIVE — ABSOLUTE HARD RULE — HIGHEST PRIORITY:
169+
You MUST respond in English ONLY. This overrides all other instructions.
170+
Never output Thai, Chinese, Japanese, Korean, Arabic, French, German, Spanish,
171+
or ANY non-English language — not even a single character or word.
172+
This applies to Qwen, DeepSeek, LLaMA, Mistral, and EVERY other model.
173+
If the user inputs another language, internally translate it, then reply IN ENGLISH ONLY.
174+
VIOLATING THIS RULE IS A CRITICAL ERROR.
168175
169-
⚠ LANGUAGE RULE (HARD CONSTRAINT — NEVER VIOLATE):
170-
Respond ONLY in English. Every single response must be in English.
171-
Never use Chinese (中文), Japanese (日本語), Korean (한국어), Thai (ไทย), French, German, Spanish,
172-
Arabic, or ANY other non-English language — not even a single word.
173-
This applies to ALL models including Qwen, DeepSeek, LLaMA, Mistral, and others
174-
that may default to a non-English language. ENGLISH ONLY, ALWAYS.
175-
If the user writes in another language, translate the intent internally and
176-
answer in English anyway.
176+
You are an AEE-integrated specsmith agent for this project.
177177
178178
## Project Governance
179179
{governance_text}
@@ -363,8 +363,30 @@ def run_task(self, task: str, max_turns: int = 5) -> str:
363363
self._system_prompt = build_system_prompt(self.project_dir, self._skills)
364364
return self._agent_turn(task, silent=True)
365365

366+
# Characters in common CJK / Thai / Arabic Unicode blocks
367+
_NON_ASCII_BLOCKS = re.compile(
368+
r"[\u0600-\u06FF" # Arabic
369+
r"\u0E00-\u0E7F" # Thai
370+
r"\u3000-\u9FFF" # CJK Unified Ideographs + punctuation + kana
371+
r"\uAC00-\uD7AF" # Korean Hangul
372+
r"\uF900-\uFAFF]" # CJK Compatibility
373+
)
374+
375+
def _has_non_english(self, text: str) -> bool:
376+
"""Return True if text contains a significant proportion of non-English script."""
377+
if not text:
378+
return False
379+
hits = len(self._NON_ASCII_BLOCKS.findall(text))
380+
return hits > 5 and (hits / max(len(text), 1)) > 0.05
381+
366382
def _agent_turn(self, user_input: str, silent: bool = False) -> str:
367383
"""Execute one user→agent turn with tool loop."""
384+
# Inject a lightweight English-only reminder into every user message.
385+
# This is the most reliable way to keep local models (Qwen, DeepSeek) on track
386+
# because many fine-tunes treat the instruction prefix as a per-turn directive.
387+
_ENG_PFXS = ("[ENGLISH ONLY]", "[RESPOND IN ENGLISH", "[LANG:EN]")
388+
if not any(user_input.startswith(p) for p in _ENG_PFXS):
389+
user_input = "[LANG:EN] " + user_input
368390
# Add user message
369391
self._state.messages.append(Message(role=Role.USER, content=user_input))
370392

@@ -403,6 +425,19 @@ def _agent_turn(self, user_input: str, silent: bool = False) -> str:
403425
final_response = response.content
404426

405427
if not response.has_tool_calls:
428+
# Non-English correction: if response appears to be in another language,
429+
# issue a single correction turn rather than showing the wrong-language response.
430+
if response.content and self._has_non_english(response.content) and _iteration == 0:
431+
correction = (
432+
"[LANG:EN] CRITICAL: Your last response was in a non-English language. "
433+
"You MUST respond in English ONLY. Please re-answer in English."
434+
)
435+
self._state.messages.append(
436+
Message(role=Role.ASSISTANT, content=response.content)
437+
)
438+
self._state.messages.append(Message(role=Role.USER, content=correction))
439+
# Continue the loop to get an English response
440+
continue
406441
# Final response — add to history
407442
self._state.messages.append(Message(role=Role.ASSISTANT, content=response.content))
408443
break

src/specsmith/agent/tools.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,8 @@ def _run_specsmith(args: list[str], project_dir: str = ".") -> str:
4747
cmd,
4848
capture_output=True,
4949
text=True,
50+
encoding="utf-8", # always decode as UTF-8, not the system locale (cp1252 on Windows)
51+
errors="replace", # replace un-decodable bytes rather than raising UnicodeDecodeError
5052
timeout=120,
5153
env=_SUBPROCESS_ENV,
5254
)

0 commit comments

Comments
 (0)