Skip to content

Commit a0cfe0a

Browse files
feat: add human-like smooth typing with optional typo injection (#201)
## Summary - Adds `smooth` and `typo_chance` fields to `POST /computer/type` for human-like typing via xdotool - When `smooth=true`, text is typed in word-sized chunks with variable intra-word delays ([50, 120]ms) and inter-word pauses ([80, 200]ms, 1.5x at sentence boundaries) - When `typo_chance` is set (0.0-0.10), realistic typos are injected using geometric gap sampling (O(typos) random calls, not O(chars)) and corrected with backspace after a "realization" pause ### New package: `server/lib/typinghumanizer` - `SplitWordChunks` -- word chunking with trailing delimiters attached to preceding word - `UniformJitter` -- random duration in a range, clamped to minimum - `IsSentenceEnd` -- sentence boundary detection for longer pauses - `AdjacentKey` -- QWERTY neighbor lookup from static `[26][]byte` array, O(1) - `GenerateTypoPositions` -- geometric gap sampling for typo placement ### Typo types (weighted distribution) | Type | Weight | Mechanism | |---|---|---| | Adjacent key | 60% | QWERTY neighbor substitution | | Doubling | 20% | Character typed twice | | Transpose | 15% | Swap with next character | | Extra char | 5% | Random adjacent key inserted | Related: #169 (plan document) ## Demo ![smooth_typing_demo](https://github.com/user-attachments/assets/ddf5dd15-b692-438f-be2d-a7c2e9d69d9b) ## Test plan - [x] All 17 typinghumanizer tests pass (word chunking, adjacency, typo generation, distribution) - [x] Builds clean, no lint issues - [ ] Manual test with `smooth: true` on a running instance - [ ] Manual test with `smooth: true, typo_chance: 0.03` to verify typo/correction behavior Made with [Cursor](https://cursor.com) <!-- CURSOR_SUMMARY --> --- > [!NOTE] > **Medium Risk** > Changes the `POST /computer/type` behavior by defaulting to a new smooth-typing path that introduces randomized delays and optional typo/backspace correction, which could affect automation timing and determinism. Also modifies the generated OpenAPI client/server glue via new post-generation patching, which can subtly impact request/response handling if the codegen output format changes. > > **Overview** > Adds human-like typing to `POST /computer/type` via new request fields `smooth` (defaults to true) and `typo_chance` (0–0.10), enabling variable per-chunk delays, inter-word pauses, and optional typo injection with backspace correction. > > Introduces `server/lib/typinghumanizer` (with tests) to chunk text, generate jittered timings, and compute realistic typo positions/adjacent-key substitutions, and wires this into the API implementation. > > Updates OpenAPI (`openapi.yaml` + regenerated `lib/oapi/oapi.go`) and the `server/Makefile` to run an additional codegen patch step (`patch_strict_optional_json`) to tolerate empty JSON bodies for optional-body endpoints and restore `omitempty` tags; adds a small demo page + Python script to record side-by-side typing behavior. > > <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit d63ab1d. Bugbot is set up for automated code reviews on this repo. Configure [here](https://www.cursor.com/dashboard/bugbot).</sup> <!-- /CURSOR_SUMMARY -->
1 parent 18e7577 commit a0cfe0a

11 files changed

Lines changed: 1249 additions & 379 deletions

File tree

Lines changed: 137 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,137 @@
1+
#!/usr/bin/env python3
2+
"""
3+
Demo script: smooth typing vs instant typing.
4+
5+
Drives the typing_demo.html page through the kernel-images API to produce
6+
a side-by-side comparison suitable for recording as a GIF/MP4.
7+
8+
Usage:
9+
# 1. Start a kernel-images container
10+
# 2. Upload typing_demo.html to the container
11+
# 3. Run this script:
12+
python demo_smooth_typing.py --base-url http://localhost:8000
13+
14+
Requirements:
15+
pip install requests
16+
"""
17+
18+
import argparse
19+
import base64
20+
import json
21+
import time
22+
from pathlib import Path
23+
24+
import requests
25+
26+
DEMO_TEXT = "The quick brown fox jumps over the lazy dog. Hello world!"
27+
28+
29+
def api(base: str, method: str, path: str, **kwargs):
30+
url = f"{base}{path}"
31+
resp = getattr(requests, method)(url, **kwargs)
32+
resp.raise_for_status()
33+
return resp
34+
35+
36+
def upload_demo_page(base: str):
37+
html_path = Path(__file__).parent / "typing_demo.html"
38+
html_bytes = html_path.read_bytes()
39+
api(base, "put", "/fs/write_file", params={"path": "/tmp/typing_demo.html"},
40+
data=html_bytes, headers={"Content-Type": "application/octet-stream"})
41+
print("Uploaded typing_demo.html")
42+
43+
44+
def execute_js(base: str, code: str):
45+
api(base, "post", "/playwright/execute", json={"code": code, "timeout_sec": 10})
46+
47+
48+
def navigate(base: str):
49+
execute_js(base, "await page.goto('file:///tmp/typing_demo.html');")
50+
time.sleep(1)
51+
52+
53+
def click_input(base: str):
54+
execute_js(base, "await page.click('#input');")
55+
time.sleep(0.3)
56+
57+
58+
def clear_input(base: str):
59+
execute_js(base, "window.demoApi.clear();")
60+
time.sleep(0.3)
61+
62+
63+
def set_mode(base: str, label: str, cls: str):
64+
execute_js(base, f"window.demoApi.setMode('{label}', '{cls}');")
65+
66+
67+
def type_text(base: str, text: str, smooth: bool = False, typo_chance: float = 0):
68+
body = {"text": text, "smooth": smooth}
69+
if typo_chance > 0:
70+
body["typo_chance"] = typo_chance
71+
if not smooth:
72+
body["delay"] = 0
73+
api(base, "post", "/computer/type", json=body)
74+
75+
76+
def start_recording(base: str):
77+
api(base, "post", "/recording/start", json={"framerate": 15, "id": "typing-demo"})
78+
print("Recording started")
79+
time.sleep(0.5)
80+
81+
82+
def stop_recording(base: str):
83+
api(base, "post", "/recording/stop", json={"id": "typing-demo"})
84+
time.sleep(1)
85+
print("Recording stopped")
86+
87+
88+
def download_recording(base: str, output: str):
89+
resp = api(base, "get", "/recording/download", params={"id": "typing-demo"})
90+
Path(output).write_bytes(resp.content)
91+
print(f"Saved recording to {output}")
92+
93+
94+
def run_demo(base: str, output: str):
95+
upload_demo_page(base)
96+
navigate(base)
97+
start_recording(base)
98+
99+
# --- Phase 1: Instant typing (no delay) ---
100+
set_mode(base, "INSTANT TYPING — delay: 0", "instant")
101+
time.sleep(1)
102+
click_input(base)
103+
type_text(base, DEMO_TEXT, smooth=False)
104+
time.sleep(2)
105+
106+
# --- Phase 2: Smooth typing (no typos) ---
107+
clear_input(base)
108+
set_mode(base, "SMOOTH TYPING — HUMAN-LIKE", "smooth")
109+
time.sleep(1)
110+
click_input(base)
111+
type_text(base, DEMO_TEXT, smooth=True)
112+
time.sleep(2)
113+
114+
# --- Phase 3: Smooth typing with typos ---
115+
clear_input(base)
116+
set_mode(base, "SMOOTH TYPING — WITH TYPOS", "typos")
117+
time.sleep(1)
118+
click_input(base)
119+
type_text(base, DEMO_TEXT, smooth=True, typo_chance=0.04)
120+
time.sleep(2)
121+
122+
stop_recording(base)
123+
download_recording(base, output)
124+
125+
126+
def main():
127+
parser = argparse.ArgumentParser(description="Smooth typing demo recorder")
128+
parser.add_argument("--base-url", default="http://localhost:8000",
129+
help="Base URL of the kernel-images API")
130+
parser.add_argument("--output", default="smooth_typing_demo.mp4",
131+
help="Output video file path")
132+
args = parser.parse_args()
133+
run_demo(args.base_url, args.output)
134+
135+
136+
if __name__ == "__main__":
137+
main()
Lines changed: 170 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,170 @@
1+
<!DOCTYPE html>
2+
<html lang="en">
3+
<head>
4+
<meta charset="UTF-8">
5+
<meta name="viewport" content="width=device-width, initial-scale=1.0">
6+
<title>Typing Demo</title>
7+
<style>
8+
* { margin: 0; padding: 0; box-sizing: border-box; }
9+
html, body { width: 100%; height: 100%; overflow: hidden; }
10+
body {
11+
font-family: 'SF Mono', 'Fira Code', 'Monaco', monospace;
12+
background: #0d1117;
13+
color: #e6edf3;
14+
display: flex;
15+
flex-direction: column;
16+
align-items: center;
17+
justify-content: center;
18+
gap: 48px;
19+
}
20+
#banner {
21+
padding: 14px 40px;
22+
background: rgba(22, 27, 34, 0.95);
23+
border: 2px solid #30363d;
24+
border-radius: 12px;
25+
font-size: 22px;
26+
font-weight: 700;
27+
letter-spacing: 6px;
28+
text-transform: uppercase;
29+
color: #58a6ff;
30+
text-align: center;
31+
transition: color 0.3s, border-color 0.3s;
32+
}
33+
#banner.instant { color: #f85149; border-color: #f85149; }
34+
#banner.smooth { color: #3fb950; border-color: #3fb950; }
35+
#banner.typos { color: #d2a8ff; border-color: #d2a8ff; }
36+
.typing-area {
37+
width: 700px;
38+
min-height: 120px;
39+
padding: 24px 28px;
40+
background: #161b22;
41+
border: 1px solid #30363d;
42+
border-radius: 12px;
43+
position: relative;
44+
}
45+
.typing-area .label {
46+
position: absolute;
47+
top: -12px;
48+
left: 16px;
49+
background: #0d1117;
50+
padding: 2px 10px;
51+
font-size: 11px;
52+
color: #8b949e;
53+
letter-spacing: 2px;
54+
text-transform: uppercase;
55+
}
56+
#input {
57+
width: 100%;
58+
background: transparent;
59+
border: none;
60+
outline: none;
61+
color: #e6edf3;
62+
font-family: inherit;
63+
font-size: 20px;
64+
line-height: 1.6;
65+
caret-color: #58a6ff;
66+
resize: none;
67+
overflow: hidden;
68+
}
69+
#input::placeholder { color: #484f58; }
70+
#keystroke-viz {
71+
width: 700px;
72+
height: 80px;
73+
position: relative;
74+
overflow: hidden;
75+
}
76+
#keystroke-viz canvas {
77+
width: 100%;
78+
height: 100%;
79+
}
80+
.hint {
81+
color: #484f58;
82+
font-size: 12px;
83+
letter-spacing: 1px;
84+
}
85+
</style>
86+
</head>
87+
<body>
88+
<div id="banner">Typing Demo</div>
89+
90+
<div class="typing-area">
91+
<div class="label">Input</div>
92+
<textarea id="input" rows="3" placeholder="Text will appear here..."></textarea>
93+
</div>
94+
95+
<div id="keystroke-viz">
96+
<canvas id="viz-canvas"></canvas>
97+
</div>
98+
99+
<div class="hint">Keystroke timing visualization</div>
100+
101+
<script>
102+
const input = document.getElementById('input');
103+
const banner = document.getElementById('banner');
104+
const vizCanvas = document.getElementById('viz-canvas');
105+
const vizCtx = vizCanvas.getContext('2d');
106+
107+
let keyTimes = [];
108+
let lastKeyTime = 0;
109+
110+
function resizeCanvas() {
111+
vizCanvas.width = vizCanvas.parentElement.clientWidth;
112+
vizCanvas.height = vizCanvas.parentElement.clientHeight;
113+
drawViz();
114+
}
115+
116+
function drawViz() {
117+
const w = vizCanvas.width;
118+
const h = vizCanvas.height;
119+
vizCtx.clearRect(0, 0, w, h);
120+
121+
if (keyTimes.length < 2) return;
122+
123+
const intervals = [];
124+
for (let i = 1; i < keyTimes.length; i++) {
125+
intervals.push(keyTimes[i] - keyTimes[i - 1]);
126+
}
127+
128+
const maxInterval = Math.min(Math.max(...intervals), 500);
129+
const barWidth = Math.max(2, (w - 20) / intervals.length - 1);
130+
131+
vizCtx.fillStyle = 'rgba(88, 166, 255, 0.7)';
132+
const bannerEl = document.getElementById('banner');
133+
if (bannerEl.classList.contains('instant')) vizCtx.fillStyle = 'rgba(248, 81, 73, 0.7)';
134+
if (bannerEl.classList.contains('smooth')) vizCtx.fillStyle = 'rgba(63, 185, 80, 0.7)';
135+
if (bannerEl.classList.contains('typos')) vizCtx.fillStyle = 'rgba(210, 168, 255, 0.7)';
136+
137+
for (let i = 0; i < intervals.length; i++) {
138+
const barH = Math.max(2, (intervals[i] / maxInterval) * (h - 10));
139+
const x = 10 + i * (barWidth + 1);
140+
vizCtx.fillRect(x, h - barH, barWidth, barH);
141+
}
142+
}
143+
144+
input.addEventListener('keydown', () => {
145+
const now = performance.now();
146+
keyTimes.push(now);
147+
if (keyTimes.length > 300) keyTimes.shift();
148+
drawViz();
149+
});
150+
151+
window.addEventListener('resize', resizeCanvas);
152+
resizeCanvas();
153+
154+
window.demoApi = {
155+
setMode: (label, cls) => {
156+
banner.textContent = label;
157+
banner.className = cls || '';
158+
},
159+
clear: () => {
160+
input.value = '';
161+
keyTimes = [];
162+
vizCtx.clearRect(0, 0, vizCanvas.width, vizCanvas.height);
163+
},
164+
focus: () => {
165+
input.focus();
166+
}
167+
};
168+
</script>
169+
</body>
170+
</html>

server/Makefile

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,9 @@ oapi-generate: $(OAPI_CODEGEN)
2323
openapi-down-convert --input openapi.yaml --output openapi-3.0.yaml
2424
$(OAPI_CODEGEN) -config ./oapi-codegen.yaml ./openapi-3.0.yaml
2525
@echo "Fixing oapi-codegen issue https://github.com/oapi-codegen/oapi-codegen/issues/1764..."
26-
go run ./scripts/oapi/patch_sse_methods.go -file ./lib/oapi/oapi.go -expected-replacements 3
26+
go run ./scripts/oapi/patch_sse_methods -file ./lib/oapi/oapi.go -expected-replacements 3
27+
@echo "Patching strict JSON optional bodies (io.EOF) + response omitempty tags..."
28+
go run ./scripts/oapi/patch_strict_optional_json -file ./lib/oapi/oapi.go
2729
go fmt ./lib/oapi/oapi.go
2830
go mod tidy
2931

0 commit comments

Comments
 (0)