Skip to content

refactor: use urllib.parse for WebSocket URL construction#780

Merged
kraenhansen merged 3 commits into
mainfrom
fix/scribe-url-encoding
Apr 29, 2026
Merged

refactor: use urllib.parse for WebSocket URL construction#780
kraenhansen merged 3 commits into
mainfrom
fix/scribe-url-encoding

Conversation

@kraenhansen
Copy link
Copy Markdown
Member

@kraenhansen kraenhansen commented Apr 28, 2026

Summary

  • Extract a shared build_ws_url(base_url, path_segments, params) helper in url_utils.py that handles:
    • Scheme conversion (http→ws, https→wss)
    • Percent-encoding each path segment (preventing injection via dynamic values like voice IDs)
    • Query string encoding via urllib.parse.urlencode
  • Replace manual f-string URL construction in all three WebSocket URL builders:
    • ScribeRealtime._build_websocket_url in scribe.py
    • RealtimeTextToSpeechClient.convert_realtime in realtime_tts.py
    • BaseConversation._get_wss_url in conversation.py
  • Fix _get_signed_url to use urlparse/urlunparse instead of string concatenation for appending query params

Closes #779

Test plan

  • All 16 tests in test_url_utils.py pass (scheme conversion, path encoding, special chars, query params)
  • All 13 tests in test_stt_realtime.py pass (including URL construction and scheme conversion)
  • All 19 tests in test_convai.py + test_async_convai.py pass (including URL edge cases)

🤖 Generated with Claude Code


Note

Medium Risk
Touches WebSocket URL generation for conversation, realtime STT, and realtime TTS; subtle changes in path/query encoding or base-path joining could break connectivity in edge cases.

Overview
Introduces a shared build_ws_url helper to construct WebSocket URLs by converting schemes (http(s)ws(s)), percent-encoding path segments, and encoding query parameters via urllib.parse.urlencode.

Refactors the conversation, realtime STT (ScribeRealtime), and realtime TTS URL builders to use this helper instead of manual string/urljoin construction, and updates conversation signed-URL augmentation to append source/version via urlparse/urlunparse.

Adds unit tests covering scheme conversion, path joining/encoding, query encoding (including repeated keys), and port/base-path handling.

Reviewed by Cursor Bugbot for commit 3b97a1f. Bugbot is set up for automated code reviews on this repo. Configure here.

cursor[bot]

This comment was marked as resolved.

@kraenhansen kraenhansen changed the title fix: use urllib.parse.urlencode for Scribe WebSocket query string fix: use urllib.parse.urlencode for WebSocket query strings Apr 28, 2026
@kraenhansen kraenhansen force-pushed the fix/scribe-url-encoding branch 5 times, most recently from bb38bd9 to 7a6efd5 Compare April 28, 2026 09:16
@kraenhansen kraenhansen changed the title fix: use urllib.parse.urlencode for WebSocket query strings refactor: use urllib.parse for WebSocket URL construction Apr 28, 2026
@kraenhansen kraenhansen force-pushed the fix/scribe-url-encoding branch 2 times, most recently from ff5cd24 to f5eb1af Compare April 28, 2026 09:20
Replace manual f-string query parameter concatenation and scheme
swapping with a shared build_ws_url helper that uses urllib.parse
for proper percent-encoding. This fixes free-form values (e.g.
keyterms with spaces) producing malformed URLs.

Affected files:
- realtime/scribe.py — _build_websocket_url
- realtime_tts.py — convert_realtime
- conversational_ai/conversation.py — _get_wss_url and _get_signed_url

Closes #779

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@kraenhansen kraenhansen force-pushed the fix/scribe-url-encoding branch from f5eb1af to 3b97a1f Compare April 28, 2026 09:23
@kraenhansen kraenhansen requested a review from PaulAsjes April 28, 2026 09:26
cursor[bot]

This comment was marked as resolved.

Pass keep_blank_values=True to parse_qsl so empty-valued parameters
(e.g. param=) in the signed URL are not silently dropped.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
cursor[bot]

This comment was marked as resolved.

urlencode defaults to quote_plus which encodes spaces as `+` instead
of `%20`. This can break signed URL signatures when the server
validates against the original encoding.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@kraenhansen kraenhansen merged commit 837172c into main Apr 29, 2026
6 checks passed
@kraenhansen kraenhansen deleted the fix/scribe-url-encoding branch April 29, 2026 11:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Use proper URL builder for Scribe realtime WebSocket query string

2 participants