Skip to content

Commit 8aa867a

Browse files
g-despotclaude
andcommitted
feat(grpc-web): add Pyodide/WASM grpc-web transport
Let the async client's gRPC data path run under Pyodide/WebAssembly (marimo, browser), where grpcio has no wheel and raw sockets are unavailable. Base changes (no-ops on normal platforms): - setup.cfg: mark grpcio with `; sys_platform != "emscripten"` so micropip skips it under Pyodide while CPython installs it unchanged. - weaviate/proto/v1/__init__.py: when grpcio distribution metadata is absent, fall back to version 1.72.1 so a working generated-proto variant is selected. Restricted to grpcio; a missing protobuf still surfaces. New companion package packages/grpc-web (weaviate-python-grpc-web): - A sys.modules `grpc` shim that satisfies the client's import-time grpc surface and the `grpc.aio.Channel` / awaitability contracts; installs itself only under Emscripten so it never clobbers a real grpcio. - GrpcWebChannel: frames unary RPCs as grpc-web and POSTs them via pyodide pyfetch (with an httpx sender for CPython tests); folds call metadata into fetch headers; maps grpc-web trailers/status to the client's error types. - Reuses the client's generated protobuf stubs (no codegen fork). Async-only; bidirectional BatchStream is intentionally unsupported over fetch. Tests (25 passing): grpc-web framing, transport round-trips and error mapping, a subprocess test that imports weaviate under the shim with a real-proto unary round trip, and base proto-guard regression tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1 parent 3a8d341 commit 8aa867a

15 files changed

Lines changed: 1146 additions & 5 deletions

File tree

packages/grpc-web/README.md

Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
# weaviate-python-grpc-web
2+
3+
A grpc-web / WebAssembly (Pyodide) transport for the
4+
[Weaviate Python client](https://github.com/weaviate/weaviate-python-client), so the
5+
client's **async** gRPC data path can run inside a browser (marimo notebooks, Pyodide,
6+
WASM workers) where there is no socket and no `grpcio` wheel.
7+
8+
It is built from the same repository as `weaviate-client` and reuses its generated
9+
protobuf stubs — it does **not** fork code generation.
10+
11+
## How it works
12+
13+
Under Pyodide there is no `grpcio` Emscripten wheel, and `import weaviate` hard-imports
14+
`grpc` at module load. This package installs a small pure-Python `grpc` shim into
15+
`sys.modules` **before** `import weaviate`, which:
16+
17+
- satisfies every import-time `import grpc` / `from grpc(.aio) import ...` in the base
18+
client and its generated `*_pb2_grpc` stubs;
19+
- provides `grpc.aio.Channel` as a real base class, so the grpc-web channel
20+
(`GrpcWebChannel`) subclasses it and the client's `isinstance(..., grpc.aio.Channel)`
21+
assertions pass;
22+
- satisfies the generated v6300 stub's version gate
23+
(`grpc.__version__` / `grpc._utilities.first_version_is_lower`).
24+
25+
The `GrpcWebChannel` frames unary RPCs as grpc-web (a 5-byte header + protobuf payload)
26+
and POSTs them via `pyodide.http.pyfetch` to a server fronted by a grpc-web transcoder
27+
(e.g. Envoy or [connectrpc/vanguard](https://github.com/connectrpc/vanguard-go)). Call
28+
metadata (API key / OIDC bearer) is folded into `fetch` headers.
29+
30+
## Usage
31+
32+
```python
33+
import weaviate_grpc_web # installs the grpc shim under Emscripten (no-op elsewhere)
34+
import weaviate
35+
36+
client = weaviate.use_async_with_local(skip_init_checks=True)
37+
await client.connect()
38+
collection = client.collections.get("Article")
39+
await collection.query.near_text("hello", limit=3)
40+
```
41+
42+
## Supported / unsupported
43+
44+
| RPC | Kind | Status |
45+
|----------------------------------------------------------|-----------------|--------|
46+
| Search, Aggregate, TenantsGet, BatchObjects, BatchDelete | unary | ✅ works over grpc-web |
47+
| Health check (`/grpc.health.v1.Health/Check`) | unary | ✅ (recommend `skip_init_checks=True` + REST `/.well-known/ready`) |
48+
| References (`/batch/references`) | REST | ✅ via httpx-in-Pyodide |
49+
| `batch.stream()` / `batch.experimental()` (BatchStream) | bidi streaming | ❌ not possible over grpc-web/fetch — use `insert_many()` / `batch.dynamic()` / `fixed_size()` / `rate_limit()` |
50+
| Synchronous client || ❌ async-only under WASM |
51+
52+
## Testing on CPython
53+
54+
`weaviate_grpc_web.install(force=True)` installs the shim on a normal CPython
55+
interpreter (run it in a fresh process, before importing `weaviate`). Inject a sender
56+
with `weaviate_grpc_web.set_sender(...)` (e.g. `make_httpx_sender()`) to exercise the
57+
transport against an Envoy/vanguard transcoder without a browser.

packages/grpc-web/pyproject.toml

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
[build-system]
2+
requires = ["setuptools>=65", "wheel"]
3+
build-backend = "setuptools.build_meta"
4+
5+
[project]
6+
name = "weaviate-python-grpc-web"
7+
description = "grpc-web / WASM (Pyodide) transport for the Weaviate Python client"
8+
readme = "README.md"
9+
requires-python = ">=3.10"
10+
license = { text = "BSD-3-Clause" }
11+
authors = [{ name = "Weaviate", email = "hello@weaviate.io" }]
12+
keywords = ["weaviate", "grpc-web", "pyodide", "wasm", "emscripten"]
13+
# Version is kept in lockstep with weaviate-client. TODO(lockstep): derive from the same
14+
# git tag via setuptools_scm and assert the built versions match in CI before publishing.
15+
version = "0.0.1.dev0"
16+
# Deliberately depends on weaviate-client WITHOUT grpcio (grpcio is excluded under
17+
# Emscripten by the `sys_platform != "emscripten"` marker in the base package's deps).
18+
dependencies = [
19+
"weaviate-client",
20+
]
21+
22+
[project.urls]
23+
Source = "https://github.com/weaviate/weaviate-python-client"
24+
Tracker = "https://github.com/weaviate/weaviate-python-client/issues"
25+
26+
[tool.setuptools.packages.find]
27+
where = ["src"]
28+
29+
[tool.setuptools.package-data]
30+
weaviate_grpc_web = ["py.typed"]
Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
"""grpc-web / WASM transport for the Weaviate Python client.
2+
3+
Under Pyodide/Emscripten there is no ``grpcio`` wheel. Importing this package installs a
4+
pure-Python ``grpc`` shim into ``sys.modules`` (and forces the pure-Python protobuf
5+
runtime) so that the subsequent ``import weaviate`` succeeds and its async gRPC data path
6+
runs over grpc-web (``fetch``) instead of HTTP/2 sockets.
7+
8+
Usage under Pyodide::
9+
10+
import weaviate_grpc_web # installs the grpc shim (no-op off Emscripten)
11+
import weaviate
12+
13+
client = weaviate.use_async_with_local(skip_init_checks=True)
14+
await client.connect()
15+
16+
The shim is installed automatically only under Emscripten, so importing this package on a
17+
normal CPython install never clobbers a real, working ``grpcio``. Async clients only —
18+
the synchronous client is not supported in the browser.
19+
"""
20+
21+
import os
22+
import sys
23+
24+
from ._shim import StatusCode, install, is_installed
25+
26+
__all__ = [
27+
"install",
28+
"is_installed",
29+
"set_sender",
30+
"make_httpx_sender",
31+
"GrpcWebChannel",
32+
"StatusCode",
33+
]
34+
35+
36+
def _bootstrap() -> None:
37+
if sys.platform == "emscripten":
38+
# The pure-Python protobuf runtime always works; the upb C-extension may not be
39+
# present. Set before ``import weaviate`` (which imports protobuf) so it takes
40+
# effect. ``setdefault`` lets a user override it explicitly.
41+
os.environ.setdefault("PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION", "python")
42+
install()
43+
44+
45+
_bootstrap()
46+
47+
# Imported after the bootstrap. These modules pull their grpc base classes directly from
48+
# ``._shim`` (not via ``sys.modules['grpc']``), so importing them is safe regardless of
49+
# whether the shim was installed.
50+
from ._channel import GrpcWebChannel, set_sender # noqa: E402
51+
from ._sender import make_httpx_sender # noqa: E402
Lines changed: 233 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,233 @@
1+
"""The grpc-web channel and multicallables.
2+
3+
:class:`GrpcWebChannel` implements the small slice of the ``grpc.aio`` channel interface
4+
that ``weaviate``'s generated stub and ``ConnectionV4`` actually use — ``unary_unary``,
5+
``stream_stream`` and ``close`` — by framing requests as grpc-web and POSTing them via a
6+
pluggable async sender. It subclasses the shim's ``grpc.aio.Channel`` (:class:`AioChannel`)
7+
so the ``isinstance(..., grpc.aio.Channel)`` assertions in ``connect/v4.py`` hold.
8+
9+
Only unary RPCs are supported (Search, Aggregate, TenantsGet, BatchObjects,
10+
BatchReferences, BatchDelete, and the unary health check). ``stream_stream`` (the bidi
11+
``BatchStream`` used by opt-in server-side batching) cannot work over grpc-web/fetch and
12+
raises a clear error.
13+
"""
14+
15+
import base64
16+
import urllib.parse
17+
from typing import Any, Callable, Dict, Optional
18+
19+
from ._framing import encode_message, split_response
20+
from ._sender import Sender, pyfetch_sender
21+
from ._shim import AioChannel, AioRpcError, StatusCode, status_from_int
22+
23+
# Module-level default sender; overridable for tests / non-browser runtimes.
24+
_default_sender: Sender = pyfetch_sender
25+
26+
27+
def set_sender(sender: Sender) -> None:
28+
"""Override the default async sender used by new channels (tests/integration)."""
29+
global _default_sender
30+
_default_sender = sender
31+
32+
33+
def get_sender() -> Sender:
34+
return _default_sender
35+
36+
37+
def _encode_timeout(seconds: float) -> str:
38+
"""Encode a timeout as a grpc-timeout header value (``<positive int><unit>``)."""
39+
millis = max(1, int(seconds * 1000))
40+
if millis < 100_000_000:
41+
return f"{millis}m"
42+
return f"{max(1, int(seconds))}S"
43+
44+
45+
def _fold_metadata(headers: Dict[str, str], metadata: Any) -> None:
46+
"""Fold gRPC call metadata (``[(key, value), ...]``) into fetch headers.
47+
48+
Binary ``-bin`` keys are base64-encoded as grpc-web requires.
49+
"""
50+
if not metadata:
51+
return
52+
for key, value in metadata:
53+
name = key.lower()
54+
if name.endswith("-bin"):
55+
raw = value if isinstance(value, (bytes, bytearray)) else str(value).encode()
56+
headers[name] = base64.b64encode(raw).decode("ascii")
57+
else:
58+
headers[name] = value if isinstance(value, str) else str(value)
59+
60+
61+
def _header_lookup(headers: Dict[str, str], name: str) -> Optional[str]:
62+
target = name.lower()
63+
for key, value in headers.items():
64+
if key.lower() == target:
65+
return value
66+
return None
67+
68+
69+
class _UnaryUnaryMultiCallable:
70+
"""Awaitable multicallable bound by ``WeaviateStub.__init__``.
71+
72+
Called as ``await mc(request, metadata=..., timeout=...)`` (and, for the health
73+
check, as ``mc(request, timeout=...)`` with no metadata).
74+
"""
75+
76+
def __init__(
77+
self,
78+
channel: "GrpcWebChannel",
79+
path: str,
80+
request_serializer: Callable[[Any], bytes],
81+
response_deserializer: Callable[[bytes], Any],
82+
) -> None:
83+
self._channel = channel
84+
self._path = path
85+
self._serialize = request_serializer
86+
self._deserialize = response_deserializer
87+
88+
async def __call__(
89+
self,
90+
request: Any,
91+
*,
92+
metadata: Any = None,
93+
timeout: Optional[float] = None,
94+
credentials: Any = None,
95+
wait_for_ready: Any = None,
96+
compression: Any = None,
97+
) -> Any:
98+
payload = self._serialize(request)
99+
return await self._channel._unary(self._path, payload, self._deserialize, metadata, timeout)
100+
101+
102+
class _UnsupportedStreamMultiCallable:
103+
"""Placeholder for ``stream_stream`` (bidirectional streaming).
104+
105+
Calling it raises immediately, before the ``async for`` in ``connect/v4.py:1243``
106+
begins iterating.
107+
"""
108+
109+
def __init__(self, path: str) -> None:
110+
self._path = path
111+
112+
def __call__(self, *args: Any, **kwargs: Any) -> Any:
113+
raise RuntimeError(
114+
f"Bidirectional streaming RPC {self._path!r} (server-side batching / "
115+
"BatchStream) is not supported over grpc-web/fetch. Use insert_many(), or "
116+
"batch.dynamic() / fixed_size() / rate_limit(), instead of batch.stream()."
117+
)
118+
119+
120+
class GrpcWebChannel(AioChannel):
121+
"""grpc-web/fetch implementation of the async grpc channel slice the client uses."""
122+
123+
def __init__(
124+
self,
125+
target: Optional[str],
126+
secure: bool,
127+
options: Any = None,
128+
sender: Optional[Sender] = None,
129+
) -> None:
130+
if not target:
131+
raise ValueError("GrpcWebChannel requires a target (host:port)")
132+
scheme = "https" if secure else "http"
133+
self._base_url = f"{scheme}://{target}"
134+
self._sender: Sender = sender or get_sender()
135+
136+
def unary_unary(
137+
self,
138+
method: str,
139+
request_serializer: Callable[[Any], bytes],
140+
response_deserializer: Callable[[bytes], Any],
141+
_registered_method: bool = False,
142+
) -> _UnaryUnaryMultiCallable:
143+
return _UnaryUnaryMultiCallable(self, method, request_serializer, response_deserializer)
144+
145+
def stream_stream(
146+
self,
147+
method: str,
148+
request_serializer: Callable[[Any], bytes],
149+
response_deserializer: Callable[[bytes], Any],
150+
_registered_method: bool = False,
151+
) -> _UnsupportedStreamMultiCallable:
152+
return _UnsupportedStreamMultiCallable(method)
153+
154+
async def close(self, grace: Optional[float] = None) -> None:
155+
# Nothing to tear down: each call is an independent fetch.
156+
return None
157+
158+
async def _unary(
159+
self,
160+
path: str,
161+
payload: bytes,
162+
deserialize: Callable[[bytes], Any],
163+
metadata: Any,
164+
timeout: Optional[float],
165+
) -> Any:
166+
headers: Dict[str, str] = {
167+
"content-type": "application/grpc-web+proto",
168+
"accept": "application/grpc-web+proto",
169+
"x-grpc-web": "1",
170+
"x-user-agent": "weaviate-python-grpc-web",
171+
}
172+
_fold_metadata(headers, metadata)
173+
if timeout is not None:
174+
headers["grpc-timeout"] = _encode_timeout(timeout)
175+
176+
url = self._base_url + path
177+
status, resp_headers, body = await self._sender(
178+
url, headers, encode_message(payload), timeout
179+
)
180+
return self._handle_response(status, resp_headers, body, deserialize)
181+
182+
@staticmethod
183+
def _handle_response(
184+
http_status: int,
185+
resp_headers: Dict[str, str],
186+
body: bytes,
187+
deserialize: Callable[[bytes], Any],
188+
) -> Any:
189+
messages, trailers = split_response(body) if body else ([], {})
190+
191+
raw_status = trailers.get("grpc-status")
192+
if raw_status is None:
193+
raw_status = _header_lookup(resp_headers, "grpc-status")
194+
raw_message = (
195+
trailers.get("grpc-message") or _header_lookup(resp_headers, "grpc-message") or ""
196+
)
197+
message = urllib.parse.unquote(raw_message)
198+
199+
if raw_status is None:
200+
if http_status != 200:
201+
raise AioRpcError(
202+
code=_status_from_http(http_status),
203+
details=f"HTTP {http_status} from grpc-web endpoint",
204+
)
205+
code = StatusCode.OK
206+
else:
207+
code = status_from_int(int(raw_status))
208+
209+
if code is not StatusCode.OK:
210+
raise AioRpcError(code=code, details=message)
211+
if not messages:
212+
raise AioRpcError(
213+
code=StatusCode.INTERNAL,
214+
details="grpc-web response contained no message frame",
215+
)
216+
return deserialize(messages[0])
217+
218+
219+
def _status_from_http(http_status: int) -> StatusCode:
220+
"""Map an HTTP status to a gRPC status when no grpc-status is present.
221+
222+
Mirrors the grpc-web spec's HTTP-to-gRPC code mapping.
223+
"""
224+
return {
225+
400: StatusCode.INTERNAL,
226+
401: StatusCode.UNAUTHENTICATED,
227+
403: StatusCode.PERMISSION_DENIED,
228+
404: StatusCode.UNIMPLEMENTED,
229+
429: StatusCode.UNAVAILABLE,
230+
502: StatusCode.UNAVAILABLE,
231+
503: StatusCode.UNAVAILABLE,
232+
504: StatusCode.UNAVAILABLE,
233+
}.get(http_status, StatusCode.UNKNOWN)

0 commit comments

Comments
 (0)