Skip to content

Commit d475bb0

Browse files
yokoflyclaude
andcommitted
docs: split Python External Stream into source/sink pages
Refactor the single python-external-stream page into source/sink MDX wrappers backed by three shared partials (basics, read, write), matching the Kafka/Pulsar/NATS pattern. Add read- and write-side worked examples: streaming generator, batch list, init/deinit lifecycle, local API credential callback, materialized-view sink, and webhook POST. Drop internal engine-version gates in favor of a single Timeplus Enterprise 3.2+ product note. - New: docs/python-external-stream-source.mdx and -sink.mdx, plus shared partials under docs/shared/. - Sidebar entries repointed; /python-external-stream redirects to /python-external-stream-source. - Incoming links in docs/external-stream.md and docs/sql-create-external-stream.md updated to reference both pages. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 4d335dd commit d475bb0

10 files changed

Lines changed: 303 additions & 114 deletions

docs/external-stream.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ Timeplus supports 6 types of external streams:
88
* [Kafka External Stream](/kafka-source)
99
* [Pulsar External Stream](/pulsar-source)
1010
* [NATS JetStream External Stream](/nats-jetstream-source)
11-
* [Python External Stream](/python-external-stream), only available in Timeplus Enterprise
11+
* [Python External Stream Source](/python-external-stream-source) and [Sink](/python-external-stream-sink), only available in Timeplus Enterprise
1212
* [Timeplus External Stream](/timeplus-source), only available in Timeplus Enterprise
1313
* [Log External Stream](/log-stream) (experimental)
1414

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
---
2+
id: python-external-stream-sink
3+
title: Python Sink
4+
---
5+
6+
import ExternalPythonBasics from './shared/python-external-stream.md';
7+
import ExternalPythonWrite from './shared/python-external-stream-write.md';
8+
9+
<ExternalPythonBasics />
10+
<ExternalPythonWrite />
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
---
2+
id: python-external-stream-source
3+
title: Python Source
4+
---
5+
6+
import ExternalPythonBasics from './shared/python-external-stream.md';
7+
import ExternalPythonRead from './shared/python-external-stream-read.md';
8+
9+
<ExternalPythonBasics />
10+
<ExternalPythonRead />

docs/python-external-stream.md

Lines changed: 0 additions & 106 deletions
This file was deleted.
Lines changed: 112 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,112 @@
1+
## Read Data from a Python External Stream
2+
3+
The read function is the entry point Timeplus calls to pull rows from your Python code. It is **synchronous** (no `async`/`await`) and receives **no implicit arguments** — any configuration must arrive through the init function. Each value it produces is a row whose columns match the stream's schema in declared order; for a single-column stream you may yield bare scalars.
4+
5+
### Streaming source (generator)
6+
7+
Yield a row or a batch of rows at a time. The query stays alive as long as the generator does, which makes generators the right shape for clocks, polling loops, websocket feeds, message-bus consumers, and other long-lived sources.
8+
9+
```sql
10+
CREATE EXTERNAL STREAM py_clock (tick uint32, label string)
11+
AS $$
12+
import time
13+
14+
def py_clock():
15+
n = 0
16+
while True:
17+
yield (n, "tick")
18+
n += 1
19+
time.sleep(1)
20+
$$
21+
SETTINGS
22+
type = 'python',
23+
mode = 'streaming';
24+
```
25+
26+
`read_function_name` is omitted, so it defaults to the stream name `py_clock`. Setting `mode = 'streaming'` makes the engine reject a non-generator return value, which catches mistakes like returning a list early.
27+
28+
### Batch source (list)
29+
30+
Return a list of rows once. Use this shape for one-shot pulls — REST snapshots, file scans, or any source where a single call yields the full result.
31+
32+
```sql
33+
CREATE EXTERNAL STREAM py_users (id int32, name string)
34+
AS $$
35+
import json
36+
import urllib.request
37+
38+
def py_users():
39+
with urllib.request.urlopen("https://api.example.com/users") as r:
40+
payload = json.load(r)
41+
return [(u["id"], u["name"]) for u in payload]
42+
$$
43+
SETTINGS
44+
type = 'python',
45+
mode = 'batch';
46+
```
47+
48+
### Long-lived setup with init / deinit
49+
50+
Open a client once, stash it on `builtins`, and tear it down at the end of the query. Init parameters arrive as a single string, so JSON is convenient when you have more than one value to pass.
51+
52+
```sql
53+
CREATE EXTERNAL STREAM py_cookie_counter
54+
(
55+
previous_cleanup_count int32,
56+
secret_flavor string
57+
)
58+
AS $$
59+
import builtins, json
60+
61+
def open_bakery(config):
62+
builtins._tp_cookie_secret_flavor = json.loads(config)["flavor"]
63+
64+
def close_bakery():
65+
if hasattr(builtins, "_tp_cookie_secret_flavor"):
66+
del builtins._tp_cookie_secret_flavor
67+
68+
def serve_cookie_report():
69+
return [(0, getattr(builtins, "_tp_cookie_secret_flavor", ""))]
70+
$$
71+
SETTINGS
72+
type = 'python',
73+
read_function_name = 'serve_cookie_report',
74+
init_function_name = 'open_bakery',
75+
init_function_parameters = '{"flavor":"double-chocolate"}',
76+
deinit_function_name = 'close_bakery';
77+
```
78+
79+
Remember that init and deinit run **per query**, not once per stream creation — the `builtins` state above is set up and torn down each time a query reads from `py_cookie_counter`. Use stream-specific `builtins` attribute names and delete them in deinit so later Python sessions do not see stale state.
80+
81+
### Calling back to `timeplusd`
82+
83+
The injected `__timeplus_local_api_user` and `__timeplus_local_api_password` globals let the read function authenticate to the same server without hard-coded credentials. The example below queries an internal stream over the REST interface and turns the result into a row.
84+
85+
```sql
86+
CREATE EXTERNAL STREAM py_user_count (total int64)
87+
AS $$
88+
import base64, urllib.request
89+
90+
def py_user_count():
91+
creds = base64.b64encode(
92+
f"{__timeplus_local_api_user}:{__timeplus_local_api_password}".encode()
93+
).decode()
94+
req = urllib.request.Request(
95+
"http://localhost:8123/?query=SELECT+count()+FROM+table(users)",
96+
headers={"Authorization": f"Basic {creds}"},
97+
)
98+
with urllib.request.urlopen(req) as r:
99+
return [(int(r.read().strip()),)]
100+
$$
101+
SETTINGS
102+
type = 'python',
103+
mode = 'batch';
104+
```
105+
106+
Treat `__timeplus_local_api_password` as a secret — do not log it, do not echo it back into output rows, and do not pass it into subprocesses.
107+
108+
### Cancellation and errors
109+
110+
When a query is cancelled (for example by `KILL QUERY` or by closing the client), the running Python code receives a `KeyboardInterrupt`. Streaming generators stop at the next yield point; long-blocking calls inside C extensions may delay the interrupt until they return.
111+
112+
If the read function raises, the query fails and the Python traceback is included in the error response — wrap recoverable errors inside the function and decide explicitly whether to re-raise or continue.
Lines changed: 84 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,84 @@
1+
## Write Data to a Python External Stream
2+
3+
The write function is invoked once per chunk, not once per row. Its arguments are **column-oriented**: one Python list per output column, in declared order, all of equal length. Iterate with `zip` to recover row tuples.
4+
5+
### Sink basics
6+
7+
```sql
8+
CREATE EXTERNAL STREAM py_metric_sink (host string, value float32)
9+
AS $$
10+
def py_metric_sink(host, value):
11+
for h, v in zip(host, value):
12+
print(f"{h}={v}")
13+
$$
14+
SETTINGS type = 'python';
15+
```
16+
17+
Insert a few rows:
18+
19+
```sql
20+
INSERT INTO py_metric_sink (host, value) VALUES ('a', 1.0), ('b', 2.0);
21+
```
22+
23+
Behind the scenes Timeplus calls `py_metric_sink(['a', 'b'], [1.0, 2.0])` — one call carrying both rows. A larger INSERT or a downstream query that delivers many chunks results in one call per chunk.
24+
25+
If `write_function_name` is omitted Timeplus uses `read_function_name` (which itself defaults to the stream name), so the Python function above only needs to be named once.
26+
27+
### Materialized view → external stream
28+
29+
Routing a continuous query into a sink is the most common production pattern. Define the sink once, then point a materialized view at it:
30+
31+
```sql
32+
CREATE EXTERNAL STREAM py_alert_sink (host string, value float32)
33+
AS $$
34+
def py_alert_sink(host, value):
35+
for h, v in zip(host, value):
36+
notify(h, v) # your notifier
37+
$$
38+
SETTINGS type = 'python';
39+
40+
CREATE MATERIALIZED VIEW high_value_alerts INTO py_alert_sink AS
41+
SELECT host, value FROM metrics WHERE value > 100;
42+
```
43+
44+
The materialized view feeds chunks into the sink as they are produced; each chunk becomes one call to `py_alert_sink`.
45+
46+
### Custom protocol example: webhook POST
47+
48+
Load the destination URL in init, reuse that configuration for every chunk, and clear it in deinit. Init parameters carry the URL so the Python body is reusable across environments. To pool an actual HTTP connection, swap `urllib` for a session-aware client (for example `requests.Session()`) and stash the session itself on `builtins`.
49+
50+
```sql
51+
CREATE EXTERNAL STREAM py_webhook (event_id string, body string)
52+
AS $$
53+
import builtins, json, urllib.request
54+
55+
def open_client(config):
56+
builtins._tp_webhook = json.loads(config)["url"]
57+
58+
def close_client():
59+
if hasattr(builtins, "_tp_webhook"):
60+
del builtins._tp_webhook
61+
62+
def post_event(event_id, body):
63+
for eid, b in zip(event_id, body):
64+
req = urllib.request.Request(
65+
builtins._tp_webhook,
66+
data=json.dumps({"id": eid, "body": b}).encode(),
67+
headers={"Content-Type": "application/json"},
68+
method="POST",
69+
)
70+
urllib.request.urlopen(req).read()
71+
$$
72+
SETTINGS
73+
type = 'python',
74+
init_function_name = 'open_client',
75+
init_function_parameters = '{"url":"https://hooks.example.com/notify"}',
76+
deinit_function_name = 'close_client',
77+
write_function_name = 'post_event';
78+
```
79+
80+
Replace `urllib` with any HTTP, S3, queue, or proprietary client your environment ships with. Manage Python dependencies through the [Python UDF](/py-udf) library configuration — the same runtime backs both features.
81+
82+
### Failure behavior
83+
84+
If the write function raises, the INSERT fails and the Python traceback is included in the error response. Side effects already performed by your Python code (HTTP requests sent, files written, queue messages published) are **not** rolled back by Timeplus — design idempotent writes, or batch your side effect inside a single transactional call your downstream system controls.

0 commit comments

Comments
 (0)