Skip to content

Commit cc88662

Browse files
committed
feat: Add time series end-to-end example and derived analytics
- Implemented `17_timeseries_end_to_end.py` to demonstrate a SQL-first time series workflow. - Created a new TimeSeries type with multiple tags and numeric fields. - Added functionality to generate synthetic telemetry data for sensors and bulk insert samples. - Included raw window queries, hourly aggregations, and derived alert-style views. - Updated README to document the new example and its capabilities. - Introduced `run_examples_minimal.py` script to facilitate running multiple examples with minimal inputs. - Enhanced `13_stackoverflow_hybrid_queries.py` with derived time-series analytics in phase 5. - Updated `things-to-do.md` to reflect progress on time series and future tasks.
1 parent 582b5e1 commit cc88662

9 files changed

Lines changed: 904 additions & 24 deletions

File tree

.github/workflows/test-python-examples.yml

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ on:
2525
description: "Glob pattern(s) for examples to run (space-separated, relative to bindings/python/examples)."
2626
required: false
2727
type: string
28-
default: "0[1-9]_*.py 1[0-6]_*.py"
28+
default: "0[1-9]_*.py 1[0-7]_*.py"
2929
build-version:
3030
description: "Override package version (PEP 440) for build.sh"
3131
required: false
@@ -37,10 +37,10 @@ on:
3737
examples:
3838
description: "Glob pattern(s) for examples to run (space-separated, relative to bindings/python/examples)."
3939
required: false
40-
default: "0[1-9]_*.py 1[0-6]_*.py"
40+
default: "0[1-9]_*.py 1[0-7]_*.py"
4141

4242
env:
43-
EXAMPLES: ${{ inputs.examples || '0[1-9]_*.py 1[0-6]_*.py' }}
43+
EXAMPLES: ${{ inputs.examples || '0[1-9]_*.py 1[0-7]_*.py' }}
4444

4545
permissions:
4646
contents: read
@@ -396,6 +396,12 @@ jobs:
396396
timeout_duration=1200
397397
example_jvm_args=""
398398
;;
399+
"17_timeseries_end_to_end.py")
400+
example_args="--hours 2 --interval-minutes 10"
401+
example_name="$example (time-series end-to-end, minimal)"
402+
timeout_duration=900
403+
example_jvm_args=""
404+
;;
399405
*)
400406
example_args=""
401407
example_name="$example"

bindings/python/docs/examples/13_stackoverflow_hybrid_queries.md

Lines changed: 30 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3,16 +3,18 @@
33
[View source code]({{ config.repo_url }}/blob/{{ config.extra.version_tag }}/bindings/python/examples/13_stackoverflow_hybrid_queries.py){ .md-button }
44

55
This example builds a standalone Stack Overflow pipeline combining document tables,
6-
property graph data, embeddings, vector indexes, and hybrid queries.
6+
property graph data, embeddings, vector indexes, hybrid queries, and a derived
7+
time-series activity layer.
78

89
## Overview
910

10-
Example 13 runs four phases:
11+
Example 13 runs five phases:
1112

1213
1. XML to document tables
1314
2. XML to graph
1415
3. Embeddings plus vector indexes on Question, Answer, and Comment
1516
4. Hybrid queries combining SQL, OpenCypher, and vector search
17+
5. Derived time-series analytics from existing timestamped events
1618

1719
## Current Repository Guidance
1820

@@ -24,6 +26,8 @@ Example 13 runs four phases:
2426
- `GraphBatch` is the repository's recommended bulk graph ingest path from Python
2527
- Graph edge creation uses RID-based directed endpoints
2628
- Traversal semantics are directional
29+
- Phase 5 does not replace the document or graph models; it builds a compact
30+
`ActivitySeries` projection from existing `CreationDate` / `Date` fields
2731

2832
## Run
2933

@@ -36,7 +40,8 @@ python 13_stackoverflow_hybrid_queries.py \
3640
--encode-batch-size 256 \
3741
--model all-MiniLM-L6-v2 \
3842
--heap-size 4g \
39-
--top-k 10
43+
--top-k 10 \
44+
--timeseries-top-tags 5
4045
```
4146

4247
## Key Options
@@ -48,3 +53,25 @@ python 13_stackoverflow_hybrid_queries.py \
4853
- `--heap-size`: JVM heap size
4954
- `--top-k`: hybrid/vector result count
5055
- `--candidate-limit`: candidate pool size for hybrid ranking
56+
- `--timeseries-top-tags`: number of top tags projected into the derived
57+
time-series activity layer
58+
59+
## Time-Series Layer
60+
61+
The script now adds a compact derived TimeSeries type after the main hybrid phases:
62+
63+
- `ActivitySeries`
64+
- timestamp: daily bucket timestamp
65+
- tags: `event_type`, `scope`, `tag`
66+
- fields: `event_count`, `total_score`, `avg_score`
67+
68+
The source data comes from the timestamps already present on:
69+
70+
- `Question.CreationDate`
71+
- `Answer.CreationDate`
72+
- `Comment.CreationDate`
73+
- `Badge.Date`
74+
75+
That means the example stays faithful to the Stack Overflow domain model while also
76+
demonstrating how ArcadeDB can project graph/document events into a time-series view
77+
for trend analysis.
Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
# 17 - Time Series End-to-End
2+
3+
[View source code]({{ config.repo_url }}/blob/{{ config.extra.version_tag }}/bindings/python/examples/17_timeseries_end_to_end.py){ .md-button }
4+
5+
This example demonstrates the current Python-bindings posture for time series:
6+
use plain ArcadeDB SQL from Python rather than a dedicated Python object API.
7+
8+
It covers:
9+
10+
- creating a `TIMESERIES TYPE` with multiple tags and numeric fields
11+
- generating deterministic telemetry for six building sensors
12+
- inserting hundreds of samples transactionally
13+
- running raw window queries with multiple tag filters
14+
- grouping into hourly buckets with `ts.timeBucket()`
15+
- aggregating at sensor, building, and region levels
16+
- deriving alert-style views from SQL aggregates
17+
- reading back the latest sample per sensor
18+
19+
## Run
20+
21+
From `bindings/python/examples`:
22+
23+
```bash
24+
python3 17_timeseries_end_to_end.py
25+
```
26+
27+
With a longer synthetic run:
28+
29+
```bash
30+
python3 17_timeseries_end_to_end.py --hours 12 --interval-minutes 5
31+
```
32+
33+
## Notes
34+
35+
- The example is intentionally SQL-first.
36+
- If the packaged ArcadeDB runtime does not include TimeSeries SQL support,
37+
the script prints a short explanation and exits.
38+
- The database is created under `./my_test_databases/timeseries_demo_db` and is kept for inspection.
39+
- The generated data models smart-building telemetry with tags for region, building,
40+
zone, and sensor id plus fields for temperature, humidity, power, CO2, and occupancy.
41+
42+
## Why SQL-First?
43+
44+
The bindings already expose a stable generic interface through `db.command()` and
45+
`db.query()`. For time series, that keeps Python maintenance low while avoiding a
46+
premature public object API around upstream-owned semantics.

bindings/python/docs/examples/index.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -61,6 +61,9 @@ Four-way table-ingest comparison. Repository guidance from these experiments is
6161
**[16 - Import Database vs Transactional Graph Ingest](16_import_database_vs_transactional_graph_ingest.md)**
6262
Four-way graph-ingest comparison. Repository guidance from these experiments is to prefer `GraphBatch` for bulk graph ingest.
6363

64+
**[17 - Time Series End-to-End](17_timeseries_end_to_end.md)**
65+
SQL-first time-series workflow covering type creation, tagged inserts, range queries, and hourly bucket aggregation.
66+
6467
## Quick Start
6568

6669
**⚠️ Important: Always run examples from the `examples/` directory.**
@@ -83,6 +86,7 @@ python 01_simple_document_store.py
8386
9. **Vector Benchmarks** (11/12) - Index build and search benchmarking across vector backends
8487
10. **Hybrid Queries** (13) - Combined SQL, graph, and vector workflow
8588
11. **Lifecycle And Ingest Benchmarks** (14/15/16) - Embedded lifecycle timing and ingest comparisons
89+
12. **Time Series SQL Workflow** (17) - Tagged samples, range queries, and bucket aggregation from Python
8690

8791
---
8892

0 commit comments

Comments
 (0)