Skip to content

Commit 922b7db

Browse files
Merge pull request #15 from SolidLabResearch/codex/hybrid-baseline-static-data
Add JanusQL baseline bootstrap support
2 parents 9fb621e + be9a26d commit 922b7db

14 files changed

Lines changed: 1531 additions & 238 deletions

README.md

Lines changed: 112 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -1,79 +1,150 @@
11
# Janus
22

3-
Janus is a hybrid engine for unified Live and Historical RDF Stream Processing, implemented in Rust.
3+
Janus is a Rust engine for unified historical and live RDF stream processing.
4+
5+
It combines:
6+
7+
- historical window evaluation over segmented RDF storage
8+
- live window evaluation over incoming streams
9+
- a single Janus-QL query model for hybrid queries
10+
- an HTTP/WebSocket API for query lifecycle management and result delivery
11+
12+
The name comes from the Roman deity Janus, associated with transitions and with looking both backward and forward. That dual perspective matches Janus's goal: querying past and live RDF data together.
13+
14+
## What Janus Supports
15+
16+
- Historical windows with `START` / `END`
17+
- Sliding live windows with `RANGE` / `STEP`
18+
- Hybrid queries that mix historical and live windows
19+
- Extension functions for anomaly-style predicates such as thresholds, relative change, z-score, outlier checks, and trend divergence
20+
- Optional baseline bootstrapping for hybrid anomaly queries with `USING BASELINE <window> LAST|AGGREGATE`
21+
- HTTP endpoints for registering, starting, stopping, listing, and deleting queries
22+
- WebSocket result streaming for running queries
23+
24+
## Query Model
25+
26+
Janus uses Janus-QL, a hybrid query language for querying historical and live RDF data in one query.
27+
28+
Example:
29+
30+
```sparql
31+
PREFIX ex: <http://example.org/>
32+
PREFIX janus: <https://janus.rs/fn#>
33+
PREFIX baseline: <https://janus.rs/baseline#>
34+
35+
REGISTER RStream ex:out AS
36+
SELECT ?sensor ?reading
37+
FROM NAMED WINDOW ex:hist ON LOG ex:store [START 1700000000000 END 1700003600000]
38+
FROM NAMED WINDOW ex:live ON STREAM ex:stream1 [RANGE 5000 STEP 1000]
39+
USING BASELINE ex:hist AGGREGATE
40+
WHERE {
41+
WINDOW ex:hist {
42+
?sensor ex:mean ?mean .
43+
?sensor ex:sigma ?sigma .
44+
}
45+
WINDOW ex:live {
46+
?sensor ex:hasReading ?reading .
47+
}
48+
?sensor baseline:mean ?mean .
49+
?sensor baseline:sigma ?sigma .
50+
FILTER(janus:is_outlier(?reading, ?mean, ?sigma, 3))
51+
}
52+
```
53+
54+
`USING BASELINE` is optional. If present, Janus bootstraps baseline values from the named historical window before or during live execution:
455

5-
The name "Janus" is inspired by the Roman deity Janus who is the guardian of doorways and transitions, and looks towards both the past and the future simultaneously. This dual perspective reflects Janus's capability to process both Historical and Live RDF streams in a unified manner utilizing a single query language and engine.
56+
- `LAST`: use the final historical window snapshot as baseline
57+
- `AGGREGATE`: merge the historical window outputs into one compact baseline
658

759
## Performance
860

9-
Janus achieves high-throughput RDF stream processing with dictionary encoding and streaming segmented storage:
61+
Janus uses dictionary encoding and segmented storage for high-throughput ingestion and historical reads.
1062

11-
- Write Throughput: 2.6-3.14 Million quads/sec
12-
- Read Throughput: 2.7-2.77 Million quads/sec
13-
- Point Query Latency: Sub-millisecond (0.235 ms at 1M quads)
14-
- Space Efficiency: 40% reduction through dictionary encoding (24 bytes vs 40 bytes per event)
63+
- Write throughput: 2.6-3.14 million quads/sec
64+
- Read throughput: 2.7-2.77 million quads/sec
65+
- Point query latency: 0.235 ms at 1M quads
66+
- Space efficiency: about 40% smaller encoded events
1567

16-
For detailed benchmark results, see [BENCHMARK_RESULTS.md](./BENCHMARK_RESULTS.md).
68+
Detailed benchmark data is in [BENCHMARK_RESULTS.md](./BENCHMARK_RESULTS.md).
1769

18-
## Development
70+
## Quick Start
1971

2072
### Prerequisites
2173

22-
- Rust (stable toolchain)
74+
- Rust stable
2375
- Cargo
2476

25-
### Building
77+
### Build
2678

2779
```bash
28-
# Debug build
2980
make build
30-
31-
# Release build (optimized)
3281
make release
3382
```
3483

35-
### Testing
84+
### Run the HTTP API
3685

3786
```bash
38-
# Run all tests
39-
make test
40-
41-
# Run tests with verbose output
42-
make test-verbose
87+
cargo run --bin http_server -- --host 127.0.0.1 --port 8080 --storage-dir ./data/storage
4388
```
4489

45-
### Code Quality
90+
Then check the server:
4691

47-
Before pushing to the repository, run the CI/CD checks locally:
92+
```bash
93+
curl http://127.0.0.1:8080/health
94+
```
95+
96+
### Try the HTTP client example
4897

4998
```bash
50-
# Run all CI/CD checks (formatting, linting, tests, build)
51-
make ci-check
99+
cargo run --example http_client_example
100+
```
52101

53-
# Or use the script directly
54-
./scripts/ci-check.sh```
102+
This example demonstrates:
55103

56-
This will run:
57-
- **rustfmt** - Code formatting check
58-
- **clippy** - Lint checks with warnings as errors
59-
- **tests** - Full test suite
60-
- **build** - Compilation check
104+
- query registration
105+
- query start and stop
106+
- query inspection
107+
- replay control
108+
- WebSocket result consumption
61109

62-
Individual checks can also be run:
110+
## Development
111+
112+
### Common Commands
63113

64114
```bash
65-
make fmt # Format code
66-
make fmt-check # Check formatting
67-
make lint # Run Clippy
68-
make check # Run formatting and linting checks
115+
make build # debug build
116+
make release # optimized build
117+
make test # full test suite
118+
make test-verbose # verbose tests
119+
make fmt # format code
120+
make fmt-check # check formatting
121+
make lint # clippy with warnings as errors
122+
make check # formatting + linting
123+
make ci-check # local CI script
69124
```
70125

71-
## Licence
126+
### Examples
72127

73-
This code is copyrighted by Ghent University - imec and released under the MIT Licence
128+
The repository includes runnable examples under [`examples/`](./examples), including:
74129

75-
## Contact
130+
- [`examples/http_client_example.rs`](./examples/http_client_example.rs)
131+
- [`examples/comparator_demo.rs`](./examples/comparator_demo.rs)
132+
- [`examples/demo_dashboard.html`](./examples/demo_dashboard.html)
133+
134+
## Project Layout
76135

77-
For any questions, please contact [Kush](mailto:mailkushbisen@gmail.com) or create an issue in the repository.
136+
- [`src/api`](./src/api): query lifecycle and orchestration
137+
- [`src/parsing`](./src/parsing): Janus-QL parsing
138+
- [`src/stream`](./src/stream): live stream processing
139+
- [`src/execution`](./src/execution): historical execution
140+
- [`src/storage`](./src/storage): segmented RDF storage
141+
- [`src/http`](./src/http): REST and WebSocket API
142+
- [`tests`](./tests): integration and parser coverage
143+
144+
## License
145+
146+
This project is released under the MIT License.
147+
148+
## Contact
78149

79-
---
150+
For questions, open an issue or contact [Kush](mailto:mailkushbisen@gmail.com).

docs/ANOMALY_DETECTION.md

Lines changed: 84 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,84 @@
1+
# Anomaly Detection
2+
3+
Janus already supports anomaly-oriented extension functions, but they are stateless functions evaluated within one query execution context.
4+
5+
That distinction matters.
6+
7+
## What Extension Functions Are Good At
8+
9+
Current extension functions are sufficient for:
10+
11+
- fixed thresholds
12+
- relative change checks
13+
- z-score style checks when mean and sigma are already present
14+
- simple outlier or divergence predicates over current bindings
15+
16+
This works well when the query already has everything it needs in one evaluation context.
17+
18+
## Where Baselines Help
19+
20+
Baselines help when live anomaly scoring depends on historical context such as:
21+
22+
- deviation from normal behavior
23+
- per-sensor baselines
24+
- volatility comparison
25+
- recent historical trend
26+
27+
In those cases, Janus can bootstrap compact historical values into live static data and let the live query compare current readings against them.
28+
29+
## What Janus Does Not Do
30+
31+
Janus does not currently maintain a full continuously updated hybrid historical/live relation.
32+
33+
So if you need:
34+
35+
- long-running stateful models
36+
- full seasonal context
37+
- large retained historical buffers inside the engine
38+
39+
you will need either:
40+
41+
- external model state
42+
- future dedicated baseline refresh logic
43+
- more specialized stateful operators
44+
45+
## Recommended Pattern
46+
47+
For a first anomaly-detection pipeline in Janus:
48+
49+
1. Use a historical query that emits one compact row per anchor.
50+
2. Materialize baseline values such as `mean` and `sigma`.
51+
3. Join those values in the live query using `baseline:*` predicates.
52+
4. Apply extension functions on the live side.
53+
54+
Example:
55+
56+
```sparql
57+
PREFIX ex: <http://example.org/>
58+
PREFIX janus: <https://janus.rs/fn#>
59+
PREFIX baseline: <https://janus.rs/baseline#>
60+
61+
REGISTER RStream ex:out AS
62+
SELECT ?sensor ?reading
63+
FROM NAMED WINDOW ex:hist ON LOG ex:store [START 1700000000000 END 1700003600000]
64+
FROM NAMED WINDOW ex:live ON STREAM ex:stream1 [RANGE 5000 STEP 1000]
65+
USING BASELINE ex:hist LAST
66+
WHERE {
67+
WINDOW ex:hist {
68+
?sensor ex:mean ?mean .
69+
?sensor ex:sigma ?sigma .
70+
}
71+
WINDOW ex:live {
72+
?sensor ex:hasReading ?reading .
73+
}
74+
?sensor baseline:mean ?mean .
75+
?sensor baseline:sigma ?sigma .
76+
FILTER(janus:is_outlier(?reading, ?mean, ?sigma, 3))
77+
}
78+
```
79+
80+
## Choosing LAST vs AGGREGATE
81+
82+
- Use `LAST` when you care about the most recent historical regime before live execution.
83+
- Use `AGGREGATE` when you want a more stable summary across multiple historical sliding windows.
84+
- Prefer fixed historical windows unless you have a clear reason to derive a baseline from many historical subwindows.

0 commit comments

Comments
 (0)