Skip to content

Commit 7249746

Browse files
Merge pull request #21 from SolidLabResearch/codex/benchmark-combined
Fix sparse index max timestamps for point lookups
2 parents 9292577 + 1c12a34 commit 7249746

5 files changed

Lines changed: 1787 additions & 0 deletions

File tree

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
# Hybrid Scaling Combined Benchmark Design
2+
3+
Date: 2026-06-17
4+
5+
## Status
6+
7+
Accepted
8+
9+
## Context
10+
11+
Janus needs a combined hybrid historical and live query benchmark that scales the historical store size while running both Janus unified and decomposed baseline executions with the same deterministic live stream trace in realtime.
12+
13+
## Decision
14+
15+
1. Create a new binary hybrid_scaling_combined in src/bin/hybrid_scaling_combined.rs.
16+
2. Keep the queried historical window static at 1,000 events, representing exactly 1,000 quads.
17+
3. Scale the total store size H by writing H - 1,000 older events prior to the queried window. This keeps query bounds and query strings identical across all H, eliminating compiler and query plan variance while testing if index size affects bounded lookup performance.
18+
4. Implement a realtime event replay pacing loop using event intervals to simulate realtime streaming accurately.
19+
5. Record both target_historical_quads and actual_historical_quads.
20+
6. Generate JSON, CSV, and Markdown reports.
21+
22+
## Alternatives Considered
23+
24+
- **Dynamic Query Bounds**: Growing the historical database forward and changing the query window start/end timestamps based on H. Rejected because this changes the query string on every size H, causing varying query registration overhead and caching behavior.
25+
- **Accelerated Live Processing**: Running the live stream as fast as possible without sleeps. Rejected because the timing metrics (like first hybrid result delay relative to start and window overheads) require realtime pacing to match realistic RDF stream processing engines.
Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
# Hybrid Scaling Query Types Benchmark Design
2+
3+
Date: 2026-06-17
4+
5+
## Status
6+
7+
Accepted
8+
9+
## Context
10+
11+
The hybrid scaling combined benchmark currently runs with a static queried historical window size of 1,000 events. To evaluate different scaling behaviors under various selectivity patterns inside the hybrid live-plus-historical query, the benchmark needs to support multiple query/access patterns: point_lookup, fixed_60s, range_10_percent, range_50_percent, and range_100_percent.
12+
13+
## Decision
14+
15+
1. Extend the CLI parser to support a `--historical-query-types` flag, accepting comma-separated strings of the target types.
16+
2. Store sequential, unique quads in the historical SegmentedStorage to prevent deduplication and ensure that the query selectivity matches the physical range size.
17+
3. Use a slight fractional offset in the object field (`40.0 + (index % 17) + index * 0.000001`) to guarantee quad uniqueness while keeping the baseline flow value within range to successfully join and match the live events.
18+
4. Modify the decomposed baseline's `join_live_with_baseline_with_filter` to parse flow values as `f64` instead of `i32` to correctly handle fractional baseline averages.
19+
5. Delineate and output the detailed query metrics: `historical_query_start_ms`, `historical_query_end_ms`, `historical_query_span_ms`, `historical_result_count`, `target_historical_quads`, and `historical_query_type` in `HybridScalingRow`, renaming `window_processing_overhead_ms` to `post_trigger_result_observation_delay_ms` for correctness.
20+
21+
## Alternatives Considered
22+
23+
- **Using distinct subjects for all events**: Rejected because only junctions with IRIs in `junction/{0..63}` exist in the live stream. If the query window selects subjects outside of this range (e.g. older history in larger stores), the hybrid query would have a zero result count, complicating verification of correct end-to-end execution.
24+
- **Using unique predicates**: Rejected because the query's graph patterns are written with a static predicate `ex:baselineFlow`. Changing the predicate would require rewriting the query patterns dynamically, introducing parsing/registration noise.

0 commit comments

Comments
 (0)