Skip to content

Latest commit

 

History

History
181 lines (116 loc) · 6.61 KB

File metadata and controls

181 lines (116 loc) · 6.61 KB

Data Model

This document describes the persistence model implemented by entropy-processor, based on:

  1. src/main/resources/db/migration/V1__initial_schema.sql
  2. Entities under src/main/java/com/ammann/entropy/model

1. Persistence Approach

  • Database: PostgreSQL with TimescaleDB extension.
  • Schema lifecycle: Flyway migration at startup.
  • ORM: Hibernate ORM + Panache entities.
  • Hibernate schema mode: validate (migration SQL is the schema source of truth).

2. Table Inventory

Table Role Time-Series Type
entropy_data Primary event storage for ingested entropy events Hypertable
nist_test_results SP 800-22 per-test outcomes Hypertable
nist_90b_results SP 800-90B aggregate assessment outcomes Hypertable
nist_90b_estimator_results SP 800-90B estimator-level detail rows Regular table
data_quality_reports Persisted quality reports Regular table
nist_validation_jobs Async validation job tracking Regular table
entropy_comparison_run Comparison run metadata Regular table
entropy_comparison_result Source-level comparison outcomes Hypertable

3. Structural Relationships

%%{init: {"flowchart": {"curve": "linear"}}}%%
flowchart LR
    ED[entropy_data]
    NTR[nist_test_results]
    N90[nist_90b_results]
    NE[nist_90b_estimator_results]
    JV[nist_validation_jobs]
    CR[entropy_comparison_run]
    RS[entropy_comparison_result]

    ED -->|windowed bitstream source| NTR
    ED -->|windowed bitstream source| N90
    N90 -->|assessment_run_id| NE
    JV -->|test_suite_run_id| NTR
    JV -->|assessment_run_id| N90
    CR -->|comparison_run_id| RS
Loading

Note: nist_90b_estimator_results intentionally does not enforce a DB foreign key to nist_90b_results because of Timescale/partitioning constraints documented in migration comments.

4. Table-Level Design Notes

4.1 entropy_data

Purpose:

  1. Stores one row per ingested entropy event.
  2. Supports time-window analytics and interval computations.

Key columns:

  • hw_timestamp_ns, server_received, sequence, whitened_entropy, source_address.

Key behavior:

  • Converted to hypertable partitioned by server_received with 1-day chunks.
  • Composite primary key (id, server_received) to satisfy Timescale uniqueness constraints.

4.2 nist_test_results

Purpose:

  1. Stores SP 800-22 individual test results.
  2. Supports chunked validation runs via chunk_index and chunk_count.

Key columns:

  • test_suite_run_id, test_name, passed, p_value, executed_at, details.

Time-series behavior:

  • Hypertable partitioned by executed_at (7-day chunk interval).

4.3 nist_90b_results

Purpose:

  1. Stores SP 800-90B assessment outcomes using run-summary row discrimination. Each row is either a run summary row (is_run_summary = TRUE) representing the canonical result for a completed run, or a per-chunk row (is_run_summary = FALSE) representing a single chunk processed during the assessment.
  2. Provides run-level link via assessment_run_id.

Key columns:

  • assessment_run_id, min_entropy, passed, assessment_details, executed_at, is_run_summary, chunk_index, chunk_count.

Row discrimination:

  • Run summary row (is_run_summary = TRUE): Written once after all chunks have been processed. Contains min_entropy = MIN(all chunks), passed = AND(all chunks), and chunk_index = NULL. The assessment_details JSON includes aggregation metadata such as chunk count, aggregation rule, and the index of the estimator source chunk.
  • Per-chunk row (is_run_summary = FALSE): Written during chunk processing. Contains the chunk_index (zero-based) and the assessment outcome for that individual chunk. These rows are retained for forensic diagnosis but are not treated as canonical results.
  • Incomplete run: If the chunk loop fails before completion, no summary row is written. The absence of a summary row for a given assessment_run_id indicates that the run did not complete successfully. Partial per-chunk rows may remain.

Partial unique index:

  • uq_nist_90b_run_summary ON nist_90b_results (assessment_run_id) WHERE is_run_summary = TRUE enforces that at most one summary row exists per assessment run.

Time-series behavior:

  • Hypertable partitioned by executed_at (7-day chunk interval).

4.4 nist_90b_estimator_results

Purpose:

  1. Stores detailed estimator outputs (IID and NON_IID categories).
  2. Preserves semantics for non-entropy estimators via nullable entropy_estimate.

Key constraints:

  • Unique key on (assessment_run_id, test_type, estimator_name).

4.5 data_quality_reports

Purpose:

  • Stores quality assessment summaries generated by quality analysis logic.

Key columns:

  • report_timestamp, window_start, window_end, total_events, overall_quality_score, recommendations.

4.6 nist_validation_jobs

Purpose:

  1. Tracks asynchronous validation workflow state.
  2. Supports API polling for progress and completion.

Lifecycle states encoded by constraint and enums:

  • QUEUED, RUNNING, COMPLETED, FAILED.

Key columns:

  • validation_type, status, progress_percent, current_chunk, total_chunks, test_suite_run_id, assessment_run_id.

4.7 Comparison Tables

entropy_comparison_run:

  • One row per comparison execution, including sample sizes and mixed-source traceability metadata.

entropy_comparison_result:

  • One or more rows per run, one per source type (BASELINE, HARDWARE, MIXED), with NIST and entropy metric outputs.
  • Implemented as hypertable partitioned by created_at.

5. Data Flow Through Persistence

graph TD
    Ingest[gRPC ingest] --> entropy_data
    entropy_data --> N22[SP800-22 execution]
    entropy_data --> N90[SP800-90B execution]

    N22 --> nist_test_results
    N90 --> nist_90b_results
    N90 --> nist_90b_estimator_results

    AsyncJobs[Validation job orchestration] --> nist_validation_jobs
    Quality[Quality analysis] --> data_quality_reports
    Compare[Comparison workflow] --> entropy_comparison_run
    Compare --> entropy_comparison_result
Loading

6. Query Boundary Observations

From entity and service code:

  1. Time-window access is the dominant access pattern (server_received and executed_at).
  2. Interval analytics are performed using native SQL window functions over entropy_data.
  3. Validation retrieval uses run identifiers (test_suite_run_id, assessment_run_id) for aggregation and API responses. SP 800-90B result retrieval filters on is_run_summary to distinguish canonical run-summary rows from forensic per-chunk rows.
  4. Job tracking is independent from test-result tables and linked by run IDs.