- Instrumented Gen-AI Application: Your service must tag distinct workloads (routes, nodes, agent steps) and log every prompt/completion pair.
- Log Store: Elasticsearch (or equivalent) captures production traffic so datasets can be built automatically.
- Dataset & Model Ops Infra: The blueprint spins up NeMo Datastore, Evaluator, Customizer, plus local API & workers to orchestrate jobs.
- Post-Eval Human Review: Engineers/researchers validate promising models before promotion; no user feedback collection.
Think of this flywheel as a discovery and promotion service that surfaces promising smaller models rather than a fully autonomous replacement engine.
The following diagram illustrates the high-level architecture of the Data Flywheel Blueprint:
Note
Version 1 of the Data Flywheel Foundational Blueprint optimizes cost & latency via model distillation. Future versions will target absolute accuracy gains and agentic observability (prompt / template suggestions).
📖 For complete data logging implementation: See Data Logging Guide
Use a continuous log exportation flow for your production environments:
- Application emits JSON: Every prompt/response is captured by your service (language-agnostic; any HTTP middleware, logger, or side-car works).
- Exporter ships records: A lightweight log exporter forwards those records to Elasticsearch in near real-time.
- Flywheel API pulls data: Workers query Elasticsearch to build evaluation and fine-tune splits automatically.
sequenceDiagram
participant App as Application
box Flywheel
participant ES as Log store
participant API as Flywheel API
participant Worker as Worker
end
box NMP
participant datastore as Datastore
participant dms as DMS
participant customizer as Customizer
participant eval as Evaluator
end
App->>ES: Log usage data
API->>Worker: Start evaluation job
Worker <<->> ES: Pull data
Worker ->> datastore: Store eval and<br>FT datasets
loop For each NIM
Worker ->> dms: Spin up NIM
Worker ->> customizer: Fine tune NIM
Worker->> eval: Base evaluation
Worker->> eval: ICL evaluation
Worker->> eval: FT eval
Worker->>API: Work
end
API->>App: Notify of new model
flowchart TD
subgraph ex["Example Application<br>e.g. AIVA"]
subgraph AIVA
agent["Agent Node"]
LLM
Exporter
agent --> LLM
agent --> Exporter
end
subgraph loader_script["load_test_data.py"]
script_es["ES client"]
end
end
style ex fill:#ddddff
script_es --> log_store
Exporter --> log_store
subgraph Blueprint["docker compose"]
api["API"]
workers["Workers"]
log_store["Elasticsearch"]
queue["Queue"]
database["Database"]
end
subgraph k8s["K8s cluster"]
nmp["NeMo microservices"]
end
workers --> nmp
style Blueprint fill:#efe
admin["Admin app<br>(e.g. notebook)"] --> api
The Data Flywheel Blueprint includes an automatic cleanup system that ensures proper resource management when the system is shut down unexpectedly or when workers are terminated. This prevents resource leaks and ensures clean system state.
The CleanupManager automatically activates during worker shutdown and performs the following operations:
- Detects running resources: Finds all flywheel runs with
PENDINGorRUNNINGstatus - Identifies active NIMs: Locates all NVIDIA Inference Microservices with
RUNNINGdeployment status - Cancels running jobs:
- Cancels active customization jobs through NeMo Customizer
- Shuts down deployments:
- Stops all running NIM deployments via NeMo Deployment Manager
- Shuts down local LLM judge deployments (remote judges are unaffected)
- Updates database state: Marks all resources as
CANCELLEDwith appropriate timestamps and error messages
The cleanup manager activates automatically in these scenarios:
- Worker shutdown: When Celery workers receive shutdown signals (SIGTERM, SIGINT)
- Container termination: When Docker containers are stopped, triggering Celery worker shutdown
- System restart: During planned or unplanned system restarts
- Database-driven: Only cleans up resources marked as running in the database
- Error resilience: Continues cleanup even if individual operations fail
- Comprehensive logging: Records all cleanup actions and errors for debugging
