This document defines what the research platform must do, what components it contains, and what contracts must remain stable during implementation.
It is the primary functional specification for:
- application structure
- state and data contracts
- API behavior
- session and round lifecycle
- replay and reproducibility requirements
Related documents:
- motivation.md
- theoretical_background.md
- system_test_specification.md
- pre_implementation_blueprint.md
This specification covers the research prototype used to study interactive prompt-embedding steering for diffusion models.
It includes:
- frontend behavior
- backend services
- experiment, session, round, and candidate state
- persistence and replay
- strategy interfaces and constraints
- logging and reproducibility
- tracing and debugging surfaces
It does not define:
- production deployment architecture
- large-scale multi-tenant operations
- enterprise-grade security
- model training or fine-tuning pipelines
The system must:
- support repeatable interactive steering sessions
- support multiple sampling, feedback, and update strategies
- preserve enough state for replay and analysis
- isolate randomness as much as practical
- stay simple enough for rapid research iteration
The system consists of six major parts:
- Frontend: interface for prompt entry, image display, feedback collection, replay, and export
- Experiment Controller: orchestrates session lifecycle and round progression
- Generation Engine: encodes prompts, applies steering, and renders images
- Sampling Module: proposes steering candidates under a configured policy
- Preference / Update Module: learns from feedback and computes the next incumbent
- Storage and Evaluation Layer: persists state, computes metrics, and reconstructs replay data
The canonical workflow is:
- create or load an experiment
- create a session with prompt and configuration
- encode the base prompt and initialize steering state
- propose round candidates
- render candidate images
- collect user feedback
- update preference state and incumbent steering vector
- repeat until the user stops
- export or replay the session
The following must remain true:
- each session uses an immutable configuration snapshot
- each round belongs to exactly one session
- each candidate belongs to exactly one round
- feedback is attached to exactly one round
- feedback for a round is accepted at most once
- seed information is persisted for every rendered candidate
- replay data is sufficient to reconstruct decision history
- session state is durable after each completed round and feedback submission
- a new round cannot be generated while the session is still awaiting feedback for the current round
An experiment defines a reusable research configuration.
Required fields:
- experiment ID
- title
- description
- created timestamp
- model checkpoint
- steering mode
- sampler strategy
- feedback strategy
- update strategy
- seed policy
- candidate count
- trust-region settings
- anchoring settings
- status
- researcher notes
A session is one interactive run of one experiment with one prompt.
Required fields:
- session ID
- experiment ID
- prompt text
- negative prompt text
- model name
- base embedding cache key
- steering basis configuration
- current state
z_t - current round index
- incumbent candidate ID if available
- session status
- final selected candidate if available
A round is one propose-render-feedback-update cycle.
Required fields:
- round ID
- session ID
- round index
- incumbent
z_t - sampled candidate list
- seed policy used
- render status
- user feedback summary
- update summary
- latency metrics
A candidate is one proposed point in steering space.
Required fields:
- candidate ID
- round ID
- candidate index within round
- steering vector
z - embedding offset metadata
- sampler role
- predicted score if available
- predicted uncertainty if available
- seed
- generation parameters
- image path or URL
- render status
A feedback event records one user action on a round.
Required fields:
- feedback ID
- round ID
- candidate IDs involved
- feedback type
- payload
- optional critique text
- timestamp
- normalized internal representation
Recommended states:
draftactivepausedcompletedarchived
Recommended states:
createdreadyawaiting_feedbackupdatingcompletedfailed
paused
Recommended states:
pendingrenderingsucceededfailed
The frontend should be intentionally simple:
- plain HTML with minimal JavaScript
- minimal hidden state
- easy DOM inspection and debugging
- accessible controls
- predictable interaction patterns across rounds
- a visible trace surface during interactive use
Purpose:
- create a new experiment
- list existing experiments
- resume a session
- compare results
- export logs
Required elements:
- experiment list
- summary columns
- filters by model, strategy, and date
- create experiment action
- resume session action
- export links
Required inputs:
- prompt text
- negative prompt text
- model checkpoint selector
- image size
- number of candidates per round
- seed policy selector
- sampler selector
- feedback selector
- updater selector
- trust-region parameters
- anchor strength
Required actions:
- start session
- save preset
- load preset
Required layout:
- header with experiment and round metadata
- control panel
- candidate image grid
- state summary panel
- trace panel
Required actions:
- next round
- regenerate current round
- pause session
- revert to previous round if supported
- pin candidate as incumbent
- mark candidate as favorite
- export round data
Required grid behavior:
- show 4 to 12 images per round
- use consistent candidate labeling
- display metadata on demand
- support image zoom
- preserve stable ordering within a round
Required feedback widgets:
- scalar rating
- rating-driven pairwise derivation
- rating-driven top-k derivation
- shortlist selection
- text critique entry
Purpose:
- replay completed rounds in order
- inspect candidates and feedback
- inspect trajectory summaries
- inspect metrics and exports
The frontend should maintain:
- active experiment config snapshot
- active session ID
- current round number
- current candidate set
- local unsaved feedback state
- request status
- recoverable error messages
The interface must support:
- keyboard navigation
- visible labels without hover dependence
- non-color-only distinctions
- screen-readable control labels
- focus visibility for active controls
The UI must:
- surface per-candidate failures without hiding successful candidates
- preserve unsaved feedback where possible after recoverable errors
- prevent double submission when a request is already in flight
- show the current round status clearly
- make trace and error information inspectable during debugging
Baseline stack:
- Python 3.11+
- FastAPI
- Diffusers
- SQLite for local research
- filesystem image storage
- Pydantic models
Responsibilities:
- experiment management
- session creation and retrieval
- round generation
- feedback submission
- replay and export delivery
- trace event intake for the frontend
Responsibilities:
- initialize session state
- call sampler
- call generation manager
- persist round data
- call updater after feedback
- enforce lifecycle transitions
Responsibilities:
- encode prompts
- cache text embeddings
- construct steering basis
- apply steering vector
z - support pooled and token-level modes
Responsibilities:
- produce candidates from current state
- apply trust-region constraints
- enforce diversity constraints
- label candidates by role
Responsibilities:
- render images from embeddings
- manage seed policy
- collect latency and failure metadata
- expose deterministic test hooks
Responsibilities:
- normalize feedback
- update preference model
- compute next incumbent state
- compute update summary
- apply stabilization controls
Responsibilities:
- compute online metrics
- compute aggregate session metrics
- prepare exports and plots
Responsibilities:
- persist structured state
- persist artifacts
- provide repository interfaces
- support replay queries
idcreated_atupdated_atnamedescriptionstatusconfig_json
idexperiment_idpromptnegative_promptmodel_namestatusbasis_typecurrent_roundcurrent_z_jsonincumbent_candidate_idcreated_atupdated_at
idsession_idround_indexincumbent_z_jsontrust_radiusseed_policyrender_statusupdate_summary_jsoncreated_at
idround_idcandidate_indexz_jsonsampler_rolepredicted_scorepredicted_uncertaintyseedrender_statusimage_pathgeneration_params_json
idround_idtypepayload_jsonnormalized_payload_jsoncritique_textcreated_at
idsession_idtypepathmetadata_jsoncreated_at
Artifacts to store:
- generated images
- round manifests
- configuration snapshots
- exported replay bundles
- evaluation reports
- JSON trace logs
The API should follow these rules:
- all responses are JSON except artifact downloads
- all write operations return persisted identifiers
- every error returns a structured code and human-readable message
- session config becomes immutable once the session is created
- round generation is idempotent only when explicitly requested
Create a new experiment.
Request body:
namedescriptionconfig
Response:
experiment_id
List experiments.
Return full experiment metadata and configuration.
Create a session from an experiment or full config.
Request body:
experiment_idor inlineconfigpromptnegative_prompt
Response:
session_idinitial_state
Return session summary and current state.
Generate the next round of candidates.
Response:
round_idcandidate_metadataimage_urlsstate_summary
Submit feedback for a round.
Request body:
feedback_typepayload- optional
critique_text
Response:
update_summarynext_incumbent_state
Return ordered rounds, artifacts, and summaries for replay.
Persist browser-side trace events for debugging and auditability.
Export logs, metrics, and artifact manifest.
The system must support at least:
- low-dimensional latent code
- token-level offset mode
- pooled embedding mode
For low-dimensional steering:
E(z) = E0 + U z
Where:
E0is the base embeddingUis the steering basiszis the controllable code
The system should support:
- random orthonormal basis
- PCA basis from prior accepted moves
- hand-defined semantic basis
- basis from prompt rewrite differences
- hybrid basis
The steering representation must support:
- trust-region clipping
- anchor-to-origin regularization
- optional subspace masks
- candidate diversity measurement
Each sampler must implement:
propose(state, config, preference_model) -> list[candidate]- candidate role tagging
- reproducible behavior under fixed RNG state
The system must include:
- random local sampler
- exploit-plus-orthogonal sampler
- uncertainty-guided sampler
The system may later include:
- Thompson-style sampler
- quality-diversity sampler
- CMA-ES style sampler
- dueling-bandit sampler
- critique-conditioned sampler
- subspace-adaptive sampler
Per round, the system should log and optionally constrain:
- exploit candidate count
- explore candidate count
- validation candidate count
- mirror candidate count
- replay candidate count
All frontend feedback must normalize into one internal event format.
The backend should derive pairwise preferences from richer signals when useful.
The system must support:
- scalar ratings
- pairwise comparison
- partial ranking
- winner plus critique
- select-all-that-fit
The platform should support:
- hidden repeated comparisons
- user confidence reporting
- decision-time logging
- uncertain or skip actions
Each updater must implement:
update(state, candidates, feedback, model) -> new_state, update_summary
The system must include:
- winner-copy updater
- winner-average updater
- linear preference updater
The system may later include:
- Bradley-Terry / pairwise logistic updater
- Bayesian updater
- contextual bandit updater
- critique-conditioned updater
- trust-region policy optimizer
- multi-subspace updater
Each updater should optionally support:
- trust-region clipping
- anchor regularization
- momentum
- rollback on confidence drop
- incumbent preservation under instability
The system must support:
- fixed-per-round
- fixed-per-candidate-role
- multi-seed averaging
For every candidate, persist:
- seed
- scheduler settings
- inference step count
- guidance scale
- image resolution
The platform must support both online and offline evaluation.
- average time per round
- average time per feedback action
- rounds until stop
- images generated per session
- user consistency score
- preference improvement over rounds
- incumbent win rate against earlier incumbents
- regret proxy relative to best observed candidate
- model calibration where applicable
- performance under alternate seeds
- rank stability across seeds
- score estimate variance
- pairwise embedding distance
- perceptual image distance
- mode coverage proxy
- distance from origin
- distance from previous incumbent
- semantic drift notes where available
- perceived controllability
- perceived usefulness of feedback
- fatigue level
- final-image satisfaction
Every experiment must log:
- full config snapshot
- random seeds
- software version
- model checkpoint identifier
- hardware metadata
- request and response manifests for each round
- serialized feedback events
- request-level backend traces
- browser-submitted frontend trace events
A replay must reconstruct:
- prompt and config
- round order
- candidate images
- feedback timeline
- updater summaries
Version independently:
- frontend
- backend
- model wrapper
- sampler
- updater
- schema
The system must handle:
- one-candidate generation failure
- render timeout
- partial round completion
- duplicate feedback submission
- invalid ranking payload
- GPU out-of-memory events
- experiment resume after crash
Behavioral requirements:
- failures are visible in the UI
- one failed candidate does not invalidate the whole round by default
- durable state is written after each completed round and feedback submission
- invalid lifecycle transitions return explicit conflict-style errors
Minimum requirements:
- no arbitrary file path input from the frontend
- input validation on all API endpoints
- critique text treated as user data
- exports must not leak server-local paths
- session isolation may be added later if multi-user support appears
The v1 system should assume:
- local or single-node execution
- manual operator oversight
- limited concurrency
- reproducibility prioritized over throughput
project/
app/
api/
routes_experiments.py
routes_sessions.py
routes_rounds.py
routes_exports.py
core/
config.py
logging.py
schema.py
engine/
prompt_encoder.py
steering_basis.py
generation.py
seeds.py
samplers/
base.py
random_local.py
exploit_orthogonal.py
uncertainty.py
thompson.py
quality_diversity.py
feedback/
normalization.py
validation.py
updaters/
base.py
winner_copy.py
winner_average.py
linear_pref.py
bradley_terry.py
bayesian.py
evaluation/
metrics.py
replay.py
reports.py
storage/
db.py
models.py
repository.py
frontend/
templates/
index.html
setup.html
session.html
replay.html
static/
styles.css
app.js
tests/
unit/
integration/
e2e/
fixtures/
scripts/
run_dev.py
export_session.py
replay_session.py
docs/
system_specification.md
- Stable Diffusion or SDXL backend through Diffusers
- low-dimensional steering mode
- one interactive session page
- one replay page
- at least three samplers
- at least three feedback modes
- at least three updaters
- fixed-per-round seed mode
- export support
- deterministic replay support
- critique-conditioned updates
- Bayesian preference model
- multi-seed validation mode
- quality-diversity archive
- study report generation
An implementation generated from this specification should produce:
- a Python FastAPI backend
- a simple HTML/CSS/JS frontend
- modular sampler interfaces
- modular updater interfaces
- a working diffusion wrapper
- persistence and export support
- a test suite aligned with the test specification
- local setup documentation
This system is a controlled research platform for interactive user-guided image generation through prompt-embedding steering.
Its architectural priorities are:
- modular experimentation
- durable state
- replayability
- controlled randomness
- low implementation complexity consistent with research use