Skip to content

Latest commit

 

History

History
103 lines (69 loc) · 8 KB

File metadata and controls

103 lines (69 loc) · 8 KB

lightcone-cli Reference

Reference for lightcone-cli execution: CLI commands, development workflow, status interpretation, and failure diagnosis. For astra.yaml spec syntax, see astra-reference.md.

CLI Reference

lc init [DIR] [--permissions yolo|recommended|minimal] [--scratch PATH]  # Scaffold a new ASTRA project
lc launch <target>                                                # Enter the sandboxed container (lc launch claude)
lc run [OUTPUTS...] [--universe NAME] [--force] [--verbose] [--rerun-triggers TRIGGERS]  # Materialize outputs
lc build [--force] [--runtime docker]                             # Build container images from specs
lc status [--universe NAME] [--json]                              # Materialization status (text or JSON)
lc verify [--universe NAME]                                       # Recompute hashes and walk the provenance chain
lc export wrroc [--output PATH] [--universe NAME] [--zip] [--metadata-only] [--author "NAME <EMAIL>"]  # Export Workflow Run RO-Crate bundle
lc eval {run,report,compare}                                      # Run/inspect eval suites (requires the 'eval' extra)

lc run is quiet by default — pass --verbose to see worker output. --scratch is only relevant on HPC sites where $HOME doesn't honor flock (NERSC etc.); it redirects Snakemake state and Dask spill onto the named filesystem.

The first lc invocation auto-creates ~/.lightcone/config.yaml:

container:
  runtime: auto    # or: docker | podman | podman-hpc | apptainer | singularity | none

Always run via lc. Recipes must execute through lc run so that container builds, option resolution, resource limits, and result paths are applied. Treat the underlying execution engine as a black box — never invoke schedulers or container runtimes directly, that will bypass reproducibility guarantees.

Container Environment

lc launch claude is the canonical entry point for all analysis work. After lc init, always launch the container before doing anything else:

lc init my-project && cd my-project
lc launch claude          # builds the container on first run (~5 min), then drops into Claude Code

Inside the container, lc build, lc run, lc status, and lc verify all work normally. Running them outside the container prints a warning — host-side execution is still possible but bypasses the reproducibility sandbox.

The container mounts the project directory at the same absolute path so all output paths, tarball paths, and manifest paths are identical inside and outside.

Creating Sub-Analyses

Sub-analyses are scaffolded by hand, since each one is just another astra.yaml nested in a directory. To add one:

  1. Create analyses/<name>/ with its own astra.yaml (and optionally src/, universes/baseline.yaml, results/).
  2. Add a path: entry to the parent astra.yaml under analyses: (e.g. analyses: { my_sub: { path: ./analyses/my_sub } }).
  3. Add a <name>: { universe: baseline } entry to each existing parent universe file.

Populate the sub-analysis's astra.yaml with inputs, outputs, and decisions. Use from: references to wire inputs and decisions to the parent or siblings — see astra-reference.md under "Composition Mechanics."

Development Workflow

All three phases happen inside the container (entered via lc launch claude):

  1. Write & Debug — Run scripts directly (python src/compute.py) to iterate. Write them recipe-ready from the start: parameterize decisions, write to convention paths, one script per output.
  2. Integrate — Add recipe: blocks to outputs in astra.yaml. Track with lc status (alias / missing / stale / ok). Set container: at analysis level or per-recipe — pass an image name (e.g., python:3.12-slim) or a path to a Containerfile (e.g., Containerfile).
  3. Materializelc run executes recipes inside their declared containers and writes a content-addressed manifest next to each output. Done when lc status shows all ok.

Bare lc run materializes every output across every universe in universes/*.yaml; pass OUTPUT_ID... to scope to specific outputs and --universe NAME to scope to one universe. Build iteratively — name one upstream output at a time (lc run <output_id> --universe <name>) so you can inspect each intermediate before chaining further downstream, rather than running the whole DAG and debugging from the bottom of a long failure trace. lc run auto-builds container images on demand, so lc build is only needed for pre-warming or forcing a rebuild with --force.

Outputs land at results/<universe>/<output_id>/, with the per-output manifest at <output_dir>/.lightcone-manifest.json. Path-rooted sub-analyses prefix the sub's path: <sub_path>/results/<universe>/<output_id>/.

An output is not done until lc run produces it. Running scripts directly is for debugging only — final results must always come from lc run so they are reproducible.

Spec-Code Invariant

astra.yaml must always reflect the code and vice versa. When you change one, update the other immediately:

  • Add a decision to code? Add it to astra.yaml and all universe files.
  • Add an output or change a script? Update the recipe: block in astra.yaml.
  • Remove or rename something? Update both sides and run astra validate astra.yaml.

Status Interpretation

lc status shows each declared output's materialization state per universe. Pass --json for machine-readable output.

  • ok — Recipe exists, results on disk, manifest matches the current spec. Done.
  • stale — Recipe or decisions changed since the last run. Re-run lc run.
  • missing — Recipe exists but no manifest (never run, or output deleted). Run lc run.
  • alias — Output has no recipe of its own; produced as a side effect of an upstream output (or a from: reference into a sub-analysis). Not independently materializable.

Failure Diagnosis

  • Script arg not recognized — The recipe's command template controls how decisions reach the script. Make sure each {decisions.<id>} is paired with a flag the script's argparse defines (e.g. --<id> {decisions.<id>}parser.add_argument('--<id>')).
  • Recipe input not found — Materialize upstream outputs first.
  • Undeclared placeholder error — A {decisions.<id>} or {inputs.<id>} in the recipe references something not listed in Output.decisions / Output.inputs. Add it to the Output's declaration, or remove the placeholder.
  • lc verify failuremissing_manifest (output dir exists with no .lightcone-manifest.json), tampered_data (bytes on disk no longer hash to the recorded data_version), or broken_chain (an upstream's data_version drifted from what this output's manifest recorded). Re-run the affected output with lc run to repair.

After failure: fix, then lc run <output_id> --universe <name>.

Publishing Analyses

lc export wrroc bundles the project's manifests, workflow definition, decisions, and (optionally) data files into a Workflow Run RO-Crate — a JSON-LD package readable by RO-Crate-aware archives (WorkflowHub, Zenodo's RO-Crate plugin, etc.). The lightcone manifest format on disk is unchanged; the bundle is generated on demand.

lc export wrroc                                # ./wrroc/ directory
lc export wrroc -o run.zip --zip               # zip bundle
lc export wrroc --metadata-only                # provenance graph + manifests only (no data files)
lc export wrroc -u baseline -u alt_method      # restrict to specific universes
lc export wrroc --author "Name <email@host>"   # override git config

The bundle's @graph contains a ComputationalWorkflow (the astra.yaml), one CreateAction per materialized output (with object referencing upstream datasets and external inputs, result referencing the produced dataset, and instrument referencing the recipe SoftwareApplication), PropertyValue entries for decisions and provenance metadata (code_version, data_version), and a Person for the author.