Skip to content

Latest commit

 

History

History
162 lines (108 loc) · 5.21 KB

File metadata and controls

162 lines (108 loc) · 5.21 KB

Scispot Case Study — Temporal + FastAPI workflow builder (ELISA demo)

This is a demo-ready skeleton implementing the case study described in Scispot Case Study (1).pdf (see prompt). It runs an example 96-well ELISA analysis workflow end-to-end with pluggable backends:

  • S3: real AWS S3 (recommended) or mock filesystem fallback under data/s3/ (app/mocks.py)
  • Elasticsearch: real Docker Elasticsearch (recommended) or mock JSONL fallback under data/es/ (app/mocks.py)
  • Agentic report: multi-step “agent” sequence (Summarizer → Comparator → ReportWriter), with optional Cerebras LLM

Architecture (high level)

flowchart LR
  Client[Scientist / UI / CLI] --> API[FastAPI Service]
  API -->|Start/Query| Temporal[Temporal Cluster]
  Temporal --> Worker[Python Worker]
  Worker --> S3[(S3: AWS or local mock)]
  Worker --> ES[(Elasticsearch: Docker or local mock)]
  Worker --> LLM[Agent (Cerebras or stub)]
Loading

What’s implemented

  • Workflow builder concept: workflow templates are stored as JSON with a list of reusable steps (a tiny DSL).
  • Temporal:
    • WorkflowRunnerWorkflow: executes a template step-by-step.
    • ElisaAnalysisWorkflow: the anchor scenario as a concrete workflow (also runnable directly).
  • FastAPI surface:
    • Create/update template
    • Trigger a run for an experiment
    • Check run status
    • Fetch outputs (analysis object + report)

Quickstart (Docker Temporal + local Python)

  1. Start Temporal + UI:
docker compose up -d

If you’re running this from WSL and see Docker permission errors, run with sudo (for demo simplicity).

  1. Create and activate a venv, install deps:
python -m venv .venv
.\.venv\Scripts\activate
pip install -r requirements.txt
  1. Start the worker (in one terminal):
python -m app.worker
  1. Start the API (in another terminal):
python -m app.api
  • Temporal UI: http://localhost:8081
  • API docs: http://localhost:8000/docs

Enabling a real LLM (optional)

By default, the agent step is a stub (no external LLM calls). To enable a real model call, set:

  • LLM_PROVIDER=cerebras + CEREBRAS_API_KEY=... (Cerebras OpenAI-compatible endpoint; model default gpt-oss-120b)

Optional overrides:

  • CEREBRAS_MODEL (default: gpt-oss-120b)
  • CEREBRAS_BASE_URL (default: https://api.cerebras.ai/v1)
  • CEREBRAS_TIMEOUT_SECONDS (default: 30)
  • LLM_FALLBACK_TO_STUB (default: true) — if false, the workflow will fail (and Temporal will retry) on LLM errors.

Using real AWS S3 (optional)

By default, this project uses a local filesystem “mock S3” under data/s3/.

If you created a real S3 bucket (single bucket + prefixes), set:

  • S3_BACKEND=aws
  • S3_BUCKET=scispot-case-study (your bucket name)
  • AWS_REGION=us-east-2 (or your bucket’s region)

Optional:

  • S3_KEY_PREFIX=dev (writes to s3://<bucket>/dev/raw-plates/..., etc.)

The code will store objects under prefixes matching the existing demo buckets:

  • raw-plates/<experiment_id>/<run_id>/raw_plate.json
  • analysis-results/<experiment_id>/<run_id>/analysis.json
  • reports/<experiment_id>/<run_id>/report.json
  • runs/<workflow_run_id>/run_manifest.json

Credentials:

  • Use any standard AWS credential method supported by boto3 (env vars, ~/.aws/credentials, IAM role).

Using real Elasticsearch (optional)

By default, this project uses a local JSONL “mock Elasticsearch” under data/es/.

For local development, you can run Elasticsearch via Docker:

docker compose up -d elasticsearch

Then enable it in .env:

  • ES_BACKEND=elasticsearch
  • ES_URL=http://localhost:9200
  • Optional: ES_INDEX_EXPERIMENT_SUMMARIES=experiment-summaries

Verify it’s up:

curl http://localhost:9200
curl "http://localhost:9200/_cat/indices?v"

After running a workflow, you can see indexed documents with:

curl "http://localhost:9200/experiment-summaries/_search?pretty"

Demo flow

  1. Create a workflow template (or use the built-in elisa_v1 template).
  2. Trigger a run for an experiment_id.
  3. Poll status; fetch outputs when complete.

Example (curl):

curl -X POST http://localhost:8000/runs -H "Content-Type: application/json" -d "{\"template_id\":\"elisa_v1\",\"experiment_id\":\"EXP-123\"}"

Then:

curl http://localhost:8000/runs/<workflow_id>/status
curl "http://localhost:8000/runs/<workflow_id>/outputs?experiment_id=EXP-123"

Notes / trade-offs / what I’d do next

  • Idempotency: activities write deterministic keys (based on run_id + experiment_id) and overwrite safely.
  • Retries: activities have Temporal retry policies; external calls are isolated to activities.
  • Waiting for data: in production, use a wait_for_s3_object activity with backoff and/or a Temporal signal when an uploader finishes.
  • Workflow builder: next step is validation + versioning, plus a UI that composes reusable steps with typed inputs/outputs.
  • Observability: add OpenTelemetry, structured logs with correlation IDs, and metrics per step (duration, retries).