Skip to content

tkathawala/scispot-case-study

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Scispot Case Study — Temporal + FastAPI workflow builder (ELISA demo)

This is a demo-ready skeleton implementing the case study described in Scispot Case Study (1).pdf (see prompt). It runs an example 96-well ELISA analysis workflow end-to-end with pluggable backends:

  • S3: real AWS S3 (recommended) or mock filesystem fallback under data/s3/ (app/mocks.py)
  • Elasticsearch: real Docker Elasticsearch (recommended) or mock JSONL fallback under data/es/ (app/mocks.py)
  • Agentic report: multi-step “agent” sequence (Summarizer → Comparator → ReportWriter), with optional Cerebras LLM

Architecture (high level)

flowchart LR
  Client[Scientist / UI / CLI] --> API[FastAPI Service]
  API -->|Start/Query| Temporal[Temporal Cluster]
  Temporal --> Worker[Python Worker]
  Worker --> S3[(S3: AWS or local mock)]
  Worker --> ES[(Elasticsearch: Docker or local mock)]
  Worker --> LLM[Agent (Cerebras or stub)]
Loading

What’s implemented

  • Workflow builder concept: workflow templates are stored as JSON with a list of reusable steps (a tiny DSL).
  • Temporal:
    • WorkflowRunnerWorkflow: executes a template step-by-step.
    • ElisaAnalysisWorkflow: the anchor scenario as a concrete workflow (also runnable directly).
  • FastAPI surface:
    • Create/update template
    • Trigger a run for an experiment
    • Check run status
    • Fetch outputs (analysis object + report)

Quickstart (Docker Temporal + local Python)

  1. Start Temporal + UI:
docker compose up -d

If you’re running this from WSL and see Docker permission errors, run with sudo (for demo simplicity).

  1. Create and activate a venv, install deps:
python -m venv .venv
.\.venv\Scripts\activate
pip install -r requirements.txt
  1. Start the worker (in one terminal):
python -m app.worker
  1. Start the API (in another terminal):
python -m app.api
  • Temporal UI: http://localhost:8081
  • API docs: http://localhost:8000/docs

Enabling a real LLM (optional)

By default, the agent step is a stub (no external LLM calls). To enable a real model call, set:

  • LLM_PROVIDER=cerebras + CEREBRAS_API_KEY=... (Cerebras OpenAI-compatible endpoint; model default gpt-oss-120b)

Optional overrides:

  • CEREBRAS_MODEL (default: gpt-oss-120b)
  • CEREBRAS_BASE_URL (default: https://api.cerebras.ai/v1)
  • CEREBRAS_TIMEOUT_SECONDS (default: 30)
  • LLM_FALLBACK_TO_STUB (default: true) — if false, the workflow will fail (and Temporal will retry) on LLM errors.

Using real AWS S3 (optional)

By default, this project uses a local filesystem “mock S3” under data/s3/.

If you created a real S3 bucket (single bucket + prefixes), set:

  • S3_BACKEND=aws
  • S3_BUCKET=scispot-case-study (your bucket name)
  • AWS_REGION=us-east-2 (or your bucket’s region)

Optional:

  • S3_KEY_PREFIX=dev (writes to s3://<bucket>/dev/raw-plates/..., etc.)

The code will store objects under prefixes matching the existing demo buckets:

  • raw-plates/<experiment_id>/<run_id>/raw_plate.json
  • analysis-results/<experiment_id>/<run_id>/analysis.json
  • reports/<experiment_id>/<run_id>/report.json
  • runs/<workflow_run_id>/run_manifest.json

Credentials:

  • Use any standard AWS credential method supported by boto3 (env vars, ~/.aws/credentials, IAM role).

Using real Elasticsearch (optional)

By default, this project uses a local JSONL “mock Elasticsearch” under data/es/.

For local development, you can run Elasticsearch via Docker:

docker compose up -d elasticsearch

Then enable it in .env:

  • ES_BACKEND=elasticsearch
  • ES_URL=http://localhost:9200
  • Optional: ES_INDEX_EXPERIMENT_SUMMARIES=experiment-summaries

Verify it’s up:

curl http://localhost:9200
curl "http://localhost:9200/_cat/indices?v"

After running a workflow, you can see indexed documents with:

curl "http://localhost:9200/experiment-summaries/_search?pretty"

Demo flow

  1. Create a workflow template (or use the built-in elisa_v1 template).
  2. Trigger a run for an experiment_id.
  3. Poll status; fetch outputs when complete.

Example (curl):

curl -X POST http://localhost:8000/runs -H "Content-Type: application/json" -d "{\"template_id\":\"elisa_v1\",\"experiment_id\":\"EXP-123\"}"

Then:

curl http://localhost:8000/runs/<workflow_id>/status
curl "http://localhost:8000/runs/<workflow_id>/outputs?experiment_id=EXP-123"

Notes / trade-offs / what I’d do next

  • Idempotency: activities write deterministic keys (based on run_id + experiment_id) and overwrite safely.
  • Retries: activities have Temporal retry policies; external calls are isolated to activities.
  • Waiting for data: in production, use a wait_for_s3_object activity with backoff and/or a Temporal signal when an uploader finishes.
  • Workflow builder: next step is validation + versioning, plus a UI that composes reusable steps with typed inputs/outputs.
  • Observability: add OpenTelemetry, structured logs with correlation IDs, and metrics per step (duration, retries).

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages