scispot-case-study/README.md at main · tkathawala/scispot-case-study

Scispot Case Study — Temporal + FastAPI workflow builder (ELISA demo)

This is a demo-ready skeleton implementing the case study described in Scispot Case Study (1).pdf (see prompt). It runs an example 96-well ELISA analysis workflow end-to-end with pluggable backends:

S3: real AWS S3 (recommended) or mock filesystem fallback under data/s3/ (app/mocks.py)
Elasticsearch: real Docker Elasticsearch (recommended) or mock JSONL fallback under data/es/ (app/mocks.py)
Agentic report: multi-step “agent” sequence (Summarizer → Comparator → ReportWriter), with optional Cerebras LLM

Architecture (high level)

flowchart LR
  Client[Scientist / UI / CLI] --> API[FastAPI Service]
  API -->|Start/Query| Temporal[Temporal Cluster]
  Temporal --> Worker[Python Worker]
  Worker --> S3[(S3: AWS or local mock)]
  Worker --> ES[(Elasticsearch: Docker or local mock)]
  Worker --> LLM[Agent (Cerebras or stub)]

What’s implemented

Workflow builder concept: workflow templates are stored as JSON with a list of reusable steps (a tiny DSL).
Temporal:
- WorkflowRunnerWorkflow: executes a template step-by-step.
- ElisaAnalysisWorkflow: the anchor scenario as a concrete workflow (also runnable directly).
FastAPI surface:
- Create/update template
- Trigger a run for an experiment
- Check run status
- Fetch outputs (analysis object + report)

Quickstart (Docker Temporal + local Python)

Start Temporal + UI:

docker compose up -d

If you’re running this from WSL and see Docker permission errors, run with sudo (for demo simplicity).

Create and activate a venv, install deps:

python -m venv .venv
.\.venv\Scripts\activate
pip install -r requirements.txt

Start the worker (in one terminal):

python -m app.worker

Start the API (in another terminal):

python -m app.api

Temporal UI: http://localhost:8081
API docs: http://localhost:8000/docs

Enabling a real LLM (optional)

By default, the agent step is a stub (no external LLM calls). To enable a real model call, set:

LLM_PROVIDER=cerebras + CEREBRAS_API_KEY=... (Cerebras OpenAI-compatible endpoint; model default gpt-oss-120b)

Optional overrides:

CEREBRAS_MODEL (default: gpt-oss-120b)
CEREBRAS_BASE_URL (default: https://api.cerebras.ai/v1)
CEREBRAS_TIMEOUT_SECONDS (default: 30)
LLM_FALLBACK_TO_STUB (default: true) — if false, the workflow will fail (and Temporal will retry) on LLM errors.

Using real AWS S3 (optional)

By default, this project uses a local filesystem “mock S3” under data/s3/.

If you created a real S3 bucket (single bucket + prefixes), set:

S3_BACKEND=aws
S3_BUCKET=scispot-case-study (your bucket name)
AWS_REGION=us-east-2 (or your bucket’s region)

Optional:

S3_KEY_PREFIX=dev (writes to s3://<bucket>/dev/raw-plates/..., etc.)

The code will store objects under prefixes matching the existing demo buckets:

raw-plates/<experiment_id>/<run_id>/raw_plate.json
analysis-results/<experiment_id>/<run_id>/analysis.json
reports/<experiment_id>/<run_id>/report.json
runs/<workflow_run_id>/run_manifest.json

Credentials:

Use any standard AWS credential method supported by boto3 (env vars, ~/.aws/credentials, IAM role).

Using real Elasticsearch (optional)

By default, this project uses a local JSONL “mock Elasticsearch” under data/es/.

For local development, you can run Elasticsearch via Docker:

docker compose up -d elasticsearch

Then enable it in .env:

ES_BACKEND=elasticsearch
ES_URL=http://localhost:9200
Optional: ES_INDEX_EXPERIMENT_SUMMARIES=experiment-summaries

Verify it’s up:

curl http://localhost:9200
curl "http://localhost:9200/_cat/indices?v"

After running a workflow, you can see indexed documents with:

curl "http://localhost:9200/experiment-summaries/_search?pretty"

Demo flow

Create a workflow template (or use the built-in elisa_v1 template).
Trigger a run for an experiment_id.
Poll status; fetch outputs when complete.

Example (curl):

curl -X POST http://localhost:8000/runs -H "Content-Type: application/json" -d "{\"template_id\":\"elisa_v1\",\"experiment_id\":\"EXP-123\"}"

Then:

curl http://localhost:8000/runs/<workflow_id>/status
curl "http://localhost:8000/runs/<workflow_id>/outputs?experiment_id=EXP-123"

Notes / trade-offs / what I’d do next

Idempotency: activities write deterministic keys (based on run_id + experiment_id) and overwrite safely.
Retries: activities have Temporal retry policies; external calls are isolated to activities.
Waiting for data: in production, use a wait_for_s3_object activity with backoff and/or a Temporal signal when an uploader finishes.
Workflow builder: next step is validation + versioning, plus a UI that composes reusable steps with typed inputs/outputs.
Observability: add OpenTelemetry, structured logs with correlation IDs, and metrics per step (duration, retries).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scispot Case Study — Temporal + FastAPI workflow builder (ELISA demo)

Architecture (high level)

What’s implemented

Quickstart (Docker Temporal + local Python)

Enabling a real LLM (optional)

Using real AWS S3 (optional)

Using real Elasticsearch (optional)

Demo flow

Notes / trade-offs / what I’d do next

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Scispot Case Study — Temporal + FastAPI workflow builder (ELISA demo)

Architecture (high level)

What’s implemented

Quickstart (Docker Temporal + local Python)

Enabling a real LLM (optional)

Using real AWS S3 (optional)

Using real Elasticsearch (optional)

Demo flow

Notes / trade-offs / what I’d do next