Backend service to sync data between Tator and Voxel51 for quickly editing localizations and downstream model iteration.
See https://docs.mbari.org/internal/ai/videos/voxel51demo.gif for demo of the community Voxel51 tool.
Supports both Voxel51 Community and Voxel51 Enterprise sync. Syncing is done by version through a simple applet.
Example Voxel51 embedding/grid view. Samples can be refined by lassoing an embedding cluster, or by filtering onmetadata (e.g. depth, label, confidence):
flowchart TD
subgraph Browser["Browser (User) "]
UI["Tator Dashboard\n(Hosted Template Applet)"]
FOTab["FiftyOne App\n(New Browser Tab)"]
end
subgraph TatorBackend["Tator Backend (Docker)"]
Gunicorn["Tator / Gunicorn"]
end
subgraph FiftyOneSync["fiftyone-sync Service (port 8001)"]
API["FastAPI\nmain.py"]
Launcher["Launcher Template\nlauncher_template.py"]
EmbedSvc["Embedding Service\nembedding_service.py"]
DBMgr["Database Manager\ndatabase_manager.py"]
SyncQueue["Sync Queue\nsync_queue.py"]
SyncWorker["Sync Worker\nsync_worker.py"]
SyncLogic["Sync Logic\nsync.py"]
end
subgraph ExternalServices["External Services"]
Tator["Tator REST API"]
FastVSS["Fast-VSS\n(Embedding Service)\nport 8000"]
Redis["Redis\n(Job Queue)"]
MongoDB["MongoDB\n(FiftyOne DB)"]
FOApp["FiftyOne App\n(port 515x per project)"]
S3["AWS S3\n(optional crop storage)"]
end
%% Tator fetches the launcher template
Gunicorn -->|"GET /render"| API
API --> Launcher
%% User interactions
UI -->|"GET /launch\nGET /versions\nPOST /sync\nPOST /sync-to-tator"| API
UI -->|"Open FiftyOne"| FOTab
FOTab -->|"HTTP port 515x"| FOApp
%% Sync flow
API -->|"enqueue job"| SyncQueue
SyncQueue -->|"job"| Redis
Redis -->|"dequeue job"| SyncWorker
SyncWorker --> SyncLogic
SyncLogic -->|"fetch media\n& localizations"| Tator
SyncLogic -->|"write dataset"| MongoDB
SyncLogic -->|"launch app"| FOApp
SyncLogic -->|"upload crops (optional)"| S3
%% Embeddings flow
API -->|"POST /embed\nGET /embed/{uuid}"| EmbedSvc
EmbedSvc -->|"POST /embeddings/{project}/\nWS /ws/predict/job/{id}/{project}"| FastVSS
%% Database / config
API --> DBMgr
DBMgr -->|"URI / port lookup"| MongoDB
%% Status polling
UI -->|"GET /sync/status/{job_id}"| API
API -->|"poll job status"| Redis
-
Embedding API: Delegates to Fast-VSS (
http://localhost:8000/embeddings/{project}/)POST /embed- Submit images (multipart/form-data) + project, returns UUIDGET /embed/{uuid}- Poll for results (job status from Fast-VSS via WebSocket/ws/predict/job/{job_id}/{project})- Set
FASTVSS_API_URLenv var to override Fast-VSS base URL
-
Port isolation: One FiftyOne App instance per Tator project (one port per project)
- Port = 5151 + (project_id - 1)
-
MongoDB isolation: One MongoDB (
containers/fiftyone-sync); per-project DBfiftyone_project_{id}(override viaFIFTYONE_DATABASE_NAMEordatabase_namequery param on/launchand/sync). -
Launcher (HostedTemplate):
/render(Open FiftyOne + Sync from Tator),/launch,/sync+/sync/status/{job_id},/recompute-crops(+ status/logs),/sync-to-tator,/versions. Token entered in applet via Verify Token; FiftyOne opens in a new tab (iframe_host= app host). -
Sync queue (Redis): Background worker:
python -m src.app.sync_worker. Env:REDIS_HOST,REDIS_PORT,REDIS_PASSWORD,REDIS_USE_SSL, orREDIS_URL.
The service is intended to be run via the compose stack, which starts MongoDB(community only) and the API:
# From repo root
docker compose -f containers/fiftyone-sync/compose.yaml up -dAPI: http://localhost:8001. Optional env: copy containers/fiftyone-sync/.env.example to containers/fiftyone-sync/.env to set FASTVSS_API_URL, REDIS_HOST, etc.
For local iteration (no Docker for the API):
cd services/fiftyone-sync
export FIFTYONE_DATABASE_URI=mongodb://localhost:27017
uvicorn src.app.main:app --host 0.0.0.0 --port 8001Use a venv and install deps first: python -m venv .venv && source .venv/bin/activate && pip install -r requirements.txt. Start MongoDB separately (e.g. docker compose -f containers/fiftyone-sync/compose.yaml up -d mongo).
You can optionally sync crop images (localization crops, not full images) from a Tator sync to an S3 bucket and build a FiftyOne dataset. The sync worker uploads the crops directory to S3 (layout: s3://bucket/prefix/media_stem/elemental_id.png), then lists the bucket with FiftyOne storage and creates a second dataset (suffix _raw). Parent folder in S3 is used as the sample label.
Per-project S3 is configured in the same YAML file used for database URIs. Set the path via FIFTYONE_SYNC_CONFIG_PATH (e.g. config.yml or an absolute path). Under each project key, add optional s3_bucket and s3_prefix:
# config.yml (or the file pointed to by FIFTYONE_SYNC_CONFIG_PATH)
projects:
"my-project-name":
vss_project: "optional-vss-project"
s3_bucket: "my-bucket" # required for S3 upload
s3_prefix: "fiftyone/raw" # optional; S3 prefix (folder) under the bucket
databases:
- uri: "mongodb://localhost:27017"
port: 5151s3_bucket: When set, sync uploads crops (not full images) and builds an_rawdataset from S3. Bucket is created if missing.s3_prefix: Optional key prefix (e.g.fiftyone/raw).
Project keys must match the Tator project name, not the project ID.
Worker needs s3:PutObject (and list). Use env vars (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_REGION), IAM role on AWS, or AWS CLI default profile. Upload uses aws s3 sync when available, else boto3.
S3 bucket/prefix fields appear after Verify Token when s3_bucket is in config; values are pre-filled and overridable before Load from Tator.
Sync requires Redis and a running sync worker.
Docker: Use REDIS_HOST=redis (and network to Tator’s Redis) or run Redis and set REDIS_HOST in containers/fiftyone-sync/.env. Run the sync worker in a separate container or on the host with the same env.
Development:
-
Start Redis (e.g.
docker run -d --name redis -p 6379:6379 redis:7or use Tator compose Redis). -
Start the API with Redis:
cd services/fiftyone-sync export FIFTYONE_DATABASE_URI=mongodb://localhost:27017 export REDIS_HOST=localhost uvicorn src.app.main:app --host 0.0.0.0 --port 8001
-
Start the sync worker (another terminal, same env):
cd services/fiftyone-sync source .venv/bin/activate export REDIS_HOST=localhost export FIFTYONE_DATABASE_URI=mongodb://localhost:27017 python -m src.app.sync_worker
-
Trigger sync (e.g. from the Tator dashboard Sync from Tator button, or via curl):
curl -X POST "http://localhost:8001/sync?project_id=1&api_url=https://your-tator.example.com&token=YOUR_TOKEN"Response is
{"job_id":"...", "status":"queued", "port":...}. -
Poll status (replace
JOB_IDwith the returnedjob_id):curl "http://localhost:8001/sync/status/JOB_ID"Keep polling until
"status":"finished"(thenresulthasdataset_name,sample_count) or"status":"failed"(thenerroris set). -
In the browser, after clicking Sync from Tator you should see “Sync queued…”, then “Sync in progress…”, then “Sync done. Opening FiftyOne…” without the tab freezing.
This service exposes a Jinja2 template for Tator Hosted Templates. When an applet uses it, Tator fetches the template and renders it with template parameters. The dashboard shows Open FiftyOne (opens the app in a new tab) and sync controls (when sync_service_url and api_url are set). Users enter their Tator API token in the applet and click Verify Token to verify it; the Version dropdown and Sync from Tator / Sync to Tator buttons then become enabled.
- In Tator, go to Organizations in the main menu and open your organization.
- In the left sidebar, under Hosted Templates, click + Add new.
- Fill in:
- Name: e.g.
FiftyOne Viewer - URL:
http://<fiftyone-sync-host>:<port>/renderFor a minimal test (message only), use/messageinstead and add a template parameter message (e.g.A is for Apple). Example:http://localhost:8001/render(use the URL where this service is running). - Headers: Leave empty unless the service requires auth.
- Template parameters (optional defaults):
base_port:5151iframe_host: host for the FiftyOne app URL (same host you use for Tator; nothost.docker.internal).message,config_yaml: optional header text / YAML exposed aswindow.FIFTYONE_CONFIG_YAML.- Sync: set
sync_service_urlandapi_url(no token tparam). User enters token in applet and clicks Verify Token; optionalversion_idto preselect.
- Name: e.g.
Click Save.
-
Open Project Settings for the project where you want the FiftyOne dashboard.
-
In the left sidebar, click Applets → + Add new.
-
Set Name (e.g.
FiftyOne) and Description. -
Leave HTML File blank.
-
Under Hosted Template, select the template you created (e.g.
FiftyOne Viewer). -
Under Template parameters, add:
- name:
project - value: the project ID (same as this project’s ID).
Set
iframe_hostto the host you use to open Tator. - name:
-
Click Save.
Go to the project, then Analytics → Dashboards. Open the applet. Click Open FiftyOne to open the viewer in a new tab, or Sync from Tator first to sync data and then open the viewer.
| Setting | Value | Who uses it |
|---|---|---|
| URL (Hosted Template) | http://host.docker.internal:8001/render |
Tator (in Docker) fetches the template from the host. On Docker Desktop (Mac/Windows) use host.docker.internal. On Linux use the host IP (e.g. 172.17.0.1:8001) or run sync in Docker on the same network. |
| base_port | 5151 |
Must match database_manager.BASE_PORT. |
| iframe_host | localhost |
Host for the FiftyOne app URL when opening in a new tab. |
| sync_service_url | http://localhost:8001 |
Required for the "Sync from Tator" button; same machine as Tator from the user's perspective. |
| api_url | http://localhost:8080 |
Sync service calls Tator's API; from the host, Tator is at localhost:8080. |
Hosted Template URL is fetched by gunicorn (must be reachable from Tator’s container). Use http://host.docker.internal:8001/render when sync runs on the host; set iframe_host to localhost (browsers cannot resolve host.docker.internal). If both run in Docker, use the service name on a shared network (e.g. http://fiftyone-sync:8001/render).
Single MongoDB (launched via containers/fiftyone-sync); each Tator project gets its own database for isolation. One port per project: project 1 → 5151, project 2 → 5152, etc.
| Env var | Purpose |
|---|---|
FIFTYONE_DATABASE_URI |
MongoDB connection URI (default mongodb://localhost:27017). Override for remote/alternative MongoDB. |
FIFTYONE_DATABASE_DEFAULT |
Prefix for per-project database names. Default fiftyone_project → fiftyone_project_1, fiftyone_project_2, etc. |
FIFTYONE_DATABASE_NAME |
Override: use this single database name for all projects (ignores default pattern). |
Optional query param database_name on GET /launch and POST /sync overrides the database for that project.
POST /sync fetches Tator media and localizations, crops bounding boxes, builds a FiftyOne dataset, and launches the FiftyOne app.
| Param | Required | Description |
|---|---|---|
project_id |
yes | Tator project ID |
api_url |
yes | Tator REST API base URL |
token |
yes | Tator API token |
version_id |
no | Version ID filter for localizations |
database_name |
no | Override MongoDB database name |
config_path |
no | Path to YAML/JSON config file for dataset build |
launch_app |
no | Launch FiftyOne app after sync (default: true) |
Sync-from-Tator flow (dashboard): The launcher template calls POST /sync, then opens the FiftyOne app in a new tab. The "Open Voxel51" link always points to the base URL from FIFTYONE_APP_PUBLIC_BASE_URL (e.g. https://cortex.shore.mbari.org/fiftyone/) — no port or dataset path is appended. The dataset name is always project_name + "_v" + version_id + "_" + port (e.g. MyProject_v66_5151). It cannot be overridden via config.
Use config_path to pass a config file path:
media_id_batch_size: 200 # chunk size for get_media_list_by_id
localization_batch_size: 5000 # page size for localization list API
include_classes: [Larvacean, Copepod] # optional: filter labels
image_extensions: ["*.png", "*.jpg"]
max_samples: 500 # optional: limit for testingThe FiftyOne dataset name is always project_name_v{version_id}_{port} and cannot be set in config.
You can optionally compute embeddings, a UMAP 2D visualization, and a similarity index after the dataset is built. Embeddings are fetched from the embed service at {service_url}/embed/{project}. By default {project} is the Tator project ID (many services expect this). Set embeddings.project_name in config to use a project name or other key instead. Add an embeddings block to your config and pass it via config_path:
embeddings:
embeddings_field: embeddings # field to store embedding vectors
brain_key: umap_viz # base FiftyOne brain key; UMAP is stored under `${brain_key}_umap`
umap_seed: 51
force_embeddings: false # set true to recompute embeddings
force_umap: false # set true to recompute UMAP
# Similarity search (fob.compute_similarity): omit or set to "" to disable
similarity_brain_key: similarity_cosine
similarity_metric: cosine # e.g. cosine, euclidean
force_similarity: false # set true to recompute similarity index
batch_size: 32 # batch size for embed service requests
service_url: null # optional; default FASTVSS_API_URL or http://localhost:8000
project_name: null # optional; override for embed service URL path (default: project_id)To recompute dimensionality reduction without re-embedding, use the launcher Recompute Dimreduce button (or POST /dimreduce). UMAP is stored under ${brain_key}_umap; PCA and t-SNE are stored under ${brain_key}_pca and ${brain_key}_tsne.
Requirements: The embed service must be running (e.g. Fast-VSS at the URL above). Set FASTVSS_API_URL to override the base URL. For UMAP visualization, install umap-learn in the sync service venv:
pip install umap-learnIf the embed service is unavailable or UMAP is not installed, sync still runs; embeddings/UMAP/similarity are skipped and a message is logged. Embeddings, UMAP, and similarity results are cached on the dataset; use force_embeddings / force_umap / force_similarity to recompute.
- Media:
/tmp/fiftyone_sync_project_{id}/download/{media_id}_{name}.jpg - Localizations:
/tmp/fiftyone_sync_project_{id}/localizations.jsonl(JSONL) - Crops:
/tmp/fiftyone_sync_project_{id}/crops/{media_stem}/{elemental_id}.png
Labels come from attributes.Label (or attributes.label) in localizations.
POST /sync-to-tator pushes FiftyOne dataset edits (labels, confidence) back to Tator localizations. Run after editing in the FiftyOne app.
The endpoint is asynchronous: it enqueues an RQ job and returns immediately so a long bulk PATCH loop against Tator cannot block other HTTP requests. The RQ worker (python -m src.app.sync_worker) executes the push. Requires Redis (set REDIS_HOST or REDIS_URL).
| Param | Required | Description |
|---|---|---|
project_id |
yes | Tator project ID |
version_id |
yes | Tator version ID (localizations must be in this version) |
api_url |
yes | Tator REST API base URL |
token |
yes | Tator API token |
port |
yes | Port for this project (resolves database) |
dataset_name |
no | FiftyOne dataset name (default: project_name_v{version_id}_{port}) |
label_attr |
no | Tator attribute name for label (default Label) |
score_attr |
no | Tator attribute name for score/confidence; omit to skip |
force_sync |
no | Push all samples regardless of timestamps |
# 1. Enqueue
curl -X POST "http://localhost:8001/sync-to-tator?project_id=4&version_id=1&port=5151&api_url=https://tator.example.com&token=YOUR_TOKEN"
# {"job_id": "abc...", "status": "queued", "port": 5151}
# 2. Poll status (queued|started|finished|failed)
curl "http://localhost:8001/sync-to-tator/status/abc..."
# when finished: {"status": "finished", "result": {"status": "ok", "updated": N, "skipped": K, "failed": M, "errors": [...]}}
# 3. Tail logs while running
curl "http://localhost:8001/sync-to-tator/logs/abc..."Only one push runs at a time per (database, project, version); a second push for the same target returns {"status": "busy", ...} instead of competing for Tator writes.
Backpressure tuning (set on the sync service; restart to apply):
| Env var | Default | Description |
|---|---|---|
FIFTYONE_SYNC_TO_TATOR_FETCH_CHUNK |
100 |
Elemental-id resolve chunk size (PUT by ids) |
FIFTYONE_SYNC_TO_TATOR_PATCH_CHUNK |
100 |
Bulk PATCH chunk size (update_localization_list) |
FIFTYONE_SYNC_TO_TATOR_CHUNK_DELAY_MS |
0 |
Sleep between PATCH chunks in milliseconds to smooth write bursts |
If Tator becomes unresponsive during pushes, lower the chunk sizes and/or raise the inter-chunk delay (for example FIFTYONE_SYNC_TO_TATOR_PATCH_CHUNK=50, FIFTYONE_SYNC_TO_TATOR_CHUNK_DELAY_MS=200).
# Submit batch of images (project maps to Fast-VSS /embeddings/{project}/)
curl -X POST http://localhost:8000/embed \
-F "files=@image1.jpg" \
-F "files=@image2.jpg" \
-F "project=testproject"
# Returns: {"uuid": "..."}
# Fetch results
curl http://localhost:8000/embed/{uuid}
# Returns: {"status": "completed", "embeddings": [[...], [...]]}


