Run Computations

Execute automated computations with populate().

Basic Usage

# Populate all missing entries
SessionAnalysis.populate()

# With progress display
SessionAnalysis.populate(display_progress=True)

Restrict What to Compute

# Only specific subjects
SessionAnalysis.populate(Subject & "sex = 'M'")

# Only recent sessions
SessionAnalysis.populate(Session & "session_date > '2026-01-01'")

# Specific key
SessionAnalysis.populate({'subject_id': 'M001', 'session_idx': 1})

Limit Number of Jobs

# Process at most 100 entries
SessionAnalysis.populate(limit=100)

Error Handling

# Continue on errors (log but don't stop)
SessionAnalysis.populate(suppress_errors=True)

# Check what failed
failed = SessionAnalysis.jobs & "status = 'error'"
print(failed)

# Clear errors to retry
failed.delete()
SessionAnalysis.populate()

When to Use Distributed Mode

Choose your populate strategy based on your workload and infrastructure:

Use `populate()` (Default) When:

✅ Single worker - Only one process computing at a time
✅ Very fast computations - Each make() completes in < 1 second
✅ Small job count - Processing < 100 entries
✅ Development/testing - Iterating on make() logic

Advantages:

Simplest approach (no job management overhead)
Immediate execution (no reservation delay)
Easy debugging (errors stop execution)

Example:

# Simple, direct execution
SessionAnalysis.populate()

Use `populate(reserve_jobs=True)` When:

✅ Multiple workers - Running on multiple machines or processes
✅ Computations > 1 second - Job reservation overhead (~100ms) becomes negligible
✅ Production pipelines - Need fault tolerance and monitoring
✅ Worker crashes expected - Jobs can be resumed

Advantages:

Prevents duplicate work between workers
Fault tolerance (crashed jobs can be retried)
Job status tracking (SessionAnalysis.jobs)
Error isolation (one failure doesn't stop others)

Example:

# Distributed mode with job coordination
SessionAnalysis.populate(reserve_jobs=True)

Job reservation overhead: ~100ms per job Worth it when: Computations take > 1 second (overhead becomes < 10%)

Use `populate(reserve_jobs=True, processes=N)` When:

✅ Multi-core machine - Want to use all CPU cores
✅ CPU-bound tasks - Computations are CPU-intensive, not I/O
✅ Independent computations - No shared state between jobs

Advantages:

Parallel execution on single machine
No network coordination needed
Combines job safety with parallelism

Example:

# Use 4 CPU cores
SessionAnalysis.populate(reserve_jobs=True, processes=4)

Caution: Don't use more processes than CPU cores (causes context switching overhead)

Decision Tree

How many workers?
├─ One → populate()
└─ Multiple → Continue...

How long per computation?
├─ < 1 second → populate() (overhead not worth it)
└─ > 1 second → Continue...

Need fault tolerance?
├─ Yes → populate(reserve_jobs=True)
└─ No → populate() (simpler)

Multiple cores on one machine?
└─ Yes → populate(reserve_jobs=True, processes=N)

Distributed Computing

For multi-worker coordination:

# Worker 1 (on machine A)
SessionAnalysis.populate(reserve_jobs=True)

# Worker 2 (on machine B)
SessionAnalysis.populate(reserve_jobs=True)

# Workers coordinate automatically via database
# Each reserves different keys, no duplicates

Check Progress

# What's left to compute
remaining = SessionAnalysis.key_source - SessionAnalysis
print(f"{len(remaining)} entries remaining")

# View job status
SessionAnalysis.jobs

The `make()` Method

@schema
class SessionAnalysis(dj.Computed):
    definition = """
    -> Session
    ---
    result : float64
    """

    def make(self, key):
        # 1. Fetch input data
        data = (Session & key).fetch1('data')

        # 2. Compute
        result = process(data)

        # 3. Insert
        self.insert1({**key, 'result': result})

Three-Part Make for Long Computations

For computations taking hours or days:

@schema
class LongComputation(dj.Computed):
    definition = """
    -> RawData
    ---
    result : float64
    """

    def make_fetch(self, key, **kwargs):
        """Fetch input data (outside transaction).

        kwargs are passed from populate(make_kwargs={...}).
        """
        data = (RawData & key).fetch1('data')
        return (data,)

    def make_compute(self, key, fetched):
        """Perform computation (outside transaction)"""
        (data,) = fetched
        result = expensive_computation(data)
        return (result,)

    def make_insert(self, key, fetched, computed):
        """Insert results (inside brief transaction)"""
        (result,) = computed
        self.insert1({**key, 'result': result})

Custom Key Source

@schema
class FilteredComputation(dj.Computed):
    definition = """
    -> RawData
    ---
    result : float64
    """

    @property
    def key_source(self):
        # Only compute for high-quality data
        return (RawData & 'quality > 0.8') - self

Populate Options

Option	Default	Description
`restriction`	`None`	Filter what to compute
`limit`	`None`	Max entries to process
`display_progress`	`False`	Show progress bar
`reserve_jobs`	`False`	Reserve jobs for distributed computing
`suppress_errors`	`False`	Continue on errors

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Run Computations

Basic Usage

Restrict What to Compute

Limit Number of Jobs

Error Handling

When to Use Distributed Mode

Use `populate()` (Default) When:

Use `populate(reserve_jobs=True)` When:

Use `populate(reserve_jobs=True, processes=N)` When:

Decision Tree

Distributed Computing

Check Progress

The `make()` Method

Three-Part Make for Long Computations

Custom Key Source

Populate Options

See Also

FilesExpand file tree

run-computations.md

Latest commit

History

run-computations.md

File metadata and controls

Run Computations

Basic Usage

Restrict What to Compute

Limit Number of Jobs

Error Handling

When to Use Distributed Mode

Use populate() (Default) When:

Use populate(reserve_jobs=True) When:

Use populate(reserve_jobs=True, processes=N) When:

Decision Tree

Distributed Computing

Check Progress

The make() Method

Three-Part Make for Long Computations

Custom Key Source

Populate Options

See Also

Use `populate()` (Default) When:

Use `populate(reserve_jobs=True)` When:

Use `populate(reserve_jobs=True, processes=N)` When:

The `make()` Method