Skip to content

feat(cli): add --mode flag to dora build for parallel/sequential builds#1573

Closed
swar09 wants to merge 4 commits into
dora-rs:mainfrom
swar09:feat-dora-build
Closed

feat(cli): add --mode flag to dora build for parallel/sequential builds#1573
swar09 wants to merge 4 commits into
dora-rs:mainfrom
swar09:feat-dora-build

Conversation

@swar09
Copy link
Copy Markdown
Contributor

@swar09 swar09 commented Mar 29, 2026

Summary

Closes #1570

As of now, dora build dataflow.yml builds nodes one at a time. This is safe for low-spec hardware but wastes time on capable development machines. This PR adds a --mode flag to dora build so users can opt into parallel building.

Usage

# Sequential (current default behavior, backward compatible)
dora build dataflow.yml

# Explicitly sequential
dora build --mode sequential dataflow.yml

# Parallel (faster on capable machines)
dora build --mode parallel dataflow.yml

Note

The feature is most valuable for use cases like dataflows with Python, Rust, and C/C++ nodes from separate projects.
Works for Any combination of independent build commands.
For a pure Rust workspace with all nodes inside it, the gain is minimal because Cargo itself serializes via locks.

@swar09
Copy link
Copy Markdown
Contributor Author

swar09 commented Mar 31, 2026

I have figured out the issue which cause CI failure i will update soon .

Copy link
Copy Markdown
Collaborator

@phil-opp phil-opp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! This seems like a good idea.

We had parallel builds as default in the past, but this led to some issues with some build systems, so we chose the safer sequential build order instead. But making this configurable is even better.

// Run build on local machine
#[clap(long, action)]
local: bool,
/// Build mode sequential or parallel. Defaults to sequential
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should note here that parallel builds should only be used when the build commands support this.

Comment on lines -159 to +173
tracing::info!("Building locally, as requested through `--force-local`");
log::info!("Building locally, as requested through `--force-local`");
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the change from tracing to log? (same below)

@heyong4725
Copy link
Copy Markdown
Collaborator

Thanks for this PR @swar09, and especially for filing the underlying issue #1570 — your diagnosis was exactly right: building nodes one at a time wastes wall-clock time on capable dev machines, and the user needed a way to opt into parallel execution.

Closing this PR because the feature has already shipped in main as part of the dora 1.0 consolidation (commit 145ccce0). The --parallel flag is now available on dora build:

dora build --parallel dataflow.yml

Quick comparison so you can see where the in-tree version differs from your design:

Aspect This PR Main today
Flag --mode <sequential|parallel> --parallel (bool)
Concurrency futures::stream::FuturesUnordered tokio::task::JoinSet
Single-task short-circuit always parallel path when chosen guarded on tasks.len() > 1
Trace log when active none tracing::info!("Building N nodes in parallel")
Examples threading 11 example run.rs updated to pass param not needed (async refactor handled it)

The main differences are:

  • JoinSet over FuturesUnordered: For build steps that shell out to cargo / maturin / cmake (CPU-bound subprocesses), JoinSet spawns real tokio tasks that the runtime can schedule on independent worker threads. FuturesUnordered only polls in-order on the calling task, so subprocess work that yields (waiting on stdout/stderr) wouldn't actually parallelize across cores. Same intent, different runtime behavior.
  • Bool over enum: matches the cargo build --jobs N convention closer; one flag instead of an enum that the user has to remember the variants of.
  • Async-throughout refactor: the 1.0 rewrite made the build path fully async (no block_on calls), which removed the need to thread the parallel param through example run.rs files.

None of those design choices invalidate your approach — both implementations solve the same problem with the same trade-offs around the user's machine capability. The 1.0 consolidation just happened to land a parallel scheduler concurrent with your work.

What carries forward from this PR:

  • The issue diagnosis (Add Configurable Build Mode Parallel vs Sequential #1570) is the durable artifact. The feature exists because you filed it; the implementation that shipped is functionally identical to what you proposed.
  • The note in your PR description — "most valuable for use cases like dataflows with Python, Rust, and C/C++ nodes from separate projects" — is the exact reason this matters. Pure-cargo workspaces don't benefit (cargo serializes via locks), but cross-language dataflows scale linearly with core count. This is worth surfacing in the CLI help text or docs; filing a follow-up if not already there.

Thanks again — this kind of "obvious-once-you-think-about-it" usability win is great to get reports on. Apologies for the slow turn-around; if you have more proposals like this, please keep them coming, ideally in a fresh PR against the current main since the build subsystem has changed substantially since March.


🤖 Closing comment from Claude on behalf of the maintainers as part of the backlog triage pass.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add Configurable Build Mode Parallel vs Sequential

3 participants