Learn how to optimize your CI/CD workflows with parallel segment execution in Kite.
Kite allows segments to run in parallel, dramatically reducing total ride execution time. Instead of waiting for each segment to finish before starting the next, multiple segments can run simultaneously.
Example time savings:
Sequential: build(2m) + test(2m) + lint(1m) + deploy(1m) = 6 minutes
Parallel: build(2m) + max(test(2m), lint(1m)) + deploy(1m) = 5 minutes
Savings: 17% faster
ride {
name = "Sequential"
flow {
segment("build")
segment("test")
segment("lint")
}
}Execution order:
build → test → lint
Total time: Sum of all segments
ride {
name = "Parallel"
flow {
segment("build")
parallel {
segment("test")
segment("lint")
}
}
}Execution order:
build
↓
test ⎫
lint ⎭ (run simultaneously)
Total time: build + max(test, lint)
Use parallel {} to group segments that should run simultaneously:
flow {
// Sequential: runs first
segment("setup")
// Parallel: all run at the same time
parallel {
segment("unit-test")
segment("integration-test")
segment("lint")
}
// Sequential: runs after all parallel segments complete
segment("deploy")
}Execution visualization:
setup
↓
unit-test ⎫
integration-test ⎬ (parallel)
lint ⎭
↓
deploy
Control how many segments run simultaneously with maxConcurrency:
ride {
name = "Limited Parallelism"
maxConcurrency = 2 // Only 2 segments at a time
flow {
parallel {
segment("test-1") // Runs immediately
segment("test-2") // Runs immediately
segment("test-3") // Waits for slot
segment("test-4") // Waits for slot
}
}
}Execution with maxConcurrency = 2:
Time →
test-1 ████████
test-2 ████████
test-3 ████████
test-4 ████████
Benefits:
- Prevents resource exhaustion
- Controls memory/CPU usage
- Maintains system stability
Default: Number of available CPU cores (if not specified)
segments {
segment("clean") {
execute {
exec("./gradlew", "clean")
}
}
segment("compile") {
dependsOn("clean")
execute {
exec("./gradlew", "compileDebugKotlin")
}
}
segment("unit-tests") {
dependsOn("compile")
execute {
exec("./gradlew", "testDebugUnitTest")
}
}
segment("instrumented-tests") {
dependsOn("compile")
execute {
exec("./gradlew", "connectedAndroidTest")
}
}
segment("lint") {
dependsOn("compile")
execute {
exec("./gradlew", "lintDebug")
}
}
segment("detekt") {
dependsOn("compile")
execute {
exec("./gradlew", "detekt")
}
}
segment("assemble") {
dependsOn("unit-tests", "instrumented-tests", "lint", "detekt")
execute {
exec("./gradlew", "assembleRelease")
}
}
}
ride {
name = "Full CI"
maxConcurrency = 3 // Run up to 3 segments in parallel
flow {
// Phase 1: Clean and compile (sequential)
segment("clean")
segment("compile")
// Phase 2: All quality checks (parallel)
parallel {
segment("unit-tests")
segment("instrumented-tests")
segment("lint")
segment("detekt")
}
// Phase 3: Final assembly (sequential)
segment("assemble")
}
}Execution timeline:
clean ████
compile ████████
unit-tests ██████
instrumented-tests ████████████
lint ████
(only 3 run at a time due to maxConcurrency=3)
assemble ██
Time savings:
- Sequential: 2 + 8 + 6 + 12 + 4 + 4 + 2 = 38 minutes
- Parallel: 2 + 8 + max(6, 12, 4, 4) + 2 = 24 minutes
- Savings: 37% faster (14 minutes saved)
✅ Good - Segments are independent:
parallel {
segment("unit-tests") // Tests code
segment("integration-tests") // Tests integration
segment("lint") // Checks style
segment("security-scan") // Scans for vulns
}❌ Bad - Segments have dependencies:
parallel {
segment("build") // Creates JAR
segment("test") // Needs JAR from build ❌
}segment("test") {
dependsOn("build") // ✅ Explicit dependency
execute { /* test code */ }
}
parallel {
segment("build") // Runs first
segment("test") // Waits for build
}Kite respects dependencies even within parallel {} blocks.
ride {
// 8 core machine, 16GB RAM
maxConcurrency = 3 // Leave resources for OS
environment {
// Each Gradle gets ~4GB
put("GRADLE_OPTS", "-Xmx4g")
}
flow {
parallel {
segment("test-1") // ~4GB RAM, 2 cores
segment("test-2") // ~4GB RAM, 2 cores
segment("test-3") // ~4GB RAM, 2 cores
}
}
}Resource calculation:
- Memory:
maxConcurrency × per-segment RAM ≤ total RAM - OS overhead - CPU:
maxConcurrency × per-segment cores ≤ total cores
flow {
segment("build")
// Group: All tests
parallel {
segment("unit-tests")
segment("integration-tests")
segment("e2e-tests")
}
// Group: All analysis
parallel {
segment("lint")
segment("detekt")
segment("ktlint")
}
segment("deploy")
}Benefits:
- Clear execution phases
- Easier to understand
- Better resource utilization
// ❌ Too high - will exhaust resources
maxConcurrency = 10 // On 4-core, 8GB machine
// ✅ Appropriate for resources
maxConcurrency = 3 // Leaves room for OS
// ✅ No limit - uses all cores
maxConcurrency = null // Default: Runtime.getRuntime().availableProcessors()Guidelines:
- CI runners: 2-3 (typically have limited resources)
- Developer machines: 4-6 (more resources available)
- Powerful servers: 8-16 (lots of resources)
flow {
segment("build")
parallel {
segment("unit-test") {
dependsOn("build")
}
segment("integration-test") {
dependsOn("build")
}
}
segment("deploy") {
dependsOn("unit-test", "integration-test")
}
}Execution:
build
├─→ unit-test ────┐
└─→ integration-test ─┘
↓
deploy
flow {
// Phase 1: Setup
segment("checkout")
segment("dependencies")
// Phase 2: Build variants in parallel
parallel {
segment("build-debug")
segment("build-release")
segment("build-staging")
}
// Phase 3: Test each variant in parallel
parallel {
segment("test-debug") {
dependsOn("build-debug")
}
segment("test-release") {
dependsOn("build-release")
}
segment("test-staging") {
dependsOn("build-staging")
}
}
// Phase 4: Deploy
segment("deploy-release") {
dependsOn("test-release")
}
}flow {
segment("build")
parallel {
// Only run if PR
segment("pr-checks") {
condition = { ctx ->
ctx.env("CI_PULL_REQUEST") != null
}
}
// Only run if main branch
segment("deploy") {
condition = { ctx ->
ctx.env("BRANCH") == "main"
}
}
// Always run
segment("tests")
}
}segments {
segment("prepare-data") {
execute {
// Split data into chunks
workspace.resolve("chunk-1.txt").toFile().writeText("data1")
workspace.resolve("chunk-2.txt").toFile().writeText("data2")
workspace.resolve("chunk-3.txt").toFile().writeText("data3")
}
}
segment("process-chunk-1") {
dependsOn("prepare-data")
execute {
val data = workspace.resolve("chunk-1.txt").toFile().readText()
// Process data1
}
}
segment("process-chunk-2") {
dependsOn("prepare-data")
execute {
val data = workspace.resolve("chunk-2.txt").toFile().readText()
// Process data2
}
}
segment("process-chunk-3") {
dependsOn("prepare-data")
execute {
val data = workspace.resolve("chunk-3.txt").toFile().readText()
// Process data3
}
}
segment("aggregate-results") {
dependsOn("process-chunk-1", "process-chunk-2", "process-chunk-3")
execute {
// Combine results
}
}
}
ride {
name = "Data Processing"
maxConcurrency = 3
flow {
segment("prepare-data")
parallel {
segment("process-chunk-1")
segment("process-chunk-2")
segment("process-chunk-3")
}
segment("aggregate-results")
}
}Formula:
Total RAM ≥ (maxConcurrency × per-segment RAM) + OS overhead
Example (16GB machine):
ride {
maxConcurrency = 3 // 3 parallel segments
environment {
// 16GB / 3 = ~5GB per segment
// Reserve 4GB for segment, 1GB buffer
put("GRADLE_OPTS", "-Xmx4g")
}
}Formula:
Optimal maxConcurrency = total_cores / cores_per_segment
Example (8 core machine):
ride {
// Each Gradle uses ~2 cores
// 8 cores / 2 = 4 segments
maxConcurrency = 4
environment {
put("org.gradle.workers.max", "2")
}
}GitHub Actions (Standard runner: 2 cores, 7GB RAM):
ride {
maxConcurrency = 2 // Conservative
environment {
put("GRADLE_OPTS", "-Xmx2g")
}
}GitLab CI (Configurable, typically 4 cores, 8GB RAM):
ride {
maxConcurrency = 3
environment {
put("GRADLE_OPTS", "-Xmx2g")
}
}Self-hosted (16 cores, 32GB RAM):
ride {
maxConcurrency = 8 // Can be aggressive
environment {
put("GRADLE_OPTS", "-Xmx3g")
}
}Amdahl's Law for parallel workflows:
Speedup = 1 / ((1 - P) + (P / N))
Where:
- P = Portion that can be parallelized (0 to 1)
- N = Number of parallel segments
Example:
- Total time: 10 minutes
- Sequential portion: 2 minutes (20%)
- Parallel portion: 8 minutes (80%)
- Parallel segments: 4
Speedup = 1 / ((1 - 0.8) + (0.8 / 4))
= 1 / (0.2 + 0.2)
= 1 / 0.4
= 2.5×
Result: 10 minutes → 4 minutes (6 minutes saved)
# Run with timing information
kite-cli ride CI --verbose
# Output shows segment durations:
✓ build (2m 30s)
✓ unit-tests (4m 15s) ← Bottleneck
✓ lint (1m 20s)
✓ detekt (0m 45s)Optimization strategies:
- Split the bottleneck into smaller segments
- Increase resources for that segment
- Cache dependencies to speed it up
- Run less frequently (e.g., only on main branch)
// ❌ Before: One long segment (10 minutes)
segment("all-tests") {
execute {
exec("./gradlew", "test") // Tests 20 modules
}
}
// ✅ After: Split into parallel segments
parallel {
segment("test-core") {
execute {
exec("./gradlew", ":core:test") // 2 minutes
}
}
segment("test-api") {
execute {
exec("./gradlew", ":api:test") // 2 minutes
}
}
segment("test-ui") {
execute {
exec("./gradlew", ":ui:test") // 3 minutes
}
}
}
// Total: 3 minutes (max of all) vs 10 minutesSymptom:
OutOfMemoryError: Java heap space
Solution:
Reduce maxConcurrency or increase memory per segment:
ride {
maxConcurrency = 2 // Reduced from 4
environment {
put("GRADLE_OPTS", "-Xmx6g") // Increased from 4g
}
}Cause 1: Dependencies prevent parallel execution
Solution: Check dependencies:
segment("test") {
dependsOn("build") // Must wait for build
}Cause 2: One segment is much longer than others
Solution: Split long segments or optimize them
Cause 3: maxConcurrency is too low
Solution: Increase if resources allow:
maxConcurrency = 4 // Increased from 2Symptom:
System is unresponsive
High CPU/memory usage
Segments taking longer than usual
Solution:
Reduce maxConcurrency:
ride {
maxConcurrency = 2 // Reduced from 4
}Preview parallel execution without running:
$ kite-cli ride CI --dry-run
Ride: CI Pipeline
═════════════════════════════════════════
Phase 1 (Sequential):
→ clean (estimated: 30s)
→ compile (estimated: 2m)
Phase 2 (Parallel, max 3):
⇉ unit-tests (estimated: 3m)
⇉ integration-tests (estimated: 5m)
⇉ lint (estimated: 1m)
⇉ detekt (estimated: 45s)
Phase 3 (Sequential):
→ assemble (estimated: 1m)
Total estimated time: 8m 30s
Sequential: 3m 30s
Parallel: 5m (bottleneck: integration-tests)
Max concurrent segments: 3
Estimated memory usage: ~12GB (3 segments × 4GB)| Feature | Usage | Benefit |
|---|---|---|
parallel {} |
Group independent segments | Run multiple segments simultaneously |
maxConcurrency |
Limit parallel segments | Control resource usage |
dependsOn() |
Declare dependencies | Ensure correct execution order |
| Resource planning | Calculate memory/CPU needs | Prevent system overload |
| Bottleneck analysis | Identify slow segments | Target optimization efforts |
Key takeaways:
- ✅ Parallelize independent segments for dramatic speedup
- ✅ Use
maxConcurrencyto prevent resource exhaustion - ✅ Respect dependencies with
dependsOn() - ✅ Monitor and optimize bottlenecks
- ✅ Plan resources: memory and CPU allocation
Typical speedup: 30-50% faster execution times
- Writing Rides - Complete ride composition guide
- Writing Segments - Segment configuration including dependencies
- Core Concepts - Understanding the execution model