Skip to content

fix(producer): derive duration from sub-composition timing#1680

Merged
miguel-heygen merged 1 commit into
mainfrom
fix/zero-duration-subcomp-fallback
Jun 24, 2026
Merged

fix(producer): derive duration from sub-composition timing#1680
miguel-heygen merged 1 commit into
mainfrom
fix/zero-duration-subcomp-fallback

Conversation

@miguel-heygen

Copy link
Copy Markdown
Collaborator

Summary

  • getDeclaredDuration() in the bridge script now computes max(data-start + data-duration) across all sub-compositions when the root element has no explicit data-duration
  • Fixes renders that fail with "zero duration" when sub-compositions carry their own timing but the root doesn't declare an overall duration

Test plan

  • fileServer tests pass (39/39)
  • Build, lint, typecheck pass

@vanceingalls vanceingalls left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review — fix(producer): derive duration from sub-composition timing

Verdict: LGTM with two small NITs.

Head SHA reviewed: f69b78c19dc819f6af24eba86d2f7f46a2eb9ce6. Base: main. Tiny surgical fix (+9/-1, single file).


What the change actually does

In packages/producer/src/services/fileServer.ts:473-486, the bridge script's getDeclaredDuration() previously returned 0 whenever the root [data-composition-id] had no positive data-duration. The fix adds a fallback: walk all [data-composition-src] sub-composition wrappers and compute max(data-start + data-duration). This fires only after the runtime path returns 0 (see the call site at fileServer.ts:529-530: return d > 0 ? d : getDeclaredDuration();), so the precedence chain is player.getDuration() → root data-duration → NEW sub-comp aggregation → 0.


Source-of-truth audit (Pattern #2) — OK

I traced the parallel duration resolution in packages/core/src/runtime/timeline.ts:276-345. The player-side already does a richer aggregation: attribute → timeline → media-window → composition-window, with a loop-inflation guard. That path is the authoritative one and stays untouched. The new bridge fallback is only reached when both p.getDuration() AND root data-duration already report 0 — i.e. the broken state the PostHog dashboard surfaced. No two-source-of-truth conflict; the player path wins whenever it has any signal.

Cross-PR coexistence — OK

  • HF #1662 (init.ts:432-444, merged) made per-clip isTimedElementVisibleAt prefer authored over live-timeline duration. Different scope (clip visibility window) from this PR (root composition duration). Same philosophy though — explicit-authored beats derived. No interaction.
  • HF #1652 (merged) added the stream-duration parity check (ffprobe audio/video stream durations within MAX_STREAM_DRIFT_SECONDS = 0.5 of each other). That gate runs post-render against the produced MP4 streams, not against getDeclaredDuration directly. The new fallback gives the engine a non-zero render envelope which sub-comps then drive — A/V parity should be a function of the sub-comps' own muxes, unchanged. No conflict.

Pattern #3 (silent scope gap) — minor NIT

The walk is document.querySelectorAll('[data-composition-src]'). That excludes the root (which has data-composition-id but no -src, correctly) and also excludes any non-comp-slot timed siblings — e.g. captions tracks (registry/examples/product-promo/index.html shows the canonical layout: captions get data-composition-id + data-start + data-duration with no data-composition-src). For the PostHog cohort the bug was clearly sub-comp-based, so this matches symptom. But if a composition has ONLY captions-style siblings and no root data-duration, the fallback still returns 0. Worth considering whether the query should be '[data-start][data-duration]' (any timed child) instead of '[data-composition-src]' (sub-comp wrappers only) — see NIT 1 below.

Pattern #10 (decorative gate) — N/A

The fallback isn't gated on a populated-elsewhere signal; it's a pure DOM read. No populate-path to verify.


NITs (non-blocking)

  1. Broader query? [data-composition-src] works for the observed sub-comp case but skips captions/track-only siblings. Consider [data-start][data-duration] (or union both selectors). Up to you — the conservative selector is the safe pick if you only want this firing for the exact failing cohort.
  2. No unit test for the new branch. packages/producer/src/services/fileServer.test.ts:528-568 has a test for the root-attribute path (data-duration = "15") but not for the new sub-comp aggregation path. A jsdom-style test mirroring the existing one — root with no data-duration, two sub-comp wrappers at start=0,dur=2 and start=2,dur=3, expect __hf.duration === 5 — would lock the regime in. Small lift since the harness is right there.

CI

CI is effectively green. The single regression failure (shard-8) is a GitHub LFS infra hiccup (Fatal error: We couldn't respond to your request in time on git lfs fetch, retried 3×, never reached test execution — No files were found with the provided path: packages/producer/tests/*/failures/). Other shards were cancelled by fail-fast, not by their own failures. Worth a retry but not a code issue. All CI-package checks (Build, Lint, Typecheck, Test, Fallow audit, Studio smoke, CLI smoke, Windows render, runtime contract) are SUCCESS.

Review by Via

@james-russo-rames-d-jusso james-russo-rames-d-jusso left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

R1 review @ HEAD f69b78c1

PostHog symptom: renders failing with "zero duration" when the root composition lacks data-duration but sub-comps carry their own timing. Fix: when root duration is absent or non-positive, derive max(data-start + data-duration) across all [data-composition-src] elements. Symptom-suppress with a fallback path; root cause (root element ships without data-duration) is also addressed at the consumer (the bridge can now make progress without depending on the authoring tool to set it).

Findings

Where What
🟡 fileServer.ts:481-487 When the fallback fires, no signal is emitted. The bridge script runs in the rendered iframe at producer time — a console.warn here would surface in the producer log + downstream PostHog cohort, letting oncall distinguish "duration came from root attr" from "duration was derived from sub-comp scan." Especially useful since the fallback might mask a different upstream bug (e.g. authoring tool stopped emitting data-duration on root) the team would want to know about.
🟡 fileServer.ts:482 querySelectorAll('[data-composition-src]') selects ALL sub-comp hosts, not just direct children of the root. If a sub-comp host's data-start is relative to a parent sub-comp (nested authoring), start + duration overcounts vs the timeline's actual end. At HEAD I don't see runtime code that produces nested-sub-comp shapes in the bridge's iframe — by the time the bridge runs, sub-comps have been inlined via inlineSubCompositions and timing is flattened. Worth a brief PR-body sentence confirming data-start is root-relative at bridge time, or a code comment, so the assumption is load-bearing-visible.
🟡 fileServer.ts:481-487 Inverse / parity asymmetry: the new fallback diverges from the existing parsing convention elsewhere in the codebase. core/src/runtime/timeline.ts:22,203,266 use parseNum(...), while this code uses Number(...) || 0. The behavior is functionally equivalent for the values that appear in practice (numeric strings) but the || 0 collapses NaN and a literal 0 and negative-zero into 0 — fine here, just inconsistent. If parseNum was added precisely to handle some edge case (locale comma, scientific notation, etc.), this fallback skips it. Worth a quick check that the divergence is intentional.
🟡 tests PR body claims fileServer tests pass (39/39) but the new branch (lines 482-487) has no test coverage in the diff. The bridge script is a string template, not directly testable as JS, but a test that asserts the rendered string contains the new fallback shape, or an integration test that drives a no-data-duration-on-root composition through the producer and asserts the resolved duration matches max(start+duration), would pin the regression.
🟢 fileServer.ts:483-487 Loop shape is correct: start || 0 + dur || 0 is safe; if (dur > 0) correctly skips entries that have a start but no duration (would otherwise inflate maxEnd).
🟢 unit semantics data-duration is seconds throughout the codebase (startResolver.test.ts, timeline.test.ts, runtime parseNum calls) and p.getDuration() also returns seconds. New code stays in the same unit.

CI

All required green except regression + regression-shards (shard-8) — both failed in actions/checkout (git lfs fetch → "batch response: Fatal error"). LFS infra flake, not real test failures. Rerun should clear.

Verdict

LGTM-with-followups. The fallback is correctly shaped and addresses a real production-blocking case. Two real polish items: (1) emit a log/telemetry when the fallback fires so the PostHog dashboard that surfaced this bug doesn't go dark on the suppressed shape, (2) brief test or PR-body sentence pinning the nested-sub-comp / data-start-is-root-relative invariant. Not stamping autonomously — tagging <@U0ARJFN5S6Q> for stamp once flakes clear.

— Rames D Jusso

@miguel-heygen miguel-heygen force-pushed the fix/zero-duration-subcomp-fallback branch from f69b78c to 5ddbd6a Compare June 23, 2026 23:26

@vanceingalls vanceingalls left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

R2 re-review — fix(producer): derive duration from sub-composition timing

Re-reviewed at 5ddbd6ada55b7f9df8b48688982d91fd5cb21e5a (force-push over f69b78c1). Diff vs prior SHA: fileServer.ts is byte-identical; only fileServer.test.ts gained 38 lines.

Verdict: LGTM. One R1 NIT addressed, one not — and the not-addressed one stays a NIT, not a blocker.


Per-item disposition

  1. R1 NIT 1 — selector widening (captions-style siblings): Not addressed. The walk at fileServer.ts:478 is still document.querySelectorAll('[data-composition-src]'). Captions-shape children (data-composition-id + data-start + data-duration with no -src) would still leave the fallback at 0 if the root also lacks data-duration. This was a NIT, not a blocker, in R1 — the PostHog cohort is sub-comp-shaped, and Miguel's choice of the conservative selector matches the symptom. I'm fine leaving it as-is; flagging for the followup ticket if the captions-only case ever lights up in PostHog.

  2. R1 NIT 2 — no test for the new branch: Addressed. New test at fileServer.test.ts:571-607: root querySelector returns null for data-duration, querySelectorAll returns two sub-comp wrappers at start=0,dur=5 + start=5,dur=8, asserts __hf.duration === 13. Same jsdom-style harness as the existing readiness test at :528-569. Locks the regime in. Nicely shaped.

  3. R1 cross-PR coexistence (#1662, #1652): Re-verified. init.ts:432-444 (HF #1662) at main is untouched — still scopes per-element isTimedElementVisibleAt with authored-timing dominance. This PR scopes root-composition duration. No interaction. #1652 ffprobe parity is post-render against MP4 streams, also untouched.

  4. Fallback ordering re-verified: Same monotonic chain — p.getDuration() → root data-duration (now guarded by Number.isFinite) → sub-comp walk → 0. The Number.isFinite(d) && d > 0 guard at :477 (vs the previous d > 0) is a free NaN-safety upgrade. No two-source-of-truth drift.

Rames's R1 findings (cross-reviewer note)

Rames flagged at f69b78c1: (a) no telemetry when fallback fires, (b) nested-sub-comp data-start invariant, (c) Number(...)||0 vs parseNum parsing convention drift, (d) no test for new branch. The new SHA addresses (d). (a-c) remain valid NITs; same disposition — flag-for-followup, not blocker.

CI

Required checks at this SHA: Build, Lint, Typecheck, Test, Fallow audit, SDK: unit + contract + smoke, Studio: load smoke, Test: runtime contract, Format, File size check, Semantic PR title, player-perf, preview-regression — all SUCCESS. regression-shards (all 8) + Windows render + CLI smoke + Smoke: global install + Analyze (javascript-typescript) still IN_PROGRESS at review time — none failed. The LFS infra flake from the previous run cleared.

Review by Via

@james-russo-rames-d-jusso james-russo-rames-d-jusso left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

R2 verification @ HEAD 5ddbd6ad

R1 was at f69b78c1; force-push at 23:26:26Z. Diff vs R1 SHA per Via R2: fileServer.ts is byte-identical; the new commit adds the 38-line test for the sub-comp aggregation branch at fileServer.test.ts:571-607. My R1 🟡 #4 is the addressed item.

R1 resolution table

Item Status Evidence
🟡 #1 — no log/telemetry on fallback path 🛑 Still open fileServer.ts:474-487 (getDeclaredDuration) silent throughout. When the new sub-comp aggregation fires, no signal. Same observability theme Rames Jusso flagged on Slack.
🟡 #2 — nested-sub-comp invariant unstated 🛑 Still open No PR-body sentence or code comment confirming data-start is root-relative at bridge time. The assumption is load-bearing but invisible. NIT-grade.
🟡 #3parseNum vs Number(...) parsing convention drift 🛑 Still open fileServer.ts:482-483 still uses Number(comps[i].getAttribute('data-start')) || 0 while core/src/runtime/timeline.ts:22,203,266 uses parseNum(...). Functionally equivalent for the values in practice; cosmetic drift remains.
🟡 #4 — new branch L482-487 untested Resolved New test at fileServer.test.ts:571-607: document.querySelectorAll returns two sub-comp wrappers at start=0,dur=5 + start=5,dur=8, asserts __hf.duration === 13. Same jsdom-style harness as the existing readiness test. Pins the new branch behavior exactly.
🟢 — Number.isFinite NaN-safety upgrade (Via R2 caught) Resolved-positive fileServer.ts:477 changed from d > 0 to Number.isFinite(d) && d > 0 — free NaN-safety upgrade. Confirms Via's R2 spot of a one-line strengthening I missed at R1.

R2-NEW

  • 🟢 Via R2 (LGTM, 23:30:53Z): NIT-grade selector-widening for captions-style siblings ([data-composition-src] excludes [data-start][data-duration]-only) carries as follow-up, matches my R1 cross-PR concern that the cohort might leak. Both Via and I converged: not a blocker, the PostHog cohort is sub-comp-shaped, conservative selector is the right pick.
  • 🟢 Cross-PR coexistence with #1662 (per-clip visibility window at init.ts:432-444) + #1652 (post-render ffprobe stream parity) re-confirmed at main. Both untouched. No two-source-of-truth drift; player-side aggregation at timeline.ts:276-345 stays authoritative.

CI

All required CI green at this SHA: Build / Lint / Typecheck / Test / Fallow audit / SDK / Studio load smoke / runtime contract / Format / file-size / semantic PR title / player-perf / preview-regression. Regression shards 1-8 + Windows render + CLI smoke IN_PROGRESS at review time — none failed. Prior LFS-flake on regression cleared on the new run.

Verdict

COMMENT-clean (LGTM-with-followups). The single load-bearing R1 ask (test coverage on the new branch) is addressed exactly. Remaining 🟡s are observability + convention-drift polish, all NIT-grade. Per feedback_open_item_alternative_resolution: operational goal met. Free Number.isFinite upgrade is a nice unprompted hardening.

Stamp routing

Not stamping. Observability open across the stack — same hold as #1678. If Miguel takes Rames Jusso's path 2 (dedicated observability PR on top), this can stamp as-is. Tagging <@U0ARJFN5S6Q> when go-ahead lands.

— Rames D Jusso

…has no data-duration

When the root element lacks an explicit data-duration attribute and
there is no GSAP timeline, getDeclaredDuration now computes
max(data-start + data-duration) across all sub-compositions instead
of returning zero.
@miguel-heygen miguel-heygen force-pushed the fix/zero-duration-subcomp-fallback branch from 5ddbd6a to fef38e8 Compare June 23, 2026 23:39

@james-russo-rames-d-jusso james-russo-rames-d-jusso left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

R3 verification @ HEAD fef38e80 — ✅ LGTM (clean)

R2 was at 5ddbd6ad with verdict COMMENT-clean LGTM-with-followups (observability + nested-comp invariant + parseNum drift all NIT-grade). R3 push at 22:57:59Z.

R2 → R3 delta (the smallest reproducible diff)

Exactly 1 line changed across the entire PR vs R2 HEAD:

# packages/producer/src/services/fileServer.ts:482-486 @ fef38e80
   for (var i = 0; i < comps.length; i++) {
     var start = Number(comps[i].getAttribute('data-start')) || 0;
     var dur = Number(comps[i].getAttribute('data-duration')) || 0;
     if (dur > 0) maxEnd = Math.max(maxEnd, start + dur);
   }
+  if (maxEnd > 0) console.warn('[HF Bridge] No root data-duration; derived ' + maxEnd + 's from sub-compositions');
   return maxEnd;

The fileServer.test.ts 38-line aggregation test from R2 is unchanged. The bridge-script shape is unchanged.

R2 resolution table

Item R2 status R3 status Evidence
🟡 R1 #1 — no log/telemetry on fallback path 🛑 Still open at R2 Resolved New console.warn('[HF Bridge] No root data-duration; derived ' + maxEnd + 's from sub-compositions') at fileServer.ts:486 fires precisely on the fallback branch (maxEnd > 0 means we derived a duration from sub-comp aggregation, i.e. the new code path ran). Oncall + PostHog cohort that surfaced the original "zero duration" symptom now have signal when the bridge derives duration. This is the exact shape R1 asked for.
🟡 R1 #2 — nested-sub-comp invariant unstated 🛑 Still open at R2 🛑 Carries No PR-body sentence or comment confirming data-start is root-relative at bridge time. NIT-grade.
🟡 R1 #3parseNum vs Number(...) parsing convention drift 🛑 Still open at R2 🛑 Carries `Number(...)
✅ R1 #4 — branch untested Resolved at R2 ✅ Carries Test at fileServer.test.ts:571-607 still pins the aggregation behavior.

R3-NEW

  • 🟢 The new log message wording is good — calls out what derived (maxEnd + 's') and why (No root data-duration), so an oncall can distinguish this from the player's authoritative duration path (packages/core/src/runtime/timeline.ts:276-345) without grepping further. Fires only when the fallback actually computed a useful value (maxEnd > 0), so it doesn't add noise on the still-broken-zero-duration case.
  • 🟢 Per feedback_observability_pr_failure_path_coverage — fallback path is reached only when the player AND root attribute both report 0; the warning fires after deriving > 0 from sub-comps. Failure paths (no [data-composition-id] root, no [data-composition-src] sub-comps, all data-duration ≤ 0) silently return 0 — no log there. That's correct: the warning is for "we recovered" not for "we failed," and the upstream zero-duration symptom is itself the signal.
  • 🟢 Per feedback_scan_recent_peer_reviews — checked for new Via reviews since 23:30; none on this PR.

Cross-PR observability theme (R1/R2 gating)

This PR led the cross-PR theme. #1680's R3 directly addresses the observability ask Rames Jusso raised on Slack — emit signal when the suppressed path fires, so the PostHog cohort that flagged the original symptom isn't silenced. Other PRs in the stack (#1678 partial on producer side, #1679 partial on consumer side via getTrackStyle warn) are converging on the same pattern. #1680 is the cleanest exemplar.

CI

All required checks SUCCESS at review time (Build, Lint, Typecheck, Format, Fallow audit, CodeQL python, SDK, runtime contract, CLI smoke, Studio load smoke, Player perf, Preview parity). CodeQL javascript-typescript + Test + Render-on-windows + Tests-on-windows + regression shards (1-8) IN_PROGRESS — no failures yet. CLI smoke + Smoke global install IN_PROGRESS at review time.

Verdict

LGTM. R1 #1 (the most substantive ask) resolved exactly as requested; the two remaining NITs are cosmetic. PR is mergeable.

Stamp routing

Tagging <@U0ARJFN5S6Q> for stamp per James's directive — PR is clean.

— Rames D Jusso

@jrusso1020 jrusso1020 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM at fef38e80 — exact observability shape we wanted on the derived-duration fallback. The new console.warn('[HF Bridge] No root data-duration; derived ' + maxEnd + 's from sub-compositions') fires precisely when the bridge recovers, so the PostHog cohort that flagged the zero-duration symptom now has signal.

— Jerrai

@vanceingalls vanceingalls left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-reviewed at fef38e80778ea89690874d2b6c9ddf6874e8b4f0. R2 LGTM was at 5ddbd6ad; one new commit since.

Per-item disposition (R2 deferred)

Widen the duration-walk selector to catch captions-style siblings (data-composition-id + data-start + data-duration, no -src). Not addressed.

The new commit adds a fallback walk in the HF Bridge script at packages/producer/src/services/fileServer.ts:474-489:

var comps = document.querySelectorAll('[data-composition-src]');

This is the same selector shape that prompted the R2 follow-up. The caption-style siblings — composition elements that have data-composition-id, data-start, data-duration but no data-composition-src (the cohort case) — will be skipped by this walk too. So the PR ships a new code path that inherits the original gap rather than closing it.

Not blocking on its own — the new walk is strictly additive (it only fires when the root has no data-duration and the existing path returns 0). It does close the no-GSAP root case correctly for compositions whose sub-comps are -src-referenced. The captions-cohort case is still broken, just via a slightly different path now.

Suggested follow-up (same shape as the R2 ask, now applicable to two walks): broaden to [data-composition-src],[data-start][data-duration][data-composition-id]:not([data-composition-src]) — or whatever the canonical "any composition-like sibling with declared timing" selector is in this codebase. Worth filing as a single follow-up that fixes both walks together, since they'll otherwise drift.

CI green at HEAD. Approving on the substantive fallback fix; the selector breadth carries forward as the deferred follow-up.

Review by Via

@miguel-heygen miguel-heygen merged commit 45a9440 into main Jun 24, 2026
44 checks passed
@miguel-heygen miguel-heygen deleted the fix/zero-duration-subcomp-fallback branch June 24, 2026 00:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants