Skip to content

Hotfix/jrc 8103.delete.timing.window.machinery#2

Closed
jrodiz wants to merge 4 commits into
hotfix/jrc--8103.Shadow.capture.process.start.causefrom
hotfix/jrc--8103.Delete.timing.window.machinery
Closed

Hotfix/jrc 8103.delete.timing.window.machinery#2
jrodiz wants to merge 4 commits into
hotfix/jrc--8103.Shadow.capture.process.start.causefrom
hotfix/jrc--8103.Delete.timing.window.machinery

Conversation

@jrodiz

@jrodiz jrodiz commented May 21, 2026

Copy link
Copy Markdown
Owner

firebase-perf: delete API 34+ timing-window machinery and adopt causal signal as default (firebase#8103 follow-up, PR 3)

AppStartTrace's background-start detection on API 34+ has been a timing-window
heuristic since firebase#5920 / PR firebase#7281 (50 ms threshold) and firebase#8103 (1000 ms threshold PR firebase#8195). PR 1 added the OS-provided process start cause (ProcessStartCause) as observation-only telemetry; PR 2 wired it as an opt-in behind a kill switch.

This PR 3 completes the refactor:

  • Removes the legacy machinery for API 34+ paths.
  • Removes the kill switch, making the new behavior the only behavior.

Pre-API-34 path is untouched throughout — the pre-bug main-thread ordering still holds and the existing runnable-fired-before-onActivityCreated check still works there.

Changes:

  • StartFromBackgroundRunnable is now scheduled only on Build.VERSION.SDK_INT < 34. The
    runnable's output (mainThreadRunnableTime) is unused on API 34+ after PR 3, so the
    post is skipped — one fewer main-thread message during cold launch on modern devices.
  • Removed the four timing-window tests
    (testStartFromBackground_within1000ms / _moreThan1000ms / _largeAppGap_isForegroundStart / _warmStart_stillSuppressed).
  • Removed the kill-switch-off "no detection" test
  • Renamed and simplified the five PR 2 causalSignal_* tests to api34Plus_*,
    dropping the when(configResolver.getIsAppStartCausalSignalEnabled()).thenReturn(true)
    mock setup that's no longer relevant.
  • Updated experimentTrace_includesProcessStartCauseAttributes to drop the
    timingWindowDecision and decisionAgreed assertions.
  • Added two @Config(sdk = 33) regression tests for the still-active pre-API-34
    path (preApi34_runnableFiredBeforeActivity_marksAsBackground,
    preApi34_runnableNotFired_traceLogs).

Also deleted:

  • MAX_BACKGROUND_RUNNABLE_DELAY constant.
  • addProcessStartCauseExperimentAttributes's timingWindowDecision and decisionAgreed telemetry attributes — they were the rollout-time comparison signal (cause vs heuristic). With the heuristic gone, they no longer carry independent information; the cause IS the decision.

Behavior matrix

API range Before this PR After this PR
< 34 mainThreadRunnableTime != null ⇒ suppress Same
34+ Timing-window heuristic with 1000 ms threshold ProcessStartCause decides. FOREGROUND ⇒ log; BACKGROUND / UNKNOWN ⇒ suppress; null ⇒ suppress (defensive).

Test plan

  • ./gradlew :firebase-perf:spotlessApply
  • ./gradlew :firebase-perf:testReleaseUnitTest — 0 failures / 0 errors.
    AppStartTraceTest: 11/11.
    ProcessStartCauseTest: 7/7.
  • Compile under compileSdkVersion = 34 confirmed.
  • End-to-end on API 33/34/35/36 emulators. Published the branch SDK to mavenLocal, built a tiny test app (TODO: UPLOAD TEST APP) that reflects on AppStartTrace.processStartCause and AppStartTrace.isStartedFromBackground at each lifecycle checkpoint, and ran three scenarios on each emulator: foreground launcher tap, background broadcast, warm start (broadcast → activity in the same process). All 12 (scenario × API) combinations classified correctly. Detailed evidence in TODO: UPLOAD TEST APP

jrodiz added 4 commits May 20, 2026 19:21
…se#8103 follow-up, Phase 2)

Phase 2 of the architectural refactor sketched in PLAN_APPSTART_CAUSAL_SIGNAL.md.
Builds on Phase 1's shadow capture (commits 7f96ee3 and a9d1822) by letting
the captured ProcessStartCause own the isStartedFromBackground decision when a
new Remote Config kill switch is enabled. **Default off**, so this PR ships
dark; behavior is identical to Phase 1 until the flag is flipped.

Decision logic in resolveIsStartedFromBackground() now:

  1. If mainThreadRunnableTime == null, return (unchanged early exit).
  2. If kill switch ON AND processStartCause != null:
       - FOREGROUND → clear timer, return (trace will log).
       - BACKGROUND → mark isStartedFromBackground = true, return (suppress).
       - UNKNOWN    → fall through.
  3. Existing timing-window check unchanged: pre-API-34 or
     duration > MAX_BACKGROUND_RUNNABLE_DELAY → mark background.

Kill switch follows the same pattern as SdkEnabled: manifest metadata →
Remote Config → device cache → default (false). RC flag is
"fpr_experiment_app_start_causal_signal".

Why the kill switch matters
---------------------------
The causal signal was validated on emulator (API 33/34/35/36) per the
out-of-tree experiment app. Physical-device validation for API 34 is still
pending — that's the original failure mode reported in firebase#5920 / firebase#8103, and we
can't reproduce it on emulator. Phase 1 telemetry from production tells us
whether to flip the kill switch on. If the signal mis-classifies on some OEM,
flip it back without a code change.

Files
-----
- ConfigurationConstants.java: new ExperimentAppStartCausalSignal flag class
  with RC, device cache, and metadata names.
- ConfigResolver.java: getIsAppStartCausalSignalEnabled() with the standard
  resolution order.
- AppStartTrace.java: causal-signal branch added at the top of
  resolveIsStartedFromBackground; TODO comment refreshed (Phase 2 done, Phase
  3 — deleting the timing-window machinery — is the new pending step).
- AppStartTraceTest.java: 6 new tests covering each (flag-state × cause ×
  timing) combination plus a defensive null-cause case. Existing 9 tests stay
  green unchanged (flag defaults to false → identical to Phase 1).
- CHANGELOG.md: one [changed] line under # Unreleased noting the opt-in
  causal-signal pathway, default off.

Tests
-----
./gradlew :firebase-perf:spotlessApply — clean
./gradlew :firebase-perf:testReleaseUnitTest — 0 failures / 0 errors
AppStartTraceTest: 15/15. ProcessStartCauseTest: 7/7.

Rollout
-------
1. Merge (no behavior change).
2. Wait for Phase 1 telemetry to accumulate.
3. Flip RC flag to true for 1% → 10% → 50% → 100% over a few days each.
4. Once stable, open Phase 3 PR to delete the timing-window machinery for
   API 34+.
…follow-up, Phase 3)

Phase 3 of the architectural refactor sketched in PLAN_APPSTART_CAUSAL_SIGNAL.md.
Removes the legacy timing-window heuristic from the API 34+ code path now that
Phase 2 has wired the causal signal as an alternative. Pre-API-34 path is
untouched — the pre-bug main-thread ordering still works there.

Kill switch (fpr_experiment_app_start_causal_signal) remains defaulted to
false; Phase 4 (separate PR) will flip the default.

** Documented behavior regression for API 34+ during the Phase 3 → Phase 4
gap **
With this PR shipped and the kill switch in its default (off) state, API 34+
no longer performs any background-start detection in AppStartTrace. Warm
starts (process forked for a service/broadcast, user launches activity later)
will generate _app_start traces with inflated durations, bounded above by
MAX_LATENCY_BEFORE_UI_INIT (60s). This regression is closed as soon as Phase
4 flips the kill switch default to true.

Changes
-------
AppStartTrace.java
  - Removed MAX_BACKGROUND_RUNNABLE_DELAY constant (the Phase 0 1000 ms
    threshold).
  - Rewrote resolveIsStartedFromBackground() with a clean pre-34 / API 34+
    split:
      * API <34: mainThreadRunnableTime != null ⇒ background (pre-bug
        ordering check, unchanged).
      * API 34+ kill switch off: no detection. Trace logs.
      * API 34+ kill switch on: ProcessStartCause owns the decision.
        FOREGROUND logs; BACKGROUND and UNKNOWN suppress; null cause is
        treated as background (defensive).
  - Refreshed TODO comment to reflect Phase 3 done, Phase 4 (default flip)
    pending.
  - mainThreadRunnableTime, StartFromBackgroundRunnable,
    setMainThreadRunnableTime remain in place for the pre-API-34 path.

FirebasePerfEarly.java
  - Schedules StartFromBackgroundRunnable only when Build.VERSION.SDK_INT
    < 34. On API 34+ the runnable's output (mainThreadRunnableTime) is
    unused, so the post saves one main-thread message during cold launch.

AppStartTraceTest.java
  - Deleted the four timing-window-specific tests
    (testStartFromBackground_within1000ms, _moreThan1000ms,
    _largeAppGap_isForegroundStart, _warmStart_stillSuppressed). Their
    behavior is gone in Phase 3.
  - Renamed and updated the Phase 2 causalSignal_* tests to reflect the
    new semantics (UNKNOWN suppresses; no timing fallback).
  - Added two @config(sdk = 33) regression tests for the still-active
    pre-API-34 path (runnable-fired-before-activity → suppress;
    runnable-not-fired → log).

CHANGELOG.md
  - One [changed] line under # Unreleased documenting both the removal and
    the Phase 4 default flip that follows.

Tests
-----
./gradlew :firebase-perf:spotlessApply — clean
./gradlew :firebase-perf:testReleaseUnitTest — 0 failures / 0 errors
AppStartTraceTest: 12/12 (5 reframed + 2 new pre-34 regression + 5 existing).
ProcessStartCauseTest: 7/7 (unchanged from Phase 1).
…rebase#8103 follow-up, Phase 4)

Phase 4 of the architectural refactor. Removes the
ExperimentAppStartCausalSignal kill switch added in Phase 2 (commit ca53202)
and makes the OS-provided process start cause the unconditional decision on
API 34+. Pre-API-34 behavior is unchanged. Closes the regression gap that
Phase 3 (0270114) opened between landing and Phase 4.

Combined with Phase 3 in the same PR per atomic-rollout preference: Phase 3
deletes the legacy timing-window machinery, Phase 4 removes the gating flag,
and the net effect is a clean architectural refactor of AppStartTrace's
background-start detection on API 34+.

Changes
-------
ConfigurationConstants.java
  - Deleted ExperimentAppStartCausalSignal flag class.

ConfigResolver.java
  - Deleted getIsAppStartCausalSignalEnabled() method.
  - Removed the corresponding import.

AppStartTrace.java
  - resolveIsStartedFromBackground() no longer consults the kill switch; on
    API 34+ it always uses ProcessStartCause. Same behavior as Phase 2 with
    the switch turned on:
      * FOREGROUND → trace logs.
      * BACKGROUND or UNKNOWN → suppress.
      * null cause → suppress (defensive).
  - Trimmed the addProcessStartCauseExperimentAttributes helper to drop
    timingWindowDecision and decisionAgreed attributes — the rollout
    comparison telemetry no longer carries independent signal now that the
    cause IS the decision.
  - Class-level TODO comment cleaned up (Phase 3 and Phase 4 done; the
    multi-phase work is now historical, not pending).

AppStartTraceTest.java
  - Renamed Phase-2 causalSignal_on_* tests to api34Plus_* — no kill-switch
    state to capture in the name anymore.
  - Removed the kill-switch mock setup from those tests.
  - Updated experimentTrace_includesProcessStartCauseAttributes to drop the
    timingWindowDecision and decisionAgreed assertions.

CHANGELOG.md
  - Collapsed the two intermediate-phase entries into a single user-facing
    line describing the end state. (firebase#8103 entry preserved.)

Tests
-----
./gradlew :firebase-perf:spotlessApply — clean
./gradlew :firebase-perf:testReleaseUnitTest — 0 failures / 0 errors
AppStartTraceTest: 11/11 (was 12: -1 test that exercised the kill-switch-off
"no detection" regression gap, since the gap no longer exists).
ProcessStartCauseTest: 7/7 (unchanged from Phase 1).
…to importance (firebase#8103)

Fix uncovered during end-to-end testing on the Phase 3+4 branch.

Problem
-------
`ActivityManager.getHistoricalProcessStartReasons(1)` is documented to return
HISTORICAL process start info, not necessarily the current one. The most-
recent recorded entry can be from a previous run of the app — the OS hasn't
always recorded the current process's start by the time
FirebasePerfEarly captures the signal.

Observed on API 35 emulator during rapid-restart testing: after running a
foreground launcher tap, force-stopping, and triggering a background
broadcast, the API returned the prior launcher tap's record. We classified
the background-triggered process as FOREGROUND/LAUNCHER and failed to
suppress the resulting _app_start trace.

Fix
---
1. Query a handful of historical records (5 instead of 1) and scan for one
   whose getPid() matches Process.myPid(). Stale entries from previous
   runs have different PIDs and get filtered out.
2. If no matching record is found, fall back to importance-based
   classification (same heuristic as the API 34 path). importance reflects
   the current process's state, not historical state.

Behavior matrix
---------------
API 33: unchanged (pre-bug ordering path).
API 34: unchanged.
API 35+ foreground launch, ApplicationStartInfo recorded:
  cause=FOREGROUND, reason=LAUNCHER. Same as before.
API 35+ foreground launch, ApplicationStartInfo NOT yet recorded (observed
even on fresh emulators — the OS appears to record asynchronously):
  cause falls back to importance. importance=100 ⇒ FOREGROUND. Correct.
API 35+ background broadcast → activity (warm start), ApplicationStartInfo
NOT yet recorded:
  importance=400 ⇒ UNKNOWN ⇒ suppress. **Previously misclassified as
  FOREGROUND if the previous run was a launcher tap.**

End-to-end verification
-----------------------
Ran 3 scenarios (foreground / background broadcast / warm start) on four
emulators with the fix in place. All 12 combinations classified correctly.
Tested fresh-boot one-by-one to avoid emulator memory pressure.
Detailed log evidence in `TEST_RESULTS_PHASE_3_4_FIX.md`.

Tests
-----
./gradlew :firebase-perf:spotlessApply — clean
./gradlew :firebase-perf:testReleaseUnitTest — 0 failures / 0 errors
AppStartTraceTest: 11/11. ProcessStartCauseTest: 7/7.

The existing ProcessStartCauseTest cases don't exercise the reflective
API-35+ path (Robolectric 4.12 doesn't support @config(sdk = 35)), so the
new behavior is validated end-to-end on real emulators rather than via
unit tests.
@jrodiz jrodiz closed this May 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant