Hotfix/jrc 8103.delete.timing.window.machinery#2
Closed
jrodiz wants to merge 4 commits into
Closed
Conversation
…se#8103 follow-up, Phase 2) Phase 2 of the architectural refactor sketched in PLAN_APPSTART_CAUSAL_SIGNAL.md. Builds on Phase 1's shadow capture (commits 7f96ee3 and a9d1822) by letting the captured ProcessStartCause own the isStartedFromBackground decision when a new Remote Config kill switch is enabled. **Default off**, so this PR ships dark; behavior is identical to Phase 1 until the flag is flipped. Decision logic in resolveIsStartedFromBackground() now: 1. If mainThreadRunnableTime == null, return (unchanged early exit). 2. If kill switch ON AND processStartCause != null: - FOREGROUND → clear timer, return (trace will log). - BACKGROUND → mark isStartedFromBackground = true, return (suppress). - UNKNOWN → fall through. 3. Existing timing-window check unchanged: pre-API-34 or duration > MAX_BACKGROUND_RUNNABLE_DELAY → mark background. Kill switch follows the same pattern as SdkEnabled: manifest metadata → Remote Config → device cache → default (false). RC flag is "fpr_experiment_app_start_causal_signal". Why the kill switch matters --------------------------- The causal signal was validated on emulator (API 33/34/35/36) per the out-of-tree experiment app. Physical-device validation for API 34 is still pending — that's the original failure mode reported in firebase#5920 / firebase#8103, and we can't reproduce it on emulator. Phase 1 telemetry from production tells us whether to flip the kill switch on. If the signal mis-classifies on some OEM, flip it back without a code change. Files ----- - ConfigurationConstants.java: new ExperimentAppStartCausalSignal flag class with RC, device cache, and metadata names. - ConfigResolver.java: getIsAppStartCausalSignalEnabled() with the standard resolution order. - AppStartTrace.java: causal-signal branch added at the top of resolveIsStartedFromBackground; TODO comment refreshed (Phase 2 done, Phase 3 — deleting the timing-window machinery — is the new pending step). - AppStartTraceTest.java: 6 new tests covering each (flag-state × cause × timing) combination plus a defensive null-cause case. Existing 9 tests stay green unchanged (flag defaults to false → identical to Phase 1). - CHANGELOG.md: one [changed] line under # Unreleased noting the opt-in causal-signal pathway, default off. Tests ----- ./gradlew :firebase-perf:spotlessApply — clean ./gradlew :firebase-perf:testReleaseUnitTest — 0 failures / 0 errors AppStartTraceTest: 15/15. ProcessStartCauseTest: 7/7. Rollout ------- 1. Merge (no behavior change). 2. Wait for Phase 1 telemetry to accumulate. 3. Flip RC flag to true for 1% → 10% → 50% → 100% over a few days each. 4. Once stable, open Phase 3 PR to delete the timing-window machinery for API 34+.
…follow-up, Phase 3)
Phase 3 of the architectural refactor sketched in PLAN_APPSTART_CAUSAL_SIGNAL.md.
Removes the legacy timing-window heuristic from the API 34+ code path now that
Phase 2 has wired the causal signal as an alternative. Pre-API-34 path is
untouched — the pre-bug main-thread ordering still works there.
Kill switch (fpr_experiment_app_start_causal_signal) remains defaulted to
false; Phase 4 (separate PR) will flip the default.
** Documented behavior regression for API 34+ during the Phase 3 → Phase 4
gap **
With this PR shipped and the kill switch in its default (off) state, API 34+
no longer performs any background-start detection in AppStartTrace. Warm
starts (process forked for a service/broadcast, user launches activity later)
will generate _app_start traces with inflated durations, bounded above by
MAX_LATENCY_BEFORE_UI_INIT (60s). This regression is closed as soon as Phase
4 flips the kill switch default to true.
Changes
-------
AppStartTrace.java
- Removed MAX_BACKGROUND_RUNNABLE_DELAY constant (the Phase 0 1000 ms
threshold).
- Rewrote resolveIsStartedFromBackground() with a clean pre-34 / API 34+
split:
* API <34: mainThreadRunnableTime != null ⇒ background (pre-bug
ordering check, unchanged).
* API 34+ kill switch off: no detection. Trace logs.
* API 34+ kill switch on: ProcessStartCause owns the decision.
FOREGROUND logs; BACKGROUND and UNKNOWN suppress; null cause is
treated as background (defensive).
- Refreshed TODO comment to reflect Phase 3 done, Phase 4 (default flip)
pending.
- mainThreadRunnableTime, StartFromBackgroundRunnable,
setMainThreadRunnableTime remain in place for the pre-API-34 path.
FirebasePerfEarly.java
- Schedules StartFromBackgroundRunnable only when Build.VERSION.SDK_INT
< 34. On API 34+ the runnable's output (mainThreadRunnableTime) is
unused, so the post saves one main-thread message during cold launch.
AppStartTraceTest.java
- Deleted the four timing-window-specific tests
(testStartFromBackground_within1000ms, _moreThan1000ms,
_largeAppGap_isForegroundStart, _warmStart_stillSuppressed). Their
behavior is gone in Phase 3.
- Renamed and updated the Phase 2 causalSignal_* tests to reflect the
new semantics (UNKNOWN suppresses; no timing fallback).
- Added two @config(sdk = 33) regression tests for the still-active
pre-API-34 path (runnable-fired-before-activity → suppress;
runnable-not-fired → log).
CHANGELOG.md
- One [changed] line under # Unreleased documenting both the removal and
the Phase 4 default flip that follows.
Tests
-----
./gradlew :firebase-perf:spotlessApply — clean
./gradlew :firebase-perf:testReleaseUnitTest — 0 failures / 0 errors
AppStartTraceTest: 12/12 (5 reframed + 2 new pre-34 regression + 5 existing).
ProcessStartCauseTest: 7/7 (unchanged from Phase 1).
…rebase#8103 follow-up, Phase 4) Phase 4 of the architectural refactor. Removes the ExperimentAppStartCausalSignal kill switch added in Phase 2 (commit ca53202) and makes the OS-provided process start cause the unconditional decision on API 34+. Pre-API-34 behavior is unchanged. Closes the regression gap that Phase 3 (0270114) opened between landing and Phase 4. Combined with Phase 3 in the same PR per atomic-rollout preference: Phase 3 deletes the legacy timing-window machinery, Phase 4 removes the gating flag, and the net effect is a clean architectural refactor of AppStartTrace's background-start detection on API 34+. Changes ------- ConfigurationConstants.java - Deleted ExperimentAppStartCausalSignal flag class. ConfigResolver.java - Deleted getIsAppStartCausalSignalEnabled() method. - Removed the corresponding import. AppStartTrace.java - resolveIsStartedFromBackground() no longer consults the kill switch; on API 34+ it always uses ProcessStartCause. Same behavior as Phase 2 with the switch turned on: * FOREGROUND → trace logs. * BACKGROUND or UNKNOWN → suppress. * null cause → suppress (defensive). - Trimmed the addProcessStartCauseExperimentAttributes helper to drop timingWindowDecision and decisionAgreed attributes — the rollout comparison telemetry no longer carries independent signal now that the cause IS the decision. - Class-level TODO comment cleaned up (Phase 3 and Phase 4 done; the multi-phase work is now historical, not pending). AppStartTraceTest.java - Renamed Phase-2 causalSignal_on_* tests to api34Plus_* — no kill-switch state to capture in the name anymore. - Removed the kill-switch mock setup from those tests. - Updated experimentTrace_includesProcessStartCauseAttributes to drop the timingWindowDecision and decisionAgreed assertions. CHANGELOG.md - Collapsed the two intermediate-phase entries into a single user-facing line describing the end state. (firebase#8103 entry preserved.) Tests ----- ./gradlew :firebase-perf:spotlessApply — clean ./gradlew :firebase-perf:testReleaseUnitTest — 0 failures / 0 errors AppStartTraceTest: 11/11 (was 12: -1 test that exercised the kill-switch-off "no detection" regression gap, since the gap no longer exists). ProcessStartCauseTest: 7/7 (unchanged from Phase 1).
…to importance (firebase#8103) Fix uncovered during end-to-end testing on the Phase 3+4 branch. Problem ------- `ActivityManager.getHistoricalProcessStartReasons(1)` is documented to return HISTORICAL process start info, not necessarily the current one. The most- recent recorded entry can be from a previous run of the app — the OS hasn't always recorded the current process's start by the time FirebasePerfEarly captures the signal. Observed on API 35 emulator during rapid-restart testing: after running a foreground launcher tap, force-stopping, and triggering a background broadcast, the API returned the prior launcher tap's record. We classified the background-triggered process as FOREGROUND/LAUNCHER and failed to suppress the resulting _app_start trace. Fix --- 1. Query a handful of historical records (5 instead of 1) and scan for one whose getPid() matches Process.myPid(). Stale entries from previous runs have different PIDs and get filtered out. 2. If no matching record is found, fall back to importance-based classification (same heuristic as the API 34 path). importance reflects the current process's state, not historical state. Behavior matrix --------------- API 33: unchanged (pre-bug ordering path). API 34: unchanged. API 35+ foreground launch, ApplicationStartInfo recorded: cause=FOREGROUND, reason=LAUNCHER. Same as before. API 35+ foreground launch, ApplicationStartInfo NOT yet recorded (observed even on fresh emulators — the OS appears to record asynchronously): cause falls back to importance. importance=100 ⇒ FOREGROUND. Correct. API 35+ background broadcast → activity (warm start), ApplicationStartInfo NOT yet recorded: importance=400 ⇒ UNKNOWN ⇒ suppress. **Previously misclassified as FOREGROUND if the previous run was a launcher tap.** End-to-end verification ----------------------- Ran 3 scenarios (foreground / background broadcast / warm start) on four emulators with the fix in place. All 12 combinations classified correctly. Tested fresh-boot one-by-one to avoid emulator memory pressure. Detailed log evidence in `TEST_RESULTS_PHASE_3_4_FIX.md`. Tests ----- ./gradlew :firebase-perf:spotlessApply — clean ./gradlew :firebase-perf:testReleaseUnitTest — 0 failures / 0 errors AppStartTraceTest: 11/11. ProcessStartCauseTest: 7/7. The existing ProcessStartCauseTest cases don't exercise the reflective API-35+ path (Robolectric 4.12 doesn't support @config(sdk = 35)), so the new behavior is validated end-to-end on real emulators rather than via unit tests.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
firebase-perf: delete API 34+ timing-window machinery and adopt causal signal as default (firebase#8103 follow-up, PR 3)
AppStartTrace's background-start detection on API 34+ has been a timing-windowheuristic since firebase#5920 / PR firebase#7281 (50 ms threshold) and firebase#8103 (1000 ms threshold PR firebase#8195). PR 1 added the OS-provided process start cause (
ProcessStartCause) as observation-only telemetry; PR 2 wired it as an opt-in behind a kill switch.This PR 3 completes the refactor:
Pre-API-34 path is untouched throughout — the pre-bug main-thread ordering still holds and the existing runnable-fired-before-
onActivityCreatedcheck still works there.Changes:
StartFromBackgroundRunnableis now scheduled only onBuild.VERSION.SDK_INT < 34. Therunnable's output (
mainThreadRunnableTime) is unused on API 34+ after PR 3, so thepost is skipped — one fewer main-thread message during cold launch on modern devices.
(
testStartFromBackground_within1000ms/_moreThan1000ms/_largeAppGap_isForegroundStart/_warmStart_stillSuppressed).causalSignal_*tests toapi34Plus_*,dropping the
when(configResolver.getIsAppStartCausalSignalEnabled()).thenReturn(true)mock setup that's no longer relevant.
experimentTrace_includesProcessStartCauseAttributesto drop thetimingWindowDecisionanddecisionAgreedassertions.@Config(sdk = 33)regression tests for the still-active pre-API-34path (
preApi34_runnableFiredBeforeActivity_marksAsBackground,preApi34_runnableNotFired_traceLogs).Also deleted:
MAX_BACKGROUND_RUNNABLE_DELAYconstant.addProcessStartCauseExperimentAttributes'stimingWindowDecisionanddecisionAgreedtelemetry attributes — they were the rollout-time comparison signal (cause vs heuristic). With the heuristic gone, they no longer carry independent information; the cause IS the decision.Behavior matrix
mainThreadRunnableTime != null⇒ suppressProcessStartCausedecides. FOREGROUND ⇒ log; BACKGROUND / UNKNOWN ⇒ suppress; null ⇒ suppress (defensive).Test plan
./gradlew :firebase-perf:spotlessApply./gradlew :firebase-perf:testReleaseUnitTest— 0 failures / 0 errors.AppStartTraceTest: 11/11.
ProcessStartCauseTest: 7/7.
compileSdkVersion = 34confirmed.mavenLocal, built a tiny test app (TODO: UPLOAD TEST APP) that reflects onAppStartTrace.processStartCauseandAppStartTrace.isStartedFromBackgroundat each lifecycle checkpoint, and ran three scenarios on each emulator: foreground launcher tap, background broadcast, warm start (broadcast → activity in the same process). All 12 (scenario × API) combinations classified correctly. Detailed evidence in TODO: UPLOAD TEST APP