feat(nimbus): add Fenix end-to-end enrollment integration test#15345
Merged
jaredlockhart merged 39 commits intomainfrom Apr 21, 2026
Merged
feat(nimbus): add Fenix end-to-end enrollment integration test#15345jaredlockhart merged 39 commits intomainfrom
jaredlockhart merged 39 commits intomainfrom
Conversation
Because * The existing Fenix Nimbus integration test jobs in CircleCI are gated behind dead branch filters (update_firefox_versions, fix-13658) and the moz-central.test-container.Dockerfile no longer matches the post-Oct-2025 monorepo layout (product flavors removed, gradle tasks renamed). * Without live CI coverage, changes to the recipe JSON contract on either side (Experimenter emission or Fenix consumption) can regress silently. This commit * Adds .github/workflows/fenix-integration-test.yml which downloads a publicly indexed signed debug APK from TaskCluster, brings up the Experimenter stack, mints a recipe via the v6 draft-experiments endpoint, boots an Android emulator with KVM, and exercises the full JEXL + bucketing + enrollment path via nimbus-cli with --preserve-targeting --preserve-bucketing. * Adds experimenter/tests/integration/nimbus/android/mint_fenix_recipe.py, a standalone Python script that reuses helpers.create_experiment() to create a draft and fetches its recipe from /api/v6/draft-experiments/. * Adds experimenter/tests/integration/nimbus/android/run_fenix_test.sh, the emulator-side script that installs the APK, invokes nimbus-cli enroll, dumps enrollment state via log-state, and asserts the expected line appears in logcat. Fixes #15340
…kflow Because * nimbus-cli does not define a --version flag; clap rejects it with "unexpected argument '--version' found" and exits 1, failing the Install step. This commit * Replaces nimbus-cli --version with command -v + nimbus-cli --help | head, which verifies the binary is on PATH and executable without requiring a flag the CLI does not implement. Fixes #15340
Because * make up_prod_detached returns as soon as containers start, but the experimenter gunicorn process needs ~30-60s to finish collectstatic and start serving — during that window nginx answers 502. * helpers._post_form only retries on ConnectionError, not on 5xx responses, so the first POST /nimbus/create/ bombs immediately with "POST /nimbus/create/ failed (502)". This commit * Adds a poll loop against /__lbheartbeat__ with 60 attempts at 5s spacing (5 min budget) before the mint step runs. On timeout it dumps docker compose ps + logs for diagnosis.
… boot Because * setup-cached-build deletes /usr/local/lib/android with sudo to free ~8GB of disk space, but reactivecircus/android-emulator-runner then tries to mkdir that path as the unprivileged runner user and fails with EACCES. This commit * Adds a step between "Enable KVM" and the emulator-runner invocation that recreates /usr/local/lib/android and chowns it to $USER so the Android SDK installer can write to it.
Because
* /api/v6/draft-experiments/ returns recipes with bucketConfig=null
because allocate_bucket_range() only runs on state transitions
(draft-to-preview, draft-to-review, review-to-approve).
* Fenix's Nimbus SDK rejects such recipes:
"nimbus::schema: Malformed experiment found! invalid type: null,
expected struct BucketConfig" — so enrollment never happens.
This commit
* Adds a POST to /nimbus/{slug}/draft-to-preview/ after creating the
draft. That transition calls allocate_bucket_range() which assigns
a concrete NimbusBucketRange to the experiment.
* Switches the recipe fetch to /api/v6/experiments/{slug}/ since the
experiment is no longer in Draft status.
* wait_for_recipe also now validates bucketConfig is populated before
returning the recipe.
… targeting
Because
* helpers.create_basic_experiment hardcodes firefox_min_version="120.!",
producing a targeting JEXL of "(app_version|versionCompare('120.!') >= 0)".
* Fenix debug builds from mozilla-central report app_version="1.0.2616"
(a synthetic dev build number), not a real Firefox release version.
* With "1.0.2616" < "120.!", targeting evaluates to false, Nimbus logs
an empty "Slug | Features | Branch" table, and the assertion fails
because the experiment was never enrolled.
This commit
* Overrides firefox_min_version to "0.0" so the targeting expression
becomes "(app_version|versionCompare('0.0') >= 0)", which is always
true for any app_version the emulator reports.
Because * Fenix debug builds report app_version in the form "1.0.yyww" (e.g. "1.0.2616" for year 26 week 16) via Config.generateDebugVersionName in android-components/plugins/config, which will never satisfy any real "FIREFOX_1xx" version filter like the default "120.!". * Setting firefox_min_version to "0.0" is rejected because the form is a ChoiceField backed by get_version_choices, and the invalid value causes the whole audience form to fail validation — channel gets cleared, bucketConfig.count ends up 0, and application/appId serialize to empty strings. This commit * Passes firefox_min_version="" which is a valid empty choice for the required=False ChoiceField, causing the audience form to accept the override. _get_targeting_min_version then skips the versionCompare clause entirely, so the emitted targeting has no version filter to fail against the debug build.
Because
* Nimbus SDK's dumpStateToLog writes the "slug | features | branch"
table via Rust's log::info! calls which appear in logcat under the
"nimbus::stateful::nimbus_client" tag, not the Kotlin-side
"app-services-Nimbus.kt" tag the assertion was grepping for.
* The previous run proved enrollment actually succeeded — logcat
contained "nimbus_client: fenix-integration-test-experiment |
messaging | treatment-a" — but the assertion regex missed it and
bailed out with "FAIL: found slug references but no nimbus
log-state line".
This commit
* Greps for the Rust log tag and the table-row pipe separator
("nimbus_client:\s*<slug>\s+\|") which is the actual shape of the
log-state output.
* Accepts either branch as the SDK's bucketing hash decides — with
--preserve-bucketing, the --branch flag is advisory and the real
allocation runs.
* On failure dumps nimbus_client lines for easier debugging.
Because * The fenix integration test pulls an APK from the mozilla-central TC index. With no pinned task id we always fetch .latest, but that gives no visibility into which specific build the last green run used and nothing triggers the test when mozilla-central changes. * We already have a daily cron (update-firefox.yml) that bumps pinned TC task ids for desktop-release and desktop-beta and opens a PR — the same machinery should track fenix-debug. This commit * Adds fenix-debug to update-firefox.yml's workflow_dispatch variant options and matrix, mirroring the desktop entries. * Adds a fenix_debug case to external_integration_updater_script.sh pointing at gecko.v2.mozilla-central.latest.mobile.fenix-debug and writing the resolved task id to firefox_fenix_debug_build.env. * Creates an empty firefox_fenix_debug_build.env so the download step's guard on FIREFOX_FENIX_DEBUG_TASK_ID=.+ starts out falling through to .latest; the bumper fills it in on its first run. * Updates the fenix-integration-test.yml triggers: drops the iteration-only "push to 15340" branch; adds a daily cron, push to main, and push to the bumper's update_firefox_fenix_debug branch.
…nnels Because * Per the desktop pattern, we test beta and release channels (nightly churns too much to pin reliably). Fenix should follow the same convention: one test job per channel, one update-bot variant per channel, so Experimenter regressions that only bite a specific channel are caught. * Release builds in particular test the shipping code path and the real-version targeting JEXL — closer to what users actually see. This commit * Matrixes .github/workflows/fenix-integration-test.yml over channel: [beta, release], with per-channel package id, APK namespace, pinned-task-id env file, and env-var name derived from matrix. * Adds fenix-beta and fenix-release variants to update-firefox.yml mirroring the existing desktop entries; removes the debug variant. * Updates external_integration_updater_script.sh: restores fenix_release and fenix_beta cases (they were already there), removes the short-lived fenix_debug case. * Adds --channel arg to mint_fenix_recipe.py so the minted recipe's application/appId match the APK's package id. * Accepts FENIX_PACKAGE and FENIX_CHANNEL in run_fenix_test.sh so the adb install target and nimbus-cli --channel are per-channel. * Triggers on push to main plus update_firefox_fenix_beta and update_firefox_fenix_release. Drops the temporary 15340 iteration trigger now that the flow is stable.
Because * fenix-release is only built during shipping-phase promote (run-on-projects: [] in build-apk/kind.yml:146), so the pinned task id YmbxEtM6QqGQLzrwS532aw is from an old promotion whose artifacts have aged out of TaskCluster retention. The download returned 404. This commit * Clears FIREFOX_FENIX_RELEASE_TASK_ID so the workflow falls through to gecko.v2.mozilla-release.latest.mobile.fenix-release, which resolves to whatever the most recent indexed release promotion task is. The updater bot will repopulate the pinned id on its next cron run.
… paths Because * On PRs that don't touch experimenter/ at all (e.g. unrelated cirrus, schemas, or docs changes), this matrix is ~30 min of wasted CI time per run. Matching the desktop-enrollment pattern, we gate every expensive step behind check-changed-paths. * The build-tracking env files (firefox_fenix_beta_build.env, firefox_fenix_release_build.env) are what the daily bump bot updates. When the bot's PR is the only change on the branch, the test MUST run — so those paths are listed explicitly alongside experimenter/ even though the broader prefix already matches them. Explicit listing survives future narrowing of the experimenter/ prefix and makes intent obvious to reviewers. This commit * Adds pull_request and merge_group triggers so the gate has something to gate against (parity with desktop-enrollment). * Adds check-changed-paths step with paths covering experimenter/ plus the two build env files by name. * Adds `if: steps.check-paths.outputs.should-run == 'true'` to every expensive subsequent step (setup-cached-build, APK download, nimbus-cli install, experimenter stack, mint, emulator runner, teardown).
Because
* The previous gate used the repo-root experimenter/ prefix, which
matches everything under it (docs, legacy frontend, unrelated
tests). That made the explicit env-file paths redundant and the
gate overly broad — docs-only PRs would still trigger the 30-min
matrix.
This commit
* Narrows paths to:
- experimenter/experimenter/ : Django app (recipe serializer, models,
feature configs, nimbus_ui forms)
- experimenter/tests/integration/ : the mint + run scripts and
helpers this test actually uses
- experimenter/tests/firefox_fenix_{beta,release}_build.env : the
bump-bot's target files, which are outside the two dir prefixes
and MUST trigger the test on bumper-only PRs
Because * The previous iteration lived entirely outside the pytest harness: a standalone mint_fenix_recipe.py with sys.path hacks to reach helpers, plus a shell run_fenix_test.sh invoked directly by the emulator runner. Every other integration test in this repo brokers through pytest — reporting, markers, splits, reruns, fixtures. There was no reason Fenix should be the exception. * The existing test_fenix_integration.py was rotted (gradlewbuild-based Kotlin instrumentation path, relied on a ping_server autouse fixture) and its imports would have broken collection of the new test. This commit * Deletes mint_fenix_recipe.py, run_fenix_test.sh, test_fenix_integration.py, and gradlewbuild.py. * Adds experimenter/tests/integration/nimbus/android/test_fenix_enrollment.py as a real pytest test using helpers.create_experiment, marked with @pytest.mark.fenix_enrollment. Channel and APK path come from env vars (FENIX_CHANNEL / FENIX_APK_PATH) so the workflow matrix can scope each job to one channel. * Adds the fenix_enrollment marker to experimenter/tests/pytest.ini. * Repairs the integration_test_nimbus_fenix Makefile target — was pointing poetry at a non-existent pyproject; now installs from experimenter/tests/pyproject.toml and runs pytest with the fenix_enrollment marker. * Workflow swaps the bespoke run_fenix_test.sh for `make integration_test_nimbus_fenix` inside the emulator-runner script block, drops the separate mint step (now in the test), and uploads test-reports/ on failure instead of raw logcat (pytest surfaces the last 30 nimbus_client lines itself on assertion fail).
Because * The Makefile target crashed with pytest exit 4 (usage error) right after the "plugins: ..." line — no collected-items summary printed. * The most likely cause is pytest.ini's addopts trying to write --junitxml=experimenter/tests/integration/test-reports/... to a non-existent directory when pytest is invoked on the host (desktop tests run inside docker where the dir is pre-created). This commit * mkdir -p experimenter/tests/integration/test-reports before pytest runs so the junit xml write target exists. * Runs a preliminary `pytest --co -q -m fenix_enrollment ...` collect-only pass so we see collection output before the real run — if something's still wrong with marker/import we'll see it clearly.
Because
* poetry -C <dir> changes the subprocess cwd. With -C experimenter/tests,
pytest ran from experimenter/tests and resolved
"experimenter/tests/integration/nimbus/android" against that cwd,
yielding "experimenter/tests/experimenter/tests/..." — not found,
exit 4, no tests collected.
* pytest.ini's addopts has --junitxml=experimenter/tests/integration/...
which hits the same double-prefix problem from that cwd.
This commit
* Replaces poetry -C with an explicit "cd experimenter/tests &&
poetry run pytest", and passes the pytest path relative to that
cwd ("integration/nimbus/android").
* Overrides pytest.ini addopts with "-o addopts=" and re-adds the
warnings plugin opt-out + an explicit --junitxml relative to the
new cwd, so the junit report lands at the expected path under
integration/test-reports/.
Because * FENIX_APP = "fenix" duplicated BaseExperimentApplications.FIREFOX_FENIX.value from nimbus.models.base_dataclass — imported once and reused by the rest of the nimbus integration test suite. * FEATURE_SLUG = "messaging" reached around the application_feature_ids fixture already defined in conftest, which maps each app to its canonical no-feature-* feature id. Using that fixture keeps the Fenix test aligned with how the rest of the harness configures experiments (no-op feature, enrollment-mechanics-only). This commit * Imports BaseExperimentApplications and sets FENIX_APP from the enum. * Drops the FEATURE_SLUG constant + the inline get_feature_id_as_string call. * Injects the application_feature_ids fixture and reads the fenix entry from it.
…r prefixes Because * All other integration tests keep fixtures in conftest.py, not alongside the test. Defining fenix_channel/apk_path/experiment_slug in test_fenix_enrollment.py broke that pattern. * Existing module-level helpers in this suite use plain names (test_cirrus_integration.py:7 "def navigate_to(...)"), not a _semi_private prefix. The _mint_preview_experiment / _wait_for_recipe naming was imported from a different convention. * Reaching into helpers._post_form from a test bypasses the public wrapper pattern that helpers.py already uses for state transitions (e.g. end_experiment). This commit * Adds experimenter/tests/integration/nimbus/android/conftest.py housing fenix_channel, fenix_apk_path, and a fenix-specific experiment_slug override, following the existing `@pytest.fixture(name="x")` / `def fixture_x()` convention. * Renames _mint_preview_experiment → mint_preview_experiment and _wait_for_recipe → wait_for_recipe to match the plain-name convention. * Adds a public helpers.launch_to_preview(slug) wrapper next to end_experiment; the test now calls that instead of helpers._post_form directly.
Because
* Several guards in this test were written reflexively for cases we
don't actually encounter, either because upstream code already
catches them (curl -f) or because the input shape is known
(hard-coded fixtures). They add reading cost without catching
anything real.
This commit removes:
* Quote-stripping on TASK_ID after sourcing the env file —
sourcing FOO="bar" assigns `bar`, the stripping never fires.
* Eval-based indirection — replaced with modern `${!TASK_ID_VAR}`.
* `file "$FENIX_APK_PATH"` sanity check after curl — `curl -sSfL`
already fails on HTTP errors, so a bad download never reaches this
line.
* `ls -lh` diagnostic on the downloaded APK — debug-only noise.
* `command -v nimbus-cli` + `nimbus-cli --help | head -5` install
sanity checks — install failure already aborts via `set -e`.
* `-p no:warnings` pytest flag — copy-paste from pytest.ini addopts,
no evidence of actual warning noise to suppress.
* `assert feature_id, ...` in the enrollment test —
application_feature_ids is a hard-coded dict with a fenix entry;
other tests in this suite don't guard the lookup either.
* The `if match is None: ... assert match is not None:` block —
nonsense control flow to satisfy pyright; replaced with a single
assertion that computes the debug context unconditionally (only
materialized as a string when the assert fires).
Debug breadcrumb left over from the 30GB-runner disk-space iteration. Not asserting anything, not gating anything — print-only.
Every echo in the APK-resolve step and the backend-wait poll was narration — "Using pinned X", "Downloading Y", "Poll N: HTTP Z, waiting..." — restating what the next line was about to do or confirming something GHA already shows via exit codes and command traces. Dropped. Also collapsed the backend-wait loop: curl -sfk returns non-zero on non-2xx, so the separate HTTP code extraction + string comparison was redundant. Kept: * The ::error:: annotation + docker compose ps/logs dump on timeout — real error context. * curl failure → shell failure (-sSfL, set -e) drives the step result.
The pinned task id in firefox_fenix_{beta,release}_build.env is the
contract with the update-firefox bumper job. If the file is missing
or empty that's a bug in the bumper, not something the test should
silently paper over by falling back to .latest.
This commit
* Drops the if-pinned-else-latest branch in the APK download step.
The env file is sourced unconditionally; missing file or empty
variable fails the step with a clear error.
* Drops the apk_namespace matrix key which only existed to feed the
fallback.
Other tests in this suite (create_experiment in nimbus/conftest.py)
put experiment-setup wrappers behind a factory fixture, not a plain
module-level function. Doing the same for Fenix keeps the pattern
consistent.
This commit
* Moves mint_preview_experiment + wait_for_recipe from the test
module to experimenter/tests/integration/nimbus/android/conftest.py,
packaging them into a single factory fixture that:
- Creates the draft via helpers.create_experiment
- Transitions draft → preview via helpers.launch_to_preview
- Polls /api/v6/experiments/{slug}/ until bucketConfig is populated
- Returns the resolved recipe
* application_feature_ids is pulled from the existing parent conftest
fixture rather than looked up directly.
* test_fenix_enrollment shrinks to the actual behavior it asserts:
call the fixture to get a recipe, push via nimbus-cli, assert the
log-state row shows up in logcat.
…iment Matches the existing create_experiment factory-fixture naming in nimbus/conftest.py. Descriptive (creates a Fenix experiment) and parallels the generic factory rather than inventing new vocabulary.
curl exit code 22 (HTTP error from empty URL path) is opaque. Catch the empty-var case explicitly and print a GHA ::error:: annotation pointing at the fix (run the update-firefox bumper for that channel).
Queried gecko.v2.mozilla-release.latest.mobile.fenix-release via the TaskCluster index API and pinned the current task id so the release integration test passes out of the gate — the daily update-firefox bumper will roll it forward from here.
Full audit — removed everything not required for the test to pass: Workflow (fenix-integration-test.yml): * BUILDKIT_PROGRESS + COMPOSE_ANSI env vars (cosmetic log formatting) * FENIX_RECIPE_PATH env var (unused; test uses tmp_path) * FENIX_EXPERIMENT_SLUG env var (unused; test uses fixture) * FENIX_PACKAGE env var + matrix `package:` field (unused since the test doesn't verify package presence) * "(migrate + load features + up_prod)" step-name parenthetical * Teardown step (ephemeral runner; desktop workflow doesn't have one) Makefile: * -v flag on pytest (one test; default output is fine) conftest.py: * fenix_channel / fenix_apk_path skip-if-missing guards — env always set in CI; os.environ[...] KeyError on misconfig is the right signal * Path(value).exists() existence check — curl would have failed earlier if path is bad * reference_branch override in create_fenix_experiment — default form already produces "control" / "treatment-a" branches * population_percent="100" — default in create_basic_experiment * total_enrolled_clients="1000000" — default 55 is sufficient for 100% bucketing at the namespace level * firefox_min_version="" override — real Fenix beta/release builds report versions >120, default "120.!" targeting passes naturally test_fenix_enrollment.py: * -r -t -g flags on adb install — fresh emulator, production-signed APK, no runtime permissions needed by enrollment
Discovered by strip-test-and-observe: removing this flip the default
firefox_min_version back to "120.!", producing a targeting JEXL of
"(app_version|versionCompare('120.!') >= 0)".
* Release fenix APK reports a plain version like "141.0" and passes.
* Beta fenix APK reports a suffixed version like "141.0b5"; whatever
versionCompare does with the beta suffix, it does not satisfy the
"120.!" check on our beta emulator, so targeting fails and Nimbus
does not enroll.
firefox_min_version="" is the smallest override that keeps targeting
channel-agnostic for this enrollment-mechanics test.
pytest-rerunfailures masks the first-attempt failure: the rerun hits a duplicate-slug ValueError because the first attempt already created the experiment in experimenter. The ValueError bubbles up as the only visible failure, hiding whatever assertion actually failed. Disabling rerunfailures shows the real first-attempt error so we can fix the root cause.
curl's built-in retry handles both connection errors (stack booting) and non-2xx responses (nginx up but gunicorn not ready returning 502), so the 13-line bash loop + docker-compose-logs-on-failure collapses to a single curl invocation.
The Fenix release and beta builds run maybeFetchExperiments during normal startup, which pulls live recipes from production Remote Settings. Our nimbus-cli enroll applies our test experiment locally, but the concurrent background fetch returns real production recipes without our test experiment in the list. Nimbus then evolves against that fresh fetch — treating our test experiment as a server-side ended experiment — and unenrolls us. The log-state table ends up showing whichever production experiment claims the same feature slot (no-feature-fenix), not our test. Disabling wifi + mobile data on the emulator before enrolling keeps the fetch from succeeding. Our local apply survives and log-state shows the test experiment as intended. No need to re-enable — the emulator is discarded at the end of the run.
This was referenced Apr 21, 2026
yashikakhurana
approved these changes
Apr 21, 2026
Contributor
yashikakhurana
left a comment
There was a problem hiding this comment.
awesome, we got our tests back, thank you @jaredlockhart
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Because
This commit
.github/workflows/fenix-integration-test.yml, matrixed over[beta, release], that downloads a signed Fenix APK from the indexed TaskCluster route (gecko.v2.mozilla-{beta,release}.latest.mobile.fenix-{beta,release}), stands up the Experimenter stack, mints a preview-state experiment via the pytest harness, boots an Android emulator with KVM, and runs the full JEXL + bucketing + enrollment path vianimbus-cli enroll --preserve-targeting --preserve-bucketing.test_fenix_enrollment.pyuses the standard@pytest.mark.fenix_enrollmentmarker and thecreate_fenix_experimentfactory fixture inandroid/conftest.py, which reuseshelpers.create_experiment+ a newhelpers.launch_to_previewwrapper and polls/api/v6/experiments/{slug}/for the allocated bucketConfig.maybeFetchExperimentscan't overwrite our local enrollment with production Remote Settings recipes (which would otherwise evolve-unenroll our test experiment in favor of a real production one that claims the same feature slot).fenix-betaandfenix-releasevariants toupdate-firefox.ymlso the existing daily bumper refreshes the pinned TC task ids inexperimenter/tests/firefox_fenix_{beta,release}_build.env, mirroring the desktop variants.Fixes #15340