Add production deployment configuration and CI/CD by cooper667 · Pull Request #194 · fjelltopp/adx_develop

cooper667 · 2026-01-22T17:29:33Z

Add deploy/ folder with Dockerfile.prod, nginx, uwsgi configs
Add production.ini (secrets externalized to secrets.ini)
Add entrypoint that merges production.ini + secrets.ini at startup
Add build-deploy.yml GitHub Actions workflow
Add dependabot.yml
Update supervisor config with nginx and uwsgi programs

- Add deploy/ folder with Dockerfile.prod, nginx, uwsgi configs - Add production.ini (secrets externalized to secrets.ini) - Add entrypoint that merges production.ini + secrets.ini at startup - Add build-deploy.yml GitHub Actions workflow - Add dependabot.yml - Update supervisor config with nginx and uwsgi programs

Previous commits were force-pushed away from upstream repos.

Change GitHub environment URL for staging deployments to reflect the new domain.

…ploads Updates ckanext-unaids to 5e557c3 which adds CSRF token to file upload authorization requests, fixing 400 errors when uploading files in CKAN 2.11.

Support all package types (dataset, dataset-2, etc.) in download routes. DataPusher was failing with 404 for resources using custom package types.

- Change staging domain from dev-adr to dev.adr.fjelltopp.org - Enable saml2auth plugin and configure Auth0 IDP - Re-enable login/register redirect to SAML2 login - Update ckanext-unaids submodule URL to fork

Bake production.ini into image so config changes flow through CI/CD. Secrets are still merged at runtime via entrypoint from secrets.ini. After this deploys, run: kubectl patch deployment ckan -n adr-s --type='json' -p='[ {"op": "replace", "path": "/spec/template/spec/volumes/3/projected/sources", "value": [ {"secret": {"name": "jwt-keys"}}, {"secret": {"name": "ckan-ini-secrets"}} ]} ]'

- Dockerfile bakes config as /etc/ckan/base.ini - Entrypoint merges base.ini + secrets.ini → /etc/ckan/production.ini - Allows subPath mounts for secrets without overwriting base config After deploy, apply subPath mount patch (see commit message).

Config merge order at startup: base.ini < env.ini < secrets.ini - deploy/base.ini: common config (baked into image) - deploy/staging.ini: staging-specific (CI creates ConfigMap) - deploy/production.ini: prod-specific (CI creates ConfigMap) - Entrypoint merges all three into /tmp/production.ini - CI workflow creates ckan-env-config ConfigMap per environment

Point submodule back to fjelltopp/ckanext-unaids instead of fork, using the same commit as the base branch.

…d_request_access CKAN 2.11 does not register a site_read auth function. The previously recorded submodule commit (1678265) had a stricter ValueError guard that re-raised in non-testing environments, causing a 500 on any restricted_request_access URL. The new commit (9b530ef) correctly swallows the missing-auth-function ValueError unconditionally.

CKAN startup (60s readiness delay + migrations + uWSGI init) takes ~6-7 minutes. The 5m timeout was causing false-failure CI exits even when the deploy succeeded.

The nightly sync replaces staging's DB with a prod snapshot, wiping any api_token rows that only existed on staging. This breaks DataPusher — it holds a ckan.datapusher.api_token JWT whose DB row no longer exists, causing every push job to 401 on callback. Fix: add refresh_datapusher_token() to the sync, called inside the scale-to-0 window after both DB restores complete and before CKAN scales back up: 1. Generates a fresh HS256 JWT using the same secret CKAN uses (api_token.jwt.encode.secret, passed in as CKAN_JWT_SECRET). 2. Inserts a matching row into staging's api_token table via psql. 3. Patches ckan-ini-secrets with the new token so CKAN starts with a token that already exists in the restored DB. No extra pod restart needed — CKAN picks up the correct secret on its normal post-restore scale-up. Also: - Add secrets: get/patch RBAC rule to the adr-sync Role. - Document CKAN_JWT_SECRET in secrets.yaml.template.

Move from the downgraded 9b530ef back to the tip of development (1678265) plus one patch commit (058dd78) that fixes the site_read ValueError on CKAN 2.11 without regressing any other changes.

Prevents stacked deploys when commits are pushed in quick succession. A new push cancels the in-flight CI run before it reaches kubectl, avoiding the window where both the old and new pods are 0/1.

pg_restore --no-privileges strips every GRANT from the restored datastore DB. The datastore_ro role then can't SELECT from _table_metadata, so DataPusher's datastore_search call 500s, push_to_datastore aborts before pushing rows, and the 'complete' callback that creates datatables_view never fires. Uploads appear to succeed but views never show up. Run 'ckan datastore set-permissions' after the restore and pipe the canonical GRANT script into psql as ckan_admin against the datastore DB.

ChasNelson1990 · 2026-05-29T15:47:23Z

+
+## Auth0 layout
+
+There's one Auth0 tenant (canonical `dev-udfgla0l.eu.auth0.com`) with the custom domain `auth-hivtools.unaids.org` promoted on top.


No. There are 2?!

This one is the prod one. Have we dropped the dev one?

ChasNelson1990 · 2026-05-29T15:49:49Z

What about SMTP?

https://github.com/fjelltopp/adx_develop/pull/194/changes/BASE..29f49c6910486b23719d9a19251a22c954524260#diff-612f86aecf87d2bd630ef7b030fa48e67a6c1a15e7def9dac5a3e2040cf8abaaR8

These predate the Azure/AKS + GitHub Actions pipeline and have no live consumer: - Jenkinsfile — Jenkins pipeline; logged into AWS ECR, drove ci_setup/ci_test - ci_setup.sh — Jenkins-only stack bringup ($WORKSPACE/CHANGE_ID) - ci_test.sh — Jenkins daily test job - build_ckan.yml — published ckan_base to ghcr; prod Dockerfile is self-contained - container_build_and_push.sh — manual Docker Hub base-image push, superseded by 'adx build' Frozen since 2022-23 while the rest of the repo moved to CKAN 2.11/Py3.10. Jenkins is decommissioned. Local dev (adx + docker-compose) is unaffected.

rebuild_solr_index.sh is tracked, so ignoring it was a no-op that just hid it from git status. Drop the stale ignore lines.

* ci: gate staging build/deploy on extension test suite Re-adds the CKAN extension tests that died with the Jenkins pipeline, now as a 'test' job in build-deploy.yml that build/deploy depend on. Brings up the docker-compose dev stack (the same flow the old ci_setup.sh/ci_test.sh drove via adx) on pinned submodule commits and runs the 5 Fjelltopp suites: unaids, validation, scheming, dhis2harvester, emailasusername Gate semantics: - push: tests must pass before build and deploy run - redeploy (workflow_dispatch + image_tag): tests skipped, image already tested when built - test failure blocks both build and deploy Tests run against the dev image + bind-mounted submodules, matching local 'adx test'. Not yet verified green under CKAN 2.11/Py3.10. * ci: run tests on PRs + cache pipenv venv and React node_modules - Add pull_request trigger so the extension-test job runs as a visible status check; build/deploy stay gated off pull_request events. - Cache .adxvenv on Pipfile.lock — the dominant cost is bootstrap's 'pipenv sync --dev' (CKAN + ~19 extensions), not the image build. - Cache the unaids React node_modules on its yarn.lock. Both caches are safe on miss, so the first run is a clean cold signal. * ci: run all suites (no fail-fast) and save caches on failure - Run all 5 extension suites and aggregate, so one run reports the full picture instead of stopping at the first failing suite. - Split cache restore/save so .adxvenv and node_modules are saved even when tests fail, making triage runs warm instead of cold. * test: fix extension test suite under CKAN 2.11/Py3.10 - Add 'mock' and 'pyfakefs' to dev-packages: ckanext-validation, -scheming and -emailasusername import the standalone 'mock' package (and pyfakefs in validation), which weren't installed, causing pytest collection errors. Dev-only — prod uses 'pipenv sync' without --dev, so these don't ship. - run_tests: blank CKAN_SMTP_SERVER for test runs so suites never reach the dev stack's smtp4dev. Fixes ckanext-unaids' test_send_dataset_transfer_emails_errors, which asserts mail sending fails when no server is configured. * deps: pin frictionless==5.13.1 and pyfakefs==4.6.* to match extensions The validation/unaids extensions pin frictionless[ckan]==5.13.1 and pyfakefs==4.6.* in their own requirements and pass their own CI against those. Our merged Pipfile had frictionless>=5.0.0,<6.0.0 (drifted to a newer 5.x that dropped Resource.__create__) and pyfakefs=* (6.x dropped the CreateFile API), so ckanext-validation's test suite failed in our stack only. Aligning the pins fixes it without touching the submodule; frictionless is prod-facing but this matches the version the extensions are built and tested against.

The Jenkins/Py2-era testing section referenced nosetests-2.7 and a ckan-nosetests alias that no longer exist. adx test wraps ckan-pytest (pytest in CKAN's venv); update the core-tests command to the pytest equivalent against the mounted /etc/ckan/test-core.ini. Also drop the dead nosetests.xml entry from .gitignore.

cooper667 deployed to staging January 22, 2026 17:40 — with GitHub Actions View deployment

cooper667 marked this pull request as ready for review January 26, 2026 08:14

cooper667 temporarily deployed to staging January 26, 2026 23:56 — with GitHub Actions Inactive

cooper667 temporarily deployed to staging January 27, 2026 00:15 — with GitHub Actions Inactive

cooper667 temporarily deployed to staging January 29, 2026 02:20 — with GitHub Actions Inactive

cooper667 temporarily deployed to staging January 30, 2026 11:53 — with GitHub Actions Inactive

cooper667 temporarily deployed to staging January 30, 2026 21:25 — with GitHub Actions Inactive

cooper667 deployed to staging January 30, 2026 21:41 — with GitHub Actions View deployment

cooper667 temporarily deployed to staging January 31, 2026 23:49 — with GitHub Actions Inactive

cooper667 temporarily deployed to staging February 1, 2026 00:21 — with GitHub Actions Inactive

cooper667 temporarily deployed to staging February 1, 2026 00:23 — with GitHub Actions Inactive

cooper667 had a problem deploying to staging February 1, 2026 00:28 — with GitHub Actions Failure

cooper667 temporarily deployed to staging February 1, 2026 00:42 — with GitHub Actions Inactive

cooper667 temporarily deployed to staging February 1, 2026 01:03 — with GitHub Actions Inactive

cooper667 added 14 commits February 3, 2026 01:31

Update submodules to commits that exist in remotes

d212819

Previous commits were force-pushed away from upstream repos.

Update staging domain to dev.adr.fjelltopp.org

ef26689

Change GitHub environment URL for staging deployments to reflect the new domain.

Update staging domain to dev-adr.fjelltopp.org

e320355

fix(submodules): update ckanext-unaids with CSRF token fix for file u…

212cd3d

…ploads Updates ckanext-unaids to 5e557c3 which adds CSRF token to file upload authorization requests, fixing 400 errors when uploading files in CKAN 2.11.

docs(config): update CSRF comments now that FileUploader sends token

537d89e

fix(submodules): update ckanext-blob-storage with datapusher route fix

86c34ec

Support all package types (dataset, dataset-2, etc.) in download routes. DataPusher was failing with 404 for resources using custom package types.

fix(submodules): update ckanext-blob-storage with trailing slash fix

55d9136

feat: Enable SAML2 Auth0 login on dev.adr.fjelltopp.org

e01ec91

- Change staging domain from dev-adr to dev.adr.fjelltopp.org - Enable saml2auth plugin and configure Auth0 IDP - Re-enable login/register redirect to SAML2 login - Update ckanext-unaids submodule URL to fork

fix(saml): use correct Auth0 client ID for dev.adr.fjelltopp.org

338f36e

fix(saml): use correct Auth0 tenant (dev-udfgla0l)

92405e9

cooper667 force-pushed the ckan211-prod-deploy-pr branch from 10a5add to 21d7e3b Compare February 2, 2026 17:31

fix(submodules): revert ckanext-unaids to upstream fjelltopp repo

c0ad19a

Point submodule back to fjelltopp/ckanext-unaids instead of fork, using the same commit as the base branch.

cooper667 temporarily deployed to production May 27, 2026 11:58 — with GitHub Actions Inactive

cooper667 had a problem deploying to staging May 27, 2026 12:26 — with GitHub Actions Failure

fix(ci): increase kubectl rollout timeout from 5m to 10m

283fb9c

CKAN startup (60s readiness delay + migrations + uWSGI init) takes ~6-7 minutes. The 5m timeout was causing false-failure CI exits even when the deploy succeeded.

cooper667 temporarily deployed to staging May 27, 2026 12:34 — with GitHub Actions Inactive

cooper667 temporarily deployed to staging May 27, 2026 13:05 — with GitHub Actions Inactive

chore: bump ckanext-restricted to latest + site_read fix

7073593

Move from the downgraded 9b530ef back to the tip of development (1678265) plus one patch commit (058dd78) that fixes the site_read ValueError on CKAN 2.11 without regressing any other changes.

cooper667 temporarily deployed to staging May 27, 2026 13:12 — with GitHub Actions Inactive

ci: add concurrency cancel-in-progress to deploy workflows

fd332a4

Prevents stacked deploys when commits are pushed in quick succession. A new push cancels the in-flight CI run before it reaches kubectl, avoiding the window where both the old and new pods are 0/1.

cooper667 temporarily deployed to staging May 27, 2026 13:21 — with GitHub Actions Inactive

Bump ckanext-restricted to d357aa2

f00c48d

cooper667 temporarily deployed to staging May 27, 2026 13:34 — with GitHub Actions Inactive

cooper667 temporarily deployed to production May 27, 2026 14:54 — with GitHub Actions Inactive

cooper667 temporarily deployed to staging May 28, 2026 07:03 — with GitHub Actions Inactive

enbale about

29f49c6

cooper667 temporarily deployed to staging May 28, 2026 17:43 — with GitHub Actions Inactive

cooper667 temporarily deployed to production May 29, 2026 15:47 — with GitHub Actions Inactive

ChasNelson1990 reviewed May 29, 2026

View reviewed changes

cooper667 added 2 commits May 29, 2026 12:19

chore: stop gitignoring rebuild_solr_index.sh and .vuhitra/

62a3ae8

rebuild_solr_index.sh is tracked, so ignoring it was a no-op that just hid it from git status. Drop the stale ignore lines.

cooper667 temporarily deployed to staging May 29, 2026 19:23 — with GitHub Actions Inactive

cooper667 temporarily deployed to staging May 29, 2026 20:59 — with GitHub Actions Inactive

chore: restore submodule refs to branch tips

6ecce97

A-Souhei temporarily deployed to staging May 30, 2026 09:28 — with GitHub Actions Inactive

cooper667 deployed to production May 30, 2026 13:52 — with GitHub Actions View deployment

A-Souhei deployed to staging June 3, 2026 07:40 — with GitHub Actions View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add production deployment configuration and CI/CD#194

Add production deployment configuration and CI/CD#194
cooper667 wants to merge 63 commits into
ckan211-python310-migration-staging-1from
ckan211-prod-deploy-pr

cooper667 commented Jan 22, 2026

Uh oh!

ChasNelson1990 May 29, 2026

Uh oh!

ChasNelson1990 May 29, 2026

Uh oh!

Uh oh!

ChasNelson1990 May 29, 2026

Uh oh!

cooper667 May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants


		## Auth0 layout

		There's one Auth0 tenant (canonical `dev-udfgla0l.eu.auth0.com`) with the custom domain `auth-hivtools.unaids.org` promoted on top.

Conversation

cooper667 commented Jan 22, 2026

Uh oh!

ChasNelson1990 May 29, 2026

Choose a reason for hiding this comment

Uh oh!

ChasNelson1990 May 29, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ChasNelson1990 May 29, 2026

Choose a reason for hiding this comment

Uh oh!

cooper667 May 29, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants