Add production deployment configuration and CI/CD#194
Draft
cooper667 wants to merge 63 commits into
Draft
Conversation
cooper667
commented
Jan 22, 2026
- Add deploy/ folder with Dockerfile.prod, nginx, uwsgi configs
- Add production.ini (secrets externalized to secrets.ini)
- Add entrypoint that merges production.ini + secrets.ini at startup
- Add build-deploy.yml GitHub Actions workflow
- Add dependabot.yml
- Update supervisor config with nginx and uwsgi programs
- Add deploy/ folder with Dockerfile.prod, nginx, uwsgi configs - Add production.ini (secrets externalized to secrets.ini) - Add entrypoint that merges production.ini + secrets.ini at startup - Add build-deploy.yml GitHub Actions workflow - Add dependabot.yml - Update supervisor config with nginx and uwsgi programs
Previous commits were force-pushed away from upstream repos.
Change GitHub environment URL for staging deployments to reflect the new domain.
…ploads Updates ckanext-unaids to 5e557c3 which adds CSRF token to file upload authorization requests, fixing 400 errors when uploading files in CKAN 2.11.
Support all package types (dataset, dataset-2, etc.) in download routes. DataPusher was failing with 404 for resources using custom package types.
- Change staging domain from dev-adr to dev.adr.fjelltopp.org - Enable saml2auth plugin and configure Auth0 IDP - Re-enable login/register redirect to SAML2 login - Update ckanext-unaids submodule URL to fork
Bake production.ini into image so config changes flow through CI/CD.
Secrets are still merged at runtime via entrypoint from secrets.ini.
After this deploys, run:
kubectl patch deployment ckan -n adr-s --type='json' -p='[
{"op": "replace", "path": "/spec/template/spec/volumes/3/projected/sources", "value": [
{"secret": {"name": "jwt-keys"}},
{"secret": {"name": "ckan-ini-secrets"}}
]}
]'
- Dockerfile bakes config as /etc/ckan/base.ini - Entrypoint merges base.ini + secrets.ini → /etc/ckan/production.ini - Allows subPath mounts for secrets without overwriting base config After deploy, apply subPath mount patch (see commit message).
Config merge order at startup: base.ini < env.ini < secrets.ini - deploy/base.ini: common config (baked into image) - deploy/staging.ini: staging-specific (CI creates ConfigMap) - deploy/production.ini: prod-specific (CI creates ConfigMap) - Entrypoint merges all three into /tmp/production.ini - CI workflow creates ckan-env-config ConfigMap per environment
10a5add to
21d7e3b
Compare
Point submodule back to fjelltopp/ckanext-unaids instead of fork, using the same commit as the base branch.
…d_request_access CKAN 2.11 does not register a site_read auth function. The previously recorded submodule commit (1678265) had a stricter ValueError guard that re-raised in non-testing environments, causing a 500 on any restricted_request_access URL. The new commit (9b530ef) correctly swallows the missing-auth-function ValueError unconditionally.
CKAN startup (60s readiness delay + migrations + uWSGI init) takes ~6-7 minutes. The 5m timeout was causing false-failure CI exits even when the deploy succeeded.
The nightly sync replaces staging's DB with a prod snapshot, wiping any
api_token rows that only existed on staging. This breaks DataPusher —
it holds a ckan.datapusher.api_token JWT whose DB row no longer exists,
causing every push job to 401 on callback.
Fix: add refresh_datapusher_token() to the sync, called inside the
scale-to-0 window after both DB restores complete and before CKAN
scales back up:
1. Generates a fresh HS256 JWT using the same secret CKAN uses
(api_token.jwt.encode.secret, passed in as CKAN_JWT_SECRET).
2. Inserts a matching row into staging's api_token table via psql.
3. Patches ckan-ini-secrets with the new token so CKAN starts
with a token that already exists in the restored DB.
No extra pod restart needed — CKAN picks up the correct secret on
its normal post-restore scale-up.
Also:
- Add secrets: get/patch RBAC rule to the adr-sync Role.
- Document CKAN_JWT_SECRET in secrets.yaml.template.
Move from the downgraded 9b530ef back to the tip of development (1678265) plus one patch commit (058dd78) that fixes the site_read ValueError on CKAN 2.11 without regressing any other changes.
Prevents stacked deploys when commits are pushed in quick succession. A new push cancels the in-flight CI run before it reaches kubectl, avoiding the window where both the old and new pods are 0/1.
pg_restore --no-privileges strips every GRANT from the restored datastore DB. The datastore_ro role then can't SELECT from _table_metadata, so DataPusher's datastore_search call 500s, push_to_datastore aborts before pushing rows, and the 'complete' callback that creates datatables_view never fires. Uploads appear to succeed but views never show up. Run 'ckan datastore set-permissions' after the restore and pipe the canonical GRANT script into psql as ckan_admin against the datastore DB.
|
|
||
| ## Auth0 layout | ||
|
|
||
| There's one Auth0 tenant (canonical `dev-udfgla0l.eu.auth0.com`) with the custom domain `auth-hivtools.unaids.org` promoted on top. |
Member
There was a problem hiding this comment.
This one is the prod one. Have we dropped the dev one?
Author
These predate the Azure/AKS + GitHub Actions pipeline and have no live consumer: - Jenkinsfile — Jenkins pipeline; logged into AWS ECR, drove ci_setup/ci_test - ci_setup.sh — Jenkins-only stack bringup ($WORKSPACE/CHANGE_ID) - ci_test.sh — Jenkins daily test job - build_ckan.yml — published ckan_base to ghcr; prod Dockerfile is self-contained - container_build_and_push.sh — manual Docker Hub base-image push, superseded by 'adx build' Frozen since 2022-23 while the rest of the repo moved to CKAN 2.11/Py3.10. Jenkins is decommissioned. Local dev (adx + docker-compose) is unaffected.
rebuild_solr_index.sh is tracked, so ignoring it was a no-op that just hid it from git status. Drop the stale ignore lines.
* ci: gate staging build/deploy on extension test suite Re-adds the CKAN extension tests that died with the Jenkins pipeline, now as a 'test' job in build-deploy.yml that build/deploy depend on. Brings up the docker-compose dev stack (the same flow the old ci_setup.sh/ci_test.sh drove via adx) on pinned submodule commits and runs the 5 Fjelltopp suites: unaids, validation, scheming, dhis2harvester, emailasusername Gate semantics: - push: tests must pass before build and deploy run - redeploy (workflow_dispatch + image_tag): tests skipped, image already tested when built - test failure blocks both build and deploy Tests run against the dev image + bind-mounted submodules, matching local 'adx test'. Not yet verified green under CKAN 2.11/Py3.10. * ci: run tests on PRs + cache pipenv venv and React node_modules - Add pull_request trigger so the extension-test job runs as a visible status check; build/deploy stay gated off pull_request events. - Cache .adxvenv on Pipfile.lock — the dominant cost is bootstrap's 'pipenv sync --dev' (CKAN + ~19 extensions), not the image build. - Cache the unaids React node_modules on its yarn.lock. Both caches are safe on miss, so the first run is a clean cold signal. * ci: run all suites (no fail-fast) and save caches on failure - Run all 5 extension suites and aggregate, so one run reports the full picture instead of stopping at the first failing suite. - Split cache restore/save so .adxvenv and node_modules are saved even when tests fail, making triage runs warm instead of cold. * test: fix extension test suite under CKAN 2.11/Py3.10 - Add 'mock' and 'pyfakefs' to dev-packages: ckanext-validation, -scheming and -emailasusername import the standalone 'mock' package (and pyfakefs in validation), which weren't installed, causing pytest collection errors. Dev-only — prod uses 'pipenv sync' without --dev, so these don't ship. - run_tests: blank CKAN_SMTP_SERVER for test runs so suites never reach the dev stack's smtp4dev. Fixes ckanext-unaids' test_send_dataset_transfer_emails_errors, which asserts mail sending fails when no server is configured. * deps: pin frictionless==5.13.1 and pyfakefs==4.6.* to match extensions The validation/unaids extensions pin frictionless[ckan]==5.13.1 and pyfakefs==4.6.* in their own requirements and pass their own CI against those. Our merged Pipfile had frictionless>=5.0.0,<6.0.0 (drifted to a newer 5.x that dropped Resource.__create__) and pyfakefs=* (6.x dropped the CreateFile API), so ckanext-validation's test suite failed in our stack only. Aligning the pins fixes it without touching the submodule; frictionless is prod-facing but this matches the version the extensions are built and tested against.
The Jenkins/Py2-era testing section referenced nosetests-2.7 and a ckan-nosetests alias that no longer exist. adx test wraps ckan-pytest (pytest in CKAN's venv); update the core-tests command to the pytest equivalent against the mounted /etc/ckan/test-core.ini. Also drop the dead nosetests.xml entry from .gitignore.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.