Feature/production containers by XanderVertegaal · Pull Request #87 · CentreForDigitalHumanities/langpro-annotator

XanderVertegaal · 2026-03-27T14:43:42Z

This PR adds production-ready containers to our Docker Compose setup, inspired by ParsePort and Docker's own guide for containerizing an Angular application. As discussed, it also adds an Nginx container through which all communications are funneled.

Clarifications to some of the decisions I made have been added to the code (where they are relevant), and I am happy to provide context and further explanation where desired. There are a few things I would like you to consider with special attention.

Instead of psycopg2, the backend uses psycopg2-binary. See my comment at requirements.in for more details.
settings.py has la-backend in ALLOWED_HOSTS and 'localhost' in CSRF_TRUSTED_ORIGINS. Since this is run as part of a Docker network and the backend should now no longer be contacted directly, I think this should be safe, but I could be wrong.
glue.py is no longer used. I think this is fine, since the backend is no longer serving the frontend static files and this setup intends to use settings.py file in backend/langpro_annotator.

If you agree, the following needs to be done (as part of separate issues/PRs, not here):

Hide the language select menu in the frontend until (if at all) we implement proper internationalization.
Add Angular and Nginx container log output to /logs folder.

I have tried to deploy this on the test server but I think it requires initializing the langpro-container and LangPro submodules, which I was unable to do. (It asks for my Git credentials, which are not accepted.)

Other containers can use this to refer to whatever container variant (dev or prod) turns out to be running.

Not necessary, but less unwieldy than 'langpro-annotator-langpro-backend-prod'.

This will be replaced in the near future.

…itialized.

… guide

…tion guide

XanderVertegaal · 2026-03-27T14:49:06Z

+# Copy project files.
 COPY . .

-CMD python manage.py check && \


I moved this to the compose file so this Dockerfile can be used for both prod (with Gunicorn) and dev (Django dev server). The dev container does not need Gunicorn, of course, but that is a small price to pay for having one container that serves both profiles.

I'm not a fan of using the same image for development and production. In development, I want all images to derive from buildpack-deps in order to save disk space. In production, it is preferable to go for leaner images. If I have to start using those leaner images in development, I will end up with lots of disjoint images on disk that are small in themselves but that effectively make me pay multiple times for software that I tend to use in nearly every project.

Point taken. I definitely support the idea of splitting them up.

XanderVertegaal · 2026-03-27T14:53:22Z

 django-livereload-server
 django-revproxy>=0.10.0
-psycopg2
+psycopg2-binary


When using psycopg2, the container tries to build its own instance from scratch. The slim container does not have the tools for this but suggests using the psycop2-binary package, which seems to work well and is much faster. I am not what the difference is between installing the binary or building it in the container. Please review this carefully. If we need the non-binary package, we should probably revert to the non-slim container.

https://www.psycopg.org/docs/install.html#psycopg-vs-psycopg-binary

If we can make it work with the source package, that is probably better. The slimmer the image, the more tools you have to install yourself in the Dockerfile, generally.

Right, I'll get it to work with the source package! 👍

XanderVertegaal · 2026-03-27T14:54:58Z

(This file is not really deleted, but renamed to Dockerfile.dev and subsequently modified.)

XanderVertegaal · 2026-03-27T15:52:42Z

 services:
+    nginx:
+        container_name: la-nginx
+        restart: unless-stopped


I use restart: unless-stopped for prod services, so we can actually stop them if we want to and e.g. read the logs. Dev services get restart: no.

XanderVertegaal · 2026-03-27T15:54:46Z

        volumes:
-            - ./:/usr/src/app
+            - ./backend:/usr/src/app
+            - ./logs/django:/usr/src/app/logs


This puts the Gunicorn logs in a directory logs/django outside of the containers so we can read them even on the production server.

Maybe provide the host directory through an environment as well so we can respect the $WEBROOT/logs convention.

XanderVertegaal · 2026-03-27T15:55:29Z

+            - ./frontend:/usr/src/app
+            - frontend-node-modules:/usr/src/app/node_modules
+            - frontend-angular-cache:/usr/src/app/.angular
+        command: ng serve --host 0.0.0.0 --disable-host-check --poll 200


The poll flag is added to enable live reloading.

The container will automatically create a DB with the provided config (host, user, password).

…ase access

jgonggrijp

I'm not very fond of the approach taken and I have some questions and concerns. That being said, having a way to deploy is currrently more important than having an approach that I like. I suggest that you stick with your current approach for now and only adjust it enough to make it work. We can improve on aspects later, deciding whether to change something on a case by case basis.

settings.py has la-backend in ALLOWED_HOSTS and 'localhost' in CSRF_TRUSTED_ORIGINS. Since this is run as part of a Docker network and the backend should now no longer be contacted directly, I think this should be safe, but I could be wrong.

This bit is a bit tricky. You used proxy_pass in the nginx config, which I think is a forward proxy. If I recall correctly, the target of a forward proxy still sees the hostname of the client, which would mean that your current setup wouldn't work in deployment where the hostname of the client isn't localhost. You could go with a reverse proxy instead, or use the deployment module to prepare an environment file that includes the hostname.

glue.py is no longer used. I think this is fine, since the backend is no longer serving the frontend static files and this setup intends to use settings.py file in backend/langpro_annotator.

This could also potentially go wrong. Currently, backend/langpro_annotator/index.py:index is injecting the CSRF cookie in the index page. Without that mechanism, the frontend is going to need another mechanism to obtain that cookie.

If you agree, the following needs to be done (as part of separate issues/PRs, not here):

Hide the language select menu in the frontend until (if at all) we implement proper internationalization.

Add Angular and Nginx container log output to /logs folder.

I agree.

I have tried to deploy this on the test server but I think it requires initializing the langpro-container and LangPro submodules, which I was unable to do. (It asks for my Git credentials, which are not accepted.)

You have to add the dhlabdevelopers account to those repos. See our internal documentation on deployment.

jgonggrijp · 2026-03-31T12:40:42Z

I understand your wish to have one settings.py for development and production, but production settings usually differ more from the development settings than you've accounted for here. You can also see this on the langpro-annotator-prod-containers branch that you created in the deployment repo.

You can a best of both worlds: inject a separate settings.py in deployment. In the injected settings, import * from the settings in the source code. This way, you only have to add/override keys that differ from the development defaults.

jgonggrijp · 2026-03-31T12:46:50Z

+# Copy project files.
 COPY . .

-CMD python manage.py check && \


I'm not a fan of using the same image for development and production. In development, I want all images to derive from buildpack-deps in order to save disk space. In production, it is preferable to go for leaner images. If I have to start using those leaner images in development, I will end up with lots of disjoint images on disk that are small in themselves but that effectively make me pay multiple times for software that I tend to use in nearly every project.

jgonggrijp · 2026-03-31T12:50:45Z

 django-livereload-server
 django-revproxy>=0.10.0
-psycopg2
+psycopg2-binary


https://www.psycopg.org/docs/install.html#psycopg-vs-psycopg-binary

If we can make it work with the source package, that is probably better. The slimmer the image, the more tools you have to install yourself in the Dockerfile, generally.

jgonggrijp · 2026-04-01T11:53:59Z

Maybe better to supply this information through the deployment module. That can be postponed through an issue ticket, as far as I'm concerned.

jgonggrijp · 2026-04-01T12:49:41Z

+WORKDIR /usr/src/app
+
+# Copy package.json and yarn.lock.
+COPY package.json yarn.lock ./


jgonggrijp · 2026-04-01T12:52:45Z

Well documented.

jgonggrijp · 2026-04-01T12:56:48Z

-        restart: always
        volumes:
            - postgres-data:/var/lib/postgresql/data
-            - ./backend/create_db.sql:/docker-entrypoint-initdb.d/langpro.sql


Why remove this line?

The referenced SQL file creates a new DB, but the postgres image does this for you, so it felt superfluous.

The postgres image creates a root user and a default database. The SQL file creates a non-default database and a lesser-privileged user specifically for the application.

https://hub.docker.com/_/postgres#initialization-scripts

But don't we already create a non-default database and user with the env vars in the service definition? Also, the script has hardcoded (and checked in) username + password + database name, so we'd have to change those or make them dynamic.

environment: - POSTGRES_USER=$POSTGRES_USER - POSTGRES_PASSWORD=$POSTGRES_PASSWORD - POSTGRES_DB=$POSTGRES_DB

jgonggrijp · 2026-04-01T12:58:23Z

        volumes:
-            - ./:/usr/src/app
+            - ./backend:/usr/src/app
+            - ./logs/django:/usr/src/app/logs


Maybe provide the host directory through an environment as well so we can respect the $WEBROOT/logs convention.

jgonggrijp · 2026-04-01T13:02:20Z

I do wonder whether the overlap between dev and prod is sufficient to justify profiles. I have a suspicion it would be simpler if we just had separate compose files for dev and prod, with a third compose file for the common services (nginx and postgres) that the other two include.

I was quite torn between these two options but went for profiles in the end since I am most comfortable with them. I will keep them for now but create an issue for a refactor to separate compose files.

XanderVertegaal added 25 commits March 26, 2026 16:19

Split backend container into prod and dev

f1a5a47

Add container names to remaining containers

57fc4bc

Other containers can use this to refer to whatever container variant (dev or prod) turns out to be running.

Add image names to backend services

7c748d1

Not necessary, but less unwieldy than 'langpro-annotator-langpro-backend-prod'.

Add logs to .gitignore

60be35d

Update proxy.conf.docker.json with new values

952c439

This will be replaced in the near future.

Update psycopg2 to use the binary package

e511b3b

Add Nginx configuration

45c1292

Install dependencies during build time

45c2f61

Only copy frontend folder to frontend container

7007365

Update restart policy to unless-stopped on postgres container

00a810d

Only mount backend folder to backend container

360a576

Move backend dev server start command to compose file

e5b0fc3

Add production frontend container

8673a59

Do not create localized build (for now)

c7f38aa

Remove unused expose statements and env variables

2e82d7b

Use .env variables in settings.py

b015206

Use appVersion in build-pre.js. Fail gracefully if git repo is not in…

253986b

…itialized.

Pass git info as args to prebuild script

fabcd30

Add more mime types based on official Docker Angular containerization…

9a42984

… guide

Update CONTRIBUTING.md and provide example file for .env

bc0ec89

Add more ignore files to .dockerignore

99d8546

Add localhost:5000 to CSRF trusted origins

e236164

Expand Angular Dockerfile.prod based on Docker's Angular containeriza…

5203182

…tion guide

Remove unnecessary proxy.conf.json

a2cabe4

Revert to port 4200 for frontend prod container

fc5ccfe

XanderVertegaal commented Mar 27, 2026

View reviewed changes

XanderVertegaal added 2 commits March 27, 2026 15:53

Use correct line-endings in build-pre.js

35bd17c

Match server version of Python (3.11) in backend Dockerfile

9831d5a

XanderVertegaal commented Mar 27, 2026

View reviewed changes

XanderVertegaal added 2 commits March 27, 2026 17:14

Remove DB creation script from postgres service.

5622b83

The container will automatically create a DB with the provided config (host, user, password).

Do not mount source code in backend container; use env vars for datab…

3c8f01c

…ase access

XanderVertegaal commented Mar 27, 2026

View reviewed changes

Comment thread backend/Dockerfile

XanderVertegaal added the enhancement New feature or request label Mar 27, 2026

XanderVertegaal marked this pull request as ready for review March 27, 2026 16:33

XanderVertegaal requested a review from jgonggrijp March 27, 2026 16:58

jgonggrijp reviewed Apr 1, 2026

View reviewed changes

XanderVertegaal mentioned this pull request Apr 2, 2026

Inject settings.py in deployment #91

Closed

XanderVertegaal added 16 commits April 2, 2026 10:34

Use pg_isready as a healthcheck for Postgres

8013bad

Move frontend container command back to Dockerfile.dev

02726cd

Suspend spa_url from urls.py

189acc7

Use correct user and database in healthcheck

1fbd233

Update langpro-container

3a37f4d

Update langpro-container

39e5516

Update LangPro container

edf371f

Reenable concomitant parse tables

6e65a73

Back to feature/langpro-submodule in langpro-container

145b074

Add CSRF token route

044eee3

Use LANGPRO_CONTAINER env var in common_settings.py

b1f03d6

Increase log level for backend-prod

ec0e9e0

Add z to volumes

6f51348

Fix log file path

11c3925

Hopefully fix postgres healthcheck

8afaf24

Fix log directory

0538d9c

XanderVertegaal mentioned this pull request Apr 17, 2026

Reinstate create_db script for postgres container #95

Closed

XanderVertegaal merged commit c6a54de into develop Apr 20, 2026
1 check passed

XanderVertegaal deleted the feature/production-containers branch April 20, 2026 18:10

Conversation

XanderVertegaal commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

XanderVertegaal Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jgonggrijp left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

XanderVertegaal commented Mar 27, 2026 •

edited

Loading

XanderVertegaal Mar 27, 2026 •

edited

Loading