Skip to content

Commit 42f7194

Browse files
committed
Add CI pipeline, access logging, and resolve remaining LOW issues (v0.6.1)
- GitHub Actions CI: lint (ruff), import check, security (pip-audit + bandit), Docker build - Access log middleware with RotatingFileHandler (configurable size/rotation) - Disk-full errors now logged at ERROR level - Startup warns on invalid nuts_version or db_cache_ttl_days < 1 - Friendly SystemExit on malformed settings.json / postal_patterns.json - Estimate revalidation warns on NUTS3 inconsistencies after extra source overwrite - Fix CVEs in pinned certifi and idna
1 parent 09bb592 commit 42f7194

10 files changed

Lines changed: 180 additions & 26 deletions

File tree

.github/workflows/ci.yml

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
name: CI
2+
3+
on:
4+
push:
5+
branches: [main]
6+
pull_request:
7+
branches: [main]
8+
9+
jobs:
10+
lint:
11+
runs-on: ubuntu-latest
12+
steps:
13+
- uses: actions/checkout@v4
14+
- uses: actions/setup-python@v5
15+
with:
16+
python-version: "3.12"
17+
- run: pip install ruff
18+
- run: ruff check app/ scripts/
19+
20+
import-check:
21+
runs-on: ubuntu-latest
22+
steps:
23+
- uses: actions/checkout@v4
24+
- uses: actions/setup-python@v5
25+
with:
26+
python-version: "3.12"
27+
- run: pip install -r requirements.txt
28+
- run: python -c "from app.main import app; print('imports OK')"
29+
30+
security:
31+
runs-on: ubuntu-latest
32+
steps:
33+
- uses: actions/checkout@v4
34+
- uses: actions/setup-python@v5
35+
with:
36+
python-version: "3.12"
37+
- run: pip install -r requirements.txt
38+
- name: Dependency audit
39+
run: pip install pip-audit && pip-audit -r requirements.lock
40+
- name: Static security analysis
41+
run: pip install bandit && bandit -r app/ -c pyproject.toml
42+
43+
docker:
44+
runs-on: ubuntu-latest
45+
steps:
46+
- uses: actions/checkout@v4
47+
- name: Build image
48+
run: docker build -t postalcode2nuts .
49+
- name: Verify non-root user
50+
run: |
51+
user=$(docker run --rm postalcode2nuts whoami)
52+
echo "Container user: $user"
53+
[ "$user" = "appuser" ] || exit 1

README.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -271,6 +271,9 @@ All settings are overridable via environment variables prefixed with `PC2NUTS_`:
271271
| `PC2NUTS_STARTUP_TIMEOUT` | `300` | Maximum seconds allowed for initial data loading. If exceeded, the service starts with whatever data was loaded and sets `data_stale: true`. |
272272
| `PC2NUTS_DOCS_ENABLED` | `true` | Set to `false` to disable Swagger UI (`/docs`) and ReDoc (`/redoc`) in production. |
273273
| `PC2NUTS_CORS_ORIGINS` | `*` | Comma-separated list of allowed CORS origins. Set to a specific origin (e.g. `https://example.com`) to restrict cross-origin access. Empty string disables CORS middleware. |
274+
| `PC2NUTS_ACCESS_LOG_FILE` | *(empty — stdout)* | Path to access log file. When set, logs are written to this file with automatic rotation. When empty, access logs go to stderr. |
275+
| `PC2NUTS_ACCESS_LOG_MAX_MB` | `10` | Maximum size of each access log file in MB before rotation. |
276+
| `PC2NUTS_ACCESS_LOG_BACKUP_COUNT` | `5` | Number of rotated access log files to keep (e.g. 5 x 10 MB = 50 MB max disk usage). |
274277

275278
## Three-tier lookup
276279

@@ -408,6 +411,7 @@ Changing the `PC2NUTS_EXTRA_SOURCES` list invalidates the SQLite cache automatic
408411
- **HTTPS:** The service serves plain HTTP. Place it behind a TLS-terminating reverse proxy (nginx, cloud load balancer) in production.
409412
- **Docker:** The container runs as a non-root user (`appuser`). The `/app/data` volume must be writable by this user. Pre-computed estimates are included in the image.
410413
- **Rate limiting:** Limits are per-client IP (`X-Forwarded-For` aware). Behind a reverse proxy, ensure the proxy sets this header correctly.
414+
- **Access logging:** Every request is logged with client IP, method, path, status code, and duration. Set `PC2NUTS_ACCESS_LOG_FILE` to write to a rotating file instead of stderr.
411415

412416
## Project structure
413417

app/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
__version__ = "0.6.0"
1+
__version__ = "0.6.1"

app/config.py

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,11 @@
44

55
from pydantic_settings import BaseSettings
66

7-
_defaults = json.loads((Path(__file__).parent / "settings.json").read_text())
7+
_settings_path = Path(__file__).parent / "settings.json"
8+
try:
9+
_defaults = json.loads(_settings_path.read_text())
10+
except (json.JSONDecodeError, OSError) as _exc:
11+
raise SystemExit(f"Fatal: failed to load {_settings_path}: {_exc}") from _exc
812

913

1014
class Settings(BaseSettings):
@@ -17,6 +21,9 @@ class Settings(BaseSettings):
1721
startup_timeout: int = 300
1822
docs_enabled: bool = True
1923
cors_origins: str = "*"
24+
access_log_file: str = ""
25+
access_log_max_mb: int = 10
26+
access_log_backup_count: int = 5
2027

2128
# Countries with TERCET flat files available
2229
countries: list[str] = _defaults["countries"]

app/data_loader.py

Lines changed: 36 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -334,7 +334,10 @@ def _download_and_parse_zip(
334334
if not zipfile.is_zipfile(io.BytesIO(content)):
335335
logger.warning("Downloaded file from %s is not a valid ZIP, skipping", url)
336336
return 0
337-
cached.write_bytes(content)
337+
try:
338+
cached.write_bytes(content)
339+
except OSError as exc:
340+
logger.error("Failed to cache %s: %s", cached, exc)
338341

339342
total = 0
340343
try:
@@ -471,12 +474,29 @@ def _load_estimates_from_csv(csv_path: Path) -> bool:
471474

472475

473476
def _revalidate_estimates() -> int:
474-
"""Remove estimates that now have exact matches. Returns count removed."""
475-
to_remove = [key for key in _estimates if key in _lookup]
477+
"""Remove estimates that now have exact matches and warn about inconsistencies.
478+
479+
Returns count removed.
480+
"""
481+
to_remove = []
482+
inconsistent = 0
483+
for key, est in _estimates.items():
484+
exact = _lookup.get(key)
485+
if exact is not None:
486+
to_remove.append(key)
487+
# Warn if the estimate pointed to a different NUTS3 than the exact match
488+
if est["nuts3"] != exact:
489+
inconsistent += 1
476490
for key in to_remove:
477491
del _estimates[key]
478492
if to_remove:
479493
logger.info("Removed %d estimates that now have exact TERCET matches", len(to_remove))
494+
if inconsistent:
495+
logger.warning(
496+
"%d removed estimates had NUTS3 codes inconsistent with current exact data "
497+
"(estimates CSV may need updating)",
498+
inconsistent,
499+
)
480500
return len(to_remove)
481501

482502

@@ -637,7 +657,7 @@ def _save_to_db(db: Path) -> None:
637657
len(_lookup), len(_estimates), db.name,
638658
)
639659
except Exception as exc:
640-
logger.warning("Failed to save DB cache: %s", exc)
660+
logger.error("Failed to save DB cache: %s", exc)
641661
tmp.unlink(missing_ok=True)
642662

643663

@@ -646,6 +666,18 @@ def load_data() -> None:
646666
global _data_stale, _data_loaded_at, _extra_source_count
647667

648668
with _data_lock:
669+
if settings.nuts_version == "unknown":
670+
logger.warning(
671+
"Could not derive NUTS version from base URL '%s'. "
672+
"URL guessing and DB caching may not work correctly.",
673+
settings.tercet_base_url,
674+
)
675+
if settings.db_cache_ttl_days < 1:
676+
logger.warning(
677+
"PC2NUTS_DB_CACHE_TTL_DAYS=%d is less than 1, cache will always be considered expired.",
678+
settings.db_cache_ttl_days,
679+
)
680+
649681
_lookup.clear()
650682
_estimates.clear()
651683
_data_stale = False

app/main.py

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,10 +5,13 @@
55
"""
66

77
import logging
8+
import time
89
from contextlib import asynccontextmanager
10+
from logging.handlers import RotatingFileHandler
911

1012
from fastapi import FastAPI, HTTPException, Query, Request
1113
from fastapi.middleware.cors import CORSMiddleware
14+
from starlette.middleware.base import BaseHTTPMiddleware
1215
from slowapi import Limiter
1316
from slowapi.errors import RateLimitExceeded
1417
from slowapi.util import get_remote_address
@@ -37,6 +40,39 @@
3740

3841
limiter = Limiter(key_func=get_remote_address)
3942

43+
# Access logger — separate from app logger
44+
access_logger = logging.getLogger("app.access")
45+
access_logger.setLevel(logging.INFO)
46+
access_logger.propagate = False
47+
if settings.access_log_file:
48+
_handler = RotatingFileHandler(
49+
settings.access_log_file,
50+
maxBytes=settings.access_log_max_mb * 1024 * 1024,
51+
backupCount=settings.access_log_backup_count,
52+
)
53+
_handler.setFormatter(logging.Formatter("%(asctime)s %(message)s"))
54+
access_logger.addHandler(_handler)
55+
else:
56+
_handler = logging.StreamHandler()
57+
_handler.setFormatter(logging.Formatter("%(asctime)s %(message)s"))
58+
access_logger.addHandler(_handler)
59+
60+
61+
class AccessLogMiddleware(BaseHTTPMiddleware):
62+
async def dispatch(self, request: Request, call_next):
63+
start = time.monotonic()
64+
response = await call_next(request)
65+
duration_ms = (time.monotonic() - start) * 1000
66+
access_logger.info(
67+
'%s %s %s %d %.1fms',
68+
request.client.host if request.client else "-",
69+
request.method,
70+
request.url.path,
71+
response.status_code,
72+
duration_ms,
73+
)
74+
return response
75+
4076

4177
@asynccontextmanager
4278
async def lifespan(app: FastAPI):
@@ -92,6 +128,9 @@ def _rate_limit_handler(request: Request, exc: RateLimitExceeded):
92128
)
93129

94130

131+
app.add_middleware(AccessLogMiddleware)
132+
133+
95134
def _available_countries_str() -> str:
96135
"""Sorted comma-separated list of country codes with loaded data."""
97136
return ", ".join(sorted(get_loaded_countries()))

app/postal_patterns.py

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -18,9 +18,11 @@
1818

1919
# Each regex is used verbatim as provided. Patterns may have 0, 1, or 2 capture groups.
2020
# Regexes are applied after .strip().upper() and are case-insensitive.
21-
POSTAL_PATTERNS: dict[str, dict] = json.loads(
22-
(Path(__file__).parent / "postal_patterns.json").read_text()
23-
)
21+
_patterns_path = Path(__file__).parent / "postal_patterns.json"
22+
try:
23+
POSTAL_PATTERNS: dict[str, dict] = json.loads(_patterns_path.read_text())
24+
except (json.JSONDecodeError, OSError) as _exc:
25+
raise SystemExit(f"Fatal: failed to load {_patterns_path}: {_exc}") from _exc
2426

2527
# Pre-compile all patterns for performance
2628
_COMPILED: dict[str, re.Pattern] = {

docs/security_analysis.md

Lines changed: 19 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -78,9 +78,9 @@ An attacker with env var access (CI/CD, shared hosting, orchestration misconfigu
7878

7979
`url.rsplit("/", 1)[-1]` extracts only the last path component. Path traversal via URL crafting is not possible. ZIP contents are read into memory, never extracted to disk. No zip-slip risk.
8080

81-
### Disk full handling — LOW
81+
### ~~Disk full handling~~RESOLVED (v0.6.1)
8282

83-
If the disk is full during `cached.write_bytes()` or SQLite operations, OSError propagates. For ZIPs, this fails silently per-country (other countries still load). For DB save, the `except Exception` handler cleans up the temp file. Adequate but not logged at ERROR level.
83+
**Fixed:** `cached.write_bytes()` is now wrapped in try/except with `logger.error`. DB save exception upgraded from `logger.warning` to `logger.error`.
8484

8585
---
8686

@@ -126,25 +126,25 @@ The app serves plain HTTP. Must be behind a TLS-terminating reverse proxy (nginx
126126

127127
**Fixed:** `requirements.lock` with exact pinned versions. Dockerfile uses `requirements.lock` for reproducible builds. `requirements.txt` retained with ranges for development.
128128

129-
### No CI/CD pipeline visible — LOW
129+
### ~~No CI/CD pipeline visible~~RESOLVED (v0.6.1)
130130

131-
No GitHub Actions, no automated tests, no linting, no security scanning in the repo.
131+
**Fixed:** GitHub Actions CI workflow added with 4 jobs: lint (ruff), import check, security (pip-audit + bandit), and Docker build verification.
132132

133133
---
134134

135135
## 9. Configuration Robustness
136136

137-
### `nuts_version` "unknown" not handled — LOW
137+
### ~~`nuts_version` "unknown" not handled~~RESOLVED (v0.6.1)
138138

139-
If `PC2NUTS_TERCET_BASE_URL` doesn't match `NUTS-\d{4}`, the version is "unknown". URL guessing generates invalid URLs (`NUTS-unknown`), discovery may still work, but the DB file is `postalcode2nuts_NUTS-unknown.db`. No validation or warning at startup.
139+
**Fixed:** Startup now logs a warning when the NUTS version cannot be derived from the base URL.
140140

141-
### No validation on `db_cache_ttl_days`LOW
141+
### ~~No validation on `db_cache_ttl_days`~~RESOLVED (v0.6.1)
142142

143-
A value of 0 or negative causes the cache to always appear expired. No explicit bounds checking.
143+
**Fixed:** Startup now logs a warning when `db_cache_ttl_days` is less than 1.
144144

145-
### JSON file corruption — LOW
145+
### ~~JSON file corruption~~RESOLVED (v0.6.1)
146146

147-
If `settings.json` or `postal_patterns.json` are malformed, the app crashes at import time with a `json.JSONDecodeError`. This is fail-fast (good), but the error message won't be very user-friendly.
147+
**Fixed:** Both JSON files now produce a clear `Fatal: failed to load <path>: <reason>` message and `SystemExit` instead of a raw traceback.
148148

149149
---
150150

@@ -158,9 +158,9 @@ Once loaded, data is static. To refresh, restart the service. This is intentiona
158158

159159
If TERCET is unreachable on first startup and no SQLite cache exists, the app starts with zero data. The health endpoint reports `"no_data"` but the app is "up". All `/lookup` requests return 400. There's no automatic retry after startup.
160160

161-
### Estimates not revalidated after extra sources — LOW
161+
### ~~Estimates not revalidated after extra sources~~RESOLVED (v0.6.1)
162162

163-
`_revalidate_estimates()` removes estimates that overlap with exact matches. But if extra sources overwrite a TERCET entry with a different NUTS3 code, existing estimates for that postal code are not updated to reflect the new mapping.
163+
**Fixed:** `_revalidate_estimates()` now detects and warns when removed estimates had NUTS3 codes inconsistent with current exact data, flagging stale estimates for operator review.
164164

165165
---
166166

@@ -183,7 +183,11 @@ If TERCET is unreachable on first startup and no SQLite cache exists, the app st
183183
| ~~**MEDIUM**~~ | ~~Mutable globals without locking~~ | 6 | **RESOLVED v0.6.0** |
184184
| **LOW** | Version exposed in OpenAPI spec | 4 | Open |
185185
| ~~**LOW**~~ | ~~No Docker HEALTHCHECK~~ | 7 | **RESOLVED v0.6.0** |
186-
| **LOW** | `nuts_version` "unknown" not validated | 9 | Open |
187-
| **LOW** | No CI/CD or security scanning | 8 | Open |
188-
| **LOW** | No access/audit logging | 4 | Open |
186+
| ~~**LOW**~~ | ~~`nuts_version` "unknown" not validated~~ | 9 | **RESOLVED v0.6.1** |
187+
| ~~**LOW**~~ | ~~No CI/CD or security scanning~~ | 8 | **RESOLVED v0.6.1** |
188+
| ~~**LOW**~~ | ~~No access/audit logging~~ | 4 | **RESOLVED v0.6.1** |
189189
| **LOW** | Multi-worker redundant downloads | 6 | Open |
190+
| ~~**LOW**~~ | ~~Disk full handling~~ | 5 | **RESOLVED v0.6.1** |
191+
| ~~**LOW**~~ | ~~No validation on `db_cache_ttl_days`~~ | 9 | **RESOLVED v0.6.1** |
192+
| ~~**LOW**~~ | ~~JSON file corruption~~ | 9 | **RESOLVED v0.6.1** |
193+
| ~~**LOW**~~ | ~~Estimates not revalidated after extra sources~~ | 10 | **RESOLVED v0.6.1** |

pyproject.toml

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
[tool.bandit]
2+
# B113: request without timeout — our httpx calls always specify timeout
3+
skips = ["B113"]
4+
5+
[tool.ruff]
6+
target-version = "py312"
7+
line-length = 110
8+
9+
[tool.ruff.lint]
10+
select = ["E", "F", "W"]
11+
12+
[tool.ruff.lint.per-file-ignores]
13+
"scripts/*" = ["E402"]

requirements.lock

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,15 +3,15 @@
33
# pip install -r requirements.txt && pip freeze > requirements.lock
44
annotated-types==0.7.0
55
anyio==4.12.1
6-
certifi==2023.11.17
6+
certifi==2026.1.4
77
click==8.1.6
88
Deprecated==1.3.1
99
fastapi==0.129.0
1010
h11==0.16.0
1111
httpcore==1.0.9
1212
httptools==0.7.1
1313
httpx==0.28.1
14-
idna==3.6
14+
idna==3.11
1515
limits==5.8.0
1616
pydantic==2.12.5
1717
pydantic-settings==2.13.0

0 commit comments

Comments
 (0)