Skip to content

Harden DRF API Logger production logging and compliance#117

Draft
vishalanandl177 wants to merge 8 commits into
mainfrom
feature/opentelemetry
Draft

Harden DRF API Logger production logging and compliance#117
vishalanandl177 wants to merge 8 commits into
mainfrom
feature/opentelemetry

Conversation

@vishalanandl177
Copy link
Copy Markdown
Owner

@vishalanandl177 vishalanandl177 commented Apr 26, 2026

Summary

Hardens DRF API Logger for production and compliance-sensitive deployments after load testing against the demo project.

Production safety fixes

  • Masks sensitive body, query, and header keys case-insensitively, including Authorization, Cookie, Set-Cookie, proxy auth, API keys, secrets, and custom exclude keys.
  • Treats DRF_LOGGER_QUEUE_MAX_SIZE as a background batch threshold instead of a bounded request-thread queue.
  • Removes request-thread database flushes; reaching the threshold now wakes the background worker.
  • Adds shutdown flushing plus queue writer status for backlog, dropped logs, inserted logs, and failed inserts.
  • Adds bounded default payload capture: request bodies default to 32 KB, responses to 64 KB, with truncation markers instead of silent empty values.
  • Parses content types with parameters such as application/json; charset=utf-8 and supports configured non-JSON text content types.
  • Adds DRF_API_LOGGER_PROFILING_SAMPLE_RATE to avoid profiling every request in production.
  • Adds process-local Prometheus/JSON metrics identity plus queue health metrics.
  • Makes signal listener failures non-fatal and fixes OTel no-dependency testability.

Compliance and docs

  • Added docs/compliance.rst with compliance-readiness guidance, retention example, and operator responsibilities.
  • Verified README/docs setting keys match uppercase source settings exactly.
  • Fixed Apache-2.0 packaging metadata/header consistency.
  • Updated stale docs version and removed misleading zero-impact wording.
  • Updated production/high-traffic/basic examples with safer defaults and profiling sampling.

Validation

  • J:\projects\drf-demo\venv\Scripts\python.exe -m django test tests --settings=tests.test_settings --verbosity=1 -> 198 tests OK
  • README/docs setting-key comparison -> no mismatches
  • git diff --check -> no whitespace errors
  • Demo smoke check with local source confirmed Authorization and Cookie are masked as ***FILTERED***

Notes

Generated local load-test artifacts (build/, dist/, profiler output, and the local load-test script folder) were intentionally left out of this PR.

claude added 7 commits April 26, 2026 10:39
Opt-in OTel support that emits spans with HTTP attributes and profiling
data to any OTel-compatible backend (Jaeger, Datadog, Grafana Tempo).

- New setting: DRF_API_LOGGER_ENABLE_OTEL (default False)
- New module: drf_api_logger/otel.py with start_span/finish_span
- Enriches existing OTel spans (from opentelemetry-instrumentation-django)
  or creates own spans when none exist
- Span attributes: http.method, http.url, http.status_code, http.client_ip,
  drf.execution_time_ms, drf.profiling.*, db.query_count, db.total_time_ms
- Sets span status: OK for 2xx/3xx, ERROR for 4xx/5xx
- Optional dependency: pip install drf-api-logger[otel]
- 20 new tests, all passing

https://claude.ai/code/session_01Xm34otsnCyKstNXxvzgABT
- README: replace comparison with aggressive decision-killer table
  covering 15 capabilities (custom logging vs drf-api-logger)
- examples/basic.py: 2-minute setup
- examples/production.py: optimized for real deployments
- examples/high_traffic.py: tuned for 10K+ req/min with dedicated DB
- docs/performance_tuning.rst: queue tuning, indexes, payload limits
- docs/scaling.rst: multi-process, horizontal scaling, storage estimation
- docs/security.rst: masking, compliance, OTel data safety
- docs/index.rst: toctree linking new guide pages

https://claude.ai/code/session_01Xm34otsnCyKstNXxvzgABT
Capture, group, and surface API errors by endpoint and error type
with frequency counts. Competes with lightweight APM tools.

- New model field: error_type (CharField, nullable, indexed)
- Middleware: captures exception class name for unhandled errors,
  extracts error detail from DRF response body for handled errors,
  falls back to status code mapping (NotFound, Unauthorized, etc.)
- Admin changelist: Error Analytics panel showing:
  - Top erroring endpoints grouped by API + status code + error type
  - Error frequency counts
  - Errors by type aggregation
  - Error rate percentage
  - Color-coded status badges (red for 5xx, yellow for 4xx)
- Admin detail: error_type visible in readonly fields
- 24 new tests covering extraction, capture, model, and grouping

https://claude.ai/code/session_01Xm34otsnCyKstNXxvzgABT
Error Highlighting & Grouping:
- New model field: error_type (indexed) for fast grouping
- Middleware captures exception class names for unhandled errors,
  extracts DRF error detail from response body for handled errors
- Admin Error Analytics panel: top erroring endpoints grouped by
  API + status code + error type, frequency counts, error rate
- Color-coded status badges (red 5xx, yellow 4xx)
- 24 new tests

Prometheus-Style Metrics:
- In-memory metrics collector (thread-safe, zero DB overhead)
- Tracks: request_total, error_total, error_rate, latency (avg/max),
  per-method counts, per-endpoint stats, per-error-type counts
- /drf-api-logger/metrics/ — Prometheus text format endpoint
- /drf-api-logger/metrics/json/ — JSON format endpoint
- DRF_API_LOGGER_ENABLE_METRICS setting (default False)
- 15 new tests

Total: 172 tests

https://claude.ai/code/session_01Xm34otsnCyKstNXxvzgABT
Renamed all metric names from drf_api_* to drf_api_logger_* to match
the package name and prevent conflicts with custom application metrics.

https://claude.ai/code/session_01Xm34otsnCyKstNXxvzgABT
error_type is useful for signals, OTel spans, and metrics but doesn't
need to be persisted in the DB. Removed the field and migration.

- error_type still captured in middleware and flows through:
  - Signal payload (for custom consumers)
  - OTel span attributes
  - Prometheus metrics (per_error_type)
- error_type excluded from DB payload via d.pop('error_type')
- Admin error analytics now groups by endpoint + status_code
- No migration needed — one fewer DB column

https://claude.ai/code/session_01Xm34otsnCyKstNXxvzgABT
@vishalanandl177 vishalanandl177 marked this pull request as draft May 2, 2026 18:05
@vishalanandl177 vishalanandl177 changed the title Add OTel, error grouping, Prometheus metrics — full observability stack Harden DRF API Logger production logging and compliance May 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants