Add proxy cache refresh support with safe key handling and refresh accounting by yaoge123 · Pull Request #60 · nginx-modules/ngx_cache_purge

yaoge123 · 2026-04-24T10:47:19Z

Summary

add a full proxy cache refresh flow on top of upstream purge support, including exact refresh, partial refresh, and refresh_all
split dedicated refresh configuration from purge configuration, add runtime method routing, and document the supported deployment layouts
harden refresh for large batches, concurrent requests, client aborts, transport failures, cache-key safety, and add detailed accounting / observability

Recommended Review Order

Refresh core and batch invalidation foundation
Execution model and directive split
Observability and accounting
Cache-key safety and request-target inference
Hardening and bug fixes
Repository hygiene, tests, and docs

Main Behavior

This stack adds refresh as conditional cache validation for existing cached objects. For each cached entry in scope, the module probes upstream and keeps the cache entry when upstream reports it is unchanged, but purges the cache entry when upstream reports it changed or disappeared.

Current refresh status policy is:

304: keep the cached entry
200: purge the cached entry because upstream content changed
404 / 410: purge the cached entry because the upstream object is gone
other HTTP statuses: keep the cached entry
transport / internal failures: keep the cached entry and count a real error

Configuration Model

This series ends with two supported operator-facing models:

dedicated refresh endpoints via proxy_cache_refresh
migration-friendly single-entry routing where exact / partial requests can switch between purge and refresh by HTTP method

refresh_all remains explicitly configured through proxy_cache_refresh and is not enabled on a plain purge location by accident.

Safety Rules

Refresh only runs for proxy cache paths and only when the evaluated cache key keeps the request-target tail unambiguous. The series includes:

strict cache-key tail validation
rejection of ambiguous partial refresh requests with query args
support for safe prefixed cache keys when the request-target tail can still be reconstructed safely
runtime rejection of refresh on purge entrypoints that do not actually define proxy_cache

Hardening Included In This Stack

The series also fixes and hardens a number of failure cases discovered during development and regression work:

large-batch refresh scanning and invalidation stability
concurrent refresh race handling
stale cache struct and merge-loc-conf crash fixes
client-abort safety for refresh requests
correct transport failure accounting
temp pool / fd buildup cleanup during large invalidations
relaxed scanned-node replacement checks to avoid false positives in refresh flows

Observability

Refresh responses and logs now expose richer accounting:

X-Cache-Action response header for purge vs refresh actions
top-level kept / purged / errors
dynamic status_counts
dynamic status_bytes
total_bytes / kept_bytes / purged_bytes
refresh summary logs at error_log ... notice

Verification

Validation completed on the fork before proposing upstream review, including:

Docker-based Test::Nginx suite via make test
broader regression coverage via repo-level scripts/run_regression.sh
compatibility runs on:
- 1.20.2
- 1.22.1
- 1.24.0
- 1.26.3
- 1.28.3
- 1.29.8

- Compare invalidation metadata directly against in-memory cache struct instead of re-reading the cache file, eliminating a TOCTOU race window - Add inode (uniq) and body_start cross-checks for race detection - Defer temp pool destruction via enqueue/drain pattern to prevent use-after-free when subrequests reference pool-allocated data - Register pool cleanup handler to guarantee resource release on request termination (timeout, client disconnect) - Retire chunk pools instead of immediate destruction so in-flight subrequests retain valid file references across chunk boundaries - Add active count underflow guard in refresh_done normal path - Add args.len underflow guard for malformed cache keys - Use case-sensitive ngx_strncmp for cache key matching (keys are exact byte sequences, not case-insensitive identifiers) - Add bounds checks for config parsing (from keyword position) - Fix memory leak: free key_buf on short read in collect_path - Track dispatched subrequest count separately from queue cursor for accurate timeout skip accounting

Root cause: ngx_http_file_cache_open registers a pool cleanup (ngx_http_file_cache_cleanup) that holds a pointer to the ngx_http_cache_t struct. When this struct was a stack variable in invalidate_item, the pointer became dangling after the function returned. Later, drain_temp_pools destroyed the pool, triggering the cleanup which accessed the stale stack address -> SIGSEGV. Fix: - Allocate ngx_http_cache_t on the pool instead of the stack so it survives until pool destruction - Call ngx_http_file_cache_free after invalidate_opened_cache to properly release the node reference (sets c->updated=1, making the pool cleanup a no-op) - Remove count-- from invalidate_opened_cache to avoid premature node free when concurrent refreshes race on the same cache entry Verified: 3-5 concurrent refresh requests to overlapping cache zones no longer crash. Full regression suite and 100k perf pass.

- Add finalized guard in refresh_done to prevent use-after-free on timeout - Fix item_matches_cache to allow invalidate when all metadata fields are zero - Set pd->handled in fire_subrequest error paths to prevent double error counting - Set file.name in read_item and collect_path for proper error logging - Add ngx_memzero for batch_ctx in purge_all and purge_partial - Add detailed comment on open_temp_cache shallow copy risks - Add comprehensive refresh documentation to README.md

Normal (WAITED) subrequests accumulate in r->postponed and only finalize when they become the active subrequest via the postpone filter. Since our write_event_handler never outputs data, the postpone filter never runs, causing r->main->count to grow unboundedly and overflow at 64535 for file sets larger than ~64k entries. Switch to NGX_HTTP_SUBREQUEST_BACKGROUND which bypasses the postpone list entirely. Each background subrequest finalizes independently via ngx_http_close_request, properly decrementing main->count. Since background subrequests do not post the parent request upon finalization, manually call ngx_http_post_request() in refresh_done to re-trigger the dispatch loop on the next event cycle. Remove the previous count-limit throttle which degraded throughput from ~1900 fps to ~330 fps. Verified: 100k files refresh completes in 8.5s (~11700 fps) with total=100000 kept=50000 purged=50000 errors=0. Full regression suite (10/10) passes.

nginx's cache_convert_head converts HEAD to GET by setting u->method in ngx_http_upstream_cache() before the cache_bypass predicate check. Since proxy_create_request() prioritizes u->method over r->method_name, our HEAD setting was being overridden to GET. Fix by clearing u->method in the $cache_purge_refresh_bypass variable handler, which is evaluated by ngx_http_test_predicates() during the cache_bypass check — after u->method is set but before create_request() reads it. This makes proxy_create_request() fall through to r->method_name (HEAD). Only affects refresh subrequests (checked via parent ctx and r->method == NGX_HTTP_HEAD). Normal requests are unaffected. Verified: upstream access log confirms HEAD requests with 0 body bytes. 304 responses have no body; 200 responses also transfer 0 bytes. Full regression suite (10/10) passes.

… HEAD double-header

Copilot

Pull request overview

This PR updates the module and documentation to support a new refresh capability (conditional cache validation) alongside existing purge behavior, and refreshes the nginx compatibility/testing matrix.

Changes:

Add proxy_cache_refresh support with refresh-time controls (cache_purge_refresh_concurrency, cache_purge_refresh_timeout) and method-driven runtime routing between purge/refresh.
Rework cache invalidation helpers to safely invalidate scanned entries and add refresh subrequest orchestration + summary responses.
Update docs/testing scaffolding: new refresh documentation (EN + zh-CN), expanded nginx version matrix, and add ignore rules for build/test artifacts.

Reviewed changes

Copilot reviewed 4 out of 6 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
`ngx_cache_purge_module.c`	Implements refresh feature, new directives, runtime dispatch, and refactors invalidation logic.
`t/Makefile`	Updates default nginx version and expands the test compatibility version list.
`README.md`	Documents refresh directives/rules and updates compatibility/testing guidance.
`README.zh-CN.md`	Adds detailed deployment documentation for purge/refresh behavior in Chinese.
`.gitignore`	Ignores `.worktrees/`.
`.dockerignore`	Excludes test/cache directories from Docker build context.

Copilot · 2026-04-24T10:51:52Z

+    tree.log               = ngx_cycle->log;
+
+    ngx_walk_tree(&tree, &cache->path->name);
+    return ctx.files_deleted;


In ngx_http_cache_purge_partial(), ngx_http_cache_purge_drain_temp_pools(&ctx.temp_pools) is unreachable because the function returns immediately after ngx_walk_tree(). This leaves dead code and (if temp_pools is ever populated) would skip cleanup. Move the drain before the return and keep a single return statement.

Suggested change

return ctx.files_deleted;

Copilot · 2026-04-24T10:51:52Z

+     * Note: nginx's cache_convert_head (on by default) converts HEAD to GET
+     * in ngx_http_upstream_cache() before the cache_bypass check, so the
+     * upstream actually receives a GET request.  However, header_only = 1
+     * ensures nginx finalizes the subrequest after receiving response headers
+     * without reading the body (ngx_http_upstream.c:3280 fast-path when
+     * header_only && !cacheable && !store).  The real bandwidth savings come
+     * from conditional headers (If-None-Match / If-Modified-Since) which
+     * elicit 304 responses with no body.  Forcing true HEAD would require
+     * hooking into the upstream init phase or setting proxy_cache_convert_head
+     * off globally, both unacceptably invasive.


The comment above the subrequest method setup says the upstream "actually receives a GET" due to cache_convert_head, but the $cache_purge_refresh_bypass variable handler explicitly clears r->upstream->method to restore HEAD before proxy_create_request() runs. These two comments contradict; update this block to reflect the current behavior (and/or reference the bypass variable hack) so future readers don’t get misled.

Suggested change

* Note: nginx's cache_convert_head (on by default) converts HEAD to GET

* in ngx_http_upstream_cache() before the cache_bypass check, so the

* upstream actually receives a GET request. However, header_only = 1

* ensures nginx finalizes the subrequest after receiving response headers

* without reading the body (ngx_http_upstream.c:3280 fast-path when

* header_only && !cacheable && !store). The real bandwidth savings come

* from conditional headers (If-None-Match / If-Modified-Since) which

* elicit 304 responses with no body. Forcing true HEAD would require

* hooking into the upstream init phase or setting proxy_cache_convert_head

* off globally, both unacceptably invasive.

* nginx's cache_convert_head (on by default) rewrites HEAD to GET in

* ngx_http_upstream_cache() before the cache_bypass check. For refresh

* subrequests, this module compensates via the $cache_purge_refresh_bypass

* variable handler, which clears r->upstream->method before

* proxy_create_request() runs so the upstream request is restored to

* HEAD. header_only = 1 still matters because nginx finalizes the

* subrequest after receiving response headers without reading the body

* (ngx_http_upstream.c fast-path when header_only && !cacheable &&

* !store). The real bandwidth savings come from conditional headers

* (If-None-Match / If-Modified-Since) which elicit 304 responses with

* no body. Using the bypass-variable hook avoids more invasive changes

* such as upstream-phase hooks or disabling proxy_cache_convert_head

* globally.

Copilot · 2026-04-24T10:51:53Z

+
+```
+Syntax:  cache_purge_refresh_concurrency <number>
+Default: 1


The README states cache_purge_refresh_concurrency has a default of 1, but the module code merges refresh_concurrency with a default of 32 (ngx_cache_purge_module.c:4315-4316). Please align documentation and implementation (either update the documented default or change the code default).

Suggested change

Default: 1

Default: 32

Copilot · 2026-04-24T10:51:53Z

+static ngx_int_t
+ngx_http_cache_purge_refresh(ngx_http_request_t *r,
+    ngx_http_file_cache_t *cache)
+{
+    ngx_http_cache_purge_refresh_ctx_t  *ctx;
+    ngx_http_cache_purge_loc_conf_t     *cplcf;
+    ngx_pool_cleanup_t                  *cln;
+    ngx_str_t                           *keys;
+    ngx_str_t                            targets[4];
+    ngx_str_t                            key;
+    ngx_uint_t                           ntargets;
+
+    ngx_log_debug(NGX_LOG_DEBUG_HTTP, r->connection->log, 0,
+                  "cache purge refresh in %s", cache->path->name.data);
+
+    /* Allocate refresh context */
+    ctx = ngx_pcalloc(r->pool, sizeof(ngx_http_cache_purge_refresh_ctx_t));
+    if (ctx == NULL) {
+        return NGX_HTTP_INTERNAL_SERVER_ERROR;
    }
-# endif

-# if (NGX_HTTP_UWSGI)
-    ngx_http_cache_purge_merge_conf(&conf->uwsgi, &prev->uwsgi);
+    ngx_queue_init(&ctx->temp_pools);
+    ngx_queue_init(&ctx->retired_chunk_pools);



This PR introduces substantial new refresh behavior (proxy_cache_refresh directive, conditional subrequests, cache scanning/invalidation), but there are no Test::Nginx cases covering REFRESH requests yet (no matches for "REFRESH" under t/). Add integration tests for core scenarios (304 keeps, 200/404/410 invalidates, capability mismatch 400, partial vs refresh_all) to prevent regressions across nginx versions.

Fixed in the current replay branch.

Added Test::Nginx coverage in t/basic.t for the core refresh paths this PR introduces:

exact refresh keeps cached content on 304

exact refresh invalidates on 200

exact refresh invalidates on 404

exact refresh invalidates on 410

partial refresh purges only changed matches

refresh_all refreshes changed entries across the cache

runtime capability mismatch returns 400 for refresh_all on a purge-only location

Fresh verification: make test-version VERSION=1.29.8 now passes with the new cases included.

Copilot · 2026-04-24T10:51:53Z

+    "<html><head><title>Successful purge</title></head>"
+    "<body bgcolor=\"white\"><center><h1>Successful purge</h1>"


The HTML response template hard-codes "Successful purge" in the <title>/
, but this handler can also return statuses like "queued" (background purge) and potentially other non-success states. This makes the response body misleading; consider using a neutral heading (e.g., "Cache purge") or deriving the heading from the actual status string.

Suggested change

"<html><head><title>Successful purge</title></head>"

"<body bgcolor=\"white\"><center><h1>Successful purge</h1>"

"<html><head><title>Cache purge</title></head>"

"<body bgcolor=\"white\"><center><h1>Cache purge</h1>"

Fixed.

The shared HTML template now uses the neutral heading Cache purge instead of Successful purge, so queued/background responses no longer claim success in the title.

Added TEST 4a in t/background_queue.t to cover the queued HTML response explicitly, and verified it with a fresh make test-version VERSION=1.29.8 run.

The cache_purge_throttle_ms directive uses ngx_conf_set_msec_slot, which treats bare integers as seconds per the standard nginx time-value contract. This caused a value of "10" to be interpreted as 10 seconds instead of the intended 10 milliseconds. This patch updates the documentation, internal comments, and test suites to explicitly use the "ms" suffix (e.g., "10ms"). It also clarifies that while the directive requires a suffix for millisecond precision, the internal default constant is still handled as raw milliseconds. Resolved nginx-modules#61.

The cache_purge_response_type directive had a flawed duplicate detection guard that only checked for NGX_HTTP_LOC_CONF. This silently allowed duplicate directives in http and server contexts to overwrite previous values without throwing a configuration error. Additionally, a defunct guard attempting to check for NGX_HTTP_MODULE was comparing context flags against module-type magic bits, rendering it dead code. Remove the invalid NGX_HTTP_MODULE check and make duplicate detection unconditional. The directive is intentionally valid in http, server, and location contexts, and merge_loc_conf propagates the value downward appropriately. The README context documentation is updated to reflect this correctly. Furthermore, rename the internal configuration variable from resptype to response_type for clarity. Fix a typo in the response type macro definitions (NGX_REPONSE_TYPE_* to NGX_CACHE_PURGE_RESPONSE_TYPE_*) and properly namespace them. Clarify the documentation and internal comments for the cache_purge_throttle_ms directive. Bare integers are evaluated as milliseconds natively to match the directive's name, which differs from the standard nginx time parser, logging a startup warning. Add comprehensive test coverage in config.t to ensure proper context inheritance, duplicate config rejection, and explicit time suffix parsing.

denji · 2026-04-26T04:54:48Z

@yaoge123 Thanks for the massive work here! I'm currently reviewing it. One architectural thought: rather than adding 2000+ lines to ngx_cache_purge_module.c, what do you think about splitting the new refresh mechanics into its own file (e.g., ngx_http_cache_purge_refresh.c or ngx_http_cache_refresh.c) with a shared .h header?

This would isolate the complex subrequest and validation logic from the legacy purge code, reducing the risk of regressions. We'll still compile it down to a single ngx_http_cache_purge_module.so. Let me know if you're open to this refactoring before the final review.

yaoge123 · 2026-04-26T05:46:28Z

I checked the current refresh implementation structure carefully and agree that a split will make the review and long-term maintenance safer.

I’m going to take the narrowest refactor first: move the refresh execution engine (scan/collect/subrequest/timeout/finalize flow) into a dedicated source file, while keeping the existing module commands, config parsing, access routing, and shared purge/invalidate glue in the main module file.

That should reduce the size and complexity of ngx_cache_purge_module.c without turning this into a broad architectural rewrite in the same PR. After the split I’ll rerun the existing Test::Nginx coverage before updating the PR again.

yaoge123 · 2026-04-26T10:05:36Z

Implemented the narrow split.The refresh execution engine now lives in ngx_http_cache_refresh.c, while module commands, config parsing, access routing, and the shared purge/invalidate glue remain in ngx_cache_purge_module.c.Fresh verification after the split:- make test-version VERSION=1.29.8- make test-compatThe full compatibility matrix passed after the refactor (1.20.2, 1.22.1, 1.24.0, 1.26.3, 1.28.3, 1.30.0, 1.29.8).

yaoge123 added 30 commits April 24, 2026 18:01

Route refresh invalidation through unified purge core

65ad945

Remove refresh wildcard chunk-size dependency

e944b85

Stabilize refresh scanning across production batches

0812c11

Document HEAD method restoration and background subrequest scalability

6200b4f

Increase default refresh_timeout from 30s to 300s

8a5911e

Split refresh into a dedicated proxy_cache_refresh directive

b92afdd

Clarify refresh configuration conflicts and docs

ee7999d

Route purge and refresh by request method

ff58e5c

Handle missing upstream objects during refresh

d1b6f24

Document refresh upstream status policy and summary logging

3d8a477

Sync Chinese README with refresh status policy

9000991

Raise refresh summary log level and clarify cache-key docs

3318c6b

Track refresh status codes separately from errors

dc2023e

Fix active count leak, status-0 accounting, timeout double-count, and…

88e3204

… HEAD double-header

Allocate temp request on pool and extract common invalidate helper

2672d42

Use nginx time syntax for refresh timeout

1f4a840

Add byte totals to refresh accounting

debf498

Allow safe prefixed cache keys in refresh

afe26cb

Use target URIs for safe prefixed refresh keys

3cfea73

tighten refresh cache key tail checks

e72b164

reject partial refresh queries and fix key docs

4f5b87b

fix refresh fd buildup during large invalidations

e721928

Guard refresh capture tails against invalid request captures

2e75187

Relax scanned cache node replacement checks

00b48bc

Fix refresh transport failure accounting

81eaaa6

yaoge123 added 6 commits April 24, 2026 18:01

Ignore local worktree directories

af25548

Fix replay merge_loc_conf build breakage

bc21f48

Fix runtime refresh purge entry regressions

ec2e133

Ignore local test cache artifacts in Docker builds

f2f8cc7

docs: align README files with current release line

56e7be6

test: update compatibility matrix in t/Makefile

3ffb561

Copilot AI review requested due to automatic review settings April 24, 2026 10:47

Copilot started reviewing on behalf of yaoge123 April 24, 2026 10:47 View session

yaoge123 changed the title ~~tmp pr probe~~ Add proxy cache refresh support with safe key handling and refresh accounting Apr 24, 2026

Copilot AI reviewed Apr 24, 2026

View reviewed changes

denji force-pushed the replay branch from 3ffb561 to 015c783 Compare April 25, 2026 00:35

test: cover refresh review feedback

f20893a

yaoge123 force-pushed the replay branch from 015c783 to f20893a Compare April 25, 2026 09:38

denji and others added 3 commits April 25, 2026 21:37

Merge upstream master into replay

1a64d41

refactor: split refresh execution into dedicated file

2b27ca3

yaoge123 added 2 commits May 1, 2026 10:57

ci: suppress non-actionable cppcheck info

c2b41de

test: add partial refresh coverage for query-aware cache keys

177fe0a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add proxy cache refresh support with safe key handling and refresh accounting#60

Add proxy cache refresh support with safe key handling and refresh accounting#60
yaoge123 wants to merge 43 commits into
nginx-modules:masterfrom
yaoge123:replay

yaoge123 commented Apr 24, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 24, 2026

Uh oh!

Copilot AI Apr 24, 2026

Uh oh!

Copilot AI Apr 24, 2026

Uh oh!

Copilot AI Apr 24, 2026

Uh oh!

yaoge123 Apr 25, 2026

Uh oh!

Copilot AI Apr 24, 2026

Uh oh!

yaoge123 Apr 25, 2026

Uh oh!

denji commented Apr 26, 2026 •

edited

Loading

Uh oh!

yaoge123 commented Apr 26, 2026

Uh oh!

yaoge123 commented Apr 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

-     * Note: nginx's cache_convert_head (on by default) converts HEAD to GET
-     * in ngx_http_upstream_cache() before the cache_bypass check, so the
-     * upstream actually receives a GET request.  However, header_only = 1
-     * ensures nginx finalizes the subrequest after receiving response headers
-     * without reading the body (ngx_http_upstream.c:3280 fast-path when
-     * header_only && !cacheable && !store).  The real bandwidth savings come
-     * from conditional headers (If-None-Match / If-Modified-Since) which
-     * elicit 304 responses with no body.  Forcing true HEAD would require
-     * hooking into the upstream init phase or setting proxy_cache_convert_head
-     * off globally, both unacceptably invasive.
+     * nginx's cache_convert_head (on by default) rewrites HEAD to GET in
+     * ngx_http_upstream_cache() before the cache_bypass check.  For refresh
+     * subrequests, this module compensates via the $cache_purge_refresh_bypass
+     * variable handler, which clears r->upstream->method before
+     * proxy_create_request() runs so the upstream request is restored to
+     * HEAD.  header_only = 1 still matters because nginx finalizes the
+     * subrequest after receiving response headers without reading the body
+     * (ngx_http_upstream.c fast-path when header_only && !cacheable &&
+     * !store).  The real bandwidth savings come from conditional headers
+     * (If-None-Match / If-Modified-Since) which elicit 304 responses with
+     * no body.  Using the bypass-variable hook avoids more invasive changes
+     * such as upstream-phase hooks or disabling proxy_cache_convert_head
+     * globally.

		"<html><head><title>Successful purge</title></head>"
		"<body bgcolor=\"white\"><center><h1>Successful purge</h1>"

Conversation

yaoge123 commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Recommended Review Order

Main Behavior

Configuration Model

Safety Rules

Hardening Included In This Stack

Observability

Verification

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

yaoge123 Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

yaoge123 Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

denji commented Apr 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yaoge123 commented Apr 26, 2026

Uh oh!

yaoge123 commented Apr 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

yaoge123 commented Apr 24, 2026 •

edited

Loading

denji commented Apr 26, 2026 •

edited

Loading