Add proxy cache refresh support with safe key handling and refresh accounting#60
Add proxy cache refresh support with safe key handling and refresh accounting#60yaoge123 wants to merge 43 commits into
Conversation
- Compare invalidation metadata directly against in-memory cache struct instead of re-reading the cache file, eliminating a TOCTOU race window - Add inode (uniq) and body_start cross-checks for race detection - Defer temp pool destruction via enqueue/drain pattern to prevent use-after-free when subrequests reference pool-allocated data - Register pool cleanup handler to guarantee resource release on request termination (timeout, client disconnect) - Retire chunk pools instead of immediate destruction so in-flight subrequests retain valid file references across chunk boundaries - Add active count underflow guard in refresh_done normal path - Add args.len underflow guard for malformed cache keys - Use case-sensitive ngx_strncmp for cache key matching (keys are exact byte sequences, not case-insensitive identifiers) - Add bounds checks for config parsing (from keyword position) - Fix memory leak: free key_buf on short read in collect_path - Track dispatched subrequest count separately from queue cursor for accurate timeout skip accounting
Root cause: ngx_http_file_cache_open registers a pool cleanup (ngx_http_file_cache_cleanup) that holds a pointer to the ngx_http_cache_t struct. When this struct was a stack variable in invalidate_item, the pointer became dangling after the function returned. Later, drain_temp_pools destroyed the pool, triggering the cleanup which accessed the stale stack address -> SIGSEGV. Fix: - Allocate ngx_http_cache_t on the pool instead of the stack so it survives until pool destruction - Call ngx_http_file_cache_free after invalidate_opened_cache to properly release the node reference (sets c->updated=1, making the pool cleanup a no-op) - Remove count-- from invalidate_opened_cache to avoid premature node free when concurrent refreshes race on the same cache entry Verified: 3-5 concurrent refresh requests to overlapping cache zones no longer crash. Full regression suite and 100k perf pass.
- Add finalized guard in refresh_done to prevent use-after-free on timeout - Fix item_matches_cache to allow invalidate when all metadata fields are zero - Set pd->handled in fire_subrequest error paths to prevent double error counting - Set file.name in read_item and collect_path for proper error logging - Add ngx_memzero for batch_ctx in purge_all and purge_partial - Add detailed comment on open_temp_cache shallow copy risks - Add comprehensive refresh documentation to README.md
Normal (WAITED) subrequests accumulate in r->postponed and only finalize when they become the active subrequest via the postpone filter. Since our write_event_handler never outputs data, the postpone filter never runs, causing r->main->count to grow unboundedly and overflow at 64535 for file sets larger than ~64k entries. Switch to NGX_HTTP_SUBREQUEST_BACKGROUND which bypasses the postpone list entirely. Each background subrequest finalizes independently via ngx_http_close_request, properly decrementing main->count. Since background subrequests do not post the parent request upon finalization, manually call ngx_http_post_request() in refresh_done to re-trigger the dispatch loop on the next event cycle. Remove the previous count-limit throttle which degraded throughput from ~1900 fps to ~330 fps. Verified: 100k files refresh completes in 8.5s (~11700 fps) with total=100000 kept=50000 purged=50000 errors=0. Full regression suite (10/10) passes.
nginx's cache_convert_head converts HEAD to GET by setting u->method in ngx_http_upstream_cache() before the cache_bypass predicate check. Since proxy_create_request() prioritizes u->method over r->method_name, our HEAD setting was being overridden to GET. Fix by clearing u->method in the $cache_purge_refresh_bypass variable handler, which is evaluated by ngx_http_test_predicates() during the cache_bypass check — after u->method is set but before create_request() reads it. This makes proxy_create_request() fall through to r->method_name (HEAD). Only affects refresh subrequests (checked via parent ctx and r->method == NGX_HTTP_HEAD). Normal requests are unaffected. Verified: upstream access log confirms HEAD requests with 0 body bytes. 304 responses have no body; 200 responses also transfer 0 bytes. Full regression suite (10/10) passes.
… HEAD double-header
There was a problem hiding this comment.
Pull request overview
This PR updates the module and documentation to support a new refresh capability (conditional cache validation) alongside existing purge behavior, and refreshes the nginx compatibility/testing matrix.
Changes:
- Add
proxy_cache_refreshsupport with refresh-time controls (cache_purge_refresh_concurrency,cache_purge_refresh_timeout) and method-driven runtime routing between purge/refresh. - Rework cache invalidation helpers to safely invalidate scanned entries and add refresh subrequest orchestration + summary responses.
- Update docs/testing scaffolding: new refresh documentation (EN + zh-CN), expanded nginx version matrix, and add ignore rules for build/test artifacts.
Reviewed changes
Copilot reviewed 4 out of 6 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
ngx_cache_purge_module.c |
Implements refresh feature, new directives, runtime dispatch, and refactors invalidation logic. |
t/Makefile |
Updates default nginx version and expands the test compatibility version list. |
README.md |
Documents refresh directives/rules and updates compatibility/testing guidance. |
README.zh-CN.md |
Adds detailed deployment documentation for purge/refresh behavior in Chinese. |
.gitignore |
Ignores .worktrees/. |
.dockerignore |
Excludes test/cache directories from Docker build context. |
| tree.log = ngx_cycle->log; | ||
|
|
||
| ngx_walk_tree(&tree, &cache->path->name); | ||
| return ctx.files_deleted; |
There was a problem hiding this comment.
In ngx_http_cache_purge_partial(), ngx_http_cache_purge_drain_temp_pools(&ctx.temp_pools) is unreachable because the function returns immediately after ngx_walk_tree(). This leaves dead code and (if temp_pools is ever populated) would skip cleanup. Move the drain before the return and keep a single return statement.
| return ctx.files_deleted; |
| * Note: nginx's cache_convert_head (on by default) converts HEAD to GET | ||
| * in ngx_http_upstream_cache() before the cache_bypass check, so the | ||
| * upstream actually receives a GET request. However, header_only = 1 | ||
| * ensures nginx finalizes the subrequest after receiving response headers | ||
| * without reading the body (ngx_http_upstream.c:3280 fast-path when | ||
| * header_only && !cacheable && !store). The real bandwidth savings come | ||
| * from conditional headers (If-None-Match / If-Modified-Since) which | ||
| * elicit 304 responses with no body. Forcing true HEAD would require | ||
| * hooking into the upstream init phase or setting proxy_cache_convert_head | ||
| * off globally, both unacceptably invasive. |
There was a problem hiding this comment.
The comment above the subrequest method setup says the upstream "actually receives a GET" due to cache_convert_head, but the $cache_purge_refresh_bypass variable handler explicitly clears r->upstream->method to restore HEAD before proxy_create_request() runs. These two comments contradict; update this block to reflect the current behavior (and/or reference the bypass variable hack) so future readers don’t get misled.
| * Note: nginx's cache_convert_head (on by default) converts HEAD to GET | |
| * in ngx_http_upstream_cache() before the cache_bypass check, so the | |
| * upstream actually receives a GET request. However, header_only = 1 | |
| * ensures nginx finalizes the subrequest after receiving response headers | |
| * without reading the body (ngx_http_upstream.c:3280 fast-path when | |
| * header_only && !cacheable && !store). The real bandwidth savings come | |
| * from conditional headers (If-None-Match / If-Modified-Since) which | |
| * elicit 304 responses with no body. Forcing true HEAD would require | |
| * hooking into the upstream init phase or setting proxy_cache_convert_head | |
| * off globally, both unacceptably invasive. | |
| * nginx's cache_convert_head (on by default) rewrites HEAD to GET in | |
| * ngx_http_upstream_cache() before the cache_bypass check. For refresh | |
| * subrequests, this module compensates via the $cache_purge_refresh_bypass | |
| * variable handler, which clears r->upstream->method before | |
| * proxy_create_request() runs so the upstream request is restored to | |
| * HEAD. header_only = 1 still matters because nginx finalizes the | |
| * subrequest after receiving response headers without reading the body | |
| * (ngx_http_upstream.c fast-path when header_only && !cacheable && | |
| * !store). The real bandwidth savings come from conditional headers | |
| * (If-None-Match / If-Modified-Since) which elicit 304 responses with | |
| * no body. Using the bypass-variable hook avoids more invasive changes | |
| * such as upstream-phase hooks or disabling proxy_cache_convert_head | |
| * globally. |
|
|
||
| ``` | ||
| Syntax: cache_purge_refresh_concurrency <number> | ||
| Default: 1 |
There was a problem hiding this comment.
The README states cache_purge_refresh_concurrency has a default of 1, but the module code merges refresh_concurrency with a default of 32 (ngx_cache_purge_module.c:4315-4316). Please align documentation and implementation (either update the documented default or change the code default).
| Default: 1 | |
| Default: 32 |
| static ngx_int_t | ||
| ngx_http_cache_purge_refresh(ngx_http_request_t *r, | ||
| ngx_http_file_cache_t *cache) | ||
| { | ||
| ngx_http_cache_purge_refresh_ctx_t *ctx; | ||
| ngx_http_cache_purge_loc_conf_t *cplcf; | ||
| ngx_pool_cleanup_t *cln; | ||
| ngx_str_t *keys; | ||
| ngx_str_t targets[4]; | ||
| ngx_str_t key; | ||
| ngx_uint_t ntargets; | ||
|
|
||
| ngx_log_debug(NGX_LOG_DEBUG_HTTP, r->connection->log, 0, | ||
| "cache purge refresh in %s", cache->path->name.data); | ||
|
|
||
| /* Allocate refresh context */ | ||
| ctx = ngx_pcalloc(r->pool, sizeof(ngx_http_cache_purge_refresh_ctx_t)); | ||
| if (ctx == NULL) { | ||
| return NGX_HTTP_INTERNAL_SERVER_ERROR; | ||
| } | ||
| # endif | ||
|
|
||
| # if (NGX_HTTP_UWSGI) | ||
| ngx_http_cache_purge_merge_conf(&conf->uwsgi, &prev->uwsgi); | ||
| ngx_queue_init(&ctx->temp_pools); | ||
| ngx_queue_init(&ctx->retired_chunk_pools); | ||
|
|
There was a problem hiding this comment.
This PR introduces substantial new refresh behavior (proxy_cache_refresh directive, conditional subrequests, cache scanning/invalidation), but there are no Test::Nginx cases covering REFRESH requests yet (no matches for "REFRESH" under t/). Add integration tests for core scenarios (304 keeps, 200/404/410 invalidates, capability mismatch 400, partial vs refresh_all) to prevent regressions across nginx versions.
There was a problem hiding this comment.
Fixed in the current replay branch.
Added Test::Nginx coverage in t/basic.t for the core refresh paths this PR introduces:
- exact refresh keeps cached content on
304 - exact refresh invalidates on
200 - exact refresh invalidates on
404 - exact refresh invalidates on
410 - partial refresh purges only changed matches
refresh_allrefreshes changed entries across the cache- runtime capability mismatch returns
400forrefresh_allon a purge-only location
Fresh verification: make test-version VERSION=1.29.8 now passes with the new cases included.
| "<html><head><title>Successful purge</title></head>" | ||
| "<body bgcolor=\"white\"><center><h1>Successful purge</h1>" |
There was a problem hiding this comment.
The HTML response template hard-codes "Successful purge" in the <title>/
, but this handler can also return statuses like "queued" (background purge) and potentially other non-success states. This makes the response body misleading; consider using a neutral heading (e.g., "Cache purge") or deriving the heading from the actual status string.
| "<html><head><title>Successful purge</title></head>" | |
| "<body bgcolor=\"white\"><center><h1>Successful purge</h1>" | |
| "<html><head><title>Cache purge</title></head>" | |
| "<body bgcolor=\"white\"><center><h1>Cache purge</h1>" |
There was a problem hiding this comment.
Fixed.
The shared HTML template now uses the neutral heading Cache purge instead of Successful purge, so queued/background responses no longer claim success in the title.
Added TEST 4a in t/background_queue.t to cover the queued HTML response explicitly, and verified it with a fresh make test-version VERSION=1.29.8 run.
The cache_purge_throttle_ms directive uses ngx_conf_set_msec_slot, which treats bare integers as seconds per the standard nginx time-value contract. This caused a value of "10" to be interpreted as 10 seconds instead of the intended 10 milliseconds. This patch updates the documentation, internal comments, and test suites to explicitly use the "ms" suffix (e.g., "10ms"). It also clarifies that while the directive requires a suffix for millisecond precision, the internal default constant is still handled as raw milliseconds. Resolved nginx-modules#61.
The cache_purge_response_type directive had a flawed duplicate detection guard that only checked for NGX_HTTP_LOC_CONF. This silently allowed duplicate directives in http and server contexts to overwrite previous values without throwing a configuration error. Additionally, a defunct guard attempting to check for NGX_HTTP_MODULE was comparing context flags against module-type magic bits, rendering it dead code. Remove the invalid NGX_HTTP_MODULE check and make duplicate detection unconditional. The directive is intentionally valid in http, server, and location contexts, and merge_loc_conf propagates the value downward appropriately. The README context documentation is updated to reflect this correctly. Furthermore, rename the internal configuration variable from resptype to response_type for clarity. Fix a typo in the response type macro definitions (NGX_REPONSE_TYPE_* to NGX_CACHE_PURGE_RESPONSE_TYPE_*) and properly namespace them. Clarify the documentation and internal comments for the cache_purge_throttle_ms directive. Bare integers are evaluated as milliseconds natively to match the directive's name, which differs from the standard nginx time parser, logging a startup warning. Add comprehensive test coverage in config.t to ensure proper context inheritance, duplicate config rejection, and explicit time suffix parsing.
|
@yaoge123 Thanks for the massive work here! I'm currently reviewing it. One architectural thought: rather than adding 2000+ lines to This would isolate the complex subrequest and validation logic from the legacy purge code, reducing the risk of regressions. We'll still compile it down to a single |
|
I checked the current refresh implementation structure carefully and agree that a split will make the review and long-term maintenance safer. I’m going to take the narrowest refactor first: move the refresh execution engine (scan/collect/subrequest/timeout/finalize flow) into a dedicated source file, while keeping the existing module commands, config parsing, access routing, and shared purge/invalidate glue in the main module file. That should reduce the size and complexity of |
|
Implemented the narrow split.The refresh execution engine now lives in |
Summary
refresh_allRecommended Review Order
Main Behavior
This stack adds refresh as conditional cache validation for existing cached objects. For each cached entry in scope, the module probes upstream and keeps the cache entry when upstream reports it is unchanged, but purges the cache entry when upstream reports it changed or disappeared.
Current refresh status policy is:
304: keep the cached entry200: purge the cached entry because upstream content changed404/410: purge the cached entry because the upstream object is goneConfiguration Model
This series ends with two supported operator-facing models:
proxy_cache_refreshrefresh_allremains explicitly configured throughproxy_cache_refreshand is not enabled on a plain purge location by accident.Safety Rules
Refresh only runs for proxy cache paths and only when the evaluated cache key keeps the request-target tail unambiguous. The series includes:
proxy_cacheHardening Included In This Stack
The series also fixes and hardens a number of failure cases discovered during development and regression work:
Observability
Refresh responses and logs now expose richer accounting:
X-Cache-Actionresponse header for purge vs refresh actionskept / purged / errorsstatus_countsstatus_bytestotal_bytes / kept_bytes / purged_byteserror_log ... noticeVerification
Validation completed on the fork before proposing upstream review, including:
make testscripts/run_regression.sh1.20.21.22.11.24.01.26.31.28.31.29.8