in_tail: reconcile files after missed inotify events#11750
in_tail: reconcile files after missed inotify events#11750sandy2008 wants to merge 2 commits intofluent:masterfrom
Conversation
Signed-off-by: Sandy Chen <Yuxuan.Chen@morganstanley.com>
📝 WalkthroughWalkthroughCentralizes resumable-offset persistence and truncate-reset into new helpers and refactors inotify handling to use a shared file-state reconciler, unifying truncation, rotation, deletion, pending-byte detection, and DB updates across event and periodic checks. Changes
Sequence Diagram(s)sequenceDiagram
participant Inotify
participant TailFS as "tail_fs_inotify"
participant Reconciler as "reconcile_file_state"
participant TailFile as "tail_file"
participant SQLDB
Inotify->>TailFS: file event (modify/move/unlink/IN_Q_OVERFLOW)
alt IN_Q_OVERFLOW
TailFS->>TailFS: trigger full flb_tail_scan()
else normal event
TailFS->>Reconciler: reconcile_file_state(file)
Reconciler->>TailFile: fstat(), compare offsets/sizes
alt truncated or offset mismatch
Reconciler->>TailFile: flb_tail_file_reset_on_truncate(size_delta, caller)
TailFile->>SQLDB: update_resumable_offset_state() (if enabled)
else rotation/delete/pending
Reconciler->>TailFS: mark removed/rotated or pending_bytes > 0
end
TailFS->>TailFS: schedule read if pending
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4ebe842a79
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (ret == -1) { | ||
| flb_errno(); | ||
| flb_plg_error(ins, "fstat error"); | ||
|
|
||
| continue; |
There was a problem hiding this comment.
Handle reconcile errors instead of silently continuing
in_tail_progress_check_callback treats every -1 from reconcile_file_state() as a benign skip, but that helper now returns -1 for operational failures too (not just intentional file removal), including failed watch re-registration when inotify_add_watch returns errors like ENOSPC. In that path flb_tail_file_rotated() has already marked the file as rotated, so later passes stop attempting re-registration, leaving the file unwatched and degrading ingestion correctness/timeliness under inotify-pressure scenarios.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Good catch, thanks. You're right — once flb_tail_file_rotated() runs and the inotify add fails, flb_tail_file_is_rotated() short-circuits forever and the file stays unwatched.
Pushed a fix in 68a562d that splits the rotation handling in reconcile_file_state into two parts: detect-and-mark-rotated (gated on file->rotated == 0), and a separate "re-add the watch if we've lost it" block (file->rotated != 0 && file->watch_fd == -1). That way the next reconcile pass — whether triggered by another inotify event or by in_tail_progress_check_callback — will retry inotify_add_watch and recover once kernel resources free up.
Kept the continue-on--1 in the progress callback as is for now, since the operational -1 paths it can hit are mostly transient (stat blip, brief name resolution failure) and the next periodic tick will retry anyway. Happy to tighten that further if you'd prefer.
There was a problem hiding this comment.
🧹 Nitpick comments (3)
plugins/in_tail/tail_fs_inotify.c (3)
364-383: IN_MOVE_SELF rotation handling now duplicatesreconcile_file_state.
reconcile_file_statealready callsflb_tail_file_is_rotated→flb_tail_file_rotated→flb_tail_fs_remove→flb_tail_fs_add_rotated(lines 260-279), andflb_tail_file_is_rotatedcorrectly detects the inode/name divergence after aMOVE_SELFrename. After this explicit block runs,file->rotated != 0makesflb_tail_file_is_rotatedshort-circuit toFLB_FALSE, so the second pass is a no-op — i.e., the only thing this block buys is doing the rotation work slightly earlier in the same callback.Consider dropping the explicit block and relying on
reconcile_file_stateforMOVE_SELFtoo; this also keeps all rotation re-registration failure handling in one place. Optional, since the current code is correct.♻️ Suggested simplification
- /* Check file rotation (only if it has not been rotated before) */ - if (ev.mask & IN_MOVE_SELF && file->rotated == 0) { - flb_plg_debug(ins, "inode=%"PRIu64" rotated IN_MOVE SELF '%s'", - file->inode, file->name); - - /* A rotated file must be re-registered */ - ret = flb_tail_file_rotated(file); - if (ret == -1) { - return -1; - } - - ret = flb_tail_fs_remove(ctx, file); - if (ret == -1) { - return -1; - } - - ret = flb_tail_fs_add_rotated(file); - if (ret == -1) { - return -1; - } - } - ret = reconcile_file_state(ctx, file, "tail_fs_event", &pending_data_detected);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugins/in_tail/tail_fs_inotify.c` around lines 364 - 383, The IN_MOVE_SELF handling block duplicates rotation logic already performed by reconcile_file_state; remove the explicit ev.mask & IN_MOVE_SELF branch (the block that calls flb_tail_file_rotated, flb_tail_fs_remove, flb_tail_fs_add_rotated) and rely on reconcile_file_state which uses flb_tail_file_is_rotated → flb_tail_file_rotated → flb_tail_fs_remove → flb_tail_fs_add_rotated to detect and re-register rotated files, keeping all rotation and failure handling centralized in reconcile_file_state.
167-197: Truncate-reset semantics now diverge between this helper andadjust_countersintail_file.c.
reset_file_on_truncateresetsstream_offset,last_processed_bytes, andpending_bytesafter a truncate, but the truncate branch inadjust_counters(plugins/in_tail/tail_file.clines 1707-1727) only resetsoffset/buf_lenand callsupdate_resumable_offset_state. Records emitted after a truncate detected viaadjust_counterswill keep the pre-truncatestream_offset, producing wrongoffset_keyvalues, while ones detected via inotify reconcile will be correct. Worth unifying both paths through a single shared helper.Two smaller things to consider in this helper:
file->pending_bytes = file->size;at Line 188 is immediately overwritten byreconcile_file_stateat lines 232-237 — it can be dropped.- For the non-SQLDB case, the in-memory marker (
file->db_offset_marker/db_offset_marker_size) isn't refreshed here; it works out today becauseflb_tail_file_offset_marker_matchesearly-returns whendb_offset <= 0, but a defensiveflb_tail_file_update_offset_marker(file)(or reuse ofupdate_resumable_offset_state) would make the post-truncate state explicit.♻️ Sketch for unifying truncate state reset
static int reset_file_on_truncate(struct flb_tail_config *ctx, struct flb_tail_file *file, int64_t size_delta, const char *caller) { int64_t offset; offset = lseek(file->fd, 0, SEEK_SET); if (offset == -1) { flb_errno(); return -1; } flb_plg_debug(ctx->ins, "%s: inode=%"PRIu64" file truncated %s (diff: %"PRId64" bytes)", caller, file->inode, file->name, size_delta); file->offset = offset; file->stream_offset = offset; file->last_processed_bytes = 0; file->buf_len = 0; - file->pending_bytes = file->size; - -#ifdef FLB_HAVE_SQLDB - if (ctx->db) { - flb_tail_db_file_offset(file, ctx); - } -#endif + + /* Persist offset and refresh the in-memory marker for both SQLDB + * and non-SQLDB code paths (mirrors update_resumable_offset_state + * in tail_file.c). */ +#ifdef FLB_HAVE_SQLDB + if (ctx->db) { + flb_tail_db_file_offset(file, ctx); + } + else +#endif + { + flb_tail_file_update_offset_marker(file); + } return 0; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugins/in_tail/tail_fs_inotify.c` around lines 167 - 197, The truncate reset in reset_file_on_truncate currently diverges from adjust_counters: make both paths share the same state-reset semantics by updating reset_file_on_truncate to reset the same fields used by adjust_counters (clear file->offset and file->buf_len) and also refresh resumable-offset state (call the existing helper flb_tail_file_update_offset_marker(file) or reuse update_resumable_offset_state) so stream_offset/last_processed_bytes and the in-memory db offset marker are consistent after a truncate; remove the redundant file->pending_bytes = file->size assignment (it is overwritten by reconcile_file_state) and keep the FLB_HAVE_SQLDB branch to call flb_tail_db_file_offset(file, ctx) so persistent state is updated as before.
385-388: Minor:pending_data_detectedintail_fs_eventis written but never read.
reconcile_file_staterequires the out param, but the caller at Line 385 ignores it; the unconditionaltail_signal_pending(ctx)in the success path further down already handles the “there might be more data” case. Consider either using the value to skipin_tail_collect_eventwhen nothing is pending, or making the out param optional (NULL-able) so callers that don't care don't need a placeholder local.Returning
0onreconcile_file_state == -1is fine for the “file was removed” case but does swallow watch re-registration failures from the rotation block insidereconcile_file_state; given the periodicin_tail_progress_check_callbackwill reconcile again, recovery is still possible.The
in_tail_progress_check_callbackrewrite (Line 416-446) is clean — delegating per-file work toreconcile_file_stateremoves the previous duplicated size/pending math, and the(void) ins;is the right way to silence the unused-parameter warning.Also applies to: 416-449
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugins/in_tail/tail_fs_inotify.c` around lines 385 - 388, The call in tail_fs_event to reconcile_file_state currently passes a local pending_data_detected that is never read; either use that out-param to avoid unnecessary work (i.e., after calling reconcile_file_state check pending_data_detected and skip calling in_tail_collect_event when false, while still calling tail_signal_pending only when needed), or make reconcile_file_state's out parameter optional (allow NULL) and update callers (tail_fs_event and in_tail_progress_check_callback) to pass NULL when they don't care; adjust tail_fs_event to stop creating the unused local if you go the NULL route. Keep the existing return behavior for reconcile_file_state == -1 (return 0 from tail_fs_event) so rotation/watch re-registration failures continue to be handled by periodic in_tail_progress_check_callback. Ensure references to reconcile_file_state, tail_fs_event, in_tail_collect_event, tail_signal_pending, and in_tail_progress_check_callback are updated consistently.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@plugins/in_tail/tail_fs_inotify.c`:
- Around line 364-383: The IN_MOVE_SELF handling block duplicates rotation logic
already performed by reconcile_file_state; remove the explicit ev.mask &
IN_MOVE_SELF branch (the block that calls flb_tail_file_rotated,
flb_tail_fs_remove, flb_tail_fs_add_rotated) and rely on reconcile_file_state
which uses flb_tail_file_is_rotated → flb_tail_file_rotated → flb_tail_fs_remove
→ flb_tail_fs_add_rotated to detect and re-register rotated files, keeping all
rotation and failure handling centralized in reconcile_file_state.
- Around line 167-197: The truncate reset in reset_file_on_truncate currently
diverges from adjust_counters: make both paths share the same state-reset
semantics by updating reset_file_on_truncate to reset the same fields used by
adjust_counters (clear file->offset and file->buf_len) and also refresh
resumable-offset state (call the existing helper
flb_tail_file_update_offset_marker(file) or reuse update_resumable_offset_state)
so stream_offset/last_processed_bytes and the in-memory db offset marker are
consistent after a truncate; remove the redundant file->pending_bytes =
file->size assignment (it is overwritten by reconcile_file_state) and keep the
FLB_HAVE_SQLDB branch to call flb_tail_db_file_offset(file, ctx) so persistent
state is updated as before.
- Around line 385-388: The call in tail_fs_event to reconcile_file_state
currently passes a local pending_data_detected that is never read; either use
that out-param to avoid unnecessary work (i.e., after calling
reconcile_file_state check pending_data_detected and skip calling
in_tail_collect_event when false, while still calling tail_signal_pending only
when needed), or make reconcile_file_state's out parameter optional (allow NULL)
and update callers (tail_fs_event and in_tail_progress_check_callback) to pass
NULL when they don't care; adjust tail_fs_event to stop creating the unused
local if you go the NULL route. Keep the existing return behavior for
reconcile_file_state == -1 (return 0 from tail_fs_event) so rotation/watch
re-registration failures continue to be handled by periodic
in_tail_progress_check_callback. Ensure references to reconcile_file_state,
tail_fs_event, in_tail_collect_event, tail_signal_pending, and
in_tail_progress_check_callback are updated consistently.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: a8aac234-f0a4-47a4-ac71-1734ecd362ac
📒 Files selected for processing (2)
plugins/in_tail/tail_file.cplugins/in_tail/tail_fs_inotify.c
|
Thanks for the thorough review @coderabbitai — all three nitpicks are addressed in 68a562d. 364-383 (IN_MOVE_SELF redundancy): Yep, dropped the explicit block. 167-197 (truncate-reset divergence): Good call on this one — that was a real bug. The 385-388 (unused All five targeted in_tail integration tests still pass under both regular and |
|
Tip For best results, initiate chat on the files or code changes.
Green on 26/27 integration tests under Valgrind strict mode is a strong signal. All three concerns are well addressed. 🎉 ✏️ Learnings added
🧠 Learnings usedIf you found this review helpful, would you consider giving us a shout-out on X? Thank you for using CodeRabbit! |
- retry inotify_add_watch on next reconcile when a previously rotated file ended up unwatched (e.g. ENOSPC on the first add) - drop the IN_MOVE_SELF block in tail_fs_event; reconcile_file_state already detects inode/name divergence and re-registers the watch via the same retry path - share the truncate reset between adjust_counters and the inotify reconcile path through a new flb_tail_file_reset_on_truncate helper so post-truncate offset_key values stay consistent across both paths - make reconcile_file_state's pending_data_detected out param NULL-able and pass NULL from tail_fs_event where the value was unused Signed-off-by: Sandy Chen <Yuxuan.Chen@morganstanley.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
68a562d to
dc50818
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@plugins/in_tail/tail_file.c`:
- Around line 172-182: The function update_resumable_offset_state currently
swallows errors from the persistence calls; change it to return an int status
and propagate errors from flb_tail_db_file_offset and
flb_tail_file_update_offset_marker (i.e., return their failure codes instead of
ignoring them), keeping the FLB_HAVE_SQLDB conditional logic intact; update all
callers to check the returned status and handle/report persistence failures so
truncate recovery and post-read updates can act on real persistence errors.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: f5b3f23f-3a7a-4ec9-9191-2a8d19e4fbce
📒 Files selected for processing (3)
plugins/in_tail/tail_file.cplugins/in_tail/tail_file.hplugins/in_tail/tail_fs_inotify.c
| static void update_resumable_offset_state(struct flb_tail_file *file) | ||
| { | ||
| #ifdef FLB_HAVE_SQLDB | ||
| if (file->config->db) { | ||
| flb_tail_db_file_offset(file, file->config); | ||
| return; | ||
| } | ||
| #endif | ||
|
|
||
| flb_tail_file_update_offset_marker(file); | ||
| } |
There was a problem hiding this comment.
Propagate offset-state persistence failures instead of swallowing them.
update_resumable_offset_state() ignores failures from DB/marker persistence. That makes truncate recovery and post-read progress updates report success even when resumable state update failed.
Proposed fix
-static void update_resumable_offset_state(struct flb_tail_file *file)
+static int update_resumable_offset_state(struct flb_tail_file *file)
{
`#ifdef` FLB_HAVE_SQLDB
if (file->config->db) {
- flb_tail_db_file_offset(file, file->config);
- return;
+ return flb_tail_db_file_offset(file, file->config);
}
`#endif`
- flb_tail_file_update_offset_marker(file);
+ return flb_tail_file_update_offset_marker(file);
}
@@
- update_resumable_offset_state(file);
+ if (update_resumable_offset_state(file) == -1) {
+ return -1;
+ }
return 0;
@@
- update_resumable_offset_state(file);
+ if (update_resumable_offset_state(file) == -1) {
+ return FLB_TAIL_ERROR;
+ }
return adjust_counters(ctx, file);
@@
- update_resumable_offset_state(file);
+ if (update_resumable_offset_state(file) == -1) {
+ return FLB_TAIL_ERROR;
+ }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@plugins/in_tail/tail_file.c` around lines 172 - 182, The function
update_resumable_offset_state currently swallows errors from the persistence
calls; change it to return an int status and propagate errors from
flb_tail_db_file_offset and flb_tail_file_update_offset_marker (i.e., return
their failure codes instead of ignoring them), keeping the FLB_HAVE_SQLDB
conditional logic intact; update all callers to check the returned status and
handle/report persistence failures so truncate recovery and post-read updates
can act on real persistence errors.
There was a problem hiding this comment.
Thanks for flagging this — talked it through with my reviewer agents and we landed on pushing back on the propagate-up suggestion. Tracing the call sites:
update_resumable_offset_state runs after consume_bytes() has already shifted data out of file->buf_data and after records have been emitted to the engine. By the time we hit the persist call, the data is gone from the in-process buffer.
If the helper returns -1 and we propagate that up to flb_tail_file_chunk → FLB_TAIL_ERROR, tail.c maps that to flb_tail_file_remove(file) (tail.c:117-119). On the next scan the file gets re-discovered and re-tailed from the last persisted offset — so a transient sqlite-lock blip would replay every record we just emitted. That's a worse outcome than the current "log and continue, accept that we may replay this batch on a hard restart" behavior.
The swallow pattern is also pre-existing — master swallows flb_tail_db_file_offset returns at tail_file.c:1204, :1227, tail_fs_stat.c:144, plus the two sites this PR refactored into the helper. This PR didn't introduce the silent-failure shape, just consolidated two of the call sites.
That said, your underlying concern about visibility is fair — flb_tail_db_file_offset already logs lock failures, but the sqlite_step path and flb_tail_file_update_offset_marker don't. Happy to do a targeted follow-up PR that adds debug-level logging inside update_resumable_offset_state (and/or inside the compute_offset_marker failure path) without changing return semantics, if that addresses the operability gap. Wanted to keep this PR scoped to the inotify reconcile work it started as.
There was a problem hiding this comment.
@sandy2008, that's a well-reasoned pushback and I'm convinced. The key point I missed is that the persist call happens after the buffer has been drained and records emitted — so converting a persist failure into FLB_TAIL_ERROR → flb_tail_file_remove creates a guaranteed duplication path (re-tail from last persisted offset) for what is likely a transient SQLite contention blip. "Accept potential replay only on a hard restart" is clearly the lesser evil compared to "guarantee replay on every transient lock failure." The pre-existing pattern on master further confirms this is a deliberate choice, not an oversight.
The follow-up PR adding flb_plg_debug/flb_plg_warn inside update_resumable_offset_state and the compute_offset_marker failure path would address the operability gap without touching return semantics — that sounds like the right scope. Happy to review it when it lands.
Withdrawing the original suggestion.
✏️ Learnings added
Learnt from: sandy2008
Repo: fluent/fluent-bit PR: 11750
File: plugins/in_tail/tail_file.c:172-182
Timestamp: 2026-04-28T17:54:10.813Z
Learning: In plugins/in_tail/tail_file.c, `update_resumable_offset_state()` intentionally swallows errors from `flb_tail_db_file_offset` and `flb_tail_file_update_offset_marker`. By the time it is called, `consume_bytes()` has already drained the in-process buffer and records have been emitted to the engine. Propagating failure up to `FLB_TAIL_ERROR` would trigger `flb_tail_file_remove(file)` in tail.c, causing re-discovery and re-tailing from the last persisted offset — duplicating already-emitted records on every transient SQLite lock blip. The accepted tradeoff is "log and continue; accept potential replay only on a hard restart." This swallow pattern is pre-existing in master (tail_file.c:1204, :1227, tail_fs_stat.c:144) and was intentionally preserved/consolidated in PR `#11750`. Improved visibility should come from adding debug-level logging inside the helper and compute_offset_marker failure path in a follow-up PR, not by changing return semantics.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
🧠 Learnings used
Learnt from: sandy2008
Repo: fluent/fluent-bit PR: 0
File: :0-0
Timestamp: 2026-04-28T17:39:59.910Z
Learning: In plugins/in_tail/tail_file.c, `flb_tail_file_reset_on_truncate` is a shared helper that performs lseek + offset/counter reset + `update_resumable_offset_state`. Both `adjust_counters` (in tail_fs_inotify.c) and `reconcile_file_state` call it so that `stream_offset` and `last_processed_bytes` are never left stale after a truncate, and non-SQLDB paths also get the in-memory offset marker refreshed.
Learnt from: sandy2008
Repo: fluent/fluent-bit PR: 0
File: :0-0
Timestamp: 2026-04-28T17:39:59.910Z
Learning: In plugins/in_tail/tail_fs_inotify.c, `reconcile_file_state` has a NULL-able `pending_data_detected` out-parameter. When the caller passes NULL (e.g. `tail_fs_event`), the internal check uses `file->pending_bytes > 0` directly instead of dereferencing the optional pointer. This avoids tying `in_tail_collect_event` invocation solely to pending-byte detection, since that function also drains buffered data and advances multiline state.
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11059
File: plugins/in_tail/tail_file.c:1618-1640
Timestamp: 2025-10-23T07:43:16.216Z
Learning: In plugins/in_tail/tail_file.c, when truncate_long_lines is enabled and the buffer is full, the early truncation path uses `lines > 0` as the validation pattern to confirm whether process_content successfully processed content. This is intentional to track occurrences of line processing rather than byte consumption, and consuming bytes based on `processed_bytes > 0` would be overkill for this validation purpose.
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11711
File: tests/runtime/http_client_chunked.c:172-175
Timestamp: 2026-04-14T20:30:06.356Z
Learning: In this project (fluent/fluent-bit), use K&R brace style for C/C++: for control statements (`if`, `else`, `while`, `do`, `for`), place the opening brace on the same line as the control statement (e.g., `if (cond) {`). Do not use Allman style (opening brace on the next line).
There was a problem hiding this comment.
♻️ Duplicate comments (1)
plugins/in_tail/tail_file.c (1)
172-182:⚠️ Potential issue | 🟠 MajorPropagate resumable-offset persistence failures from the shared helper.
flb_tail_db_file_offset()andflb_tail_file_update_offset_marker()can fail, butupdate_resumable_offset_state()drops that status. As a result, truncate recovery and the post-read updates at Line 1795 and Line 1979 can still report success with stale resumable state.Also applies to: 206-207, 1795-1795, 1979-1979
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugins/in_tail/tail_file.c` around lines 172 - 182, update_resumable_offset_state currently ignores failures from flb_tail_db_file_offset and flb_tail_file_update_offset_marker; change update_resumable_offset_state to return an int status (0 on success, non-zero on failure), propagate and return the error from flb_tail_db_file_offset and flb_tail_file_update_offset_marker instead of dropping it, and update all callers (including the truncate-recovery and post-read update sites that invoke update_resumable_offset_state) to check the returned status and handle/propagate failures accordingly so resumable-offset persistence errors are not treated as success.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@plugins/in_tail/tail_file.c`:
- Around line 172-182: update_resumable_offset_state currently ignores failures
from flb_tail_db_file_offset and flb_tail_file_update_offset_marker; change
update_resumable_offset_state to return an int status (0 on success, non-zero on
failure), propagate and return the error from flb_tail_db_file_offset and
flb_tail_file_update_offset_marker instead of dropping it, and update all
callers (including the truncate-recovery and post-read update sites that invoke
update_resumable_offset_state) to check the returned status and handle/propagate
failures accordingly so resumable-offset persistence errors are not treated as
success.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: ff41faaf-e067-4b74-8019-d5f63343acf2
📒 Files selected for processing (3)
plugins/in_tail/tail_file.cplugins/in_tail/tail_file.hplugins/in_tail/tail_fs_inotify.c
🚧 Files skipped from review as they are similar to previous changes (1)
- plugins/in_tail/tail_file.h
This makes the inotify tail backend recover from missed events instead of waiting for the next modify event to happen to the same file. That matters when the inotify queue overflows, or when a truncate/copytruncate/delete/rotation event is missed while a file still has readable data.
What changed:
IN_Q_OVERFLOWby reconciling tracked files and rescanning pathsIN_MODIFYTesting:
cmake -S . -B /build -DFLB_TESTS_RUNTIME=On -DFLB_TESTS_INTERNAL=On -DFLB_EXAMPLES=Off && cmake --build /build --target fluent-bit-bin -j"$(nproc)"FLUENT_BIT_BINARY=/build/bin/fluent-bit /tmp/flb-it-venv/bin/python -m pytest tests/integration/scenarios/in_tail/tests/test_in_tail_001.py -q -k "copytruncate_with_stale_writer or follows_rename_rotation or handles_multiple_rename_rotations or multi_file_rapid_rotation or rotate_wait_keeps_old_inode_then_purges_it"FLUENT_BIT_BINARY=/build/bin/fluent-bit VALGRIND=1 VALGRIND_STRICT=1 /tmp/flb-it-venv/bin/python -m pytest tests/integration/scenarios/in_tail/tests/test_in_tail_001.py -q -k "copytruncate_with_stale_writer or follows_rename_rotation or handles_multiple_rename_rotations or multi_file_rapid_rotation or rotate_wait_keeps_old_inode_then_purges_it"Summary by CodeRabbit
Refactor
Bug Fixes