Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
70 changes: 44 additions & 26 deletions plugins/in_tail/tail_file.c
Original file line number Diff line number Diff line change
Expand Up @@ -169,6 +169,44 @@ int flb_tail_file_offset_marker_matches(struct flb_tail_file *file)
return FLB_TRUE;
}

static void update_resumable_offset_state(struct flb_tail_file *file)
{
#ifdef FLB_HAVE_SQLDB
if (file->config->db) {
flb_tail_db_file_offset(file, file->config);
return;
}
#endif

flb_tail_file_update_offset_marker(file);
}
Comment on lines +172 to +182
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Propagate offset-state persistence failures instead of swallowing them.

update_resumable_offset_state() ignores failures from DB/marker persistence. That makes truncate recovery and post-read progress updates report success even when resumable state update failed.

Proposed fix
-static void update_resumable_offset_state(struct flb_tail_file *file)
+static int update_resumable_offset_state(struct flb_tail_file *file)
 {
 `#ifdef` FLB_HAVE_SQLDB
     if (file->config->db) {
-        flb_tail_db_file_offset(file, file->config);
-        return;
+        return flb_tail_db_file_offset(file, file->config);
     }
 `#endif`
 
-    flb_tail_file_update_offset_marker(file);
+    return flb_tail_file_update_offset_marker(file);
 }
@@
-    update_resumable_offset_state(file);
+    if (update_resumable_offset_state(file) == -1) {
+        return -1;
+    }
     return 0;
@@
-                update_resumable_offset_state(file);
+                if (update_resumable_offset_state(file) == -1) {
+                    return FLB_TAIL_ERROR;
+                }
                 return adjust_counters(ctx, file);
@@
-        update_resumable_offset_state(file);
+        if (update_resumable_offset_state(file) == -1) {
+            return FLB_TAIL_ERROR;
+        }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugins/in_tail/tail_file.c` around lines 172 - 182, The function
update_resumable_offset_state currently swallows errors from the persistence
calls; change it to return an int status and propagate errors from
flb_tail_db_file_offset and flb_tail_file_update_offset_marker (i.e., return
their failure codes instead of ignoring them), keeping the FLB_HAVE_SQLDB
conditional logic intact; update all callers to check the returned status and
handle/report persistence failures so truncate recovery and post-read updates
can act on real persistence errors.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for flagging this — talked it through with my reviewer agents and we landed on pushing back on the propagate-up suggestion. Tracing the call sites:

update_resumable_offset_state runs after consume_bytes() has already shifted data out of file->buf_data and after records have been emitted to the engine. By the time we hit the persist call, the data is gone from the in-process buffer.

If the helper returns -1 and we propagate that up to flb_tail_file_chunkFLB_TAIL_ERROR, tail.c maps that to flb_tail_file_remove(file) (tail.c:117-119). On the next scan the file gets re-discovered and re-tailed from the last persisted offset — so a transient sqlite-lock blip would replay every record we just emitted. That's a worse outcome than the current "log and continue, accept that we may replay this batch on a hard restart" behavior.

The swallow pattern is also pre-existing — master swallows flb_tail_db_file_offset returns at tail_file.c:1204, :1227, tail_fs_stat.c:144, plus the two sites this PR refactored into the helper. This PR didn't introduce the silent-failure shape, just consolidated two of the call sites.

That said, your underlying concern about visibility is fair — flb_tail_db_file_offset already logs lock failures, but the sqlite_step path and flb_tail_file_update_offset_marker don't. Happy to do a targeted follow-up PR that adds debug-level logging inside update_resumable_offset_state (and/or inside the compute_offset_marker failure path) without changing return semantics, if that addresses the operability gap. Wanted to keep this PR scoped to the inotify reconcile work it started as.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sandy2008, that's a well-reasoned pushback and I'm convinced. The key point I missed is that the persist call happens after the buffer has been drained and records emitted — so converting a persist failure into FLB_TAIL_ERRORflb_tail_file_remove creates a guaranteed duplication path (re-tail from last persisted offset) for what is likely a transient SQLite contention blip. "Accept potential replay only on a hard restart" is clearly the lesser evil compared to "guarantee replay on every transient lock failure." The pre-existing pattern on master further confirms this is a deliberate choice, not an oversight.

The follow-up PR adding flb_plg_debug/flb_plg_warn inside update_resumable_offset_state and the compute_offset_marker failure path would address the operability gap without touching return semantics — that sounds like the right scope. Happy to review it when it lands.

Withdrawing the original suggestion.


✏️ Learnings added
Learnt from: sandy2008
Repo: fluent/fluent-bit PR: 11750
File: plugins/in_tail/tail_file.c:172-182
Timestamp: 2026-04-28T17:54:10.813Z
Learning: In plugins/in_tail/tail_file.c, `update_resumable_offset_state()` intentionally swallows errors from `flb_tail_db_file_offset` and `flb_tail_file_update_offset_marker`. By the time it is called, `consume_bytes()` has already drained the in-process buffer and records have been emitted to the engine. Propagating failure up to `FLB_TAIL_ERROR` would trigger `flb_tail_file_remove(file)` in tail.c, causing re-discovery and re-tailing from the last persisted offset — duplicating already-emitted records on every transient SQLite lock blip. The accepted tradeoff is "log and continue; accept potential replay only on a hard restart." This swallow pattern is pre-existing in master (tail_file.c:1204, :1227, tail_fs_stat.c:144) and was intentionally preserved/consolidated in PR `#11750`. Improved visibility should come from adding debug-level logging inside the helper and compute_offset_marker failure path in a follow-up PR, not by changing return semantics.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: sandy2008
Repo: fluent/fluent-bit PR: 0
File: :0-0
Timestamp: 2026-04-28T17:39:59.910Z
Learning: In plugins/in_tail/tail_file.c, `flb_tail_file_reset_on_truncate` is a shared helper that performs lseek + offset/counter reset + `update_resumable_offset_state`. Both `adjust_counters` (in tail_fs_inotify.c) and `reconcile_file_state` call it so that `stream_offset` and `last_processed_bytes` are never left stale after a truncate, and non-SQLDB paths also get the in-memory offset marker refreshed.

Learnt from: sandy2008
Repo: fluent/fluent-bit PR: 0
File: :0-0
Timestamp: 2026-04-28T17:39:59.910Z
Learning: In plugins/in_tail/tail_fs_inotify.c, `reconcile_file_state` has a NULL-able `pending_data_detected` out-parameter. When the caller passes NULL (e.g. `tail_fs_event`), the internal check uses `file->pending_bytes > 0` directly instead of dereferencing the optional pointer. This avoids tying `in_tail_collect_event` invocation solely to pending-byte detection, since that function also drains buffered data and advances multiline state.

Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11059
File: plugins/in_tail/tail_file.c:1618-1640
Timestamp: 2025-10-23T07:43:16.216Z
Learning: In plugins/in_tail/tail_file.c, when truncate_long_lines is enabled and the buffer is full, the early truncation path uses `lines > 0` as the validation pattern to confirm whether process_content successfully processed content. This is intentional to track occurrences of line processing rather than byte consumption, and consuming bytes based on `processed_bytes > 0` would be overkill for this validation purpose.

Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11711
File: tests/runtime/http_client_chunked.c:172-175
Timestamp: 2026-04-14T20:30:06.356Z
Learning: In this project (fluent/fluent-bit), use K&R brace style for C/C++: for control statements (`if`, `else`, `while`, `do`, `for`), place the opening brace on the same line as the control statement (e.g., `if (cond) {`). Do not use Allman style (opening brace on the next line).


int flb_tail_file_reset_on_truncate(struct flb_tail_file *file,
int64_t size_delta,
const char *caller)
{
int64_t offset;
struct flb_tail_config *ctx = file->config;

offset = lseek(file->fd, 0, SEEK_SET);
if (offset == -1) {
flb_errno();
return -1;
}

flb_plg_debug(ctx->ins,
"%s: inode=%"PRIu64" file truncated %s (diff: %"PRId64" bytes)",
caller, file->inode, file->name, size_delta);

file->offset = offset;
file->stream_offset = offset;
file->last_processed_bytes = 0;
file->buf_len = 0;

update_resumable_offset_state(file);
return 0;
}

static uint64_t stat_get_st_dev(struct stat *st)
{
#ifdef FLB_SYSTEM_WINDOWS
Expand Down Expand Up @@ -1678,9 +1716,10 @@ int flb_tail_file_remove_all(struct flb_tail_config *ctx)
static int adjust_counters(struct flb_tail_config *ctx, struct flb_tail_file *file)
{
int ret;
int64_t offset;
struct stat st;

(void) ctx;

ret = fstat(file->fd, &st);
if (ret == -1) {
flb_errno();
Expand All @@ -1694,23 +1733,10 @@ static int adjust_counters(struct flb_tail_config *ctx, struct flb_tail_file *fi

/* Check if the file was truncated by comparing current size with previous size */
if (size_delta < 0) {
offset = lseek(file->fd, 0, SEEK_SET);
if (offset == -1) {
flb_errno();
if (flb_tail_file_reset_on_truncate(file, size_delta,
"adjust_counters") == -1) {
return FLB_TAIL_ERROR;
}

flb_plg_debug(ctx->ins, "adjust_counters: inode=%"PRIu64" file truncated %s (diff: %"PRId64" bytes)",
file->inode, file->name, size_delta);
file->offset = offset;
file->buf_len = 0;

/* Update offset in the database file */
#ifdef FLB_HAVE_SQLDB
if (ctx->db) {
flb_tail_db_file_offset(file, ctx);
}
#endif
}
else {
// Avoid negative pending_bytes when fstat() has stale data and size < offset
Expand Down Expand Up @@ -1766,11 +1792,7 @@ int flb_tail_file_chunk(struct flb_tail_file *file)
file->buf_len -= processed_bytes;
file->buf_data[file->buf_len] = '\0';

#ifdef FLB_HAVE_SQLDB
if (file->config->db) {
flb_tail_db_file_offset(file, file->config);
}
#endif
update_resumable_offset_state(file);
return adjust_counters(ctx, file);
}
}
Expand Down Expand Up @@ -1954,11 +1976,7 @@ int flb_tail_file_chunk(struct flb_tail_file *file)
file->buf_len -= processed_bytes;
file->buf_data[file->buf_len] = '\0';

#ifdef FLB_HAVE_SQLDB
if (file->config->db) {
flb_tail_db_file_offset(file, file->config);
}
#endif
update_resumable_offset_state(file);

/* adjust file counters, returns FLB_TAIL_OK or FLB_TAIL_ERROR */
ret = adjust_counters(ctx, file);
Expand Down
3 changes: 3 additions & 0 deletions plugins/in_tail/tail_file.h
Original file line number Diff line number Diff line change
Expand Up @@ -155,5 +155,8 @@ static inline off_t flb_tail_file_db_offset(struct flb_tail_file *file)

int flb_tail_file_update_offset_marker(struct flb_tail_file *file);
int flb_tail_file_offset_marker_matches(struct flb_tail_file *file);
int flb_tail_file_reset_on_truncate(struct flb_tail_file *file,
int64_t size_delta,
const char *caller);

#endif
Loading