File source identity issue: checksum collisions and inode reuse can cause skipped data

### A note for the community

file source currently has two fingerprint modes:

checksum (first N lines)
device_and_inode
Both can fail in real production cases:

Checksum mode issue
Different files can share the same first N lines, so they get the same fingerprint.
Vector may switch watcher path by mtime but keep old offset/checkpoint, which can skip data in the new file.

Inode mode issue
Inode can be reused after file deletion/rotation under high churn.
A new file may appear with the same (device,inode) but different content generation.
Reusing old checkpoint offset for this new generation can also skip data.

So checksum-only is unsafe for “same file” identity, and inode-only is also unsafe under inode reuse.

Feature request
Please add a safer composite file identity mode, for example:

primary key: device + inode
plus content generation validation (e.g. header checksum/first bytes in checkpoint)
if validation fails, treat as new file generation and reset resume offset safely
This would reduce data loss/skip risks while keeping backward compatibility (opt-in mode is fine).

### Use Cases

_No response_

### Attempted Solutions

_No response_

### Proposal

_No response_

### References

_No response_

### Version

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

File source identity issue: checksum collisions and inode reuse can cause skipped data #25079

A note for the community

Use Cases

Attempted Solutions

Proposal

References

Version

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

File source identity issue: checksum collisions and inode reuse can cause skipped data #25079

Description

A note for the community

Use Cases

Attempted Solutions

Proposal

References

Version

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions