Skip to content

Add possibility to show repeat errors only once #11641

@giodueck

Description

@giodueck

Is your feature request related to a problem? Please describe.

If the same error occurs on many consecutive attempts to push data to an output, the error repeatedly shows up in the logs. This can drown out useful system logs, especially when system logs are also monitored by Fluent-bit.

For example, among others, we have a Systemd input and an InfluxDB output. If for some reason the InfluxDB endpoint is not reachable, an error with the HTTP status is generated in the log. This then gets picked up by the Systemd input and the InfluxDB output tries to output this new log as well, generating another error. Eventually this results in the Storage.Backlog filling up, generating even more errors for Systemd to log.

An example of the journalctl output after some time with InfluxDB down:

2026-03-23T13:07:57.940884Z - fluent-bit.service[269] [2026/03/23 13:07:57.940553400] [error] [input chunk] fail to drop enough chunks in order to place new data coming from input plugin systemd.0
2026-03-23T13:07:57.950398Z - fluent-bit.service[269] [2026/03/23 13:07:57.943235194] [error] [input chunk] no available chunk
2026-03-23T13:07:58.019220Z - fluent-bit.service[269] [2026/03/23 13:07:58.18009674] [error] [output:influxdb:influxdb.1] http_status=500
2026-03-23T13:07:58.019220Z - fluent-bit.service[269] {"code":"internal error","message":"unexpected error writing points to database: timeout"}
2026-03-23T13:07:58.020409Z - fluent-bit.service[269] [2026/03/23 13:07:58.18712622] [error] [output:influxdb:influxdb.1] http_status=500
2026-03-23T13:07:58.020409Z - fluent-bit.service[269] {"code":"internal error","message":"unexpected error writing points to database: timeout"}

In our case, the host is an embedded system which has a relatively small backlog buffer configured, with Storage.Backlog.Mem_Limit 5M.

Describe the solution you'd like

A limit or timeout on consecutive equivalent error logs could solve this issue.

Basically: if InfluxDB is unreachable for a period, then comes back, then becomes unreachable again, both incidents should generate error logs, but not constantly.

The timeout would solve a scenario like e.g. if a series of 500 errors is interrupted once with a 503 error, the first and second blocks are really the same incident and should have the constraints applied. On the other hand, if the connection comes back for one attempt and drops again for the next, that would probably be a special case: is that the same incident or a new one?

Describe alternatives you've considered

I could not find alternatives to this yet. Expanding the storage backlog is only a partial fix.

We considered creating our own patches, but so far these have been very rudimentary.

Additional context

Version: 4.1.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions