Conversation
📝 WalkthroughWalkthroughThis PR adds complete DNS query/response tracing to the eBPF input plugin. It introduces kernel-space syscall probes (connect, sendto, recvfrom) to capture and correlate DNS transactions, user-space event decoding that extracts query names and response metadata into Fluent Bit log records. ChangesDNS eBPF Trace Support
Sequence DiagramsequenceDiagram
participant App as Application
participant Kernel as eBPF Kernel
participant Maps as BPF Maps
participant Handler as Fluent Bit DNS Handler
participant FLB as Fluent Bit Input
App->>Kernel: connect/sendto/recvfrom syscalls
Kernel->>Maps: store pending connect / dns_queries / dns_recv
Kernel->>Handler: emit EVENT_TYPE_DNS (query/response)
Handler->>Handler: decode query name, assemble record
Handler->>FLB: flb_input_log_append(record)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
2cc2b16 to
56d1e77
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 56d1e77459
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
plugins/in_ebpf/traces/dns/bpf.c (2)
58-63:dns_queriesentries are never expired for unanswered or response-missed queries.Entries are added in
trace_dns_sendto_enterand removed intrace_dns_recvfrom_exit. If a DNS query receives no response (UDP loss, timeout, or the response is received via a non-recvfromsyscall such asread/recv), the entry persists indefinitely. Withmax_entries: 16384, a workload with bursts of unanswered queries can saturate the map and silently drop new DNS query tracking.Consider implementing a periodic cleanup mechanism, or supplementing with a
recvmsg/readtracepoint to broaden response capture.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@plugins/in_ebpf/traces/dns/bpf.c` around lines 58 - 63, dns_queries map entries (map dns_queries) are never removed when a DNS query never gets a recvfrom response, which can saturate the map added in trace_dns_sendto_enter and only removed in trace_dns_recvfrom_exit; change the strategy by either (A) switching the map type from BPF_MAP_TYPE_HASH to BPF_MAP_TYPE_LRU_HASH (adjust dns_queries declaration and max_entries remains) so old/unanswered entries are automatically evicted, or (B) implement expiry logic by storing a timestamp in struct dns_query_state and adding a periodic cleanup path (or extend probes) to remove stale entries, and/or add additional probes (trace_dns_recvmsg_exit / trace_dns_read_exit) to remove entries when responses arrive via other syscalls; update dns_queries usage in trace_dns_sendto_enter and trace_dns_recvfrom_exit accordingly.
157-174:is_dns_destinationonly matches IPv4 (AF_INET); IPv6 DNS traffic is silently ignored.Applications using
AF_INET6resolvers (including dual-stack hosts querying::1or a v6 DNS server) will not have their sockets registered indns_sockets, and theirsendtocalls won't be intercepted unless an explicitaddris passed. This means DNS traces will silently produce incomplete records on IPv6-capable systems.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@plugins/in_ebpf/traces/dns/bpf.c` around lines 157 - 174, is_dns_destination currently only handles AF_INET; modify it to detect and handle AF_INET6 as well by first reading the sa_family from user addr (using bpf_probe_read_user on the family/first bytes) and then branching: for AF_INET, read a struct sockaddr_in and check dst.sin_family == AF_INET && bpf_ntohs(dst.sin_port) == DNS_PORT; for AF_INET6, read a struct sockaddr_in6 and check dst6.sin6_family == AF_INET6 && bpf_ntohs(dst6.sin6_port) == DNS_PORT. Ensure you validate addrlen before each full read (>= sizeof(struct sockaddr_in) or >= sizeof(struct sockaddr_in6)) and keep existing AF_INET behavior intact; use the same DNS_PORT, AF_INET6, struct sockaddr_in6, is_dns_destination symbols to locate and update the logic.plugins/in_ebpf/traces/includes/common/events.h (1)
136-146: ⚡ Quick win
char query[DNS_NAME_MAX]is never written or read — dead field.The BPF programs (
bpf.c) only populatequery_rawandquery_raw_len;queryis never written. The handler (handler.c) decodes into a localquery_namebuffer and never readsev->details.dns.query. This 128-byte field is dead weight in every DNS ring-buffer slot.🔧 Proposed fix
struct dns_event { __u16 txid; __u16 query_type; __u8 rcode; __u8 response; __u16 query_raw_len; __u64 latency_ns; int error_raw; - char query[DNS_NAME_MAX]; __u8 query_raw[DNS_QUERY_RAW_MAX]; };🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@plugins/in_ebpf/traces/includes/common/events.h` around lines 136 - 146, The struct dns_event contains a dead field char query[DNS_NAME_MAX] that is never populated by bpf.c nor read by handler.c; remove that unused field from the struct dns_event definition in events.h to reduce ring-buffer slot size, and then rebuild to ensure no remaining references rely on dns.query; verify bpf.c continues to populate query_raw and query_raw_len and handler.c continues to decode into its local query_name buffer (update any tests or comments mentioning dns.query if present).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@plugins/in_ebpf/traces/dns/handler.c`:
- Around line 202-217: The encoder is only reset on the success path, so ensure
flb_log_event_encoder_reset(encoder) is always called before returning from the
handler: after a failing encode_dns_event (ret == -1) and after a failing
flb_input_log_append (ret == -1). Update the control flow in the function that
calls encode_dns_event and flb_input_log_append (references: encode_dns_event,
flb_input_log_append, flb_log_event_encoder_reset, event_ctx->ins, encoder) to
perform a single cleanup/exit path (e.g., a local label or common return block)
that calls flb_log_event_encoder_reset(encoder) and then returns the appropriate
error code so the encoder cannot retain stale state across invocations.
---
Nitpick comments:
In `@plugins/in_ebpf/traces/dns/bpf.c`:
- Around line 58-63: dns_queries map entries (map dns_queries) are never removed
when a DNS query never gets a recvfrom response, which can saturate the map
added in trace_dns_sendto_enter and only removed in trace_dns_recvfrom_exit;
change the strategy by either (A) switching the map type from BPF_MAP_TYPE_HASH
to BPF_MAP_TYPE_LRU_HASH (adjust dns_queries declaration and max_entries
remains) so old/unanswered entries are automatically evicted, or (B) implement
expiry logic by storing a timestamp in struct dns_query_state and adding a
periodic cleanup path (or extend probes) to remove stale entries, and/or add
additional probes (trace_dns_recvmsg_exit / trace_dns_read_exit) to remove
entries when responses arrive via other syscalls; update dns_queries usage in
trace_dns_sendto_enter and trace_dns_recvfrom_exit accordingly.
- Around line 157-174: is_dns_destination currently only handles AF_INET; modify
it to detect and handle AF_INET6 as well by first reading the sa_family from
user addr (using bpf_probe_read_user on the family/first bytes) and then
branching: for AF_INET, read a struct sockaddr_in and check dst.sin_family ==
AF_INET && bpf_ntohs(dst.sin_port) == DNS_PORT; for AF_INET6, read a struct
sockaddr_in6 and check dst6.sin6_family == AF_INET6 && bpf_ntohs(dst6.sin6_port)
== DNS_PORT. Ensure you validate addrlen before each full read (>= sizeof(struct
sockaddr_in) or >= sizeof(struct sockaddr_in6)) and keep existing AF_INET
behavior intact; use the same DNS_PORT, AF_INET6, struct sockaddr_in6,
is_dns_destination symbols to locate and update the logic.
In `@plugins/in_ebpf/traces/includes/common/events.h`:
- Around line 136-146: The struct dns_event contains a dead field char
query[DNS_NAME_MAX] that is never populated by bpf.c nor read by handler.c;
remove that unused field from the struct dns_event definition in events.h to
reduce ring-buffer slot size, and then rebuild to ensure no remaining references
rely on dns.query; verify bpf.c continues to populate query_raw and
query_raw_len and handler.c continues to decode into its local query_name buffer
(update any tests or comments mentioning dns.query if present).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: f49fe630-76c2-49e6-958f-ec90c541ae9a
📒 Files selected for processing (7)
plugins/in_ebpf/in_ebpf.cplugins/in_ebpf/traces/dns/bpf.cplugins/in_ebpf/traces/dns/handler.cplugins/in_ebpf/traces/dns/handler.hplugins/in_ebpf/traces/includes/common/encoder.hplugins/in_ebpf/traces/includes/common/events.hplugins/in_ebpf/traces/traces.h
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@plugins/in_ebpf/traces/dns/bpf.c`:
- Around line 55-59: The code currently stores the user pointer in
dns_connect_pending at sys_enter_connect and reads user memory at
sys_exit_connect, leading to TOCTOU; instead, in sys_enter_connect (where struct
dns_connect_args and dns_connect_pending are used) perform a single safe copy of
the needed destination fields (socket family and port) from userspace into
kernel/BPF-managed storage and store only those scalar values (e.g., family,
port, and fd) in dns_connect_pending; then in sys_exit_connect consume only
those stored scalar values in sys_exit_connect (do not dereference the original
userspace addr pointer). Update references to dns_connect_pending,
sys_enter_connect, and sys_exit_connect so exit logic uses the copied
family/port values rather than reading user memory.
- Around line 61-67: The dns_queries BPF map is a plain BPF_MAP_TYPE_HASH and
can retain stale entries when responses are lost, leading to saturation; change
the map to use an eviction-capable type (e.g., BPF_MAP_TYPE_LRU_HASH) or similar
LRU/TTL-backed map and keep the same key/value types and max_entries so stale
entries are evicted automatically; update the dns_queries definition (symbol:
dns_queries) accordingly and ensure the existing explicit removal logic (used
around the response handling code paths) remains compatible with the new map
type.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 8f24cbc2-d99c-4dee-b951-666d551e3bd5
📒 Files selected for processing (2)
plugins/in_ebpf/traces/dns/bpf.cplugins/in_ebpf/traces/dns/handler.c
Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
2dbc4d5 to
ba73369
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (2)
plugins/in_ebpf/traces/dns/bpf.c (2)
367-374: 💤 Low valueOptional: make
raw_lenupper-bound explicit for BPF-verifier clarity.
raw_lenis clamped toDNS_QUERY_RAW_MAXvia the ternary on Line 367, but the subsequent cast to__u16can obscure that bound from some older BPF verifiers. An explicit clamp after the cast makes the scalar range unambiguous without changing behaviour.🔧 Proposed guard
raw_len = (__u16) (len > DNS_QUERY_RAW_MAX ? DNS_QUERY_RAW_MAX : len); + if (raw_len > DNS_QUERY_RAW_MAX) { + raw_len = DNS_QUERY_RAW_MAX; + } state.query_raw_len = raw_len;🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@plugins/in_ebpf/traces/dns/bpf.c` around lines 367 - 374, The ternary cast to __u16 for raw_len may not make the upper bound obvious to older BPF verifiers; explicitly clamp raw_len to DNS_QUERY_RAW_MAX after the cast so its scalar range is unambiguous. Update the code around raw_len and state.query_raw_len (the computation that currently uses DNS_QUERY_RAW_MAX and the subsequent assignment to state.query_raw_len) to add an explicit check like "if (raw_len > DNS_QUERY_RAW_MAX) raw_len = DNS_QUERY_RAW_MAX" before using raw_len in bpf_probe_read_user(state.query_raw, raw_len, payload), leaving behavior unchanged but making the verifier happy. Ensure references remain to raw_len, DNS_QUERY_RAW_MAX, state.query_raw_len, and bpf_probe_read_user.
287-303: 💤 Low value
trace_dns_close_enter– redundant double cast on Line 298.
(__s32) ((int) ctx->args[0])applies two equivalent 32-bit signed casts.intand__s32are the same width; the inner(int)cast is superfluous.🔧 Proposed simplification
- key.fd = (__s32) ((int) ctx->args[0]); + key.fd = (int) ctx->args[0];🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@plugins/in_ebpf/traces/dns/bpf.c` around lines 287 - 303, In trace_dns_close_enter, the fd assignment does an unnecessary double cast; replace key.fd = (__s32) ((int) ctx->args[0]); with a single 32-bit cast (e.g., key.fd = (__s32) ctx->args[0]; or cast to (int32_t)) to remove the redundant (int) cast while keeping the intended conversion for dns_socket_key.key.fd and the bpf_map_delete_elem(&dns_sockets, &key) call intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@plugins/in_ebpf/traces/dns/bpf.c`:
- Around line 367-374: The ternary cast to __u16 for raw_len may not make the
upper bound obvious to older BPF verifiers; explicitly clamp raw_len to
DNS_QUERY_RAW_MAX after the cast so its scalar range is unambiguous. Update the
code around raw_len and state.query_raw_len (the computation that currently uses
DNS_QUERY_RAW_MAX and the subsequent assignment to state.query_raw_len) to add
an explicit check like "if (raw_len > DNS_QUERY_RAW_MAX) raw_len =
DNS_QUERY_RAW_MAX" before using raw_len in bpf_probe_read_user(state.query_raw,
raw_len, payload), leaving behavior unchanged but making the verifier happy.
Ensure references remain to raw_len, DNS_QUERY_RAW_MAX, state.query_raw_len, and
bpf_probe_read_user.
- Around line 287-303: In trace_dns_close_enter, the fd assignment does an
unnecessary double cast; replace key.fd = (__s32) ((int) ctx->args[0]); with a
single 32-bit cast (e.g., key.fd = (__s32) ctx->args[0]; or cast to (int32_t))
to remove the redundant (int) cast while keeping the intended conversion for
dns_socket_key.key.fd and the bpf_map_delete_elem(&dns_sockets, &key) call
intact.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 1b540f4c-4fc1-40b3-8b06-f0a440df756b
📒 Files selected for processing (2)
plugins/in_ebpf/traces/dns/bpf.cplugins/in_ebpf/traces/dns/handler.c
🚧 Files skipped from review as they are similar to previous changes (1)
- plugins/in_ebpf/traces/dns/handler.c
In this PR, I implemented DNS query eBPF trace.
Enter
[N/A]in the box, if an item is not applicable to your change.Testing
Before we can approve your change; please submit the following in a comment:
If this is a change to packaging of containers or native binaries then please confirm it works for all targets.
ok-package-testlabel to test for all targets (requires maintainer to do).Documentation
Backporting
Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.
Summary by CodeRabbit