Skip to content

Commit b4f366a

Browse files
[OCSF] Zeek pipeline (DataDog#23712)
* [OCSF] Zeek/Corelight pipeline Add OCSF v1.5.0 normalization for Zeek/Corelight logs, covering 7 log types across 5 OCSF classes (Detection Finding, Network Activity, HTTP Activity, DNS Activity, File Hosting Activity). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Fix validate-logs errors in zeek.yaml Resolve 36 validation errors flagged by the datadog-assets validator: - Add missing `overrideOnConflict: false` to 3 attribute-remappers - Fix 2 schema-remapper names to backtick individual fields - Rename 25 facets to match validator's canonical names and add `type: integer`/`facetType: range` where required - Remove 6 facets with unresolvable path conflicts (validator demanded unique paths with no canonical definition available) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Fix severity mapping for Detection Finding [2004] Notice Notice events emit `severity.name` capitalized ("High", "Medium", etc.), so the lowercase `@severity.name:informational` filters never matched and the fallback assigned `ocsf.severity_id: 99` while preserving the capitalized name as `ocsf.severity`. Switch the schema-category-mapper to filter on the numeric `severity.id` (1-5) which Corelight reliably emits, and update the notice fixture's expected `severity_id` from 99 to 4 to reflect the corrected mapping. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Add catch-all category to schema-category-mappers with fallback Each schema-category-mapper that defines a fallback must also have a catch-all filter category at the end matching the fallback's values. Six mappers were missing the trailing catch-all: notice/alert severity_id (2004), http activity_id/status_id (4002), dns rcode_id, and dns status_id (4003). Append `query: "*"` -> Other/99 to each. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Apply PR review feedback for Zeek/Corelight OCSF pipeline Direct mappings, dead-code removal, correctness fixes, and OCSF validator cleanups across notice, suricata, conn, ssl, weird, http, dns, and file hosting sub-pipelines: - Map directly to OCSF targets where intermediates were unnecessary (ocsf.time, ocsf.duration, ocsf.traffic.packets, JA3/JA3S algorithm_id, weird protocol_name). - Drop dead/auto-generated mappers: notice/suricata category_uid (set by schema-processor), self-maps of finding_info.uid, event_code, file.hashes (when unbuilt upstream), suricata community_id correlation_uid, HTTP version-as-protocol_ver, DNS direction derivation, and the DNS rcode_id catch-all/fallback (recommended-not-required). - Convert suricata alert.signature_id event_code from string-builder to schema-remapper. - Combine domain/query into single ocsf.query.hostname schema-remapper. - Fix DNS Activity filters: use rcode_name presence to discriminate Response/Query instead of dns.answer.name (handles NXDOMAIN responses). - DNS status_id catch-all renamed Other/99 -> Unknown/0 to satisfy the OCSF validator's suspicious-Other check. - File Hosting tx_hosts/rx_hosts: drop the second intermediate field; grok targets ocsf.{src,dst}_endpoint.ip directly off a single stringify. - Switch fallback source fields per Jonah's suggestions: severity -> severity.name, alert.severity -> alert_severity, http status -> status_msg, dns rcode/status -> rcode_name. - Notice fixture: use id.orig_h/id.resp_h connection fields instead of the suricata-style src. Regenerated zeek_tests.yaml with the OCSF validator (--check-all --write). All 14 logs pass validation with no errors or warnings. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Map Zeek DNS answers to ocsf.answers as dns_answer objects Use two array-processors to wrap each Zeek `answers` string into a dns_answer object and append to ocsf.answers: the first selects the first array element into ocsf.answer.rdata, the second appends ocsf.answer onto ocsf.answers. Only the first answer is captured (the pipeline DSL has no per-element iteration), but that covers the common single-A-record case. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Add catch-all for activity_id * Fix validate-logs failure for DNS answers wrapper The previous array-processor type:select required operation.filter and operation.valueToExtract per the asset validator, but those only apply to object arrays - Zeek's `answers` is a primitive string array. Switch to string-builder + grok-parser to extract the first answer string into ocsf.answer.rdata, then keep the array-processor append to wrap it into ocsf.answers as a dns_answer object. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Address codex review feedback for file pipeline - Include `files_red` in the File Hosting [6006] sub-pipeline filter so redacted file events get OCSF class_uid/activity_id/file fields, not just the pre-transform metadata. - Prefer `filename` over `fuid` when populating `ocsf.file.name`; fall back to `fuid` only when `filename` is absent. The `fuid` mapping to `ocsf.file.uid` is unaffected. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Drop pipeline intermediates, fix multi-IP grok, restore file.hashes - is_alert (notice 2004, suricata 2004): string-builder writes directly to `ocsf.is_alert`; grok-parser converts in place. Drops the `_is_alert_str` intermediate. - DNS answers: stringify directly into `ocsf.answer`; grok extracts `ocsf.answer.rdata` via `a %{data:ocsf.answer.rdata}(,%{data})?` so the comma-separated multi-IP form parses correctly. Drops the `_answers_str` intermediate. - File Hosting tx/rx hosts: stringify directly into `ocsf.{src,dst}_endpoint`; grok extracts `.ip` via `g %{ip:ocsf.{src,dst}_endpoint.ip}(,%{data})?` for multi-IP. Drops the `_tx_hosts_str`/`_rx_hosts_str` intermediates. - Connection 4001: arithmetic-processor writes total bytes directly to `ocsf.traffic.bytes`; the schema-processor remapper becomes a self-map. Drops the `_total_bytes` intermediate (matches the earlier _total_packets/_duration_ms cleanup). - Restore `ocsf.file.hashes`: build `tmp_md5`/`tmp_sha1`/`tmp_sha256` fingerprint objects (algorithm name, integer algorithm_id, value), array-processor append each into `ocsf.file.hashes`, and self-map the array inside the 6006 schema-processor. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
1 parent 4c7e8bb commit b4f366a

2 files changed

Lines changed: 3583 additions & 91 deletions

File tree

0 commit comments

Comments
 (0)