Commit b4f366a
[OCSF] Zeek pipeline (DataDog#23712)
* [OCSF] Zeek/Corelight pipeline
Add OCSF v1.5.0 normalization for Zeek/Corelight logs, covering 7 log
types across 5 OCSF classes (Detection Finding, Network Activity, HTTP
Activity, DNS Activity, File Hosting Activity).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* Fix validate-logs errors in zeek.yaml
Resolve 36 validation errors flagged by the datadog-assets validator:
- Add missing `overrideOnConflict: false` to 3 attribute-remappers
- Fix 2 schema-remapper names to backtick individual fields
- Rename 25 facets to match validator's canonical names and add
`type: integer`/`facetType: range` where required
- Remove 6 facets with unresolvable path conflicts (validator demanded
unique paths with no canonical definition available)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Fix severity mapping for Detection Finding [2004] Notice
Notice events emit `severity.name` capitalized ("High", "Medium", etc.),
so the lowercase `@severity.name:informational` filters never matched
and the fallback assigned `ocsf.severity_id: 99` while preserving the
capitalized name as `ocsf.severity`. Switch the schema-category-mapper
to filter on the numeric `severity.id` (1-5) which Corelight reliably
emits, and update the notice fixture's expected `severity_id` from 99
to 4 to reflect the corrected mapping.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Add catch-all category to schema-category-mappers with fallback
Each schema-category-mapper that defines a fallback must also have a
catch-all filter category at the end matching the fallback's values.
Six mappers were missing the trailing catch-all: notice/alert
severity_id (2004), http activity_id/status_id (4002), dns rcode_id,
and dns status_id (4003). Append `query: "*"` -> Other/99 to each.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Apply PR review feedback for Zeek/Corelight OCSF pipeline
Direct mappings, dead-code removal, correctness fixes, and OCSF validator
cleanups across notice, suricata, conn, ssl, weird, http, dns, and file
hosting sub-pipelines:
- Map directly to OCSF targets where intermediates were unnecessary
(ocsf.time, ocsf.duration, ocsf.traffic.packets, JA3/JA3S algorithm_id,
weird protocol_name).
- Drop dead/auto-generated mappers: notice/suricata category_uid (set by
schema-processor), self-maps of finding_info.uid, event_code, file.hashes
(when unbuilt upstream), suricata community_id correlation_uid, HTTP
version-as-protocol_ver, DNS direction derivation, and the DNS rcode_id
catch-all/fallback (recommended-not-required).
- Convert suricata alert.signature_id event_code from string-builder to
schema-remapper.
- Combine domain/query into single ocsf.query.hostname schema-remapper.
- Fix DNS Activity filters: use rcode_name presence to discriminate
Response/Query instead of dns.answer.name (handles NXDOMAIN responses).
- DNS status_id catch-all renamed Other/99 -> Unknown/0 to satisfy the
OCSF validator's suspicious-Other check.
- File Hosting tx_hosts/rx_hosts: drop the second intermediate field;
grok targets ocsf.{src,dst}_endpoint.ip directly off a single stringify.
- Switch fallback source fields per Jonah's suggestions:
severity -> severity.name, alert.severity -> alert_severity,
http status -> status_msg, dns rcode/status -> rcode_name.
- Notice fixture: use id.orig_h/id.resp_h connection fields instead of
the suricata-style src.
Regenerated zeek_tests.yaml with the OCSF validator (--check-all --write).
All 14 logs pass validation with no errors or warnings.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Map Zeek DNS answers to ocsf.answers as dns_answer objects
Use two array-processors to wrap each Zeek `answers` string into a
dns_answer object and append to ocsf.answers: the first selects the
first array element into ocsf.answer.rdata, the second appends
ocsf.answer onto ocsf.answers. Only the first answer is captured (the
pipeline DSL has no per-element iteration), but that covers the common
single-A-record case.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Add catch-all for activity_id
* Fix validate-logs failure for DNS answers wrapper
The previous array-processor type:select required operation.filter and
operation.valueToExtract per the asset validator, but those only apply
to object arrays - Zeek's `answers` is a primitive string array. Switch
to string-builder + grok-parser to extract the first answer string into
ocsf.answer.rdata, then keep the array-processor append to wrap it into
ocsf.answers as a dns_answer object.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Address codex review feedback for file pipeline
- Include `files_red` in the File Hosting [6006] sub-pipeline filter so
redacted file events get OCSF class_uid/activity_id/file fields, not
just the pre-transform metadata.
- Prefer `filename` over `fuid` when populating `ocsf.file.name`; fall
back to `fuid` only when `filename` is absent. The `fuid` mapping to
`ocsf.file.uid` is unaffected.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Drop pipeline intermediates, fix multi-IP grok, restore file.hashes
- is_alert (notice 2004, suricata 2004): string-builder writes directly
to `ocsf.is_alert`; grok-parser converts in place. Drops the
`_is_alert_str` intermediate.
- DNS answers: stringify directly into `ocsf.answer`; grok extracts
`ocsf.answer.rdata` via `a %{data:ocsf.answer.rdata}(,%{data})?` so
the comma-separated multi-IP form parses correctly. Drops the
`_answers_str` intermediate.
- File Hosting tx/rx hosts: stringify directly into
`ocsf.{src,dst}_endpoint`; grok extracts `.ip` via
`g %{ip:ocsf.{src,dst}_endpoint.ip}(,%{data})?` for multi-IP. Drops
the `_tx_hosts_str`/`_rx_hosts_str` intermediates.
- Connection 4001: arithmetic-processor writes total bytes directly to
`ocsf.traffic.bytes`; the schema-processor remapper becomes a
self-map. Drops the `_total_bytes` intermediate (matches the
earlier _total_packets/_duration_ms cleanup).
- Restore `ocsf.file.hashes`: build `tmp_md5`/`tmp_sha1`/`tmp_sha256`
fingerprint objects (algorithm name, integer algorithm_id, value),
array-processor append each into `ocsf.file.hashes`, and self-map
the array inside the 6006 schema-processor.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>1 parent 4c7e8bb commit b4f366a
2 files changed
Lines changed: 3583 additions & 91 deletions
0 commit comments