Skip to content

Commit 3305ecd

Browse files
Zeek OCSF: add MAC addresses, MITRE ATT&CK, alert category, and severity fixes (DataDog#23864)
* [Zeek] Extend OCSF v1.5 normalization with additional field mappings - Notice pipeline: map proto to ocsf.evidence.connection_info.protocol_name; fix severity filter to use severity.level (not severity.id) per real Zeek logs - Suricata pipeline: map service to evidence connection_info.protocol_name; extract MITRE ATT&CK tactic/technique from alert.metadata into finding_info.attacks; map alert.category to finding_info.types; add risk_level mapping from signature_severity metadata - Conn pipeline: map orig_l2_addr/resp_l2_addr to src/dst endpoint MAC addresses Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Add changelog entry for PR DataDog#23864 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Fix grok pattern performance: replace data captures with word, drop changelog - Replace 4x %{data} capture groups with %{word} for MITRE IDs/names (MITRE tactic/technique values are alphanumeric+underscore only) - Drop trailing %{data} (not needed; grok matches without consuming tail) - Reduces expensive quantifier count from 6 to 1, under the 3-pattern limit - Remove changelog entry (log asset changes don't require one) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Refactor OCSF intermediary fields to ocsf.* namespace per style guide - Replace tmp_md5/sha1/sha256.* with ocsf.file.hash.* (style guide §7.2/8.3) - Replace tmp_attack_str/tmp_attack.* with ocsf.finding_info.attack_raw/attack.* - Remove grok-parser integer coercions for algorithm_id; schema-processor handles type coercion for ocsf.file.hashes elements per OCSF schema - Update string-builder names to follow Add <field> convention (§6.1) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Remove attack_raw intermediary: grok alert.metadata directly Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Fix OCSF validator errors for algorithm_id type and MITRE extraction - Restructure hash building to use unique named objects (ocsf.file.hash_md5, ocsf.file.hash_sha1, ocsf.file.hash_sha256) with an intermediate schema-processor to coerce algorithm_id from string to integer via targetFormat: integer before appending to ocsf.file.hashes array - Store alert.metadata stringification in alert.metadata_str (outside the ocsf namespace) to avoid unknown-attribute validation errors; grok reads from this field to extract MITRE tactic/technique into ocsf.finding_info.attack Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Fix MITRE grok pattern: add trailing %{data} anchor and correct sample Datadog's grok engine uses m.matches() which requires the pattern to consume the entire input string. The pattern was missing a trailing %{data} to absorb remaining content after the last MITRE capture (e.g. ,performance_impact:Low,...). Also update the sample to match the actual string-builder output format: comma-joined without brackets or spaces, as TemplateEvaluator produces. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Restore original grok sample format with brackets The pattern already handles both bracket/space and comma-only formats via the leading and trailing %{data} captures; the sample documents the original alert.metadata stringified representation. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Use lazy quantifier at start of MITRE grok rule Replace leading %{data} with .*? to avoid validate-logs Greedy At Start warning; trailing %{data} is kept to consume remaining content and satisfy the full-string m.matches() requirement. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Replace grok type cast for direction_id with schema-remapper Use schema-remapper targetFormat: integer inside an intermediate schema-processor instead of a grok parser for coercing evidence connection_info.direction_id from string to integer in both Detection Finding sub-pipelines. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Fix intermediate schema-processor names to follow naming convention Rename to bare 'Apply OCSF schema for <class_uid>' per style guide [NAMING-7]. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Update pipeline * Replace intermediate schema-processors with grok type coercion Use grok parsers to coerce string values to integer instead of intermediate schema-processors: direction_id in Detection Finding evidence and algorithm_id in File Hosting Activity file hashes. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Update type casting * Fix attribute-remapper targetType and support MITRE sub-techniques - Add targetType: attribute to three attribute-remappers that perform integer coercion (direction_id, hash_sha1/sha256 algorithm_id); validate-logs requires the field. - Replace %{word} captures in the Suricata MITRE grok rule with regex captures that accept dotted sub-technique IDs (e.g. T1059.001), so tactic/technique fields populate for sub-techniques as well. - Add a sub-technique sample to exercise the new pattern. * Map MITRE sub-technique to ocsf.finding_info.attacks[].sub_technique Split dotted technique IDs (e.g. T1059.001) into a base technique uid (T1059 → technique.uid) and a sub-technique object (T1059.001 → sub_technique.uid). sub_technique is a sibling of technique within the attack object, not nested inside it. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Only populate sub_technique for dotted ATT&CK technique IDs Restructure the MITRE grok into two rules: sub_technique (dotted IDs like T1059.001) captures mitre_subtechnique_name separately, while base_technique (plain IDs like T1071) does not. The sub_technique.name remapper now sources from mitre_subtechnique_name, which only exists for sub-techniques, so base techniques never get a spurious sub_technique object. Adds an end-to-end test for T1059.001. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Remove sub-technique handling from Suricata MITRE extraction Suricata metadata only emits base ATT&CK technique IDs, so the sub_technique split logic was unnecessary. Revert to a single grok rule mapping technique.uid and technique.name directly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Address PR review comments on Zeek OCSF pipeline - Replace Suricata direction_id grok-parser with attribute-remapper to match Notice sub-pipeline pattern - Map MITRE grok captures directly to ocsf.finding_info.attack.* fields, removing 4 intermediate attribute-remappers - Use ocsf.metadata.event_code as temp field for alert.metadata stringification (overwritten by schema-processor), eliminating alert.metadata_str - Add severity.id OR conditions to Notice severity_id category mapper Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Fix type-cast remapper names and convert md5 grok coercion to attribute-remapper Rename all type-cast self-map processors to follow Map `source` to `target` style guide convention. Convert remaining grok-parser type coercion for hash_md5.algorithm_id to attribute-remapper with targetFormat: integer. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 8ea556c commit 3305ecd

2 files changed

Lines changed: 504 additions & 333 deletions

File tree

zeek/assets/logs/zeek.yaml

Lines changed: 209 additions & 68 deletions
Original file line numberDiff line numberDiff line change
@@ -1714,6 +1714,33 @@ pipeline:
17141714
grok:
17151715
supportRules: ""
17161716
matchRules: "to_bool %{boolean(\"true\",\"false\"):ocsf.is_alert}"
1717+
- type: attribute-remapper
1718+
name: Map `proto` to `ocsf.evidence.connection_info.protocol_name`
1719+
enabled: true
1720+
sources:
1721+
- proto
1722+
sourceType: attribute
1723+
target: ocsf.evidence.connection_info.protocol_name
1724+
targetType: attribute
1725+
preserveSource: true
1726+
overrideOnConflict: false
1727+
- type: string-builder-processor
1728+
name: Set evidence connection_info.direction_id to 0 (Unknown)
1729+
enabled: true
1730+
template: "0"
1731+
target: ocsf.evidence.connection_info.direction_id
1732+
replaceMissing: false
1733+
- type: attribute-remapper
1734+
name: Map `ocsf.evidence.connection_info.direction_id` to `ocsf.evidence.connection_info.direction_id`
1735+
enabled: true
1736+
sources:
1737+
- ocsf.evidence.connection_info.direction_id
1738+
sourceType: attribute
1739+
target: ocsf.evidence.connection_info.direction_id
1740+
targetType: attribute
1741+
preserveSource: false
1742+
overrideOnConflict: false
1743+
targetFormat: integer
17171744
- type: attribute-remapper
17181745
name: Map `id.orig_h` to `ocsf.evidence.src_endpoint.ip`
17191746
enabled: true
@@ -1836,23 +1863,23 @@ pipeline:
18361863
name: ocsf.severity_id
18371864
categories:
18381865
- filter:
1839-
query: "@severity.id:1"
1866+
query: "@severity.level:1 OR @severity.id:1"
18401867
name: Informational
18411868
id: 1
18421869
- filter:
1843-
query: "@severity.id:2"
1870+
query: "@severity.level:2 OR @severity.id:2"
18441871
name: Low
18451872
id: 2
18461873
- filter:
1847-
query: "@severity.id:3"
1874+
query: "@severity.level:3 OR @severity.id:3"
18481875
name: Medium
18491876
id: 3
18501877
- filter:
1851-
query: "@severity.id:4"
1878+
query: "@severity.level:4 OR @severity.id:4"
18521879
name: High
18531880
id: 4
18541881
- filter:
1855-
query: "@severity.id:5"
1882+
query: "@severity.level:5 OR @severity.id:5"
18561883
name: Critical
18571884
id: 5
18581885
- filter:
@@ -1929,6 +1956,33 @@ pipeline:
19291956
grok:
19301957
supportRules: ""
19311958
matchRules: "to_bool %{boolean(\"true\",\"false\"):ocsf.is_alert}"
1959+
- type: attribute-remapper
1960+
name: Map `service` to `ocsf.evidence.connection_info.protocol_name`
1961+
enabled: true
1962+
sources:
1963+
- service
1964+
sourceType: attribute
1965+
target: ocsf.evidence.connection_info.protocol_name
1966+
targetType: attribute
1967+
preserveSource: true
1968+
overrideOnConflict: false
1969+
- type: string-builder-processor
1970+
name: Set evidence connection_info.direction_id to 0 (Unknown)
1971+
enabled: true
1972+
template: "0"
1973+
target: ocsf.evidence.connection_info.direction_id
1974+
replaceMissing: false
1975+
- type: attribute-remapper
1976+
name: Map `ocsf.evidence.connection_info.direction_id` to `ocsf.evidence.connection_info.direction_id`
1977+
enabled: true
1978+
sources:
1979+
- ocsf.evidence.connection_info.direction_id
1980+
sourceType: attribute
1981+
target: ocsf.evidence.connection_info.direction_id
1982+
targetType: attribute
1983+
preserveSource: false
1984+
overrideOnConflict: false
1985+
targetFormat: integer
19321986
- type: attribute-remapper
19331987
name: Map `id.orig_h` to `ocsf.evidence.src_endpoint.ip`
19341988
enabled: true
@@ -1977,6 +2031,37 @@ pipeline:
19772031
target: ocsf.evidences
19782032
preserveSource: false
19792033
type: append
2034+
- type: array-processor
2035+
name: Append alert.category to ocsf.finding_info.types
2036+
enabled: true
2037+
operation:
2038+
source: alert.category
2039+
target: ocsf.finding_info.types
2040+
preserveSource: true
2041+
type: append
2042+
- type: string-builder-processor
2043+
name: Add ocsf.metadata.event_code from alert.metadata
2044+
enabled: true
2045+
template: "%{alert.metadata}"
2046+
target: ocsf.metadata.event_code
2047+
replaceMissing: false
2048+
- type: grok-parser
2049+
name: Parse ocsf.metadata.event_code to MITRE tactic and technique fields
2050+
enabled: true
2051+
source: ocsf.metadata.event_code
2052+
samples:
2053+
- "[attack_target:Client_Endpoint, confidence:High, created_at:2026_04_01, deployment:Internal, deployment:Perimeter, mitre_tactic_id:TA0011, mitre_tactic_name:Command_And_Control, mitre_technique_id:T1071, mitre_technique_name:Application_Layer_Protocol, performance_impact:Low, signature_severity:Major, updated_at:2026_04_09]"
2054+
grok:
2055+
supportRules: ""
2056+
matchRules: 'rule .*?mitre_tactic_id:%{regex("[^,\\]]+"):ocsf.finding_info.attack.tactic.uid},\s*mitre_tactic_name:%{regex("[^,\\]]+"):ocsf.finding_info.attack.tactic.name},\s*mitre_technique_id:%{regex("[^,\\]]+"):ocsf.finding_info.attack.technique.uid},\s*mitre_technique_name:%{regex("[^,\\]]+"):ocsf.finding_info.attack.technique.name}%{data}'
2057+
- type: array-processor
2058+
name: Move ocsf.finding_info.attack into ocsf.finding_info.attacks array
2059+
enabled: true
2060+
operation:
2061+
source: ocsf.finding_info.attack
2062+
target: ocsf.finding_info.attacks
2063+
preserveSource: false
2064+
type: append
19802065
- type: schema-processor
19812066
name: Apply OCSF schema for 2004
19822067
enabled: true
@@ -2009,6 +2094,42 @@ pipeline:
20092094
targets:
20102095
name: ocsf.confidence
20112096
id: ocsf.confidence_id
2097+
- type: schema-category-mapper
2098+
name: ocsf.risk_level_id
2099+
categories:
2100+
- filter:
2101+
query: "@alert.metadata:\"signature_severity:Informational\""
2102+
name: Info
2103+
id: 0
2104+
- filter:
2105+
query: "@alert.metadata:\"signature_severity:Minor\""
2106+
name: Low
2107+
id: 1
2108+
- filter:
2109+
query: "@alert.metadata:\"signature_severity:Major\""
2110+
name: High
2111+
id: 3
2112+
- filter:
2113+
query: "@alert.metadata:\"signature_severity:Critical\""
2114+
name: Critical
2115+
id: 4
2116+
targets:
2117+
name: ocsf.risk_level
2118+
id: ocsf.risk_level_id
2119+
- type: schema-remapper
2120+
name: Map `ocsf.finding_info.attacks` to `ocsf.finding_info.attacks`
2121+
sources:
2122+
- ocsf.finding_info.attacks
2123+
target: ocsf.finding_info.attacks
2124+
preserveSource: true
2125+
overrideOnConflict: true
2126+
- type: schema-remapper
2127+
name: Map `ocsf.finding_info.types` to `ocsf.finding_info.types`
2128+
sources:
2129+
- ocsf.finding_info.types
2130+
target: ocsf.finding_info.types
2131+
preserveSource: true
2132+
overrideOnConflict: true
20122133
- type: schema-remapper
20132134
name: Map `ocsf.evidences` to `ocsf.evidences`
20142135
sources:
@@ -2348,6 +2469,13 @@ pipeline:
23482469
target: ocsf.src_endpoint.ip
23492470
preserveSource: true
23502471
overrideOnConflict: true
2472+
- type: schema-remapper
2473+
name: Map `orig_l2_addr` to `ocsf.src_endpoint.mac`
2474+
sources:
2475+
- orig_l2_addr
2476+
target: ocsf.src_endpoint.mac
2477+
preserveSource: true
2478+
overrideOnConflict: true
23512479
- type: schema-remapper
23522480
name: Map `id.orig_p` to `ocsf.src_endpoint.port`
23532481
sources:
@@ -2356,6 +2484,13 @@ pipeline:
23562484
preserveSource: true
23572485
overrideOnConflict: true
23582486
targetFormat: integer
2487+
- type: schema-remapper
2488+
name: Map `resp_l2_addr` to `ocsf.dst_endpoint.mac`
2489+
sources:
2490+
- resp_l2_addr
2491+
target: ocsf.dst_endpoint.mac
2492+
preserveSource: true
2493+
overrideOnConflict: true
23592494
- type: schema-remapper
23602495
name: Map `conn_state` to `ocsf.status_detail`
23612496
sources:
@@ -3450,119 +3585,125 @@ pipeline:
34503585
supportRules: ""
34513586
matchRules: 'g %{ip:ocsf.dst_endpoint.ip}(,%{data})?'
34523587
- type: string-builder-processor
3453-
name: Set MD5 algorithm name
3588+
name: Add MD5 algorithm name
34543589
enabled: true
34553590
template: MD5
3456-
target: tmp_md5.algorithm
3591+
target: ocsf.file.hash_md5.algorithm
34573592
replaceMissing: false
34583593
- type: string-builder-processor
3459-
name: Set MD5 algorithm id
3594+
name: Add MD5 algorithm id
34603595
enabled: true
34613596
template: "1"
3462-
target: tmp_md5.algorithm_id
3597+
target: ocsf.file.hash_md5.algorithm_id
34633598
replaceMissing: false
3464-
- type: grok-parser
3465-
name: Coerce tmp_md5.algorithm_id to integer
3466-
enabled: true
3467-
source: tmp_md5.algorithm_id
3468-
samples:
3469-
- "1"
3470-
grok:
3471-
supportRules: ""
3472-
matchRules: "to_int %{integer:tmp_md5.algorithm_id}"
34733599
- type: attribute-remapper
3474-
name: Map `md5` to `tmp_md5.value`
3600+
name: Map `md5` to `ocsf.file.hash_md5.value`
34753601
enabled: true
34763602
sources:
34773603
- md5
34783604
sourceType: attribute
3479-
target: tmp_md5.value
3605+
target: ocsf.file.hash_md5.value
34803606
targetType: attribute
34813607
preserveSource: true
34823608
overrideOnConflict: false
3483-
- type: array-processor
3484-
name: Append tmp_md5 to ocsf.file.hashes
3485-
enabled: true
3486-
operation:
3487-
source: tmp_md5
3488-
target: ocsf.file.hashes
3489-
preserveSource: false
3490-
type: append
34913609
- type: string-builder-processor
3492-
name: Set SHA1 algorithm name
3610+
name: Add SHA1 algorithm name
34933611
enabled: true
34943612
template: SHA-1
3495-
target: tmp_sha1.algorithm
3613+
target: ocsf.file.hash_sha1.algorithm
34963614
replaceMissing: false
34973615
- type: string-builder-processor
3498-
name: Set SHA1 algorithm id
3616+
name: Add SHA1 algorithm id
34993617
enabled: true
35003618
template: "2"
3501-
target: tmp_sha1.algorithm_id
3619+
target: ocsf.file.hash_sha1.algorithm_id
35023620
replaceMissing: false
3503-
- type: grok-parser
3504-
name: Coerce tmp_sha1.algorithm_id to integer
3505-
enabled: true
3506-
source: tmp_sha1.algorithm_id
3507-
samples:
3508-
- "2"
3509-
grok:
3510-
supportRules: ""
3511-
matchRules: "to_int %{integer:tmp_sha1.algorithm_id}"
35123621
- type: attribute-remapper
3513-
name: Map `sha1` to `tmp_sha1.value`
3622+
name: Map `sha1` to `ocsf.file.hash_sha1.value`
35143623
enabled: true
35153624
sources:
35163625
- sha1
35173626
sourceType: attribute
3518-
target: tmp_sha1.value
3627+
target: ocsf.file.hash_sha1.value
35193628
targetType: attribute
35203629
preserveSource: true
35213630
overrideOnConflict: false
3522-
- type: array-processor
3523-
name: Append tmp_sha1 to ocsf.file.hashes
3524-
enabled: true
3525-
operation:
3526-
source: tmp_sha1
3527-
target: ocsf.file.hashes
3528-
preserveSource: false
3529-
type: append
35303631
- type: string-builder-processor
3531-
name: Set SHA256 algorithm name
3632+
name: Add SHA256 algorithm name
35323633
enabled: true
35333634
template: SHA-256
3534-
target: tmp_sha256.algorithm
3635+
target: ocsf.file.hash_sha256.algorithm
35353636
replaceMissing: false
35363637
- type: string-builder-processor
3537-
name: Set SHA256 algorithm id
3638+
name: Add SHA256 algorithm id
35383639
enabled: true
35393640
template: "3"
3540-
target: tmp_sha256.algorithm_id
3641+
target: ocsf.file.hash_sha256.algorithm_id
35413642
replaceMissing: false
3542-
- type: grok-parser
3543-
name: Coerce tmp_sha256.algorithm_id to integer
3544-
enabled: true
3545-
source: tmp_sha256.algorithm_id
3546-
samples:
3547-
- "3"
3548-
grok:
3549-
supportRules: ""
3550-
matchRules: "to_int %{integer:tmp_sha256.algorithm_id}"
35513643
- type: attribute-remapper
3552-
name: Map `sha256` to `tmp_sha256.value`
3644+
name: Map `sha256` to `ocsf.file.hash_sha256.value`
35533645
enabled: true
35543646
sources:
35553647
- sha256
35563648
sourceType: attribute
3557-
target: tmp_sha256.value
3649+
target: ocsf.file.hash_sha256.value
35583650
targetType: attribute
35593651
preserveSource: true
35603652
overrideOnConflict: false
3653+
- type: attribute-remapper
3654+
name: Map `ocsf.file.hash_md5.algorithm_id` to `ocsf.file.hash_md5.algorithm_id`
3655+
enabled: true
3656+
sources:
3657+
- ocsf.file.hash_md5.algorithm_id
3658+
sourceType: attribute
3659+
target: ocsf.file.hash_md5.algorithm_id
3660+
targetType: attribute
3661+
preserveSource: false
3662+
overrideOnConflict: false
3663+
targetFormat: integer
3664+
- type: attribute-remapper
3665+
name: Map `ocsf.file.hash_sha1.algorithm_id` to `ocsf.file.hash_sha1.algorithm_id`
3666+
enabled: true
3667+
sources:
3668+
- ocsf.file.hash_sha1.algorithm_id
3669+
sourceType: attribute
3670+
target: ocsf.file.hash_sha1.algorithm_id
3671+
targetType: attribute
3672+
preserveSource: false
3673+
overrideOnConflict: false
3674+
targetFormat: integer
3675+
- type: attribute-remapper
3676+
name: Map `ocsf.file.hash_sha256.algorithm_id` to `ocsf.file.hash_sha256.algorithm_id`
3677+
enabled: true
3678+
sources:
3679+
- ocsf.file.hash_sha256.algorithm_id
3680+
sourceType: attribute
3681+
target: ocsf.file.hash_sha256.algorithm_id
3682+
targetType: attribute
3683+
preserveSource: false
3684+
overrideOnConflict: false
3685+
targetFormat: integer
3686+
- type: array-processor
3687+
name: Append ocsf.file.hash_md5 to ocsf.file.hashes
3688+
enabled: true
3689+
operation:
3690+
source: ocsf.file.hash_md5
3691+
target: ocsf.file.hashes
3692+
preserveSource: false
3693+
type: append
3694+
- type: array-processor
3695+
name: Append ocsf.file.hash_sha1 to ocsf.file.hashes
3696+
enabled: true
3697+
operation:
3698+
source: ocsf.file.hash_sha1
3699+
target: ocsf.file.hashes
3700+
preserveSource: false
3701+
type: append
35613702
- type: array-processor
3562-
name: Append tmp_sha256 to ocsf.file.hashes
3703+
name: Append ocsf.file.hash_sha256 to ocsf.file.hashes
35633704
enabled: true
35643705
operation:
3565-
source: tmp_sha256
3706+
source: ocsf.file.hash_sha256
35663707
target: ocsf.file.hashes
35673708
preserveSource: false
35683709
type: append

0 commit comments

Comments
 (0)