[ti_recordedfuture] Add identity detection data stream#19826
[ti_recordedfuture] Add identity detection data stream#19826brijesh-elastic wants to merge 5 commits into
Conversation
Elastic Docs Style Checker (Vale)Summary: 1 warning, 2 suggestions found
|
| File | Line | Rule | Message |
|---|---|---|---|
| packages/ti_recordedfuture/data_stream/identity_detection/manifest.yml | 41 | Elastic.Latinisms | Latin terms and abbreviations are a common source of confusion. Use 'for example' instead of 'e.g'. |
💡 Suggestions (2): Optional style improvements. Apply when helpful.
| File | Line | Rule | Message |
|---|---|---|---|
| packages/ti_recordedfuture/data_stream/identity_detection/manifest.yml | 88 | Elastic.WordChoice | Consider using 'deactivated, deselected, hidden, turned off, unavailable' instead of 'disabled', unless the term is in the UI. |
| packages/ti_recordedfuture/data_stream/identity_detection/manifest.yml | 97 | Elastic.WordChoice | Consider using 'deactivated, deselected, hidden, turned off, unavailable' instead of 'disabled', unless the term is in the UI. |
The Vale linter checks documentation changes against the Elastic Docs style guide. To use Vale locally or report issues, refer to Elastic style guide for Vale.
🚀 Benchmarks reportTo see the full report comment with |
|
Pinging @elastic/security-service-integrations (Team:Security-Service Integrations) |
|
👀 I have started reviewing the PR |
1 similar comment
|
👀 I have started reviewing the PR |
Vera Review BotFor the current commit state, I did not find any issues. 🤖 AI-Generated Review | Vera Review Bot | 📚 Knowledge base: integration-skills
|
| {{#if preserve_duplicate_custom_fields}} | ||
| - preserve_duplicate_custom_fields | ||
| {{/if}} |
There was a problem hiding this comment.
Can remove these references since we are not defining preserve_duplicate_custom_fields inside manifest
| secret: false | ||
| default: false | ||
| description: >- | ||
| When enabled, the leaked password cleartext value is replaced with a keyed hash (HMAC-SHA256) by the agent before the data reaches Elasticsearch. When disabled, the cleartext value is ingested as-is. Requires `Hashing Key` to be set. |
There was a problem hiding this comment.
Since by default (hash_password: false, hash_cookies: false) the leaked password cleartext and cookie values are written to ES, we should call this out in README as well and what users need to do to avoid it.
|
👀 I have started reviewing the PR |
| - append: | ||
| field: event.category | ||
| tag: append_malware_into_event_category | ||
| value: malware |
There was a problem hiding this comment.
🟡 MEDIUM data_stream/identity_detection/.../default.yml:269
event.category hardcoded to malware for non-malware detections
The pipeline unconditionally appends malware to event.category for every identity detection. Detections sourced from database breaches (source_type: DatabaseDumps and DatabaseCombolists) are not malware-derived, so this assigns an inaccurate ECS category to those records. This is visible in the pipeline test data: the record with id a1b2c3d4e5f6789012345678abcdef01 has source_type: DatabaseDumps yet receives event.category: [malware] in the expected output. Mixing accurate categories (stealer-malware logs) with inaccurate ones (database dumps) under a single hardcoded value reduces the precision of category-based filtering and detection rules.
Recommendation:
Categorize based on the detection's source_type so database-breach records are not labelled malware. For example, only set malware for malware-sourced detections and fall back to a threat-intel-appropriate category otherwise:
- append:
field: event.category
tag: append_malware_into_event_category
value: malware
allow_duplicates: false
if: ctx.recordedfuture?.identity_detection?.source_type != null && ctx.recordedfuture.identity_detection.source_type.toLowerCase().contains('malware')
- append:
field: event.category
tag: append_threat_into_event_category
value: threat
allow_duplicates: false
if: ctx.recordedfuture?.identity_detection?.source_type != null && !ctx.recordedfuture.identity_detection.source_type.toLowerCase().contains('malware')🤖 AI-Generated Review | Vera Review Bot | 📚 Knowledge base: integration-skills
⚠️ Automated review — verify suggestions before applying.
…ent categorizations according to vera comments
|
👀 I have started reviewing the PR |
|
✅ All changelog entries have the correct PR link. |
Vera Review BotFor the current commit state, I did not find any issues. 🤖 AI-Generated Review | Vera Review Bot | 📚 Knowledge base: integration-skills
|
💚 Build Succeeded
History
|
| state.?want_more.orValue(false) ? | ||
| state | ||
| : | ||
| state.drop(["offset"]).with( |
There was a problem hiding this comment.
| state.drop(["offset"]).with( | |
| state.drop("offset").with( |
| "application/json", | ||
| { | ||
| "limit": int(state.batch_size), | ||
| ?"offset": (state.?offset.orValue("") != "") ? optional.of(string(state.offset)) : optional.none(), |
There was a problem hiding this comment.
Can we use absence instead of "" to indicate absence.
| { | ||
| "limit": int(state.batch_size), | ||
| ?"offset": (state.?offset.orValue("") != "") ? optional.of(string(state.offset)) : optional.none(), | ||
| ?"organization_id": state.?organization_id[0].hasValue() ? optional.of(state.organization_id) : optional.none(), |
There was a problem hiding this comment.
| ?"organization_id": state.?organization_id[0].hasValue() ? optional.of(state.organization_id) : optional.none(), | |
| ?"organization_id": state.?organization_id[0], |
| "limit": int(state.batch_size), | ||
| ?"offset": (state.?offset.orValue("") != "") ? optional.of(string(state.offset)) : optional.none(), | ||
| ?"organization_id": state.?organization_id[0].hasValue() ? optional.of(state.organization_id) : optional.none(), | ||
| ?"include_enterprise_level": state.?include_enterprise_level.orValue(false) ? optional.of(true) : optional.none(), |
There was a problem hiding this comment.
| ?"include_enterprise_level": state.?include_enterprise_level.orValue(false) ? optional.of(true) : optional.none(), | |
| ?"include_enterprise_level": state.?include_enterprise_level, |
| ?"include_enterprise_level": state.?include_enterprise_level.orValue(false) ? optional.of(true) : optional.none(), | ||
| "filter": { | ||
| "created": {"gte": string(state.start_time)}, | ||
| ?"novel_only": state.?novel_only.orValue(false) ? optional.of(true) : optional.none(), |
There was a problem hiding this comment.
| ?"novel_only": state.?novel_only.orValue(false) ? optional.of(true) : optional.none(), | |
| ?"novel_only": state.?novel_only, |
| "" | ||
| ).as(page_max, | ||
| ( | ||
| (page_max > state.?cursor.max_created.orValue("")) ? |
There was a problem hiding this comment.
Avoid using string ordering when you are dealing with timestamps. In this case, you should have page_max be a timestamp (possibly optional, since you have a "" as a base case; either use timestamp(0) or optional.none() to signal absent max, there are trade-offs in both cases).
| "events": { | ||
| "error": { | ||
| "code": string(resp.StatusCode), | ||
| "id": string(resp.Status), |
There was a problem hiding this comment.
| "id": string(resp.Status), | |
| "id": resp.Status, |
| (size(resp.Body) != 0) ? | ||
| string(resp.Body) | ||
| : | ||
| string(resp.Status) + " (" + string(resp.StatusCode) + ")" |
There was a problem hiding this comment.
| string(resp.Status) + " (" + string(resp.StatusCode) + ")" | |
| resp.Status + " (" + string(resp.StatusCode) + ")" |
| : | ||
| [], | ||
| "want_more": next_offset != "", | ||
| ?"offset": (next_offset != "") ? optional.of(next_offset) : optional.none(), |
There was a problem hiding this comment.
| ?"offset": (next_offset != "") ? optional.of(next_offset) : optional.none(), | |
| "next": { | |
| ?"offset": next_offset, | |
| }, |
with the change that at L97 you have
body.?next_offset.as(next_offset,
and changes to make that work.
There was a problem hiding this comment.
Note that there are existing title mismatches (presumably this integration was made as a derivation of the AbuseCH integration originally since the titles refer to that data source); also in ti_recordedfuture-554321f4-a649-49da-a5ce-b3dfef1a179b.json, and others.
Proposed commit message
Checklist
changelog.ymlfile.How to test this PR locally
Related issues
Screenshots