Skip to content

[ti_recordedfuture] Add identity detection data stream#19826

Open
brijesh-elastic wants to merge 5 commits into
elastic:mainfrom
brijesh-elastic:ti_recordedfuture-identity_detection
Open

[ti_recordedfuture] Add identity detection data stream#19826
brijesh-elastic wants to merge 5 commits into
elastic:mainfrom
brijesh-elastic:ti_recordedfuture-identity_detection

Conversation

@brijesh-elastic

Copy link
Copy Markdown
Contributor

Proposed commit message

ti_recordedfuture: Add support for identity detection data stream.

This data stream retrieves identity exposure detections using [Detections API](1).

Test samples were derived from documentation, 
which were subsequently sanitized.

[1] https://docs.recordedfuture.com/reference/identity-detections

Checklist

  • I have reviewed tips for building integrations and this pull request is aligned with them.
  • I have verified that all data streams collect metrics or logs.
  • I have added an entry to my package's changelog.yml file.
  • I have verified that Kibana version constraints are current according to guidelines.
  • I have verified that any added dashboard complies with Kibana's Dashboard good practices

How to test this PR locally

  • Clone integrations repo.
  • Install elastic package locally.
  • Start elastic stack using elastic-package.
  • Move to integrations/packages/ti_recordedfuture directory.
  • Run the following command to run tests.

elastic-package test -v

Related issues

Screenshots

rf-identity_detection

@brijesh-elastic brijesh-elastic self-assigned this Jun 29, 2026
@brijesh-elastic brijesh-elastic added documentation Improvements or additions to documentation. Applied to PRs that modify *.md files. enhancement New feature or request dashboard Relates to a Kibana dashboard bug, enhancement, or modification. Integration:ti_recordedfuture Recorded Future Team:Security-Service Integrations Security Service Integrations team [elastic/security-service-integrations] Team:SDE-Crest Crest developers on the Security Integrations team [elastic/sit-crest-contractors] labels Jun 29, 2026
@github-actions

github-actions Bot commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

Elastic Docs Style Checker (Vale)

Summary: 1 warning, 2 suggestions found

⚠️ Warnings (1): Fix when the suggestion improves clarity or correctness.
File Line Rule Message
packages/ti_recordedfuture/data_stream/identity_detection/manifest.yml 41 Elastic.Latinisms Latin terms and abbreviations are a common source of confusion. Use 'for example' instead of 'e.g'.
💡 Suggestions (2): Optional style improvements. Apply when helpful.
File Line Rule Message
packages/ti_recordedfuture/data_stream/identity_detection/manifest.yml 88 Elastic.WordChoice Consider using 'deactivated, deselected, hidden, turned off, unavailable' instead of 'disabled', unless the term is in the UI.
packages/ti_recordedfuture/data_stream/identity_detection/manifest.yml 97 Elastic.WordChoice Consider using 'deactivated, deselected, hidden, turned off, unavailable' instead of 'disabled', unless the term is in the UI.

The Vale linter checks documentation changes against the Elastic Docs style guide. To use Vale locally or report issues, refer to Elastic style guide for Vale.

@elastic-vault-github-plugin-prod

Copy link
Copy Markdown

🚀 Benchmarks report

To see the full report comment with /test benchmark fullreport

@brijesh-elastic brijesh-elastic marked this pull request as ready for review June 30, 2026 10:24
@brijesh-elastic brijesh-elastic requested review from a team as code owners June 30, 2026 10:24
@infra-vault-gh-plugin-prod

Copy link
Copy Markdown

Pinging @elastic/security-service-integrations (Team:Security-Service Integrations)

@vera-review-bot

Copy link
Copy Markdown

👀 I have started reviewing the PR

1 similar comment
@vera-review-bot

Copy link
Copy Markdown

👀 I have started reviewing the PR

@vera-review-bot

Copy link
Copy Markdown

Vera Review Bot

For the current commit state, I did not find any issues.


🤖 AI-Generated Review | Vera Review Bot | 📚 Knowledge base: integration-skills

⚠️ Automated review — verify suggestions before applying.

Comment on lines +173 to +175
{{#if preserve_duplicate_custom_fields}}
- preserve_duplicate_custom_fields
{{/if}}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can remove these references since we are not defining preserve_duplicate_custom_fields inside manifest

secret: false
default: false
description: >-
When enabled, the leaked password cleartext value is replaced with a keyed hash (HMAC-SHA256) by the agent before the data reaches Elasticsearch. When disabled, the cleartext value is ingested as-is. Requires `Hashing Key` to be set.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since by default (hash_password: false, hash_cookies: false) the leaked password cleartext and cookie values are written to ES, we should call this out in README as well and what users need to do to avoid it.

@brijesh-elastic brijesh-elastic requested a review from kcreddy June 30, 2026 13:19
@vera-review-bot

Copy link
Copy Markdown

👀 I have started reviewing the PR

- append:
field: event.category
tag: append_malware_into_event_category
value: malware

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 MEDIUM data_stream/identity_detection/.../default.yml:269

event.category hardcoded to malware for non-malware detections

The pipeline unconditionally appends malware to event.category for every identity detection. Detections sourced from database breaches (source_type: DatabaseDumps and DatabaseCombolists) are not malware-derived, so this assigns an inaccurate ECS category to those records. This is visible in the pipeline test data: the record with id a1b2c3d4e5f6789012345678abcdef01 has source_type: DatabaseDumps yet receives event.category: [malware] in the expected output. Mixing accurate categories (stealer-malware logs) with inaccurate ones (database dumps) under a single hardcoded value reduces the precision of category-based filtering and detection rules.

Recommendation:

Categorize based on the detection's source_type so database-breach records are not labelled malware. For example, only set malware for malware-sourced detections and fall back to a threat-intel-appropriate category otherwise:

- append:
    field: event.category
    tag: append_malware_into_event_category
    value: malware
    allow_duplicates: false
    if: ctx.recordedfuture?.identity_detection?.source_type != null && ctx.recordedfuture.identity_detection.source_type.toLowerCase().contains('malware')
- append:
    field: event.category
    tag: append_threat_into_event_category
    value: threat
    allow_duplicates: false
    if: ctx.recordedfuture?.identity_detection?.source_type != null && !ctx.recordedfuture.identity_detection.source_type.toLowerCase().contains('malware')

🤖 AI-Generated Review | Vera Review Bot | 📚 Knowledge base: integration-skills

⚠️ Automated review — verify suggestions before applying.

…ent categorizations according to vera comments
@vera-review-bot

Copy link
Copy Markdown

👀 I have started reviewing the PR

@elastic-vault-github-plugin-prod

Copy link
Copy Markdown

✅ All changelog entries have the correct PR link.

@vera-review-bot

Copy link
Copy Markdown

Vera Review Bot

For the current commit state, I did not find any issues.


🤖 AI-Generated Review | Vera Review Bot | 📚 Knowledge base: integration-skills

⚠️ Automated review — verify suggestions before applying.

@infra-vault-gh-plugin-prod

Copy link
Copy Markdown

💚 Build Succeeded

History

cc @brijesh-elastic

state.?want_more.orValue(false) ?
state
:
state.drop(["offset"]).with(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
state.drop(["offset"]).with(
state.drop("offset").with(

"application/json",
{
"limit": int(state.batch_size),
?"offset": (state.?offset.orValue("") != "") ? optional.of(string(state.offset)) : optional.none(),

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use absence instead of "" to indicate absence.

{
"limit": int(state.batch_size),
?"offset": (state.?offset.orValue("") != "") ? optional.of(string(state.offset)) : optional.none(),
?"organization_id": state.?organization_id[0].hasValue() ? optional.of(state.organization_id) : optional.none(),

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
?"organization_id": state.?organization_id[0].hasValue() ? optional.of(state.organization_id) : optional.none(),
?"organization_id": state.?organization_id[0],

"limit": int(state.batch_size),
?"offset": (state.?offset.orValue("") != "") ? optional.of(string(state.offset)) : optional.none(),
?"organization_id": state.?organization_id[0].hasValue() ? optional.of(state.organization_id) : optional.none(),
?"include_enterprise_level": state.?include_enterprise_level.orValue(false) ? optional.of(true) : optional.none(),

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
?"include_enterprise_level": state.?include_enterprise_level.orValue(false) ? optional.of(true) : optional.none(),
?"include_enterprise_level": state.?include_enterprise_level,

?"include_enterprise_level": state.?include_enterprise_level.orValue(false) ? optional.of(true) : optional.none(),
"filter": {
"created": {"gte": string(state.start_time)},
?"novel_only": state.?novel_only.orValue(false) ? optional.of(true) : optional.none(),

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
?"novel_only": state.?novel_only.orValue(false) ? optional.of(true) : optional.none(),
?"novel_only": state.?novel_only,

""
).as(page_max,
(
(page_max > state.?cursor.max_created.orValue("")) ?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid using string ordering when you are dealing with timestamps. In this case, you should have page_max be a timestamp (possibly optional, since you have a "" as a base case; either use timestamp(0) or optional.none() to signal absent max, there are trade-offs in both cases).

"events": {
"error": {
"code": string(resp.StatusCode),
"id": string(resp.Status),

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"id": string(resp.Status),
"id": resp.Status,

(size(resp.Body) != 0) ?
string(resp.Body)
:
string(resp.Status) + " (" + string(resp.StatusCode) + ")"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
string(resp.Status) + " (" + string(resp.StatusCode) + ")"
resp.Status + " (" + string(resp.StatusCode) + ")"

:
[],
"want_more": next_offset != "",
?"offset": (next_offset != "") ? optional.of(next_offset) : optional.none(),

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
?"offset": (next_offset != "") ? optional.of(next_offset) : optional.none(),
"next": {
?"offset": next_offset,
},

with the change that at L97 you have

body.?next_offset.as(next_offset,

and changes to make that work.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that there are existing title mismatches (presumably this integration was made as a derivation of the AbuseCH integration originally since the titles refer to that data source); also in ti_recordedfuture-554321f4-a649-49da-a5ce-b3dfef1a179b.json, and others.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dashboard Relates to a Kibana dashboard bug, enhancement, or modification. documentation Improvements or additions to documentation. Applied to PRs that modify *.md files. enhancement New feature or request Integration:ti_recordedfuture Recorded Future Team:SDE-Crest Crest developers on the Security Integrations team [elastic/sit-crest-contractors] Team:Security-Service Integrations Security Service Integrations team [elastic/security-service-integrations]

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Recorded Future] Add Identity Detections Support

3 participants