Skip to content

pipelines: migrate 91 transform_ocsf entries into push/pull/ structure#64

Merged
nate-smalls-s1 merged 3 commits intoSentinel-One:mainfrom
natesmalley:transform-ocsf-migration
Apr 27, 2026
Merged

pipelines: migrate 91 transform_ocsf entries into push/pull/ structure#64
nate-smalls-s1 merged 3 commits intoSentinel-One:mainfrom
natesmalley:transform-ocsf-migration

Conversation

@natesmalley
Copy link
Copy Markdown
Contributor

Summary

Populates the empty pipelines/push/{syslog,hec}/ and pipelines/pull/{api,object_store}/ scaffolding introduced in #59. Moves 91 community pipelines from pipelines/community/transform_ocsf/<name>/ into the ingest-mode-first taxonomy with vendor and product directories. Git history is preserved on every entry (git mv).

What changed

New location Entries moved
pipelines/push/syslog/<vendor>/<product>/ 57
pipelines/pull/api/<vendor>/<product>/ 29
pipelines/pull/object_store/<vendor>/<product>/ 5
Total moves 91

The bucket assignment is driven by each entry's ingest_mode field (backfilled in #61):

  • Syslogpush/syslog/
  • API Callpull/api/
  • Other - {object store ...}pull/object_store/
  • Other - {Event Hub / Kafka stream ...}pull/api/ (per project decision)
  • Other - {agent-based file ingestion ...}push/syslog/ (per project decision)

The vendor / product split is derived per entry from its upstream parser binding and the vendor's natural taxonomy. Collisions across the cluster (Cisco Meraki, Fortinet, Cloudflare, Zscaler, Microsoft, etc.) are disambiguated with explicit product-name overrides documented in .reorg-prep/inventory/transform_ocsf_migration_plan.tsv.

What stays in pipelines/community/transform_ocsf/

15 entries remain as platform-agnostic OCSF overlays for generic, template, or unknown-vendor data:

agent_metrics_logs, generic_access_logs, inngate_gateway_logs, json_generic_logs, json_nested_kv_logs, jruby_application_logs, leef_template_logs, log4shell_detection_logs, mail_server_logs, microservice_tracing_logs, sample_test_logs, spam_detection_logs, sql_database_logs, syslog_space_delimited_logs, vpc_logs.

Interaction with #62 and #63 (both open)

This PR has no overlap with #62 (drops 7 broken-legacy entries) or #63 (drops 16 entries with first-party ingestion paths). The 23 entries those PRs delete are NOT moved here; they remain in transform_ocsf/ until those PRs merge. Merge order does not matter. Once all three land, transform_ocsf/ will contain only the 15 generic / overlay entries listed above.

What is NOT in this PR (intentional)

  • No serializer logic, metadata content, or pipeline JSON content was modified. Every change is a directory rename.
  • No naming-consistency cleanup (e.g., paloalto_*palo_alto/*, fixing the doubled-name singletons like forcepoint_forcepoint_logs, fixing the typo cloudflare_inc_waf_lastest). That is a follow-up rename PR.
  • No coordinated rename of parsers/community/<source_name>/ to match the new vendor/product layout. Each pipeline still binds to its parser via the existing source_name field, which is path-independent. A coordinated parser rename is a separate follow-up.

Test plan

  • CI passes (CodeQL, secret scanning, contributor automation)
  • git diff --stat origin/main -M shows 91 directories renamed (354 files via R rename markers), zero content changes
  • pipelines/push/syslog/, pipelines/pull/api/, pipelines/pull/object_store/ are now populated with vendor subdirectories
  • pipelines/community/transform_ocsf/ retains only the 15 generic / overlay entries listed above (after pipelines: drop 7 broken-legacy transform_ocsf entries #62 and pipelines: drop 16 transform_ocsf entries with first-party ingestion paths #63 also merge)
  • Spot-check 5 representative moved entries on github.com to confirm the new path renders the same metadata.yaml / serializer.lua / pipeline JSON content as before:
    • pipelines/push/syslog/palo_alto/panos/ (was transform_ocsf/paloalto_logs/)
    • pipelines/push/syslog/cisco/duo/ removed in pipelines: drop 16 transform_ocsf entries with first-party ingestion paths #63 (verify absent)
    • pipelines/pull/api/cloudflare/waf/ (was transform_ocsf/cloudflare_waf_logs/)
    • pipelines/pull/object_store/aws/vpc_flow/ (was transform_ocsf/aws_vpc_flow/)
    • pipelines/pull/api/microsoft/m365_mgmt_api/ (was transform_ocsf/microsoft_365_mgmt_api_logs/)
  • Bound parsers in parsers/community/<source_name>/ are unchanged; the source_name field in each moved pipeline JSON still resolves correctly
  • No broken cross-references in any surviving metadata.yaml purpose field (e.g., the PAN-OS purpose cross-references documented in pipelines: drop F-graded PAN-OS firewall transform; document PAN-OS variants #60)

Nate Smalley and others added 2 commits April 26, 2026 22:21
Moves 91 community pipeline directories from
pipelines/community/transform_ocsf/<name>/ into the ingest-mode-first
taxonomy introduced in Sentinel-One#59:

  pipelines/push/syslog/<vendor>/<product>/      57 entries
  pipelines/pull/api/<vendor>/<product>/         29 entries
  pipelines/pull/object_store/<vendor>/<product>/  5 entries

The mode bucket is determined by each entry's ingest_mode field (backfilled
in Sentinel-One#61). The vendor and product split is derived per entry from the
upstream parser binding and vendor/product convention; collisions across
the cluster (Cisco Meraki, Fortinet, Cloudflare, Zscaler, Microsoft, etc.)
are disambiguated with explicit product-name overrides documented in
.reorg-prep/inventory/transform_ocsf_migration_plan.tsv.

History is preserved on every entry (git mv).

What stays in pipelines/community/transform_ocsf/ (15 entries):
  - Generic / template / unknown-vendor entries: agent_metrics_logs,
    generic_access_logs, inngate_gateway_logs, json_generic_logs,
    json_nested_kv_logs, leef_template_logs, log4shell_detection_logs,
    mail_server_logs, microservice_tracing_logs, sample_test_logs,
    spam_detection_logs, sql_database_logs, syslog_space_delimited_logs,
    vpc_logs, jruby_application_logs.

What is NOT in this PR (intentional):
  - 23 entries scheduled for removal in Sentinel-One#62 (broken-legacy, 7) and Sentinel-One#63
    (first-party ingestion paths, 16) are NOT moved; they remain in
    transform_ocsf/ until those PRs merge. This PR has no overlap or
    conflict with Sentinel-One#62/Sentinel-One#63 -- merge order does not matter.
  - No serializer logic, no metadata.yaml content, and no pipeline JSON
    content was modified. Every change is a directory rename.
  - No naming-consistency cleanup (e.g., paloalto_* -> palo_alto/*) is
    applied yet; that is a separate follow-up.

The pipelines/push/{syslog,hec}/ and pipelines/pull/{api,object_store}/
directories are now populated -- the empty scaffolding from Sentinel-One#59 finally
has content.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@nate-smalls-s1 nate-smalls-s1 merged commit 1550197 into Sentinel-One:main Apr 27, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants