pipelines: migrate 91 transform_ocsf entries into push/pull/ structure#64
Merged
nate-smalls-s1 merged 3 commits intoSentinel-One:mainfrom Apr 27, 2026
Merged
Conversation
Moves 91 community pipeline directories from pipelines/community/transform_ocsf/<name>/ into the ingest-mode-first taxonomy introduced in Sentinel-One#59: pipelines/push/syslog/<vendor>/<product>/ 57 entries pipelines/pull/api/<vendor>/<product>/ 29 entries pipelines/pull/object_store/<vendor>/<product>/ 5 entries The mode bucket is determined by each entry's ingest_mode field (backfilled in Sentinel-One#61). The vendor and product split is derived per entry from the upstream parser binding and vendor/product convention; collisions across the cluster (Cisco Meraki, Fortinet, Cloudflare, Zscaler, Microsoft, etc.) are disambiguated with explicit product-name overrides documented in .reorg-prep/inventory/transform_ocsf_migration_plan.tsv. History is preserved on every entry (git mv). What stays in pipelines/community/transform_ocsf/ (15 entries): - Generic / template / unknown-vendor entries: agent_metrics_logs, generic_access_logs, inngate_gateway_logs, json_generic_logs, json_nested_kv_logs, leef_template_logs, log4shell_detection_logs, mail_server_logs, microservice_tracing_logs, sample_test_logs, spam_detection_logs, sql_database_logs, syslog_space_delimited_logs, vpc_logs, jruby_application_logs. What is NOT in this PR (intentional): - 23 entries scheduled for removal in Sentinel-One#62 (broken-legacy, 7) and Sentinel-One#63 (first-party ingestion paths, 16) are NOT moved; they remain in transform_ocsf/ until those PRs merge. This PR has no overlap or conflict with Sentinel-One#62/Sentinel-One#63 -- merge order does not matter. - No serializer logic, no metadata.yaml content, and no pipeline JSON content was modified. Every change is a directory rename. - No naming-consistency cleanup (e.g., paloalto_* -> palo_alto/*) is applied yet; that is a separate follow-up. The pipelines/push/{syslog,hec}/ and pipelines/pull/{api,object_store}/ directories are now populated -- the empty scaffolding from Sentinel-One#59 finally has content. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
nate-smalls-s1
approved these changes
Apr 27, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Populates the empty
pipelines/push/{syslog,hec}/andpipelines/pull/{api,object_store}/scaffolding introduced in #59. Moves 91 community pipelines frompipelines/community/transform_ocsf/<name>/into the ingest-mode-first taxonomy with vendor and product directories. Git history is preserved on every entry (git mv).What changed
pipelines/push/syslog/<vendor>/<product>/pipelines/pull/api/<vendor>/<product>/pipelines/pull/object_store/<vendor>/<product>/The bucket assignment is driven by each entry's
ingest_modefield (backfilled in #61):Syslog→push/syslog/API Call→pull/api/Other - {object store ...}→pull/object_store/Other - {Event Hub / Kafka stream ...}→pull/api/(per project decision)Other - {agent-based file ingestion ...}→push/syslog/(per project decision)The vendor / product split is derived per entry from its upstream parser binding and the vendor's natural taxonomy. Collisions across the cluster (Cisco Meraki, Fortinet, Cloudflare, Zscaler, Microsoft, etc.) are disambiguated with explicit product-name overrides documented in
.reorg-prep/inventory/transform_ocsf_migration_plan.tsv.What stays in
pipelines/community/transform_ocsf/15 entries remain as platform-agnostic OCSF overlays for generic, template, or unknown-vendor data:
agent_metrics_logs,generic_access_logs,inngate_gateway_logs,json_generic_logs,json_nested_kv_logs,jruby_application_logs,leef_template_logs,log4shell_detection_logs,mail_server_logs,microservice_tracing_logs,sample_test_logs,spam_detection_logs,sql_database_logs,syslog_space_delimited_logs,vpc_logs.Interaction with #62 and #63 (both open)
This PR has no overlap with #62 (drops 7 broken-legacy entries) or #63 (drops 16 entries with first-party ingestion paths). The 23 entries those PRs delete are NOT moved here; they remain in
transform_ocsf/until those PRs merge. Merge order does not matter. Once all three land,transform_ocsf/will contain only the 15 generic / overlay entries listed above.What is NOT in this PR (intentional)
paloalto_*→palo_alto/*, fixing the doubled-name singletons likeforcepoint_forcepoint_logs, fixing the typocloudflare_inc_waf_lastest). That is a follow-up rename PR.parsers/community/<source_name>/to match the new vendor/product layout. Each pipeline still binds to its parser via the existingsource_namefield, which is path-independent. A coordinated parser rename is a separate follow-up.Test plan
git diff --stat origin/main -Mshows 91 directories renamed (354 files viaRrename markers), zero content changespipelines/push/syslog/,pipelines/pull/api/,pipelines/pull/object_store/are now populated with vendor subdirectoriespipelines/community/transform_ocsf/retains only the 15 generic / overlay entries listed above (after pipelines: drop 7 broken-legacy transform_ocsf entries #62 and pipelines: drop 16 transform_ocsf entries with first-party ingestion paths #63 also merge)metadata.yaml/serializer.lua/ pipeline JSON content as before:pipelines/push/syslog/palo_alto/panos/(wastransform_ocsf/paloalto_logs/)pipelines/push/syslog/cisco/duo/removed in pipelines: drop 16 transform_ocsf entries with first-party ingestion paths #63 (verify absent)pipelines/pull/api/cloudflare/waf/(wastransform_ocsf/cloudflare_waf_logs/)pipelines/pull/object_store/aws/vpc_flow/(wastransform_ocsf/aws_vpc_flow/)pipelines/pull/api/microsoft/m365_mgmt_api/(wastransform_ocsf/microsoft_365_mgmt_api_logs/)parsers/community/<source_name>/are unchanged; thesource_namefield in each moved pipeline JSON still resolves correctlymetadata.yamlpurposefield (e.g., the PAN-OSpurposecross-references documented in pipelines: drop F-graded PAN-OS firewall transform; document PAN-OS variants #60)