pipelines: drop 16 transform_ocsf entries with first-party ingestion paths#63
Merged
nate-smalls-s1 merged 3 commits intoSentinel-One:mainfrom Apr 27, 2026
Conversation
…paths Removes 16 directories from pipelines/community/transform_ocsf/ for vendors whose log streams are typically delivered to AI SIEM via first-party or vendor-native ingestion paths in supported deployments, rather than via community-contributed Observo transforms. Removed: aws_guardduty_logs/ aws_waf/ azure_ad/ azure_platform/ cisco_duo/ darktrace_darktrace_logs/ microsoft_defender_for_cloud/ microsoft_entra_logs/ microsoft_eventhub_azure_signin_logs/ microsoft_eventhub_defender_email_logs/ microsoft_eventhub_defender_emailforcloud_logs/ netskope/ proofpoint/ snyk/ tenable_vulnerability_management_audit_logging/ wiz_cloud_security_logs/ (azure_ad/ is the legacy name for Microsoft Entra ID and is removed alongside microsoft_entra_logs/ to avoid leaving the same product under two paths.) Each removed entry was previously signed_off and functional, so removal is a scope refinement rather than a quality fix. The community pipelines directory is intended for vendors that require contributor-authored parsing and OCSF mapping; entries where users typically rely on a vendor-native or first-party ingestion path are out of scope. Anyone who specifically needs a community transform for one of these vendors can recover it from git history. No serializer logic, no other metadata, and no surviving entries are modified. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…cope) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
nate-smalls-s1
approved these changes
Apr 27, 2026
7 tasks
nate-smalls-s1
approved these changes
Apr 27, 2026
nate-smalls-s1
pushed a commit
that referenced
this pull request
Apr 27, 2026
Moves 91 community pipeline directories from pipelines/community/transform_ocsf/<name>/ into the ingest-mode-first taxonomy introduced in #59: pipelines/push/syslog/<vendor>/<product>/ 57 entries pipelines/pull/api/<vendor>/<product>/ 29 entries pipelines/pull/object_store/<vendor>/<product>/ 5 entries The mode bucket is determined by each entry's ingest_mode field (backfilled in #61). The vendor and product split is derived per entry from the upstream parser binding and vendor/product convention; collisions across the cluster (Cisco Meraki, Fortinet, Cloudflare, Zscaler, Microsoft, etc.) are disambiguated with explicit product-name overrides documented in .reorg-prep/inventory/transform_ocsf_migration_plan.tsv. History is preserved on every entry (git mv). What stays in pipelines/community/transform_ocsf/ (15 entries): - Generic / template / unknown-vendor entries: agent_metrics_logs, generic_access_logs, inngate_gateway_logs, json_generic_logs, json_nested_kv_logs, leef_template_logs, log4shell_detection_logs, mail_server_logs, microservice_tracing_logs, sample_test_logs, spam_detection_logs, sql_database_logs, syslog_space_delimited_logs, vpc_logs, jruby_application_logs. What is NOT in this PR (intentional): - 23 entries scheduled for removal in #62 (broken-legacy, 7) and #63 (first-party ingestion paths, 16) are NOT moved; they remain in transform_ocsf/ until those PRs merge. This PR has no overlap or conflict with #62/#63 -- merge order does not matter. - No serializer logic, no metadata.yaml content, and no pipeline JSON content was modified. Every change is a directory rename. - No naming-consistency cleanup (e.g., paloalto_* -> palo_alto/*) is applied yet; that is a separate follow-up. The pipelines/push/{syslog,hec}/ and pipelines/pull/{api,object_store}/ directories are now populated -- the empty scaffolding from #59 finally has content. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Scope refinement for
pipelines/community/transform_ocsf/. Removes 16 directories whose vendors are typically ingested into AI SIEM via first-party or vendor-native paths in supported deployments rather than via community-contributed Observo transforms. The community pipelines directory is intended for vendors that require contributor-authored parsing and OCSF mapping.Removed entries
aws_guardduty_logs/aws_waf/azure_ad/(legacy name for Microsoft Entra ID; removed alongsidemicrosoft_entra_logs/to avoid leaving the same product under two paths)azure_platform/cisco_duo/darktrace_darktrace_logs/microsoft_defender_for_cloud/microsoft_entra_logs/microsoft_eventhub_azure_signin_logs/microsoft_eventhub_defender_email_logs/microsoft_eventhub_defender_emailforcloud_logs/netskope/proofpoint/snyk/tenable_vulnerability_management_audit_logging/wiz_cloud_security_logs/Why these specifically
Each removed entry was previously signed_off and functional — this is a scope refinement, not a quality fix. The criterion is "is there a typical first-party / vendor-native ingestion path users rely on for this vendor's data?" rather than any defect in the transforms themselves.
This is distinct from the prior cleanup PRs (#60 and #62) which removed entries that were broken (F-grade,
analyzer_limit, no OCSF class produced). Those entries were dropped on quality grounds; the entries in this PR are dropped on scope grounds.What is NOT in this PR (intentional)
transform_ocsf/entries into thepush/pull/<mode>/<vendor>/<product>/taxonomy).m365_audit_logs/andmicrosoft_365_mgmt_api_logs/are retained — they cover Microsoft 365 audit/management API surfaces that are not first-party ingestion paths in supported deployments.azure_logs/,azure_nsg_flow_logs/,iis_w3c/,microsoft_activedirectory_logs/,windows_event_log_logs/are retained — these are general Azure Monitor, NSG flow exports to storage, on-prem IIS, on-prem Active Directory, and Windows Event Log ingestion paths that customers configure manually.Recovery
Each removed entry remains accessible via
git log --diff-filter=D --name-onlyand can be restored from git history if a deployment specifically requires the community transform.Test plan
git log --statshows exactly 64 file deletions across 16 directories (metadata.yaml+<name>.json+sample.json+serializer.luaper directory)CHANGELOG.mdis modifiedtransform_ocsf/entries continue to render cleanly on github.com