pipelines: drop 7 broken-legacy transform_ocsf entries#62
Merged
nate-smalls-s1 merged 2 commits intoSentinel-One:mainfrom Apr 27, 2026
Merged
pipelines: drop 7 broken-legacy transform_ocsf entries#62nate-smalls-s1 merged 2 commits intoSentinel-One:mainfrom
nate-smalls-s1 merged 2 commits intoSentinel-One:mainfrom
Conversation
Removes the following directories from pipelines/community/transform_ocsf/: aws_cloudtrail/ aws_guardduty/ darktrace/ gcp_audit_logs/ microsoft_365/ okta/ wiz_issue/ Each entry shares the same broken-legacy fingerprint (matching palo_alto_networks_firewall/ from Sentinel-One#60): - Sub-passing grade (D or F). - verdict: analyzer_limit (the automated grader could not validate the serializer's OCSF output). - class_uid: null (no valid OCSF class is produced). - required_field_coverage_pct: 0. - source_name lacks the -latest versioning suffix used by every working entry in the directory. - No matching upstream parser in parsers/community/. - Long-form Python-port style code (632 to 1720 lines), imported from the Observo platform UI rather than via the standard contributor path. Each removed entry has at least one working alternative covering the same vendor cluster: aws_cloudtrail/ -> aws_*/transform_ocsf/ entries that bind to parsers/community/<name>-latest/ (signed_off, B+ grade) aws_guardduty/ -> aws_guardduty_logs/ (B/85, signed_off, class_uid=2004) darktrace/ -> darktrace_darktrace_logs/ (B/85, signed_off, class_uid=2004) gcp_audit_logs/ -> use the bound-parser alternatives in the same vendor cluster microsoft_365/ -> microsoft_365_mgmt_api_logs/ (B/82, signed_off, class_uid=6003) okta/ -> okta_logs/ (B/85, signed_off, class_uid=3002) and okta_ocsf_logs/ (B/85, signed_off, class_uid=3002) wiz_issue/ -> wiz_cloud_security_logs/ (B/85, signed_off, class_uid=2004) No serializer logic, no other metadata, no pipeline JSON in the surviving entries was modified. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
5 tasks
nate-smalls-s1
approved these changes
Apr 27, 2026
7 tasks
nate-smalls-s1
pushed a commit
that referenced
this pull request
Apr 27, 2026
Moves 91 community pipeline directories from pipelines/community/transform_ocsf/<name>/ into the ingest-mode-first taxonomy introduced in #59: pipelines/push/syslog/<vendor>/<product>/ 57 entries pipelines/pull/api/<vendor>/<product>/ 29 entries pipelines/pull/object_store/<vendor>/<product>/ 5 entries The mode bucket is determined by each entry's ingest_mode field (backfilled in #61). The vendor and product split is derived per entry from the upstream parser binding and vendor/product convention; collisions across the cluster (Cisco Meraki, Fortinet, Cloudflare, Zscaler, Microsoft, etc.) are disambiguated with explicit product-name overrides documented in .reorg-prep/inventory/transform_ocsf_migration_plan.tsv. History is preserved on every entry (git mv). What stays in pipelines/community/transform_ocsf/ (15 entries): - Generic / template / unknown-vendor entries: agent_metrics_logs, generic_access_logs, inngate_gateway_logs, json_generic_logs, json_nested_kv_logs, leef_template_logs, log4shell_detection_logs, mail_server_logs, microservice_tracing_logs, sample_test_logs, spam_detection_logs, sql_database_logs, syslog_space_delimited_logs, vpc_logs, jruby_application_logs. What is NOT in this PR (intentional): - 23 entries scheduled for removal in #62 (broken-legacy, 7) and #63 (first-party ingestion paths, 16) are NOT moved; they remain in transform_ocsf/ until those PRs merge. This PR has no overlap or conflict with #62/#63 -- merge order does not matter. - No serializer logic, no metadata.yaml content, and no pipeline JSON content was modified. Every change is a directory rename. - No naming-consistency cleanup (e.g., paloalto_* -> palo_alto/*) is applied yet; that is a separate follow-up. The pipelines/push/{syslog,hec}/ and pipelines/pull/{api,object_store}/ directories are now populated -- the empty scaffolding from #59 finally has content. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Follow-up to #59, #60, and #61 (all merged). Drops 7 broken-legacy entries from
pipelines/community/transform_ocsf/that share the exact fingerprint already established bypalo_alto_networks_firewall/in #60: sub-passing grade, grader couldn't validate, no OCSF class produced, no matching upstream parser. Each has at least one working alternative covering the same vendor cluster.Removed entries
class_uidaws_cloudtrail/aws_guardduty/aws_guardduty_logs/(B/85)darktrace/darktrace_darktrace_logs/(B/85)gcp_audit_logs/microsoft_365/microsoft_365_mgmt_api_logs/(B/82)okta/okta_logs/(B/85),okta_ocsf_logs/(B/85)wiz_issue/wiz_cloud_security_logs/(B/85)Why these specifically
All seven entries share the same fingerprint as
palo_alto_networks_firewall/removed in #60:verdict: analyzer_limit— the automated grader could not validate the serializer's OCSF output.class_uid: null— no valid OCSF class is produced.required_field_coverage_pct: 0.source_namelacks the-latestversioning suffix used by every working entry in the directory.parsers/community/.author: Community (imported from Observo platform UI), rather than the contributor-import style used by the rest of the directory.What is NOT in this PR (intentional)
forcepoint_forcepoint_logs,incapsula_incapsula_logs,mimecast_mimecast_logs,singularityidentity_singularityidentity_logs,tailscale_tailscale_logs) are all signed_off with B/82–85 grade and 100% required-field coverage. They are just badly named — fix in the rename PR (PR Correction Fortigagte typo to Fortigate, dir and metadata #6).m365_audit_logs/andcrowdstrike_detections/are intentionally retained. Both are signed_off with validclass_uidand ≥75% required-field coverage; while they overlap functionally with cleaner alternatives (microsoft_365_mgmt_api_logs/,crowdstrike_logs/), they are not broken and removal would be a regression for any user currently importing them.Test plan
git log --statshows exactly 28 file deletions across 7 directories (metadata.yaml+<name>.json+sample.json+serializer.luaper directory)pipelines/community/transform_ocsf/{aws_cloudtrail,aws_guardduty,darktrace,gcp_audit_logs,microsoft_365,okta,wiz_issue}/andCHANGELOG.mdwas modified