Skip to content

pipelines: drop 7 broken-legacy transform_ocsf entries#62

Merged
nate-smalls-s1 merged 2 commits intoSentinel-One:mainfrom
natesmalley:transform-ocsf-dedupe
Apr 27, 2026
Merged

pipelines: drop 7 broken-legacy transform_ocsf entries#62
nate-smalls-s1 merged 2 commits intoSentinel-One:mainfrom
natesmalley:transform-ocsf-dedupe

Conversation

@natesmalley
Copy link
Copy Markdown
Contributor

Summary

Follow-up to #59, #60, and #61 (all merged). Drops 7 broken-legacy entries from pipelines/community/transform_ocsf/ that share the exact fingerprint already established by palo_alto_networks_firewall/ in #60: sub-passing grade, grader couldn't validate, no OCSF class produced, no matching upstream parser. Each has at least one working alternative covering the same vendor cluster.

Removed entries

Directory Grade Verdict class_uid Required-field coverage Working alternative
aws_cloudtrail/ D/60 analyzer_limit null 0% bound-parser AWS entries
aws_guardduty/ F/45 analyzer_limit null 0% aws_guardduty_logs/ (B/85)
darktrace/ D/60 analyzer_limit null 0% darktrace_darktrace_logs/ (B/85)
gcp_audit_logs/ F/25 analyzer_limit null 0% bound-parser GCP entries
microsoft_365/ D/60 analyzer_limit null 0% microsoft_365_mgmt_api_logs/ (B/82)
okta/ F/45 analyzer_limit null 0% okta_logs/ (B/85), okta_ocsf_logs/ (B/85)
wiz_issue/ D/60 analyzer_limit null 0% wiz_cloud_security_logs/ (B/85)

Why these specifically

All seven entries share the same fingerprint as palo_alto_networks_firewall/ removed in #60:

  • Sub-passing grade (D or F).
  • verdict: analyzer_limit — the automated grader could not validate the serializer's OCSF output.
  • class_uid: null — no valid OCSF class is produced.
  • required_field_coverage_pct: 0.
  • source_name lacks the -latest versioning suffix used by every working entry in the directory.
  • No matching upstream parser in parsers/community/.
  • Long-form Python-port style code (632–1720 lines), author: Community (imported from Observo platform UI), rather than the contributor-import style used by the rest of the directory.

What is NOT in this PR (intentional)

  • Multi-entry vendor clusters with all-signed_off members are kept — Cisco/Fortinet/Cloudflare/Zscaler/Akamai/etc. each have multiple entries that bind to genuinely-distinct upstream parsers or cover different OCSF classes. They will be documented per-entry in the upcoming migration PR (PR Add pipelines section to README #5), following the pattern established for PAN-OS in pipelines: drop F-graded PAN-OS firewall transform; document PAN-OS variants #60.
  • Doubled-name singletons (forcepoint_forcepoint_logs, incapsula_incapsula_logs, mimecast_mimecast_logs, singularityidentity_singularityidentity_logs, tailscale_tailscale_logs) are all signed_off with B/82–85 grade and 100% required-field coverage. They are just badly named — fix in the rename PR (PR Correction Fortigagte typo to Fortigate, dir and metadata #6).
  • m365_audit_logs/ and crowdstrike_detections/ are intentionally retained. Both are signed_off with valid class_uid and ≥75% required-field coverage; while they overlap functionally with cleaner alternatives (microsoft_365_mgmt_api_logs/, crowdstrike_logs/), they are not broken and removal would be a regression for any user currently importing them.
  • No serializer logic, no other metadata fields, and no pipeline JSONs in surviving entries are modified.

Test plan

  • CI passes (CodeQL, secret scanning, contributor automation)
  • git log --stat shows exactly 28 file deletions across 7 directories (metadata.yaml + <name>.json + sample.json + serializer.lua per directory)
  • No content outside pipelines/community/transform_ocsf/{aws_cloudtrail,aws_guardduty,darktrace,gcp_audit_logs,microsoft_365,okta,wiz_issue}/ and CHANGELOG.md was modified
  • No other repo content references the removed paths (search for the directory names across the tree)
  • The working alternatives listed above remain unchanged and continue to render cleanly on github.com

Nate Smalley and others added 2 commits April 26, 2026 21:41
Removes the following directories from pipelines/community/transform_ocsf/:

  aws_cloudtrail/
  aws_guardduty/
  darktrace/
  gcp_audit_logs/
  microsoft_365/
  okta/
  wiz_issue/

Each entry shares the same broken-legacy fingerprint (matching
palo_alto_networks_firewall/ from Sentinel-One#60):

- Sub-passing grade (D or F).
- verdict: analyzer_limit (the automated grader could not validate the
  serializer's OCSF output).
- class_uid: null (no valid OCSF class is produced).
- required_field_coverage_pct: 0.
- source_name lacks the -latest versioning suffix used by every working
  entry in the directory.
- No matching upstream parser in parsers/community/.
- Long-form Python-port style code (632 to 1720 lines), imported from the
  Observo platform UI rather than via the standard contributor path.

Each removed entry has at least one working alternative covering the same
vendor cluster:

  aws_cloudtrail/    -> aws_*/transform_ocsf/ entries that bind to
                       parsers/community/<name>-latest/ (signed_off, B+ grade)
  aws_guardduty/     -> aws_guardduty_logs/ (B/85, signed_off, class_uid=2004)
  darktrace/         -> darktrace_darktrace_logs/ (B/85, signed_off,
                       class_uid=2004)
  gcp_audit_logs/    -> use the bound-parser alternatives in the same vendor
                       cluster
  microsoft_365/     -> microsoft_365_mgmt_api_logs/ (B/82, signed_off,
                       class_uid=6003)
  okta/              -> okta_logs/ (B/85, signed_off, class_uid=3002) and
                       okta_ocsf_logs/ (B/85, signed_off, class_uid=3002)
  wiz_issue/         -> wiz_cloud_security_logs/ (B/85, signed_off,
                       class_uid=2004)

No serializer logic, no other metadata, no pipeline JSON in the surviving
entries was modified.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@nate-smalls-s1 nate-smalls-s1 merged commit fea8d5b into Sentinel-One:main Apr 27, 2026
2 checks passed
nate-smalls-s1 pushed a commit that referenced this pull request Apr 27, 2026
Moves 91 community pipeline directories from
pipelines/community/transform_ocsf/<name>/ into the ingest-mode-first
taxonomy introduced in #59:

  pipelines/push/syslog/<vendor>/<product>/      57 entries
  pipelines/pull/api/<vendor>/<product>/         29 entries
  pipelines/pull/object_store/<vendor>/<product>/  5 entries

The mode bucket is determined by each entry's ingest_mode field (backfilled
in #61). The vendor and product split is derived per entry from the
upstream parser binding and vendor/product convention; collisions across
the cluster (Cisco Meraki, Fortinet, Cloudflare, Zscaler, Microsoft, etc.)
are disambiguated with explicit product-name overrides documented in
.reorg-prep/inventory/transform_ocsf_migration_plan.tsv.

History is preserved on every entry (git mv).

What stays in pipelines/community/transform_ocsf/ (15 entries):
  - Generic / template / unknown-vendor entries: agent_metrics_logs,
    generic_access_logs, inngate_gateway_logs, json_generic_logs,
    json_nested_kv_logs, leef_template_logs, log4shell_detection_logs,
    mail_server_logs, microservice_tracing_logs, sample_test_logs,
    spam_detection_logs, sql_database_logs, syslog_space_delimited_logs,
    vpc_logs, jruby_application_logs.

What is NOT in this PR (intentional):
  - 23 entries scheduled for removal in #62 (broken-legacy, 7) and #63
    (first-party ingestion paths, 16) are NOT moved; they remain in
    transform_ocsf/ until those PRs merge. This PR has no overlap or
    conflict with #62/#63 -- merge order does not matter.
  - No serializer logic, no metadata.yaml content, and no pipeline JSON
    content was modified. Every change is a directory rename.
  - No naming-consistency cleanup (e.g., paloalto_* -> palo_alto/*) is
    applied yet; that is a separate follow-up.

The pipelines/push/{syslog,hec}/ and pipelines/pull/{api,object_store}/
directories are now populated -- the empty scaffolding from #59 finally
has content.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants