Skip to content

pipelines: drop F-graded PAN-OS firewall transform; document PAN-OS variants#60

Merged
nate-smalls-s1 merged 3 commits intoSentinel-One:mainfrom
natesmalley:pan-os-cleanup
Apr 27, 2026
Merged

pipelines: drop F-graded PAN-OS firewall transform; document PAN-OS variants#60
nate-smalls-s1 merged 3 commits intoSentinel-One:mainfrom
natesmalley:pan-os-cleanup

Conversation

@natesmalley
Copy link
Copy Markdown
Contributor

@natesmalley natesmalley commented Apr 27, 2026

Summary

Follow-up to #59 (merged). Drops one broken PAN-OS OCSF transform and documents the upstream-parser binding for the three remaining PAN-OS transforms so users can choose between them without reading the Lua.

What changed

  • Removed pipelines/community/transform_ocsf/palo_alto_networks_firewall/:

    • Graded F (analyzer_limit, 0% required-field coverage).
    • Used non-standard class_uid=99602001 (SentinelOne Security Alert Extended), distinct from the rest of the PAN-OS cluster which uses class_uid=4001 (Network Activity).
    • No matching upstream parser in parsers/community/. Its source_name (palo_alto_networks_firewall) lacks the -latest versioning suffix used by every other PAN-OS entry, indicating it does not bind to a tracked parser.
    • Marked as imported "from Observo platform UI" in metadata, rather than via the standard import path used by the other PAN-OS entries.
  • Documented the upstream-parser binding in each remaining PAN-OS transform's metadata.yaml purpose field:

    • paloalto_logs/ — bound to parsers/community/paloalto_logs-latest/. Standard PAN-OS field-name convention (sourceip / source_port / type); action-derived activity_id.
    • paloalto_alternate_logs/ — bound to parsers/community/paloalto_alternate_logs-latest/. Alternate field-name convention (srcip / srcport / logtype, plus verdict / threat_severity / threat_name / subtype); logType-aware activity_id (THREAT → Malicious Activity 6, URL → URL Activity 5); normalizes numeric protocol identifiers; accepts numeric severity strings.
    • paloalto_vpn_logs/ — bound to parsers/community/paloalto_vpn_logs-latest/. PAN-OS GlobalProtect VPN traffic.
  • CHANGELOG entries appended to the [Unreleased] section introduced in pipelines: reorganize around ingest mode #59.

What is NOT in this PR (intentional)

  • No directory renames. The paloalto_* cluster keeps its current naming because each entry is hard-bound to a parser in parsers/community/ with the same name. A naming-consistency rename should coordinate moves across both parsers/community/ and pipelines/community/transform_ocsf/ and is deferred to a follow-up.
  • No serializer logic changes. Each transform's behavior is unchanged.
  • No backfill of the new ingest_mode / auth_type fields onto existing transform_ocsf/ entries — that is a separate follow-up PR.

Test plan

  • CI passes (CodeQL, secret scanning, contributor automation)
  • git log --stat shows one directory deletion (4 files, 841 lines) plus three single-block metadata.yaml purpose-field edits
  • The three remaining paloalto_*/metadata.yaml files render cleanly on github.com
  • No other content references the removed palo_alto_networks_firewall/ path

Nate Smalley and others added 3 commits April 26, 2026 20:01
Removes pipelines/community/transform_ocsf/palo_alto_networks_firewall/.

Reasons:
- Graded F (analyzer_limit, 0% required_field_coverage_pct).
- Uses non-standard class_uid=99602001 (SentinelOne Security Alert Extended),
  distinct from the rest of the PAN-OS cluster which uses class_uid=4001
  (Network Activity).
- No matching upstream parser in parsers/community/. Its source_name
  (palo_alto_networks_firewall) lacks the -latest versioning suffix used by
  every other PAN-OS entry, indicating it does not bind to a tracked parser.
- Marked as imported "from Observo platform UI" in metadata, rather than via
  the standard contributor import path used by the rest of the PAN-OS entries.

The three remaining PAN-OS transforms (paloalto_logs/, paloalto_alternate_logs/,
paloalto_vpn_logs/) each bind cleanly to a corresponding parser in
parsers/community/ and are unaffected.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ding

Each PAN-OS OCSF transform in pipelines/community/transform_ocsf/ binds to a
specific parser in parsers/community/ via the source_name field in the
pipeline JSON. Until now the metadata.yaml purpose fields described all three
variants identically, leaving users no way to choose between them.

This commit rewrites each purpose field to declare:
- The parser directory it is bound to (parsers/community/<name>-latest/)
- The field-name convention it expects from that parser's output
- Its activity_id derivation strategy
- Cross-references to sibling variants

Specifically:

- paloalto_logs/: bound to paloalto_logs-latest. Expects the standard PAN-OS
  field-name convention (sourceip / source_port / type). activity_id is
  action-derived.

- paloalto_alternate_logs/: bound to paloalto_alternate_logs-latest. Expects
  an alternate field-name convention (srcip / srcport / logtype, plus verdict,
  threat_severity, threat_name, subtype). activity_id is logType-aware
  (TRAFFIC -> traffic activity, THREAT -> Malicious Activity 6, URL -> URL
  Activity 5). Also normalizes numeric protocol identifiers and accepts
  numeric severity strings.

- paloalto_vpn_logs/: bound to paloalto_vpn_logs-latest. PAN-OS GlobalProtect
  VPN traffic logs.

No serializer logic changes; this is documentation only.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…binding

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@nate-smalls-s1 nate-smalls-s1 merged commit 13a987a into Sentinel-One:main Apr 27, 2026
2 checks passed
nate-smalls-s1 pushed a commit that referenced this pull request Apr 27, 2026
Adds the new metadata fields introduced by #59 to all 129 existing
transform_ocsf/ pipeline metadata.yaml files. The fields are inserted
immediately after the existing ingestion_method line in each file. No
serializer logic, no pipeline JSON, no other metadata changed.

Values were derived per entry by combining:

1. Bound parser metadata (parsers/community/<source_name>/metadata.yaml)
   when the parser declares format=syslog/CEF/RFC/w3c/custom-syslog or
   ingestion_method containing "Syslog" or "HEC" -- the parser is
   authoritative when its declaration is unambiguous.

2. Vendor and product knowledge for the ~90 entries where the parser
   metadata is unclear (gron format with "streaming" or "unknown"
   ingestion_method, or no parser binding at all). Examples:
   - Cisco network kit (firewalls, ASA, Meraki, ISE, etc.) -> Syslog
   - Microsoft 365 / Entra / Defender management surfaces -> API Call (OAuth)
   - AWS managed services delivering to S3 (CloudTrail, ELB, Route53
     Resolver, GuardDuty export, VPC flow) -> Other - {object store with
     SQS notifications} (IAM Role)
   - Azure Event Hub-delivered streams (signin, defender email) ->
     Other - {Azure Event Hub stream (AMQP/Kafka protocol)} (OAuth)
   - SaaS REST APIs (Okta, Snyk, Wiz, Tenable, Mimecast, Netskope,
     Proofpoint, GitHub, Google Workspace, Cloudflare, etc.) -> API Call
     with the vendor's typical auth (Bearer Token, API Key & Secret,
     or OAuth)

Confidence per entry is recorded in
.reorg-prep/inventory/transform_ocsf_classifications.tsv as one of
high (103), medium (17), or low (9). Low-confidence entries are
genuinely generic placeholders (json_generic_logs, sample_test_logs,
microservice_tracing_logs, etc.) where a more specific value is not
derivable; they use Other - {Explain: ...} with the reason inline.

palo_alto_networks_firewall/ is intentionally not modified because it is
being removed in PR #60 (open).

Resulting distribution:
  Syslog                                              56
  API Call                                            39
  Other - {object store / Event Hub / agent / etc.}   34

Auth distribution:
  N/A (syslog / file-based / generic)                 75
  API Key & Secret                                    20
  OAuth                                               18
  IAM Role                                             8
  Bearer Token                                         7
  Other (Kafka SASL)                                   1

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
nate-smalls-s1 pushed a commit that referenced this pull request Apr 27, 2026
Removes the following directories from pipelines/community/transform_ocsf/:

  aws_cloudtrail/
  aws_guardduty/
  darktrace/
  gcp_audit_logs/
  microsoft_365/
  okta/
  wiz_issue/

Each entry shares the same broken-legacy fingerprint (matching
palo_alto_networks_firewall/ from #60):

- Sub-passing grade (D or F).
- verdict: analyzer_limit (the automated grader could not validate the
  serializer's OCSF output).
- class_uid: null (no valid OCSF class is produced).
- required_field_coverage_pct: 0.
- source_name lacks the -latest versioning suffix used by every working
  entry in the directory.
- No matching upstream parser in parsers/community/.
- Long-form Python-port style code (632 to 1720 lines), imported from the
  Observo platform UI rather than via the standard contributor path.

Each removed entry has at least one working alternative covering the same
vendor cluster:

  aws_cloudtrail/    -> aws_*/transform_ocsf/ entries that bind to
                       parsers/community/<name>-latest/ (signed_off, B+ grade)
  aws_guardduty/     -> aws_guardduty_logs/ (B/85, signed_off, class_uid=2004)
  darktrace/         -> darktrace_darktrace_logs/ (B/85, signed_off,
                       class_uid=2004)
  gcp_audit_logs/    -> use the bound-parser alternatives in the same vendor
                       cluster
  microsoft_365/     -> microsoft_365_mgmt_api_logs/ (B/82, signed_off,
                       class_uid=6003)
  okta/              -> okta_logs/ (B/85, signed_off, class_uid=3002) and
                       okta_ocsf_logs/ (B/85, signed_off, class_uid=3002)
  wiz_issue/         -> wiz_cloud_security_logs/ (B/85, signed_off,
                       class_uid=2004)

No serializer logic, no other metadata, no pipeline JSON in the surviving
entries was modified.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants