pipelines: drop F-graded PAN-OS firewall transform; document PAN-OS variants#60
Merged
nate-smalls-s1 merged 3 commits intoSentinel-One:mainfrom Apr 27, 2026
Merged
Conversation
Removes pipelines/community/transform_ocsf/palo_alto_networks_firewall/. Reasons: - Graded F (analyzer_limit, 0% required_field_coverage_pct). - Uses non-standard class_uid=99602001 (SentinelOne Security Alert Extended), distinct from the rest of the PAN-OS cluster which uses class_uid=4001 (Network Activity). - No matching upstream parser in parsers/community/. Its source_name (palo_alto_networks_firewall) lacks the -latest versioning suffix used by every other PAN-OS entry, indicating it does not bind to a tracked parser. - Marked as imported "from Observo platform UI" in metadata, rather than via the standard contributor import path used by the rest of the PAN-OS entries. The three remaining PAN-OS transforms (paloalto_logs/, paloalto_alternate_logs/, paloalto_vpn_logs/) each bind cleanly to a corresponding parser in parsers/community/ and are unaffected. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ding Each PAN-OS OCSF transform in pipelines/community/transform_ocsf/ binds to a specific parser in parsers/community/ via the source_name field in the pipeline JSON. Until now the metadata.yaml purpose fields described all three variants identically, leaving users no way to choose between them. This commit rewrites each purpose field to declare: - The parser directory it is bound to (parsers/community/<name>-latest/) - The field-name convention it expects from that parser's output - Its activity_id derivation strategy - Cross-references to sibling variants Specifically: - paloalto_logs/: bound to paloalto_logs-latest. Expects the standard PAN-OS field-name convention (sourceip / source_port / type). activity_id is action-derived. - paloalto_alternate_logs/: bound to paloalto_alternate_logs-latest. Expects an alternate field-name convention (srcip / srcport / logtype, plus verdict, threat_severity, threat_name, subtype). activity_id is logType-aware (TRAFFIC -> traffic activity, THREAT -> Malicious Activity 6, URL -> URL Activity 5). Also normalizes numeric protocol identifiers and accepts numeric severity strings. - paloalto_vpn_logs/: bound to paloalto_vpn_logs-latest. PAN-OS GlobalProtect VPN traffic logs. No serializer logic changes; this is documentation only. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…binding Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
5 tasks
nate-smalls-s1
approved these changes
Apr 27, 2026
nate-smalls-s1
pushed a commit
that referenced
this pull request
Apr 27, 2026
Adds the new metadata fields introduced by #59 to all 129 existing transform_ocsf/ pipeline metadata.yaml files. The fields are inserted immediately after the existing ingestion_method line in each file. No serializer logic, no pipeline JSON, no other metadata changed. Values were derived per entry by combining: 1. Bound parser metadata (parsers/community/<source_name>/metadata.yaml) when the parser declares format=syslog/CEF/RFC/w3c/custom-syslog or ingestion_method containing "Syslog" or "HEC" -- the parser is authoritative when its declaration is unambiguous. 2. Vendor and product knowledge for the ~90 entries where the parser metadata is unclear (gron format with "streaming" or "unknown" ingestion_method, or no parser binding at all). Examples: - Cisco network kit (firewalls, ASA, Meraki, ISE, etc.) -> Syslog - Microsoft 365 / Entra / Defender management surfaces -> API Call (OAuth) - AWS managed services delivering to S3 (CloudTrail, ELB, Route53 Resolver, GuardDuty export, VPC flow) -> Other - {object store with SQS notifications} (IAM Role) - Azure Event Hub-delivered streams (signin, defender email) -> Other - {Azure Event Hub stream (AMQP/Kafka protocol)} (OAuth) - SaaS REST APIs (Okta, Snyk, Wiz, Tenable, Mimecast, Netskope, Proofpoint, GitHub, Google Workspace, Cloudflare, etc.) -> API Call with the vendor's typical auth (Bearer Token, API Key & Secret, or OAuth) Confidence per entry is recorded in .reorg-prep/inventory/transform_ocsf_classifications.tsv as one of high (103), medium (17), or low (9). Low-confidence entries are genuinely generic placeholders (json_generic_logs, sample_test_logs, microservice_tracing_logs, etc.) where a more specific value is not derivable; they use Other - {Explain: ...} with the reason inline. palo_alto_networks_firewall/ is intentionally not modified because it is being removed in PR #60 (open). Resulting distribution: Syslog 56 API Call 39 Other - {object store / Event Hub / agent / etc.} 34 Auth distribution: N/A (syslog / file-based / generic) 75 API Key & Secret 20 OAuth 18 IAM Role 8 Bearer Token 7 Other (Kafka SASL) 1 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced Apr 27, 2026
nate-smalls-s1
pushed a commit
that referenced
this pull request
Apr 27, 2026
Removes the following directories from pipelines/community/transform_ocsf/: aws_cloudtrail/ aws_guardduty/ darktrace/ gcp_audit_logs/ microsoft_365/ okta/ wiz_issue/ Each entry shares the same broken-legacy fingerprint (matching palo_alto_networks_firewall/ from #60): - Sub-passing grade (D or F). - verdict: analyzer_limit (the automated grader could not validate the serializer's OCSF output). - class_uid: null (no valid OCSF class is produced). - required_field_coverage_pct: 0. - source_name lacks the -latest versioning suffix used by every working entry in the directory. - No matching upstream parser in parsers/community/. - Long-form Python-port style code (632 to 1720 lines), imported from the Observo platform UI rather than via the standard contributor path. Each removed entry has at least one working alternative covering the same vendor cluster: aws_cloudtrail/ -> aws_*/transform_ocsf/ entries that bind to parsers/community/<name>-latest/ (signed_off, B+ grade) aws_guardduty/ -> aws_guardduty_logs/ (B/85, signed_off, class_uid=2004) darktrace/ -> darktrace_darktrace_logs/ (B/85, signed_off, class_uid=2004) gcp_audit_logs/ -> use the bound-parser alternatives in the same vendor cluster microsoft_365/ -> microsoft_365_mgmt_api_logs/ (B/82, signed_off, class_uid=6003) okta/ -> okta_logs/ (B/85, signed_off, class_uid=3002) and okta_ocsf_logs/ (B/85, signed_off, class_uid=3002) wiz_issue/ -> wiz_cloud_security_logs/ (B/85, signed_off, class_uid=2004) No serializer logic, no other metadata, no pipeline JSON in the surviving entries was modified. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
7 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Follow-up to #59 (merged). Drops one broken PAN-OS OCSF transform and documents the upstream-parser binding for the three remaining PAN-OS transforms so users can choose between them without reading the Lua.
What changed
Removed
pipelines/community/transform_ocsf/palo_alto_networks_firewall/:analyzer_limit, 0% required-field coverage).class_uid=99602001(SentinelOne Security Alert Extended), distinct from the rest of the PAN-OS cluster which usesclass_uid=4001(Network Activity).parsers/community/. Itssource_name(palo_alto_networks_firewall) lacks the-latestversioning suffix used by every other PAN-OS entry, indicating it does not bind to a tracked parser.Documented the upstream-parser binding in each remaining PAN-OS transform's
metadata.yamlpurposefield:paloalto_logs/— bound toparsers/community/paloalto_logs-latest/. Standard PAN-OS field-name convention (sourceip/source_port/type); action-derivedactivity_id.paloalto_alternate_logs/— bound toparsers/community/paloalto_alternate_logs-latest/. Alternate field-name convention (srcip/srcport/logtype, plusverdict/threat_severity/threat_name/subtype); logType-awareactivity_id(THREAT→ Malicious Activity 6,URL→ URL Activity 5); normalizes numeric protocol identifiers; accepts numeric severity strings.paloalto_vpn_logs/— bound toparsers/community/paloalto_vpn_logs-latest/. PAN-OS GlobalProtect VPN traffic.CHANGELOG entries appended to the
[Unreleased]section introduced in pipelines: reorganize around ingest mode #59.What is NOT in this PR (intentional)
paloalto_*cluster keeps its current naming because each entry is hard-bound to a parser inparsers/community/with the same name. A naming-consistency rename should coordinate moves across bothparsers/community/andpipelines/community/transform_ocsf/and is deferred to a follow-up.ingest_mode/auth_typefields onto existingtransform_ocsf/entries — that is a separate follow-up PR.Test plan
git log --statshows one directory deletion (4 files, 841 lines) plus three single-blockmetadata.yamlpurpose-field editspaloalto_*/metadata.yamlfiles render cleanly on github.compalo_alto_networks_firewall/path