pipelines: reorganize around ingest mode#59
Merged
nate-smalls-s1 merged 5 commits intoSentinel-One:mainfrom Apr 27, 2026
Merged
pipelines: reorganize around ingest mode#59nate-smalls-s1 merged 5 commits intoSentinel-One:mainfrom
nate-smalls-s1 merged 5 commits intoSentinel-One:mainfrom
Conversation
…ructure Adds scaffolding for ingest-mode-first organization of community pipelines. Each leaf README documents what belongs there and the required metadata.yaml fields (ingest_mode, auth_type). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…f/paloalto_logs) The serializer at pipelines/community/serializers/Palo Alto Networks/serializer.lua covered only TRAFFIC and THREAT log types and produced OCSF Network Activity (class_uid=4001) -- a strict subset of the existing community transform at pipelines/community/transform_ocsf/paloalto_logs/, which is signed off, has 100% required-field coverage, and handles the same OCSF class plus a broader range of log types. Removing the orphan and the now-empty serializers/ umbrella. Out-of-scope follow-ups: - pipelines/community/transform_ocsf/palo_alto_networks_firewall/ is graded F (analyzer_limit, 0% required_field_coverage_pct) -- needs a fix or removal. - paloalto_logs/ vs paloalto_alternate_logs/ may be consolidatable; the latter appears to differ only in accepting variant field names (logtype/log_type/type). - Naming consistency across the paloalto_* cluster. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds pipelines/community/README.md establishing:
- The directory tree (push/{syslog,hec}, pull/{api,object_store},
community/transform_ocsf/)
- Required metadata.yaml fields including ingest_mode and auth_type enums
- Naming conventions (lowercase, underscored, no spaces)
The new schema applies to new pipelines added after this PR; existing
transform_ocsf/ entries will be backfilled in a follow-up.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Updates the Repository layout block to reflect push/{syslog,hec} and
pull/{api,object_store}, replaces the Pipelines Installation Guide with a
shorter Pipelines section pointing at the new structure, and updates the
Metadata requirements appendix with the new ingest_mode and auth_type fields.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
nate-smalls-s1
approved these changes
Apr 27, 2026
This was referenced Apr 27, 2026
nate-smalls-s1
pushed a commit
that referenced
this pull request
Apr 27, 2026
Adds the new metadata fields introduced by #59 to all 129 existing transform_ocsf/ pipeline metadata.yaml files. The fields are inserted immediately after the existing ingestion_method line in each file. No serializer logic, no pipeline JSON, no other metadata changed. Values were derived per entry by combining: 1. Bound parser metadata (parsers/community/<source_name>/metadata.yaml) when the parser declares format=syslog/CEF/RFC/w3c/custom-syslog or ingestion_method containing "Syslog" or "HEC" -- the parser is authoritative when its declaration is unambiguous. 2. Vendor and product knowledge for the ~90 entries where the parser metadata is unclear (gron format with "streaming" or "unknown" ingestion_method, or no parser binding at all). Examples: - Cisco network kit (firewalls, ASA, Meraki, ISE, etc.) -> Syslog - Microsoft 365 / Entra / Defender management surfaces -> API Call (OAuth) - AWS managed services delivering to S3 (CloudTrail, ELB, Route53 Resolver, GuardDuty export, VPC flow) -> Other - {object store with SQS notifications} (IAM Role) - Azure Event Hub-delivered streams (signin, defender email) -> Other - {Azure Event Hub stream (AMQP/Kafka protocol)} (OAuth) - SaaS REST APIs (Okta, Snyk, Wiz, Tenable, Mimecast, Netskope, Proofpoint, GitHub, Google Workspace, Cloudflare, etc.) -> API Call with the vendor's typical auth (Bearer Token, API Key & Secret, or OAuth) Confidence per entry is recorded in .reorg-prep/inventory/transform_ocsf_classifications.tsv as one of high (103), medium (17), or low (9). Low-confidence entries are genuinely generic placeholders (json_generic_logs, sample_test_logs, microservice_tracing_logs, etc.) where a more specific value is not derivable; they use Other - {Explain: ...} with the reason inline. palo_alto_networks_firewall/ is intentionally not modified because it is being removed in PR #60 (open). Resulting distribution: Syslog 56 API Call 39 Other - {object store / Event Hub / agent / etc.} 34 Auth distribution: N/A (syslog / file-based / generic) 75 API Key & Secret 20 OAuth 18 IAM Role 8 Bearer Token 7 Other (Kafka SASL) 1 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced Apr 27, 2026
nate-smalls-s1
pushed a commit
that referenced
this pull request
Apr 27, 2026
Moves 91 community pipeline directories from pipelines/community/transform_ocsf/<name>/ into the ingest-mode-first taxonomy introduced in #59: pipelines/push/syslog/<vendor>/<product>/ 57 entries pipelines/pull/api/<vendor>/<product>/ 29 entries pipelines/pull/object_store/<vendor>/<product>/ 5 entries The mode bucket is determined by each entry's ingest_mode field (backfilled in #61). The vendor and product split is derived per entry from the upstream parser binding and vendor/product convention; collisions across the cluster (Cisco Meraki, Fortinet, Cloudflare, Zscaler, Microsoft, etc.) are disambiguated with explicit product-name overrides documented in .reorg-prep/inventory/transform_ocsf_migration_plan.tsv. History is preserved on every entry (git mv). What stays in pipelines/community/transform_ocsf/ (15 entries): - Generic / template / unknown-vendor entries: agent_metrics_logs, generic_access_logs, inngate_gateway_logs, json_generic_logs, json_nested_kv_logs, leef_template_logs, log4shell_detection_logs, mail_server_logs, microservice_tracing_logs, sample_test_logs, spam_detection_logs, sql_database_logs, syslog_space_delimited_logs, vpc_logs, jruby_application_logs. What is NOT in this PR (intentional): - 23 entries scheduled for removal in #62 (broken-legacy, 7) and #63 (first-party ingestion paths, 16) are NOT moved; they remain in transform_ocsf/ until those PRs merge. This PR has no overlap or conflict with #62/#63 -- merge order does not matter. - No serializer logic, no metadata.yaml content, and no pipeline JSON content was modified. Every change is a directory rename. - No naming-consistency cleanup (e.g., paloalto_* -> palo_alto/*) is applied yet; that is a separate follow-up. The pipelines/push/{syslog,hec}/ and pipelines/pull/{api,object_store}/ directories are now populated -- the empty scaffolding from #59 finally has content. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Reorganizes
pipelines/so contributors immediately see ingest mode (push vs pull),introduces
ingest_modeandauth_typefields in pipelinemetadata.yaml, andremoves an orphan PAN-OS serializer that is functionally subsumed by an existing
community transform.
What changed
New tree shape under
pipelines/:push/syslog/<vendor>/<product>/push/hec/<vendor>/<product>/pull/api/<vendor>/<product>/pull/object_store/<vendor>/<product>/community/transform_ocsf/<vendor>/<product>/(already existed; layout retained)Each new leaf has a
.README.mddocumenting what belongs there and the requiredmetadata.yamlfields.Removed the orphan PAN-OS serializer at
pipelines/community/serializers/Palo Alto Networks/serializer.lua.It is functionally subsumed by
pipelines/community/transform_ocsf/paloalto_logs/,which targets the same OCSF class (Network Activity,
class_uid=4001), is signedoff with 100% required-field coverage, and handles a broader range of log types.
The now-empty
serializers/umbrella is removed alongside it.New metadata fields added to the pipeline
metadata.yamlschema:ingest_modeandauth_type. Documented inpipelines/community/README.mdandthe top-level
README.md. Schema applies to new pipelines added after this PR;existing entries in
transform_ocsf/will be backfilled in a follow-up.What is NOT in this PR (intentional)
transform_ocsf/entries.paloalto_*cluster.transform_ocsf/palo_alto_networks_firewall/(graded F /analyzer_limit / 0% required_field_coverage_pct) — flagged as a follow-up.
Test plan
git log --statshows newpush/andpull/directories created withleaf
.README.mdfiles, removal of the orphan PAN-OS serializer, andremoval of the now-empty
serializers/umbrellapipelines/community/transform_ocsf/paloalto_logs/serializer.luais unchangedand remains the canonical PAN-OS Network Activity transform (still graded
signed_off, 100% required-field coverage)pipelines/community/README.mdrenders cleanly on github.comREADME.mdtransform_ocsf/paloalto_logs/serializer.luaagainst a PAN-OS event sample and confirm OCSF Network Activity output
(
class_uid=4001) matches the prior baseline