Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 27 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,33 @@ All notable changes to the AI-SIEM repository will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [Unreleased]

### Changed - pipelines/ reorganization

The `pipelines/` directory has been restructured around ingestion mode rather
than contributor provenance. New layout:

- `pipelines/push/syslog/<vendor>/<product>/`
- `pipelines/push/hec/<vendor>/<product>/`
- `pipelines/pull/api/<vendor>/<product>/`
- `pipelines/pull/object_store/<vendor>/<product>/`
- `pipelines/community/transform_ocsf/<vendor>/<product>/`

`metadata.yaml` for pipelines now includes `ingest_mode` and `auth_type` fields.
The new schema applies to new pipelines added after this release; existing
entries in `transform_ocsf/` will be backfilled in a follow-up. See
`pipelines/community/README.md` for the full schema and naming conventions.

### Removed - orphan PAN-OS serializer

`pipelines/community/serializers/Palo Alto Networks/serializer.lua` has been
removed. It is functionally subsumed by
`pipelines/community/transform_ocsf/paloalto_logs/`, which is signed off with
100% required-field coverage and produces the same OCSF class (Network
Activity, `class_uid=4001`) for a broader range of log types. The now-empty
`pipelines/community/serializers/` umbrella has been removed alongside it.

## [1.3.0] - 2025-10-28

### Added
Expand Down
106 changes: 60 additions & 46 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,8 +23,15 @@ ai-siem/ # AI SIEM core structure (260+ components)
├── detections/ # Detection rules (8 detections with metadata)
│ └── community/ # Community-contributed detection rules
├── monitors/ # Python monitoring scripts for Dataset Agent (log_gen, maxmind, powerquery)
├── pipelines/ # Observo Pipeline Templates for data transformation (5 pipelines)
│ └── community/ # AWS S3, Cisco Duo, Netskope, Okta, ProofPoint
├── pipelines/ # Observo pipeline templates
│ ├── push/ # Vendor pushes to us (syslog/CEF/LEEF/KV or direct HEC)
│ │ ├── syslog/<vendor>/<product>/
│ │ └── hec/<vendor>/<product>/
│ ├── pull/ # We fetch from the vendor (REST API or object store)
│ │ ├── api/<vendor>/<product>/
│ │ └── object_store/<vendor>/<product>/
│ └── community/
│ └── transform_ocsf/<vendor>/<product>/ # OCSF normalization overlays
├── parsers/ # Parsing logic and configurations (165 parsers)
│ ├── community/ # 148 community parsers (*.conf + metadata)
│ └── sentinelone/ # 17 official marketplace parsers (*.conf + metadata)
Expand Down Expand Up @@ -113,50 +120,32 @@ The monitors directory contains Python scripts for use with the Dataset Agent:

---

## Pipelines Installation Guide

### Observo Pipeline Integration
The pipelines directory contains pre-configured Observo pipeline templates for ingesting and transforming data from various sources:

#### Available Pipeline Templates
1. **AWS S3 CloudTrail** (`aws_s3_cloudtrail/`)
- Ingests CloudTrail logs from S3 buckets via SQS/SNS
- Transforms to OCSF format with extensive field mapping
- **Required credentials:**
- `auth.assume_role`: `arn:aws:iam::<your_accountid>:role/<role you created>`
- `auth.external_id`: Your external ID for role assumption

2. **Cisco Duo Logs** (`cisco_duo_logs/`)
- Collects authentication, administrator, and telephony logs
- Supports checkpointing for incremental data collection
- **Required credentials:**
- `DUO_API_HOST`: `<your_host>.duosecurity.com`
- `DUO_INTEGRATION_KEY`: Your integration key
- `DUO_SECRET_KEY`: Your secret key

3. **Netskope Alerts** (`netskope_alerts/`)
- Ingests Netskope security alerts
- Transforms to OCSF format

4. **Okta Log Collector** (`okta_log_collector/`)
- Collects Okta identity and access management logs
- Supports incremental log collection

5. **ProofPoint Logs** (`proofpoint_log/`)
- Ingests ProofPoint email security logs
- OCSF transformation included

### Pipeline Installation Steps
1. Import the JSON configuration file into your Observo instance
2. Update authentication credentials with your specific values
3. Configure the SentinelOne AI SIEM destination endpoint
4. Deploy and activate the pipeline

### Configuration Requirements
All pipelines require:
- **SentinelOne HEC Token**: Replace `********` with your actual token
- **Endpoint URL**: Verify the correct region endpoint (default: `https://ingest.us1.sentinelone.net`)
- **Source-specific credentials**: See individual pipeline requirements above
## Pipelines

The `pipelines/` directory holds Observo pipeline templates for SentinelOne
AI SIEM, organized by ingest mode:

- `pipelines/push/{syslog,hec}/<vendor>/<product>/` — vendor pushes events to us
- `pipelines/pull/{api,object_store}/<vendor>/<product>/` — we fetch from the vendor
- `pipelines/community/transform_ocsf/<vendor>/<product>/` — OCSF normalization
overlays that run on top of upstream-ingested data

The full directory taxonomy, required `metadata.yaml` fields, and naming
conventions are documented in [`pipelines/community/README.md`](pipelines/community/README.md).

### Installing a community pipeline

1. Navigate to the relevant `pipelines/{push,pull}/<mode>/<vendor>/<product>/`
or `pipelines/community/transform_ocsf/<vendor>/<product>/` directory.
2. Import the JSON template into your Observo instance, or apply the Lua
serializer to the appropriate transform stage.
3. Update authentication credentials per the `metadata.yaml` `dependencies`
block.
4. Configure the SentinelOne AI SIEM HEC destination:
- **HEC token** — replace the placeholder in the import.
- **Endpoint URL** — verify regional endpoint
(default `https://ingest.us1.sentinelone.net`).
5. Deploy and activate.

---

Expand Down Expand Up @@ -242,4 +231,29 @@ metadata_details:
expected_behavior: "Describe the action or alert that should result"
tags: "Optional tagging"
version: "v1.0"

# Pipelines
# File: metadata.yaml
# Schema applies to new pipelines; existing entries will be backfilled in a follow-up.
# Top-level `grade:` block is produced by the automated grader — do not hand-author.
metadata_details:
vendor: "<canonical_vendor_key>" # lowercase, underscored
product: "<canonical_product_key>" # lowercase, underscored
ingest_mode: "HEC | Syslog | API Call | Other - {Explain, e.g. websocket, object store}"
auth_type: "N/A | HEC Token | OAuth | API Key & Secret | Bearer Token | Basic | mTLS | IAM Role | Other - {Explain}"
syslog_format: "CEF | LEEF | RFC5424 | RFC3164 | Vendor KV" # optional, push/syslog/ only
purpose: "What the pipeline ingests/transforms and into which OCSF classes"
source_template: "Source template name as it appears in the pipeline manager"
source_vendor: "Vendor display name"
destination_template: "SentinelOne AI SIEM"
destination_type: "SPLUNK_HEC_LOGS"
transform_templates: "Description of OCSF / Lua serializer logic"
input_schema: "Expected input record fields"
output_schema: "Resulting OCSF event shape"
scheduling: "Polling interval / event-driven / N/A"
retry_behavior: "Backoff and failure handling"
dependencies: "Auth credentials, IAM, queues, etc."
performance_impact: "Throughput and tuning notes"
tags: "Optional tagging"
version: "v1.0"
```
118 changes: 118 additions & 0 deletions pipelines/community/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
# pipelines/community/

Community-contributed Observo pipeline templates for SentinelOne AI SIEM.

This directory holds parser/transform pipelines that bridge a vendor's log
format to OCSF and the AI SIEM HEC endpoint.

---

## Layout

```
pipelines/
├── push/ # vendor pushes events to us
│ ├── syslog/<vendor>/<product>/ # vendor-specific syslog/CEF/LEEF/KV
│ └── hec/<vendor>/<product>/ # vendors that POST direct to HEC
├── pull/ # we fetch events from the vendor
│ ├── api/<vendor>/<product>/ # REST/HTTP API polling
│ └── object_store/<vendor>/<product>/ # S3 / GCS / Azure Blob
├── community/
│ └── transform_ocsf/<vendor>/<product>/ # OCSF normalization overlays
```

Each leaf (`<product>/`) contains a `metadata.yaml` and (for ingestion
templates) one Observo pipeline export JSON, or (for `transform_ocsf/`
overlays) the serializer Lua plus metadata.

---

## What this directory accepts

1. **Ingestion templates** — pipelines that get a vendor's events into the
AI SIEM. Belongs under `push/` or `pull/` based on which side initiates
the connection.

2. **OCSF transform overlays** — Lua serializers that normalize already-
ingested data into OCSF. Belongs under `community/transform_ocsf/`.

3. **Vendor-specific HEC shaping** — pipelines for vendors POSTing to HEC
that need vendor-specific batch/retry/field-handling logic. Belongs
under `push/hec/`.

---

## `metadata.yaml` schema

> **New fields (`ingest_mode`, `auth_type`) apply to new pipelines added
> after this PR.** Existing entries in `transform_ocsf/` will be backfilled
> in a follow-up sweep — they should not be considered out of compliance
> until then.

In addition to the existing top-level `grade:` block (produced by the
automated grader; do not author by hand), each pipeline declares:

```yaml
metadata_details:
vendor: "<canonical_vendor_key>" # lowercase, underscored
product: "<canonical_product_key>" # lowercase, underscored

ingest_mode: "..." # see enum below
auth_type: "..." # see enum below

# Optional, only when relevant
syslog_format: "CEF | LEEF | RFC5424 | RFC3164 | Vendor KV"

# Plus the standard pipeline narrative fields
purpose: ...
source_template: ...
source_vendor: ...
destination_template: "SentinelOne AI SIEM"
destination_type: "SPLUNK_HEC_LOGS"
transform_templates: ...
input_schema: ...
output_schema: ...
scheduling: ...
retry_behavior: ...
dependencies: ...
performance_impact: ...
tags: [...]
version: "v1.0"
```

### `ingest_mode` enum

The directory the pipeline lives in encodes push-vs-pull; `ingest_mode`
records the protocol/mechanism.

| Value | Meaning |
|--------------------------------|------------------------------------------------------|
| `HEC` | HTTP Event Collector |
| `Syslog` | Vendor syslog (RFC5424/3164, CEF, LEEF, vendor KV) |
| `API Call` | REST/HTTP API |
| `Other - {Explain: ...}` | Anything else — e.g. websocket, object store (S3/GCS/Azure Blob), gRPC. Spell out the mechanism in the braces. |

### `auth_type` enum

| Value | Meaning |
|--------------------------------|------------------------------------------------------|
| `N/A` | No auth on the wire (raw syslog over UDP, etc.) |
| `HEC Token` | Splunk-style HEC bearer |
| `OAuth` | OAuth 2.0 client credentials / authorization code |
| `API Key & Secret` | Two-part credential (key + shared secret) |
| `Bearer Token` | Static bearer token (non-HEC) |
| `Basic` | HTTP Basic auth |
| `mTLS` | Mutual TLS (client cert) |
| `IAM Role` | AWS-style assume-role (typical for object stores) |
| `Other - {Explain: ...}` | Anything else — spell out the mechanism in braces |

---

## Naming conventions

- Vendor and product directories: lowercase, underscored, no spaces
(`palo_alto/panos/`, never `Palo Alto Networks/PANOS/`)
- File names: snake_case
- One vendor's pipelines may live under multiple subtrees (e.g.,
`push/syslog/palo_alto/panos/` for firewall syslog and
`pull/api/palo_alto/cortex_xdr/` for the Cortex XDR API)
Loading
Loading