Skip to content

Commit 9b4bbd9

Browse files
authored
docs: add 15 new documentation files — Sigma/YARA, MCP, detection rules, extensions, editions, integrations
Adds 15 new documentation files across how-to guides and reference docs. All checks passed. See PR for full file list.
1 parent d119c95 commit 9b4bbd9

16 files changed

Lines changed: 4433 additions & 9 deletions

docs/how-to/build-extensions.md

Lines changed: 199 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,199 @@
1+
---
2+
title: "Build an Extension Package"
3+
description: "Create an optional extension package for ZettelForge that provides an alternative backend (TypeDB), an integration (OpenCTI), or an operational feature (multi-tenant auth)."
4+
diataxis_type: "how-to"
5+
audience: "Python developers extending ZettelForge with optional packages"
6+
tags: [extensions, enterprise, packages, development, optional-features]
7+
last_updated: "2026-04-27"
8+
version: "2.6.0"
9+
---
10+
11+
# Build an Extension Package
12+
13+
ZettelForge discovers installed extension packages at startup via `zettelforge.extensions.load_extensions()`. An extension is any Python package that registers itself under the `zettelforge.extensions` namespace or is importable as `zettelforge_enterprise`.
14+
15+
## Prerequisites
16+
17+
- ZettelForge installed (`pip install zettelforge`)
18+
- Python 3.12+
19+
- For enterprise features: separate `zettelforge-enterprise` package (not distributed on PyPI)
20+
21+
## How Extensions Are Loaded
22+
23+
The extension loader in `zettelforge.extensions` follows a two-check discovery:
24+
25+
1. **Try importing `zettelforge_enterprise`** -- if the package is installed, it is loaded as the `"enterprise"` extension.
26+
2. **Legacy env var fallback** -- if no package was found, check `THREATENGRAM_LICENSE_KEY`. If it matches the `TG-XXXX-XXXX-XXXX-XXXX` pattern, a marker is stored so `has_extension("enterprise")` returns `True`.
27+
28+
```python
29+
from zettelforge.extensions import load_extensions, has_extension, get_extension
30+
31+
load_extensions()
32+
print(has_extension("enterprise")) # True or False
33+
```
34+
35+
The loader is idempotent -- subsequent calls return the cached result without re-scanning the environment.
36+
37+
## Steps
38+
39+
### 1. Name your package
40+
41+
Use the `zettelforge_` prefix to keep naming consistent and avoid collisions:
42+
43+
- `zettelforge_enterprise` -- enterprise features (TypeDB, OpenCTI, telemetry)
44+
- `zettelforge_myfeature` -- your custom feature
45+
46+
### 2. Create the package structure
47+
48+
```
49+
zettelforge-myfeature/
50+
pyproject.toml
51+
src/
52+
zettelforge_myfeature/
53+
__init__.py
54+
feature.py
55+
```
56+
57+
The `__init__.py` can be empty -- the extension loader only needs the package to be importable.
58+
59+
### 3. Register as a ZettelForge extension (optional)
60+
61+
If you want your extension to be discoverable beyond the `zettelforge_enterprise` naming convention, register via a plugin entry point in `pyproject.toml`:
62+
63+
```toml
64+
[project.entry-points."zettelforge.extensions"]
65+
myfeature = "zettelforge_myfeature"
66+
```
67+
68+
Then consumers can check for it by name:
69+
70+
```python
71+
from zettelforge.extensions import has_extension
72+
73+
if has_extension("myfeature"):
74+
# activate custom behaviour
75+
```
76+
77+
### 4. Respect the edition API
78+
79+
Use the `zettelforge.edition` module to gate features behind the active edition:
80+
81+
```python
82+
from zettelforge.edition import is_enterprise, EditionError
83+
84+
if not is_enterprise():
85+
raise EditionError("This feature requires ZettelForge Enterprise")
86+
```
87+
88+
Available edition functions:
89+
90+
| Function | Returns | Description |
91+
|:---------|:--------|:------------|
92+
| `is_enterprise()` | `bool` | True if enterprise extensions are loaded |
93+
| `is_community()` | `bool` | True if no enterprise extensions |
94+
| `get_edition()` | `Edition` | `Edition.ENTERPRISE` or `Edition.COMMUNITY` |
95+
| `edition_name()` | `str` | `"ZettelForge + Extensions"` or `"ZettelForge"` |
96+
97+
### 5. Expose extension features
98+
99+
Your extension package should provide the actual feature implementations. The `get_extension()` function lets core code access your extension module:
100+
101+
```python
102+
from zettelforge.extensions import get_extension
103+
104+
enterprise = get_extension("enterprise")
105+
if enterprise is not None:
106+
# Access TypeDB backend, OpenCTI sync, telemetry, etc.
107+
enterprise.register_backends()
108+
```
109+
110+
### 6. Test your extension
111+
112+
Use the `reset_extensions()` function in setup/teardown to clear cached state between tests:
113+
114+
```python
115+
import os
116+
from unittest.mock import patch
117+
from zettelforge.extensions import load_extensions, has_extension, reset_extensions
118+
119+
def test_extension_loaded():
120+
reset_extensions()
121+
# Simulate having the enterprise package
122+
with patch.dict("sys.modules", {"zettelforge_enterprise": __import__("types")}):
123+
load_extensions()
124+
assert has_extension("enterprise") is True
125+
126+
127+
def test_extension_not_loaded():
128+
reset_extensions()
129+
# Simulate missing package
130+
with patch.dict("sys.modules", {"zettelforge_enterprise": None}):
131+
load_extensions()
132+
assert has_extension("enterprise") is False
133+
134+
135+
def test_legacy_env_var_activates():
136+
reset_extensions()
137+
os.environ["THREATENGRAM_LICENSE_KEY"] = "TG-1234-5678-9abc-def0"
138+
with patch.dict("sys.modules", {"zettelforge_enterprise": None}):
139+
load_extensions()
140+
assert has_extension("enterprise") is True
141+
142+
143+
def test_invalid_env_var_does_not_activate():
144+
reset_extensions()
145+
os.environ["THREATENGRAM_LICENSE_KEY"] = "invalid-key"
146+
with patch.dict("sys.modules", {"zettelforge_enterprise": None}):
147+
load_extensions()
148+
assert has_extension("enterprise") is False
149+
150+
151+
def test_get_missing_returns_none():
152+
reset_extensions()
153+
with patch.dict("sys.modules", {"zettelforge_enterprise": None}):
154+
assert get_extension("enterprise") is None
155+
```
156+
157+
### 7. Use the optional-feature pattern for SDK dependencies
158+
159+
If your extension depends on an optional SDK (e.g., `typedb-client`, `pycti`), follow the optional-feature pattern:
160+
161+
```python
162+
class MyFeature:
163+
def __init__(self):
164+
self._sdk = None
165+
self._lock = threading.Lock()
166+
167+
def _ensure_loaded(self):
168+
if self._sdk is not None:
169+
return
170+
with self._lock:
171+
if self._sdk is not None:
172+
return
173+
try:
174+
import typedb # lazy import
175+
except ImportError as exc:
176+
raise ImportError(
177+
"TypeDB feature requires typedb-client. "
178+
"Install with: pip install zettelforge-enterprise"
179+
) from exc
180+
self._sdk = typedb
181+
```
182+
183+
This ensures core ZettelForge never depends on your SDK, and the error surfaces only at the point of use.
184+
185+
## LLM Quick Reference
186+
187+
**Task**: Create a ZettelForge extension package.
188+
189+
**Key functions**: `load_extensions()` (idempotent discovery), `has_extension(name)` (boolean check), `get_extension(name)` (module or None), `reset_extensions()` (test cleanup).
190+
191+
**Edition module**: `is_enterprise()`, `is_community()`, `get_edition()`, `edition_name()` let core code gate features behind edition.
192+
193+
**Activation paths**: Package import (`zettelforge_enterprise`) takes priority. Legacy env var (`THREATENGRAM_LICENSE_KEY=TG-XXXX-XXXX-XXXX-XXXX`) is the fallback for backward compatibility.
194+
195+
**Test pattern**: `reset_extensions()` in setup, `patch.dict("sys.modules", ...)` to control whether the package exists, `patch.dict(os.environ, ...)` for env var tests.
196+
197+
**Optional SDK pattern**: Lazy-import the SDK in a private `_ensure_loaded()` method. Never import at module level. Surface a clear `ImportError` with install instructions.
198+
199+
**Entry point registration**: Add `[project.entry-points."zettelforge.extensions"]` in pyproject.toml for discovery by name beyond the `zettelforge_enterprise` convention.
Lines changed: 183 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,183 @@
1+
---
2+
title: "Configure Sigma Rule Ingestion"
3+
description: "Ingest Sigma detection rules into ZettelForge memory with automatic entity extraction, tag resolution, and knowledge graph population."
4+
diataxis_type: "how-to"
5+
audience: "Detection engineers, SOC analysts integrating Sigma rules into ZettelForge"
6+
tags: [sigma, ingestion, detection-rules, knowledge-graph, sigmahq]
7+
last_updated: "2026-04-27"
8+
version: "2.7.0"
9+
---
10+
11+
# Configure Sigma Rule Ingestion
12+
13+
Ingest Sigma detection rules (SigmaHQ format) into ZettelForge memory. Each rule is parsed, validated against the vendored SigmaHQ JSON schema, mapped to a `SigmaRule` entity with typed knowledge graph relations, and persisted as a memory note.
14+
15+
## Prerequisites
16+
17+
- ZettelForge installed (`pip install zettelforge`)
18+
- Sigma rule files in `.yml` or `.yaml` format (SigmaHQ specification V2.0.0)
19+
- Embedding and LLM models available (download automatically on first use)
20+
21+
## Steps
22+
23+
### 1. Use the CLI (quick start)
24+
25+
Dry-run a directory to validate rules before ingesting:
26+
27+
```bash
28+
python -m zettelforge.sigma.ingest /path/to/sigma/rules/ --dry-run
29+
```
30+
31+
Output shows each parsed rule with its id, rule type, tag count, and relation count:
32+
33+
```
34+
OK /path/to/sigma/rules/proc_creation_win_whoami.yml id=sigma_a1b2c3d4e5f67890 type=detection tags=3 edges=6
35+
OK /path/to/sigma/rules/susp_ps_execution.yml id=55043c5f-3c72-4fb6-aa22-70b6f7e98d4a type=detection tags=5 edges=9
36+
37+
Dry-run summary: 2/2 parsed, 0 failed.
38+
```
39+
40+
Live ingestion into a MemoryManager:
41+
42+
```bash
43+
python -m zettelforge.sigma.ingest /path/to/sigma/rules/ --domain detection
44+
```
45+
46+
### 2. Use the Python API
47+
48+
```python
49+
from zettelforge import MemoryManager
50+
from zettelforge.sigma import ingest_rule, ingest_rules_dir
51+
52+
mm = MemoryManager()
53+
54+
# Ingest a single file
55+
note, relations = ingest_rule(
56+
"/path/to/rule.yml",
57+
mm,
58+
domain="detection",
59+
)
60+
print(f"Ingested: {note.id}, relations: {len(relations)}")
61+
62+
# Ingest a directory (walks recursively)
63+
ingested, skipped = ingest_rules_dir(
64+
"/path/to/sigma/rules/",
65+
mm,
66+
glob="**/*.yml",
67+
domain="detection",
68+
)
69+
print(f"Ingested: {ingested}, skipped: {skipped}")
70+
```
71+
72+
### 3. Parse without persisting
73+
74+
For validation pipelines or custom workflows, parse rules without memory storage:
75+
76+
```python
77+
from zettelforge.sigma import parse_file, parse_yaml, from_rule_dict
78+
79+
# Parse from file
80+
rule_dict = parse_file("rule.yml")
81+
82+
# Parse from YAML string
83+
yaml_text = """
84+
title: Suspicious Whoami Execution
85+
logsource:
86+
category: process_creation
87+
product: windows
88+
detection:
89+
selection:
90+
Image|endswith: '\\whoami.exe'
91+
condition: selection
92+
"""
93+
rule_dict = parse_yaml(yaml_text)
94+
95+
# Map to entity and KG relations
96+
entity, relations = from_rule_dict(rule_dict)
97+
print(f"Rule: {entity.title}")
98+
print(f"Logsource: product={entity.logsource_product}, category={entity.logsource_category}")
99+
print(f"Relation types: {set(r['rel'] for r in relations)}")
100+
```
101+
102+
### 4. Accept multiple input types
103+
104+
`ingest_rule()` accepts a parsed dict, raw YAML string, or `Path`:
105+
106+
```python
107+
# Dict (pre-parsed)
108+
note, rels = ingest_rule(rule_dict, mm)
109+
110+
# YAML string
111+
note, rels = ingest_rule(yaml_text, mm)
112+
113+
# Path object
114+
from pathlib import Path
115+
note, rels = ingest_rule(Path("rule.yml"), mm)
116+
117+
# File path as string (auto-detected if it looks like a path)
118+
note, rels = ingest_rule("rule.yml", mm)
119+
```
120+
121+
### 5. Validate against the Sigma schema
122+
123+
Run standalone validation without entity mapping:
124+
125+
```python
126+
from zettelforge.sigma import validate, parse_yaml, SigmaValidationError
127+
128+
rule = parse_yaml(yaml_text)
129+
result = validate(rule)
130+
if not result.valid:
131+
for error in result.errors:
132+
print(f" Validation error: {error}")
133+
```
134+
135+
### 6. Understand idempotency
136+
137+
Re-ingesting an unchanged rule returns the original note. The `source_ref` follows the pattern `sigma:<rule_id>:<content_sha256_prefix>` and is checked before any write:
138+
139+
```python
140+
first, _ = ingest_rule("rule.yml", mm)
141+
second, _ = ingest_rule("rule.yml", mm)
142+
assert first.id == second.id # same note, no duplicate
143+
```
144+
145+
### 7. CLI flags reference
146+
147+
```
148+
usage: python -m zettelforge.sigma.ingest [-h] [--domain DOMAIN] [--dry-run] [--glob GLOB] path
149+
150+
positional arguments:
151+
path Sigma rule file or directory
152+
153+
options:
154+
--domain DOMAIN Memory domain for ingested notes (default: detection)
155+
--dry-run Parse + validate + map without persisting to memory
156+
--glob GLOB Glob used when path is a directory (default: **/*.yml)
157+
```
158+
159+
## LLM Quick Reference
160+
161+
**Task**: Ingest Sigma rules into ZettelForge memory with automatic knowledge graph population.
162+
163+
**Primary CLI**: `python -m zettelforge.sigma.ingest <path>` with optional `--dry-run`, `--domain`, `--glob`.
164+
165+
**Primary Python API**: `ingest_rule(source, mm, domain="detection")` returns `(MemoryNote, relations_list)`. Accepts dict, string, or Path. `ingest_rules_dir(path, mm)` walks a directory tree and returns `(ingested_count, skipped_count)`.
166+
167+
**Pipeline**: Input -> `parse_file()` or `parse_yaml()` (YAML load + JSON-schema validation against vendored SigmaHQ schemas) -> `from_rule_dict()` (map to `SigmaRule` entity + relations) -> `mm.remember()` (persist as memory note) -> KG edge persistence.
168+
169+
**Validation**: `validate(rule_dict)` returns `ValidationResult(valid, errors)`. Two error types: `SigmaParseError` (bad YAML or I/O) and `SigmaValidationError` (schema violation). Both bubble through `ingest_rule()`.
170+
171+
**Schema dispatch**: Detection rules validate against `sigma-detection-rule-schema.json`. Rules with a `correlation:` key use the correlation schema. Rules with a `filter:` key use the filters schema.
172+
173+
**Tag resolution**: Sigma tags (`attack.t1059`, `cve.2021-44228`) upgrade to typed KG edges (`detects` -> `AttackPattern`, `references_cve` -> `Vulnerability`, `attributed_to` -> `IntrusionSet`/`Malware`). Raw `tagged_with` -> `SigmaTag` edges are always preserved alongside upgrade edges. `tlp.*` and `detection.*` tags are metadata-only (no upgrade).
174+
175+
**Logsource edges**: Every populated logsource facet (product, service, category) generates an `applies_to` -> `LogSource` edge with the facet type and value in properties.
176+
177+
**Related rule edges**: The `related:` block maps to `superseded_by` (type: `obsolete`) or `related_to` (all other types).
178+
179+
**Idempotency**: Source ref pattern `sigma:<rule_id>:<content_sha256[:12]>`. A store lookup precedes every write. Unchanged rules return the original note.
180+
181+
**Security**: File size capped at 1 MB. Symlinks are never followed during directory walk. Paths that resolve outside the rules root are skipped.
182+
183+
**Edge tagging**: All KG edges emitted during ingest carry `edge_type: detection` and `source: sigma_ingest` properties for downstream filtering.

0 commit comments

Comments
 (0)