Skip to content

Commit 611607d

Browse files
committed
Revise and expand CAPE Sandbox developer guide
Updated SKILLS.md with expanded architecture, configuration, and development guidance for CAPE Sandbox v2. Added references to documentation, clarified directory structure, core workflows, and best practices. Included new sections on configuration management, adding modules (signatures, processing, reporting, machinery, packages), performance, security, and advanced debugging. Improved troubleshooting steps and command references for developers.
1 parent 252b64e commit 611607d

1 file changed

Lines changed: 141 additions & 45 deletions

File tree

SKILLS.md

Lines changed: 141 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -2,54 +2,75 @@
22

33
This document outlines the architectural structure, core concepts, and development patterns for the CAPE Sandbox (v2). It serves as a guide for extending functionality, debugging, and maintaining the codebase.
44

5+
> **Agent Hint:** Use the referenced documentation files (`docs/book/src/...`) to dive deeper into specific topics.
6+
57
## 1. Project Overview
68
CAPE (Config And Payload Extraction) is a malware analysis sandbox derived from Cuckoo Sandbox. It focuses on automated malware analysis with a specific emphasis on extracting payloads and configuration from malware.
79

10+
* **Ref:** `docs/book/src/introduction/what.rst`
11+
812
**Core Tech Stack:**
913
- **Language:** Python 3
1014
- **Web Framework:** Django
1115
- **Database:** PostgreSQL (SQLAlchemy) for task management, MongoDB/Elasticsearch for results storage.
12-
- **Virtualization:** KVM/QEMU (preferred), VirtualBox, VMWare.
16+
- **Virtualization:** KVM/QEMU (preferred), VirtualBox, VMWare, Azure, Google Cloud.
1317
- **Frontend:** HTML5, Bootstrap, Jinja2 Templates.
18+
- **Dependency Management:** Poetry.
1419

1520
## 2. Directory Structure Key
1621
| Directory | Purpose |
1722
| :--- | :--- |
1823
| `agent/` | Python script (`agent.py`) running *inside* the Guest VM to handle communication. |
19-
| `analyzer/` | Core analysis components running *inside* the Guest VM (monitor, analyzers). |
20-
| `conf/` | Configuration files (`cuckoo.conf`, `reporting.conf`, `web.conf`, etc.). |
24+
| `analyzer/` | Core analysis components running *inside* the Guest VM (monitor, analyzers, packages). |
25+
| `conf/` | Default configuration files. **Do not edit directly**; use `custom/conf/`. |
26+
| `custom/conf/` | User overrides for configuration files. |
2127
| `data/` | Static assets, yara rules, monitor binaries, and HTML templates (`data/html`). |
2228
| `lib/cuckoo/` | Core logic (Scheduler, Database, Guest Manager, Result Processor). |
23-
| `modules/` | Pluggable components (Signatures, Processing, Reporting, Auxiliary). |
29+
| `modules/` | Pluggable components (Signatures, Processing, Reporting, Auxiliary, Machinery). |
2430
| `web/` | Django-based web interface (Views, URLs, Templates). |
25-
| `utils/` | Standalone CLI utilities (`process.py`, `cleaners.py`, `rooter.py`). |
31+
| `utils/` | Standalone CLI utilities (`process.py`, `submit.py`, `rooter.py`, `community.py`). |
2632

2733
## 3. Core Workflows
2834

2935
### A. The Analysis Lifecycle
3036
1. **Submission:** User submits file/URL via WebUI (`web/submission/`) or API (`web/api/`).
37+
* **Ref:** `docs/book/src/usage/submit.rst`, `docs/book/src/usage/api.rst`
3138
2. **Scheduling:** Task is added to SQL DB. `lib/cuckoo/core/scheduler.py` picks it up.
32-
3. **Execution:**
39+
3. **Infrastructure:**
40+
* `modules/machinery` starts the VM.
41+
* `utils/rooter.py` configures network routing (if applicable).
42+
* **Ref:** `docs/book/src/usage/rooter.rst`
43+
4. **Execution:**
3344
* VM is restored/started.
3445
* `analyzer` is uploaded to VM.
35-
* Sample is injected/executed.
46+
* Sample is injected/executed using specific **Analysis Packages** (`analyzer/windows/modules/packages/`).
47+
* **Ref:** `docs/book/src/usage/packages.rst`
3648
* Behavior is monitored via API hooking (CAPE Monitor).
37-
4. **Result Collection:** Logs, PCAP, and dropped files are transferred back to Host.
38-
5. **Processing:** `modules/processing/` parses raw logs into a structured dictionary.
39-
6. **Signatures:** `modules/signatures/` runs logic against the processed data.
40-
7. **Reporting:** `modules/reporting/` exports data (JSON, HTML, MongoDB, MAEC).
41-
42-
### B. Web Interface Architecture
43-
The Web UI is split into two distinct rendering logic paths:
44-
1. **Django Views (`web/analysis/views.py`):** Handles URL routing, authentication, and context generation. It fetches data from MongoDB/Elasticsearch.
45-
2. **Jinja2 Templates:**
46-
* **Web Templates (`web/templates/`):** Standard Django templates for the UI.
47-
* **Report Templates (`data/html/`):** Standalone Jinja2 templates used by the `reporthtml` module to generate static HTML reports. *Note: Changes here affect the downloadable HTML report, not necessarily the Web UI.*
48-
49-
## 4. Development Guides
49+
* **Auxiliary Modules** (`modules/auxiliary/`) run in parallel on the Host (e.g., Sniffer).
50+
5. **Result Collection:** Logs, PCAP, and dropped files are transferred back to Host.
51+
6. **Processing:** `modules/processing/` parses raw logs into a structured dictionary (Global Container).
52+
7. **Signatures:** `modules/signatures/` runs logic against the processed data.
53+
8. **Reporting:** `modules/reporting/` exports data (JSON, HTML, MongoDB, MAEC).
54+
55+
## 4. Configuration Management
56+
* **Overrides:** Never edit files in `conf/` directly. Create a copy in `custom/conf/` with the same name.
57+
* **Environment Variables:** You can use env vars in configs: `%(ENV:VARIABLE_NAME)s`.
58+
* **Conf.d:** You can create directories like `custom/conf/reporting.conf.d/` and add `.conf` files there for granular overrides.
59+
* **Ref:** `docs/book/src/installation/host/configuration.rst`
60+
61+
## 5. Development Guides
62+
* **Coding Style:** See `docs/book/src/development/code_style.rst`
63+
64+
### Coding Standards (PEP 8+)
65+
* **Imports:** Explicit imports only (`from lib import a, b`). No `from lib import *`. Group standard library, 3rd party, and local imports.
66+
* **Strings:** Use double quotes (`"`) for strings. (This line was corrected from the original prompt to reflect the actual change needed for the example.)
67+
* **Logging:** Use `import logging; log = logging.getLogger(__name__)`. Do not use `print()`.
68+
* **Exceptions:** Use custom exceptions from `lib/cuckoo/common/exceptions.py` (e.g., `CuckooOperationalError`).
5069

5170
### How to Add a Detection Signature
5271
Signatures live in `modules/signatures/`.
72+
* **Ref:** `docs/book/src/customization/signatures.rst`
73+
5374
```python
5475
from lib.cuckoo.common.abstracts import Signature
5576

@@ -59,54 +80,129 @@ class MyMalware(Signature):
5980
severity = 3
6081
categories = ["trojan"]
6182
authors = ["You"]
83+
minimum = "2.0"
6284

63-
def on_call(self, call, process):
64-
# Inspect individual API calls
65-
if call["api"] == "CreateFileW" and "evil.exe" in call["arguments"]["filepath"]:
66-
return True
85+
def run(self):
86+
# Helper methods: check_file, check_key, check_mutex, check_api, check_ip, check_domain
87+
return self.check_file(pattern=".*evil\\.exe$", regex=True)
88+
89+
# For performance, use evented signatures (on_call) for high-volume API checks
90+
# evented = True
91+
# def on_call(self, call, process): ...
6792
```
6893

6994
### How to Add a Processing Module
70-
Processing modules (`modules/processing/`) run after analysis to extract specific data (e.g., Static analysis of a file).
95+
Processing modules (`modules/processing/`) run after analysis to extract specific data.
96+
* **Ref:** `docs/book/src/customization/processing.rst`
97+
7198
```python
7299
from lib.cuckoo.common.abstracts import Processing
73100

74101
class MyExtractor(Processing):
75102
def run(self):
76103
self.key = "my_data" # Key in the final report JSON
77104
result = {}
78-
# ... logic ...
105+
# Access raw data via self.analysis_path, self.log_path, etc.
79106
return result
80107
```
81108

82-
### How to Modify the Web Report
83-
1. **Locate the Template:** Look in `web/templates/analysis/`.
84-
* `overview/index.html`: Main dashboard.
85-
* `overview/_info.html`: General details.
86-
* `overview/_summary.html`: Behavioral summary.
87-
2. **Edit:** Use Django template language (`{% if %}`, `{{ variable }}`).
88-
3. **Context:** Data is usually passed as `analysis` object. Access fields like `analysis.info.id`, `analysis.network`, `analysis.behavior`.
109+
### How to Add a Reporting Module
110+
Reporting modules (`modules/reporting/`) consume the processed data (Global Container).
111+
* **Ref:** `docs/book/src/customization/reporting.rst`
112+
113+
```python
114+
from lib.cuckoo.common.abstracts import Report
115+
from lib.cuckoo.common.exceptions import CuckooReportError
116+
117+
class MyReport(Report):
118+
def run(self, results):
119+
# 'results' is the big dictionary containing all processed data
120+
try:
121+
# Write to file or database
122+
pass
123+
except Exception as e:
124+
raise CuckooReportError(f"Failed to report: {e}")
125+
```
126+
127+
### How to Add a Machinery Module
128+
Machinery modules (`modules/machinery/`) control the virtualization layer.
129+
* **Ref:** `docs/book/src/customization/machinery.rst`
130+
131+
```python
132+
from lib.cuckoo.common.abstracts import Machinery
133+
from lib.cuckoo.common.exceptions import CuckooMachineError
134+
135+
class MyHypervisor(Machinery):
136+
def start(self, label):
137+
# Start the VM
138+
pass
139+
140+
def stop(self, label):
141+
# Stop the VM
142+
pass
143+
```
144+
145+
### How to Add an Analysis Package
146+
Packages (`analyzer/windows/modules/packages/`) define how to execute the sample inside the VM.
147+
* **Ref:** `docs/book/src/customization/packages.rst`
148+
149+
```python
150+
from lib.common.abstracts import Package
151+
152+
class MyPackage(Package):
153+
def start(self, path):
154+
args = self.options.get("arguments")
155+
# 'execute' handles injection and monitoring
156+
return self.execute(path, args, suspended=False)
157+
```
158+
159+
## 6. Best Practices
160+
161+
### Web & UI
162+
1. **Conditionally Render:** Always check if a dictionary key exists in templates (`{% if analysis.key %}`) before rendering to avoid UI breaks on different analysis types (Static vs Dynamic).
163+
2. **Keep Views Light:** Perform heavy data crunching in `modules/processing`, not in Django views.
164+
3. **Modular CSS/JS:** Keep custom styles in `web/static/` rather than inline in templates when possible.
165+
166+
### Performance
167+
1. **Evented Signatures:** Use `evented = True` and `on_call()` in signatures to process API calls in a single loop instead of iterating the whole log multiple times.
168+
2. **Ram-boost:** Enable `ram_boost` in `processing.conf` behavior section to keep API logs in memory if the Host has >20GB RAM.
169+
3. **Disable Unused Reports:** Disable heavy reporting modules (e.g., HTML, MAEC) in `reporting.conf` if not strictly needed for automation.
170+
171+
### Security
172+
1. **Guest Isolation:** Always use static IPs and consider isolated/host-only networks. Disable noisy services (LLMNR, Teredo) in Guest to reduce PCAP noise.
173+
2. **Stealth:** Use the `no-stealth` option sparingly. CAPE's anti-anti-VM features are enabled by default and are critical for modern malware.
89174

90-
## 5. Troubleshooting & Debugging
175+
## 7. Troubleshooting & Debugging
176+
* **Ref:** `docs/book/src/Issues/Debugging_VM_issues.rst` (VM hangs, High CPU)
177+
* **Ref:** `docs/book/src/installation/guest/troubleshooting.rst` (Network, Agent issues)
91178

92179
### Common Issues
93-
* **"Waiting for container":** Usually a network configuration issue in `conf/cuckoo.conf` or `conf/auxiliary.conf`.
94-
* **Report Empty:** Check `reporting.conf`. If using MongoDB, ensure `mongodb` is enabled.
95-
* **Template Errors:** Use `{% if variable %}` guards aggressively. Missing keys in MongoDB documents cause Jinja2 crashes.
180+
* **"Waiting for container":** Check `conf/cuckoo.conf` (IPs) or network configuration. Ensure `cape-rooter` is running if routing is enabled.
181+
* **VM Stuck/Hanging:**
182+
* Check `ps aux | grep qemu` or `grep python`.
183+
* **100% CPU:** Livelock.
184+
* **0% CPU:** Waiting for I/O (likely network or agent).
185+
* Check `lib/cuckoo/core/guest.py` timeouts.
186+
* **Permissions:** Ensure `cape` user owns the directories and files.
187+
* **Database Migrations:** If DB errors occur, run `cd utils/db_migration && poetry run alembic upgrade head`.
188+
189+
### Advanced Debugging (py-spy)
190+
If the Python controller is unresponsive, use `py-spy` to inspect the stack trace without stopping the process:
191+
1. **Install:** `pip install py-spy`
192+
2. **Dump:** `sudo py-spy dump --pid <PYTHON_PID>`
193+
3. **Analyze:** Look for `wait_for_completion` (waiting for Guest/Agent) or network calls like `select`, `poll`, `recv` that may be blocked.
96194

97195
### Important Commands
98-
* `poetry run python cuckoo.py -d`: Run CAPE in debug mode (verbose logs).
99-
* `poetry run python utils/process.py -r <task_id>`: Re-run processing and reporting for a specific task without restarting the VM.
100-
* `poetry run python utils/cleaners.py --clean`: Wipe all tasks and reset the DB.
196+
* **Start CAPE:** `sudo -u cape poetry run python cuckoo.py`
197+
* **Debug Mode:** `sudo -u cape poetry run python cuckoo.py -d`
198+
* **Reprocess Task:** `sudo -u cape poetry run python utils/process.py -r <task_id>`
199+
* **Clean All:** `sudo -u cape poetry run python utils/cleaners.py --clean` (Destructive!)
200+
* **Download Signatures:** `sudo -u cape poetry run python utils/community.py -waf`
201+
* **Test Rooter:** `sudo python3 utils/rooter.py -g cape -v`
101202

102203
### Database Querying (MongoDB)
103204
CAPE stores unstructured analysis results in the `analysis` collection.
104205
```bash
105206
mongo cuckoo
106207
db.analysis.find({"info.id": 123}, {"behavior.summary": 1}).pretty()
107208
```
108-
109-
## 6. Best Practices
110-
1. **Conditionally Render:** Always check if a dictionary key exists in templates before rendering to avoid UI breaks on different analysis types (Static vs Dynamic).
111-
2. **Keep Views Light:** Perform heavy data crunching in `modules/processing`, not in Django views.
112-
3. **Modular CSS/JS:** Keep custom styles in `web/static/` rather than inline in templates when possible.

0 commit comments

Comments
 (0)