Skip to content

Commit 25b3f2f

Browse files
authored
waf: add body size handling helpers (#1084)
1 parent 9d436c6 commit 25b3f2f

23 files changed

Lines changed: 1269 additions & 801 deletions

File tree

crowdsec-docs/docs/appsec/configuration.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -168,6 +168,10 @@ inband_options:
168168
request_body_in_memory_limit: 1048576
169169
```
170170

171+
:::note
172+
`request_body_in_memory_limit` is a Coraza-level setting. It is distinct from the engine's overall maximum body size, which bounds how much of the body CrowdSec buffers before any rule runs (defaults to 10MB). See [Request body size handling](hooks.md#request-body-size-handling) to tune it.
173+
:::
174+
171175
### outofband_options
172176

173177
> object

crowdsec-docs/docs/appsec/hooks.md

Lines changed: 76 additions & 31 deletions
Large diffs are not rendered by default.

crowdsec-docs/docs/appsec/rules_examples.md

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -507,6 +507,29 @@ label=aaa\u0027%2b#request.get(\u0027.KEY_velocity.struts2.context\u0027).intern
507507

508508
Hooks allow you to customize WAF behavior at different execution phases. This section demonstrates key hook capabilities organized by execution phase.
509509

510+
## Load Phase (on_load)
511+
512+
Load hooks run once when the configuration is loaded, and are typically used to apply global settings.
513+
514+
### 1. Tune Request Body Size Handling
515+
516+
#### Description
517+
518+
Change the maximum request body size buffered and inspected by the engine, and what happens when a request exceeds it.
519+
520+
#### Hook Example
521+
522+
```yaml
523+
on_load:
524+
- apply:
525+
- SetMaxBodySize(20971520) # 20MB
526+
- SetBodySizeExceededAction("partial")
527+
```
528+
529+
#### Use Case
530+
531+
Allow larger uploads on this configuration while still bounding memory usage, and inspect the first 20MB of oversized bodies instead of dropping the request outright. See [Request body size handling](hooks.md#request-body-size-handling) for the available actions (`drop`, `partial`, `allow`).
532+
510533
## Pre-Evaluation Phase (pre_eval)
511534

512535
Pre-evaluation hooks run before rules are evaluated, allowing you to modify rule behavior dynamically per request.
@@ -617,6 +640,25 @@ pre_eval:
617640

618641
Automatically block traffic from unwanted countries.
619642

643+
### 6. Disable Body Inspection for Specific Requests
644+
645+
#### Description
646+
647+
Skip request body inspection for the current request, for example on endpoints that legitimately receive large uploads.
648+
649+
#### Hook Example
650+
651+
```yaml
652+
pre_eval:
653+
- filter: req.URL.Path startsWith "/upload"
654+
apply:
655+
- DisableBodyInspection()
656+
```
657+
658+
#### Use Case
659+
660+
Avoid buffering and inspecting large file uploads on trusted endpoints. This also bypasses the [maximum body size check](hooks.md#request-body-size-handling), so requests exceeding the limit are allowed through instead of being dropped.
661+
620662
## Post-Evaluation Phase (post_eval)
621663

622664
Post-evaluation hooks run after rule evaluation is complete, primarily used for debugging and logging.

crowdsec-docs/docs/log_processor/data_sources/appsec.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,12 @@ Number of routines to use to process the requests. Defaults to 1.
5757
How long to cache the auth token for. Accepts value supported by [time.ParseDuration](https://golang.org/pkg/time/#ParseDuration).
5858
Defaults to 1m.
5959

60+
### `body_read_timeout`
61+
62+
How long to wait for the remediation component to finish sending the request body before giving up and processing whatever was received. Accepts value supported by [time.ParseDuration](https://golang.org/pkg/time/#ParseDuration).
63+
Set to `0` to disable the timeout.
64+
Defaults to 1s.
65+
6066
### `cert_file`
6167

6268
Path to the cert file to allow HTTPS communication between the remediation component and the appsec component.
Lines changed: 233 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,233 @@
1+
---
2+
id: api_validation
3+
title: OpenAPI Schema Validation
4+
sidebar_position: 5
5+
---
6+
7+
The Application Security Component can validate incoming HTTP requests against an [OpenAPI 3](https://swagger.io/specification/) schema you provide. Requests that do not conform to the schema (unknown route, unexpected method, missing or malformed parameters, invalid request body, missing/invalid authentication credentials, …) can be rejected before they ever reach the protected application.
8+
9+
This is a positive-security model layered on top of the negative-security model implemented by the WAF rules: instead of describing what an attacker looks like, you describe what a valid client looks like and reject everything else.
10+
11+
## How it works
12+
13+
Schema validation is exposed through the [hooks](hooks.md) system:
14+
15+
- An `on_load` hook loads one or more OpenAPI schemas at startup, each under a short string `ref`.
16+
- A `pre_eval` hook calls `ValidateRequestWithSchema(ref)` to validate the current request. The function returns `true` when the request is valid, `false` otherwise.
17+
- When validation fails, structured details about the failure are published to `hook_vars` so the same hook (or a later one) can build a meaningful drop reason, enrich an event, etc.
18+
19+
## Storing schemas
20+
21+
Schemas are loaded from the `schemas/` subdirectory of the CrowdSec [`data_dir`](/configuration/crowdsec_configuration.md#data_dir) (typically `/var/lib/crowdsec/data/schemas/`).
22+
23+
Filenames passed to the loader **must be relative** to that directory.
24+
25+
```
26+
/var/lib/crowdsec/data/schemas/
27+
├── users-api.yaml
28+
└── billing-api.yaml
29+
```
30+
31+
OpenAPI 3.0 and Swagger schemas in YAML or JSON are both accepted.
32+
33+
## Loading schemas (`on_load`)
34+
35+
Loading is done from an `on_load` hook using one of two helpers:
36+
37+
| Helper | Description |
38+
| ------------------------------------------------------------- | ---------------------------------------------------------------------------------------- |
39+
| `LoadAPISchemaWithName(ref str, filename str)` | Load `<data_dir>/schemas/<filename>` and register it under `ref`, with default policies. |
40+
| `LoadAPISchemaWithOptions(ref str, filename str, opts map)` | Same as above, but lets you override per-schema policies (see below). |
41+
| `RegisterAPISchemaBodyDecoder(content_type str, decoder str)` | Enable a non-default body decoder for a given Content-Type (see below). |
42+
43+
`ref` is an arbitrary string you choose; you will use it later in `pre_eval` to refer to this schema. A schema name cannot be loaded twice.
44+
45+
```yaml
46+
name: custom/my-appsec-config
47+
inband_rules:
48+
- crowdsecurity/base-config
49+
on_load:
50+
- apply:
51+
- LoadAPISchemaWithName("users_api", "users-api.yaml")
52+
- LoadAPISchemaWithName("billing_api", "billing-api.yaml")
53+
```
54+
55+
If the schema file is missing, malformed, or not a valid OpenAPI 3 document, the datasource will fail to start and log the underlying error.
56+
57+
### Schema options
58+
59+
`LoadAPISchemaWithOptions` accepts the following keys, all strings:
60+
61+
| Key | Values | Default | Effect |
62+
| -------------------------------- | ----------------- | ------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
63+
| `on_route_not_found` | `drop` / `ignore` | `drop` | What to do when no path in the schema matches the request URL. |
64+
| `on_method_not_allowed` | `drop` / `ignore` | `drop` | What to do when a path matches but the method does not (e.g. schema only declares `GET`, request is `POST`). |
65+
| `on_unsupported_security_scheme` | `drop` / `ignore` | `drop` | What to do when an unsupported security schema is encountered (`openid`, `oauth2`). If `ignore`, the security schema will not be validated when checking a request |
66+
67+
`drop` (the default) treats the unmatched route as a validation failure — `ValidateRequestWithSchema` returns `false` and the validation error is surfaced via `hook_vars`. `ignore` lets the request through the validator without inspection (the function returns `true`), which is useful when your schema only covers a subset of your API.
68+
69+
```yaml
70+
on_load:
71+
- apply:
72+
- >
73+
LoadAPISchemaWithOptions("public_api", "public-api.yaml", {
74+
"on_route_not_found": "ignore",
75+
"on_method_not_allowed": "drop",
76+
})
77+
```
78+
79+
### Body decoders
80+
81+
The validator uses the request `Content-Type` to pick a decoder for the body. By default, only the following Content-Types are decoded:
82+
83+
- `application/json` and the JSON variants `application/json-patch+json`, `application/merge-patch+json`, `application/ld+json`, `application/hal+json`, `application/vnd.api+json`, `application/problem+json`
84+
- `application/x-www-form-urlencoded`
85+
- `multipart/form-data`
86+
87+
A request whose Content-Type is not in this list will fail validation if the matching operation in the schema declares a request body.
88+
89+
To enable validation of additional Content-Types, register a decoder from `on_load`:
90+
91+
```yaml
92+
on_load:
93+
- apply:
94+
- RegisterAPISchemaBodyDecoder("application/yaml", "yaml")
95+
- RegisterAPISchemaBodyDecoder("text/csv", "csv")
96+
```
97+
98+
Available decoder names:
99+
100+
| Decoder | Use for |
101+
| ------------ | ----------------------------------------------------- |
102+
| `json` | JSON payloads |
103+
| `urlencoded` | `application/x-www-form-urlencoded` |
104+
| `multipart` | `multipart/form-data` |
105+
| `yaml` | YAML payloads |
106+
| `csv` | CSV payloads |
107+
| `plain` | `text/plain` |
108+
| `file` | Raw binary uploads (`application/octet-stream`, etc.) |
109+
110+
:::warning
111+
Body decoders are registered process-wide. If you run several AppSec datasources in the same CrowdSec process, they share the same set of registered decoders.
112+
:::
113+
114+
## Validating requests (`pre_eval`)
115+
116+
In a `pre_eval` hook, call `ValidateRequestWithSchema(ref)` with the `ref` you used at load time. It returns `true` if the request matches the schema, `false` otherwise.
117+
118+
| Helper | Type | Description |
119+
| --------------------------- | -------------------- | -------------------------------------------------------------------------------------------------- |
120+
| `ValidateRequestWithSchema` | `func(ref str) bool` | Validate the current request against the schema registered under `ref`. Returns `true` on success. |
121+
122+
A typical pattern is to fail closed — on validation failure, drop the request and use the failure details to build a human-readable reason:
123+
124+
```yaml
125+
name: custom/my-appsec-config
126+
on_load:
127+
- apply:
128+
- LoadAPISchemaWithName("users_api", "users-api.yaml")
129+
inband:
130+
pre_eval:
131+
- filter: req.URL.Path startsWith "/users" && !ValidateRequestWithSchema("users_api")
132+
apply:
133+
- |
134+
DropRequest("schema validation failed: " + hook_vars.validation_error_message)
135+
```
136+
137+
You can also use the result to pick a softer remediation, send a custom event, etc.
138+
139+
### Validation result variables
140+
141+
When `ValidateRequestWithSchema` returns `false`, the following keys are set on `hook_vars`. They are available to the `apply` block of the same hook, to later hooks in the same request, and to `on_match` / `post_eval` hooks. The same keys are also propagated to the resulting CrowdSec event.
142+
143+
| `hook_vars` key | Description |
144+
| --------------------------- | ---------------------------------------------------------------------------------------------------------------- |
145+
| `validation_error` | Full human-readable error string (combination of reason, field and message). |
146+
| `validation_error_reason` | Failure category — `parameter`, `request_body`, `security`, `route_not_found`, `method_not_allowed`, `internal`. |
147+
| `validation_error_field` | Name of the offending field (e.g. query parameter, header, body property) when applicable. |
148+
| `validation_error_message` | The underlying error message from the validator. |
149+
| `validation_error_value` | The offending value, truncated to 100 characters. |
150+
| `validation_error_expected` | Short description of what the schema expected (e.g. `type: integer, min: 18`). |
151+
152+
On success these keys are absent.
153+
154+
## Authentication
155+
156+
If your OpenAPI schema declares a `security` requirement on an operation, the validator enforces it as part of validation. Failure to satisfy the security requirement is reported as a `security` reason in `hook_vars`.
157+
158+
| Security scheme | Supported | Notes |
159+
| ------------------------- | --------- | ---------------------------------------------------------------------------------------------- |
160+
| `http` `basic` | Yes | Checks that an `Authorization: Basic …` header is present and non-empty. |
161+
| `http` `bearer` | Yes | Checks that an `Authorization: Bearer …` header is present and non-empty. |
162+
| `apiKey` (`header`) | Yes | Checks that the named header is present and non-empty. |
163+
| `apiKey` (`query`) | Yes | Checks that the named query parameter is present and non-empty. |
164+
| `apiKey` (`cookie`) | Yes | Checks that the named cookie is present and non-empty. |
165+
| `oauth2`, `openIdConnect` | No | A warning is logged at schema load. Any request guarded by such a scheme will fail validation. |
166+
167+
The validator only verifies that the credential **is present and well-formed** — it does not verify the credential against any backing store.
168+
169+
## End-to-end example
170+
171+
`/var/lib/crowdsec/data/schemas/users-api.yaml`:
172+
173+
```yaml
174+
openapi: 3.0.0
175+
info:
176+
title: Users API
177+
version: "1.0.0"
178+
paths:
179+
/users:
180+
post:
181+
requestBody:
182+
required: true
183+
content:
184+
application/json:
185+
schema:
186+
type: object
187+
required: [username, email]
188+
additionalProperties: false
189+
properties:
190+
username:
191+
type: string
192+
minLength: 3
193+
maxLength: 20
194+
email:
195+
type: string
196+
format: email
197+
responses:
198+
"201":
199+
description: created
200+
```
201+
202+
AppSec configuration:
203+
204+
```yaml
205+
name: custom/my-appsec-config
206+
on_load:
207+
- apply:
208+
- LoadAPISchemaWithName("users_api", "users-api.yaml")
209+
inband:
210+
pre_eval:
211+
- filter: req.URL.Path startsWith "/users" && !ValidateRequestWithSchema("users_api")
212+
apply:
213+
- |
214+
DropRequest("API schema violation on '" + hook_vars.validation_error_field + "': " + hook_vars.validation_error_message)
215+
```
216+
217+
With this configuration:
218+
219+
- `POST /users` with `{"username": "ab", "email": "x"}` is dropped (`username` too short, `email` malformed).
220+
- `POST /users` with a valid body passes validation and is then evaluated by the WAF rules as usual.
221+
- `GET /users` is dropped with reason `method_not_allowed` (default policy).
222+
- `POST /admin` is dropped with reason `route_not_found` (default policy).
223+
224+
## Metrics
225+
226+
Two Prometheus counters are exposed:
227+
228+
| Metric | Labels | Description |
229+
| ----------------------------------- | ------------------------------------------------- | -------------------------------------------------------------- |
230+
| `cs_appsec_validation_ok_total` | `source`, `appsec_engine`, `schema_ref` | Requests that passed schema validation. |
231+
| `cs_appsec_validation_failed_total` | `source`, `appsec_engine`, `schema_ref`, `reason` | Requests that failed schema validation, broken down by reason. |
232+
233+
`reason` values match `validation_error_reason`: `parameter`, `request_body`, `security`, `route_not_found`, `method_not_allowed`, `internal`.

crowdsec-docs/versioned_docs/version-v1.7/appsec/configuration.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -168,6 +168,10 @@ inband_options:
168168
request_body_in_memory_limit: 1048576
169169
```
170170

171+
:::note
172+
`request_body_in_memory_limit` is a Coraza-level setting. It is distinct from the engine's overall maximum body size, which bounds how much of the body CrowdSec buffers before any rule runs (defaults to 10MB). See [Request body size handling](hooks.md#request-body-size-handling) to tune it.
173+
:::
174+
171175
### outofband_options
172176

173177
> object

0 commit comments

Comments
 (0)