Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 1 addition & 16 deletions docs/assets/css/custom.css
Original file line number Diff line number Diff line change
Expand Up @@ -73,28 +73,13 @@ html.dark .hextra-card:hover {

/* ── Nav ─────────────────────────────────────────────────────── */

/* Probe Results — primary nav item */
/* Leaderboards — primary nav item */
a[href$="/probe-results"],
a[href$="/probe-results/"] {
font-weight: 700 !important;
font-size: 1.1em !important;
}

/* Separator before Glossary */
a[href$="/docs"],
a[href$="/docs/"] {
margin-left: 12px !important;
padding-left: 14px !important;
border-left: 1px solid rgba(128, 128, 128, 0.3) !important;
}

/* Separator before Add a Framework */
a[href$="/add-a-framework"],
a[href$="/add-a-framework/"] {
margin-left: 12px !important;
padding-left: 14px !important;
border-left: 1px solid rgba(128, 128, 128, 0.3) !important;
}

/* ── Code blocks: fix dark-mode readability ─────────────────── */

Expand Down
1 change: 1 addition & 0 deletions docs/content/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ Http11Probe sends a suite of crafted HTTP requests to each server and checks whe
{{< card link="compliance" title="Compliance" subtitle="RFC 9110/9112 protocol requirements — line endings, request-line format, header syntax, Host validation, Content-Length parsing." icon="check-circle" >}}
{{< card link="smuggling" title="Smuggling" subtitle="CL/TE ambiguity, duplicate Content-Length, obfuscated Transfer-Encoding, pipeline injection vectors." icon="shield-exclamation" >}}
{{< card link="malformed-input" title="Robustness" subtitle="Binary garbage, oversized fields, too many headers, control characters, integer overflow, incomplete requests." icon="lightning-bolt" >}}
{{< card link="normalization" title="Normalization" subtitle="Header normalization behavior — underscore-to-hyphen, space before colon, tab in name, case folding on Transfer-Encoding." icon="adjustments" >}}
{{< /cards >}}

<div style="height:60px"></div>
Expand Down
70 changes: 64 additions & 6 deletions docs/content/add-a-framework/_index.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,39 @@
---
title: Add a Framework
toc: false
toc: true
---

Http11Probe is designed so anyone can contribute their HTTP server and get compliance results without touching the test infrastructure.

## Required Endpoints

Your server must listen on **port 8080** and implement three endpoints:

| Endpoint | Method | Behavior |
|----------|--------|----------|
| `/` | `GET` | Return `200 OK`. This is the baseline reachability check. |
| `/` | `POST` | Read the full request body and return it in the response. Used by body handling and smuggling tests. |
| `/echo` | `POST` | Return all received request headers in the response body, one per line as `Name: Value`. Used by normalization tests. |

### Why `/echo`?

Normalization tests need to see how the server internally represents headers after parsing. For example, if the test sends `Content_Length: 99`, the `/echo` endpoint reveals whether the server normalized the underscore to a hyphen, preserved it as-is, or dropped it entirely. Without this endpoint, normalization tests cannot run.

### Response format for `/echo`

The response body should contain one header per line in `Name: Value` format:

```
Host: localhost:8080
Content-Length: 11
Content-Type: text/plain
```

The order does not matter. Include all headers the server received (framework-added headers like `Connection` are fine).

## Steps

**1. Write a minimal server** — Create a directory under `src/Servers/YourServer/` with a simple HTTP server that listens on **port 8080** and returns `200 OK` on `GET /`. Any language, any framework.
**1. Create a server directory** — Add a directory under `src/Servers/YourServer/` with your server source code implementing the three endpoints above.

**2. Add a Dockerfile** — Build and run your server. It will run with `--network host`.

Expand All @@ -17,7 +43,7 @@ Http11Probe is designed so anyone can contribute their HTTP server and get compl
{"name": "Your Server"}
```

That's it. Open a PR and the probe runs automatically.
Open a PR and the probe runs automatically.

## How It Works

Expand All @@ -26,14 +52,14 @@ The CI pipeline scans `src/Servers/*/probe.json` to discover servers. For each o
1. Builds the Docker image from the Dockerfile in that directory
2. Runs the container on port 8080 with `--network host`
3. Waits for the server to become ready
4. Runs the full compliance probe suite
4. Runs the full probe suite (compliance, smuggling, malformed input, normalization)
5. Stops the container and moves to the next server

No workflow edits, no port allocation, no config files.

## Example

Here's the full Flask server as a reference:
Here's the Flask server as a reference:

**`src/Servers/FlaskServer/probe.json`**
```json
Expand All @@ -49,4 +75,36 @@ COPY src/Servers/FlaskServer/app.py .
ENTRYPOINT ["python3", "app.py", "8080"]
```

**`src/Servers/FlaskServer/app.py`** — a minimal Flask app that reads the port from `sys.argv` and returns `200 OK` on `GET /`.
**`src/Servers/FlaskServer/app.py`**
```python
import sys
from flask import Flask, request
from werkzeug.routing import Rule

app = Flask(__name__)

@app.route('/echo', methods=['GET','POST','PUT','DELETE','PATCH','OPTIONS','HEAD'])
def echo():
lines = []
for name, value in request.headers:
lines.append(f"{name}: {value}")
return '\n'.join(lines) + '\n', 200, {'Content-Type': 'text/plain'}

app.url_map.add(Rule('/', defaults={"path": ""}, endpoint='catch_all'))
app.url_map.add(Rule('/<path:path>', endpoint='catch_all'))

@app.endpoint('catch_all')
def catch_all(path):
if request.method == 'POST':
return request.get_data(as_text=True)
return "OK"

if __name__ == "__main__":
port = int(sys.argv[1]) if len(sys.argv) > 1 else 8080
app.run(host="0.0.0.0", port=port)
```

The key parts:
- **`/echo`** — echoes all received headers back as plain text.
- **`POST /`** — reads and returns the request body (needed for body and smuggling tests).
- **`GET /`** (catch-all) — returns `"OK"` with `200`.
2 changes: 1 addition & 1 deletion docs/content/docs/body/_index.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
title: Body Handling
description: "Body Handling — Http11Probe documentation"
weight: 6
weight: 9
sidebar:
open: false
---
Expand Down
2 changes: 1 addition & 1 deletion docs/content/docs/content-length/_index.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
title: Content-Length
description: "Content-Length — Http11Probe documentation"
weight: 6
weight: 8
sidebar:
open: false
---
Expand Down
2 changes: 1 addition & 1 deletion docs/content/docs/headers/_index.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
title: Header Syntax
description: "Header Syntax — Http11Probe documentation"
weight: 4
weight: 6
sidebar:
open: false
---
Expand Down
2 changes: 1 addition & 1 deletion docs/content/docs/host-header/_index.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
title: Host Header
description: "Host Header — Http11Probe documentation"
weight: 5
weight: 7
sidebar:
open: false
---
Expand Down
19 changes: 19 additions & 0 deletions docs/content/docs/http-overview/_index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
---
title: Understanding HTTP
description: "What HTTP is, how HTTP/1.1 works in depth, its history from 0.9 to 3, and alternatives."
weight: 2
sidebar:
open: false
---

A comprehensive guide to HTTP — what it is, why it was designed the way it was, and how HTTP/1.1 works at the wire level. Start here before diving into the individual test categories.

{{< cards >}}
{{< card link="what-is-http" title="What is HTTP?" subtitle="Application-layer request/response protocol, client-server model, stateless design, and core design goals." icon="question-mark-circle" >}}
{{< card link="message-syntax" title="Message Syntax" subtitle="Request and response structure, methods (GET, POST, PUT...), status codes (1xx–5xx), and the request-line grammar." icon="code" >}}
{{< card link="headers" title="Headers" subtitle="Header structure, common request and response headers, the Host header, and why it's the only required header." icon="document-text" >}}
{{< card link="connections" title="Connections" subtitle="Persistent connections, keep-alive, pipelining, head-of-line blocking, Upgrade, and 100 Continue." icon="switch-horizontal" >}}
{{< card link="body-and-framing" title="Body and Framing" subtitle="Content-Length, chunked transfer encoding, trailers, and why CL+TE conflicts cause request smuggling." icon="document-download" >}}
{{< card link="caching-and-negotiation" title="Caching and Negotiation" subtitle="Content negotiation with Accept headers, Cache-Control, ETags, conditional requests, and Vary." icon="refresh" >}}
{{< card link="history-and-future" title="History and Future" subtitle="HTTP/0.9 to HTTP/3, the current IETF work, alternatives to HTTP, and learning resources." icon="clock" >}}
{{< /cards >}}
179 changes: 179 additions & 0 deletions docs/content/docs/http-overview/body-and-framing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,179 @@
---
title: Body and Framing
description: "Content-Length, chunked transfer encoding, trailers, and why CL+TE conflicts cause request smuggling."
weight: 5
---

HTTP/1.1 messages optionally carry a **message body** after the header section. The critical question for any parser is: **where does the body end?** Getting this wrong is the root cause of HTTP request smuggling.

## When Is a Body Present?

- **Requests** — a body is present if `Content-Length` or `Transfer-Encoding` is set. `GET`, `HEAD`, `DELETE`, and `OPTIONS` typically have no body (though the spec doesn't forbid it).
- **Responses** — all responses to `HEAD` requests and all `1xx`, `204`, and `304` responses have no body. Everything else may have a body.

## Content-Length

The `Content-Length` header declares the exact size of the body in bytes as a decimal integer:

```http
POST /data HTTP/1.1
Host: example.com
Content-Type: text/plain
Content-Length: 13

Hello, World!
```

The parser reads exactly 13 bytes after the empty line, then the next bytes are the start of the next message (on a persistent connection) or the connection ends.

### Rules

- The value **MUST** be a non-negative decimal integer.
- **No leading zeros** — `Content-Length: 007` is invalid.
- **No signs** — `Content-Length: +13` or `Content-Length: -1` are invalid.
- **No whitespace** within the value — `Content-Length: 1 3` is invalid.
- If `Content-Length` **doesn't match** the actual body size, the message is malformed. The server SHOULD close the connection.
- **Multiple `Content-Length` headers** are allowed only if all values are identical. If they differ, the message is malformed and MUST be rejected.

### Why Strictness Matters

Lenient parsing of `Content-Length` is a common source of vulnerabilities:

- `Content-Length: 0x0d` — if parsed as hex, this is 13 bytes. If parsed as decimal, it's invalid. A parser mismatch between front-end and back-end enables smuggling.
- `Content-Length: 13, 14` — a list of two differing values. One parser might take the first, another the last.

## Chunked Transfer Encoding

When the total body size is unknown at the time headers are sent (streaming, server-generated content, compression), HTTP/1.1 uses **chunked transfer encoding**.

### Format

```
chunk-size (hex) CRLF
chunk-data CRLF
...
0 CRLF
[ trailer-section ]
CRLF
```

Each chunk starts with the chunk size in hexadecimal, followed by CRLF, then exactly that many bytes of data, followed by CRLF. A zero-length chunk signals the end of the body.

### Full Example

```http
HTTP/1.1 200 OK
Transfer-Encoding: chunked

4\r\n
Wiki\r\n
7\r\n
pedia i\r\n
B\r\n
n chunks.\r\n
0\r\n
\r\n
```

Decoded body: `Wikipedia in chunks.`

### Chunk Extensions

A chunk-size may be followed by semicolon-separated extensions:

```
a;ext-name=ext-value\r\n
0123456789\r\n
```

Most servers and proxies **ignore** chunk extensions. They exist for potential use cases like per-chunk checksums or metadata, but are rarely used in practice. Some security tools test whether servers handle unexpected extensions safely.

### Trailers

After the final zero-length chunk, **trailer fields** may appear — headers sent after the body:

```http
HTTP/1.1 200 OK
Transfer-Encoding: chunked
Trailer: Checksum

4\r\n
data\r\n
0\r\n
Checksum: abc123\r\n
\r\n
```

Trailers are useful for:
- **Checksums/signatures** — computed as the body streams.
- **Processing status** — whether the server completed successfully.
- **Metadata** — anything that can't be determined until after the body is generated.

The `Trailer` header in the response declares which trailer fields to expect (though this is advisory, not enforced).

### Rules

- Chunk sizes **MUST** be hexadecimal, case-insensitive (`a` and `A` are both valid).
- A zero-length chunk **MUST** be present to terminate the body.
- After the zero-length chunk, the trailer section and final CRLF complete the message.

## Content-Length vs Transfer-Encoding

A message **MUST NOT** contain both `Content-Length` and `Transfer-Encoding`.

RFC 9112 §6.1 is explicit:

> If a message is received with both a Transfer-Encoding and a Content-Length header field, the Transfer-Encoding overrides the Content-Length. Such a message might indicate an attempt to perform request smuggling or response splitting and **ought to be handled as an error**.

### The Request Smuggling Problem

This ambiguity is the **root cause of HTTP request smuggling**. Consider a message with both headers:

```http
POST / HTTP/1.1
Host: example.com
Content-Length: 6
Transfer-Encoding: chunked

0\r\n
\r\n
GPOST
```

- A parser that uses **Transfer-Encoding** sees a zero-length chunk → body ends immediately. The remaining bytes (`GPOST`) are the start of the next request.
- A parser that uses **Content-Length** reads 6 bytes (`0\r\n\r\nG`) as the body. `POST` becomes part of the next request with a different method.

If a front-end proxy uses one interpretation and a back-end server uses another, the attacker controls where one request ends and the next begins. This can:
- **Bypass access controls** — smuggle a request to an internal endpoint.
- **Poison caches** — make the cache store an attacker-controlled response for a victim's URL.
- **Hijack connections** — capture another user's request.

### How Servers Should Handle It

Strict servers should:
1. **Reject** messages with both `Content-Length` and `Transfer-Encoding` with a 400 response.
2. If not rejecting, **always prioritize `Transfer-Encoding`** and ignore `Content-Length`.
3. **Never trust `Content-Length`** when `Transfer-Encoding` is present.

This is one of the most critical compliance checks that Http11Probe performs.

## Transfer-Encoding Obfuscation

Attackers may try to hide `Transfer-Encoding` from one parser while making another recognize it:

```http
Transfer-Encoding: chunked
Transfer-Encoding : chunked
Transfer-Encoding: xchunked
Transfer-Encoding: chunked\r\n (extra space)
Transfer-Encoding:
chunked
```

Each of these variants exploits differences in how parsers handle:
- Whitespace before the colon (forbidden by RFC 9112 §5.1).
- Unknown transfer coding names.
- Obs-fold (deprecated line folding).
- Leading/trailing whitespace in the value.

Strict, RFC-compliant parsing eliminates these attack surfaces.
Loading
Loading