Skip to content

Commit ab102dc

Browse files
committed
fix: Correct BigQuery example to use bq load instead of bq insert
- Replace incorrect 'bq insert' with 'bq load' command - Use newline-delimited JSON format as required by BigQuery - Implement batching for efficient loading (avoids per-request overhead) - Add trap handler for graceful shutdown - Include requester_ip field in logged data - Use proper --source_format and --autodetect flags
1 parent 651595d commit ab102dc

1 file changed

Lines changed: 36 additions & 13 deletions

File tree

docs/guide/request-logging.md

Lines changed: 36 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ httpjail --request-log requests.log --js "true" -- npm install
1111
## Log Format
1212

1313
Each request is logged on a single line:
14+
1415
```
1516
<timestamp> <+/-> <METHOD> <URL>
1617
```
@@ -26,9 +27,11 @@ Each request is logged on a single line:
2627
2025-09-22T14:23:45.345Z - POST https://analytics.example.com/track
2728
```
2829

29-
## BigQuery Integration
30+
## Example: BigQuery Integration
3031

31-
Stream request logs to BigQuery using Line Processor mode:
32+
Achieve more advanced logging with the line processor
33+
rule engine (`--proc`). Here's an example of how to log to
34+
every request to BigQuery:
3235

3336
```bash
3437
#!/bin/bash
@@ -38,34 +41,54 @@ Stream request logs to BigQuery using Line Processor mode:
3841
PROJECT="my-project"
3942
DATASET="httpjail_logs"
4043
TABLE="requests"
44+
BATCH_FILE="/tmp/requests-$$.ndjson"
45+
46+
# Process requests in batches
47+
batch_count=0
48+
max_batch=100
4149

42-
# Process requests and log to BigQuery
4350
while read -r line; do
44-
# Parse the request JSON
45-
request=$(echo "$line" | jq -c '{
51+
# Parse and enrich the request
52+
echo "$line" | jq -c '{
4653
timestamp: now | todate,
4754
url: .url,
4855
method: .method,
4956
host: .host,
50-
path: .path
51-
}')
57+
path: .path,
58+
requester_ip: .requester_ip
59+
}' >> "$BATCH_FILE"
60+
61+
batch_count=$((batch_count + 1))
5262

53-
# Log to BigQuery (streaming insert)
54-
echo "$request" | bq insert --project_id="$PROJECT" \
55-
"$DATASET.$TABLE"
63+
# Load batch when threshold reached
64+
if [ $batch_count -ge $max_batch ]; then
65+
bq load --source_format=NEWLINE_DELIMITED_JSON \
66+
--autodetect \
67+
"$PROJECT:$DATASET.$TABLE" \
68+
"$BATCH_FILE"
69+
70+
> "$BATCH_FILE" # Clear batch file
71+
batch_count=0
72+
fi
5673

5774
# Allow all requests
5875
echo "true"
5976
done
77+
78+
# Load any remaining records on exit
79+
trap 'bq load --source_format=NEWLINE_DELIMITED_JSON --autodetect "$PROJECT:$DATASET.$TABLE" "$BATCH_FILE"' EXIT
6080
```
6181

6282
Usage:
83+
6384
```bash
6485
httpjail --proc ./log-to-bigquery.sh --request-log local-backup.log -- your-app
6586
```
6687

6788
This approach:
68-
- Streams requests to BigQuery in real-time
89+
90+
- Batches requests for efficient BigQuery loading
6991
- Maintains a local backup in `local-backup.log`
70-
- Allows custom processing and enrichment
71-
- Scales to high-volume applications
92+
- Uses newline-delimited JSON format (required by BigQuery)
93+
- Handles graceful shutdown with trap to load remaining data
94+
- Avoids per-request overhead of streaming inserts

0 commit comments

Comments
 (0)