Skip to content

Commit 4f410af

Browse files
author
bgagent
committed
chore(review): update doc
1 parent e5328ca commit 4f410af

2 files changed

Lines changed: 24 additions & 36 deletions

File tree

docs/design/ATTACHMENTS.md

Lines changed: 12 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -423,6 +423,8 @@ flowchart TD
423423
MB -->|Invalid| R[REJECTED: not a valid image]
424424
MB -->|Valid| D[Dimension check: parse PNG IHDR / JPEG SOF]
425425
D -->|> 8000px| OV[REJECTED: oversized]
426+
D -->|Unparseable + > 5 MB| FC[REJECTED: fail-closed, dimensions unverifiable]
427+
D -->|Unparseable + <= 5 MB| G[Bedrock Guardrail: rely on Bedrock validation]
426428
D -->|OK| G[Bedrock Guardrail: ApplyGuardrail with image content block, retries]
427429
G -->|INTERVENED| B[BLOCKED: content policy violation]
428430
G -->|NONE| P[PASSED: original bytes stored as-is]
@@ -433,7 +435,7 @@ flowchart TD
433435

434436
**Magic bytes validation:** Verify the first bytes against known image signatures before any further processing. A file claiming to be `image/png` must start with `\x89PNG\r\n\x1a\n`. This prevents polyglot files (e.g., an image header followed by executable code) from reaching the screening pipeline.
435437

436-
**Dimension checks:** Image dimensions are read from PNG IHDR chunks and JPEG SOF markers using pure buffer parsing (no native dependencies). Images exceeding 8000px on either side are rejected before the Bedrock call.
438+
**Dimension checks:** Image dimensions are read from PNG IHDR chunks and JPEG SOF markers using pure buffer parsing (no native dependencies). Images exceeding 8000px on either side are rejected before the Bedrock call. For PNGs, a missing IHDR chunk is a hard failure (the file is corrupt or incomplete). For JPEGs, if the SOF marker cannot be found: files > 5 MB are rejected (fail-closed — an unparseable large JPEG is too risky to forward without dimension verification); smaller files are allowed through with a logged warning, relying on Bedrock's own validation to reject oversized images.
437439

438440
**Bedrock image screening:** The `ApplyGuardrailCommand` supports `image` content blocks with `png` and `jpeg` formats. Raw image bytes are passed directly — no re-encoding or format conversion needed.
439441

@@ -728,21 +730,13 @@ async function resolveAttachments(attachments, ...) {
728730

729731
for (const att of attachments) {
730732
if (att.type === 'image') {
731-
// getImageDimensions parses PNG IHDR / JPEG SOF markers from the buffer.
732-
// If dimensions cannot be determined (corrupt image, unsupported format variant),
733-
// throw AttachmentResolutionError — never default to (0,0) or skip the estimate.
734-
let width: number, height: number;
735-
try {
736-
({ width, height } = await getImageDimensions(att));
737-
} catch (err) {
738-
throw new AttachmentResolutionError(
739-
`Cannot determine dimensions for image "${att.filename}". ` +
740-
`The image may be corrupt or in an unsupported format variant. ` +
741-
`Re-export the image and try again.`,
742-
{ cause: err },
743-
);
744-
}
745-
const tokenCost = estimateImageTokens(width, height);
733+
// estimateImageTokensFromBuffer parses PNG IHDR / JPEG SOF markers.
734+
// Returns undefined when dimensions cannot be determined (unusual JPEG
735+
// encoder, corrupt tail). This is non-fatal — use MAX_IMAGE_TOKENS as a
736+
// conservative fallback so budget enforcement still works (overestimates
737+
// rather than underestimates).
738+
const tokenCost = estimateImageTokensFromBuffer(att.content, att.content_type)
739+
?? MAX_IMAGE_TOKENS;
746740
att.token_estimate = tokenCost;
747741
attachmentTokenBudget += tokenCost;
748742
}
@@ -763,7 +757,7 @@ async function resolveAttachments(attachments, ...) {
763757
}
764758
```
765759

766-
**Policy:** If image attachments consume more than `USER_PROMPT_TOKEN_BUDGET - MIN_TEXT_TOKEN_BUDGET` tokens (i.e., they would leave fewer than 20K tokens for text context), the task fails with a clear error. The user can reduce image count or downscale images before resubmitting.
760+
**Policy:** If image attachments consume more than `USER_PROMPT_TOKEN_BUDGET - MIN_TEXT_TOKEN_BUDGET` tokens (i.e., they would leave fewer than 20K tokens for text context), the task fails with a clear error. The user can reduce image count or downscale images before resubmitting. When dimensions are unparseable, `MAX_IMAGE_TOKENS` (1568) is used as a conservative budget estimate — this may slightly overcount, but ensures the budget check never underestimates token cost due to parsing limitations.
767761

768762
**Token budget vs. payload size:** The token budget above measures **vision tokens** (based on pixel dimensions). This is separate from the **API payload size**, which is affected by base64 encoding overhead (~33% expansion). Image attachments are sent as multimodal content blocks with base64-encoded data, so a 10 MB image becomes ~13.3 MB in the API request. The Anthropic API has its own request size limits (separate from our Lambda payload limits). The `MAX_ATTACHMENT_SIZE_BYTES` (10 MB) is chosen to ensure that even after base64 expansion, individual images stay within the Anthropic API's per-image limits. For multiple large images, the total base64-encoded payload is bounded by the 50 MB total task limit (which produces ~67 MB base64), but in practice the vision token budget is the binding constraint — 10 full-resolution images would consume ~18,820 vision tokens (well within the 100K budget) but produce a very large API payload. The agent should stream images from local files rather than holding all base64 data in memory simultaneously.
769763

@@ -1615,7 +1609,7 @@ The implementation is ordered to deliver value incrementally while maintaining s
16151609
34. Add `AttachmentConfig` and `PreparedAttachment` Pydantic models to agent `models.py` (with validators, `s3_version_id` required, `checksum_sha256` required as lowercase hex)
16161610
35. Add attachment download from S3 with pinned `VersionId` (via IAM role) and mandatory SHA-256 integrity verification
16171611
36. Add multimodal content blocks for image attachments in agent prompt
1618-
37. Add token budget accounting with resize-aware formula matching Anthropic docs (1568px cap, 28px tile padding, 1568 token cap, 1.2x safety margin), with explicit error path for `getImageDimensions` failures
1612+
37. Add token budget accounting with resize-aware formula matching Anthropic docs (1568px cap, 28px tile padding, 1568 token cap, 1.2x safety margin), with conservative `MAX_IMAGE_TOKENS` fallback when dimensions are unparseable (non-fatal — avoids rejecting valid images with unusual JPEG encoders)
16191613
38. Add `AttachmentBudgetExceededError` (extends `AttachmentError` base class — already caught by hydration re-throw list from Phase 1 step 10)
16201614
39. Add agent capability check in orchestrator (fail task if agent doesn't support attachments)
16211615
40. Add parity test: `AgentAttachmentPayload` fields match `AttachmentConfig` fields (including `s3_version_id` and `checksum_sha256`)

docs/src/content/docs/architecture/Attachments.md

Lines changed: 12 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -427,6 +427,8 @@ flowchart TD
427427
MB -->|Invalid| R[REJECTED: not a valid image]
428428
MB -->|Valid| D[Dimension check: parse PNG IHDR / JPEG SOF]
429429
D -->|> 8000px| OV[REJECTED: oversized]
430+
D -->|Unparseable + > 5 MB| FC[REJECTED: fail-closed, dimensions unverifiable]
431+
D -->|Unparseable + <= 5 MB| G[Bedrock Guardrail: rely on Bedrock validation]
430432
D -->|OK| G[Bedrock Guardrail: ApplyGuardrail with image content block, retries]
431433
G -->|INTERVENED| B[BLOCKED: content policy violation]
432434
G -->|NONE| P[PASSED: original bytes stored as-is]
@@ -437,7 +439,7 @@ flowchart TD
437439

438440
**Magic bytes validation:** Verify the first bytes against known image signatures before any further processing. A file claiming to be `image/png` must start with `\x89PNG\r\n\x1a\n`. This prevents polyglot files (e.g., an image header followed by executable code) from reaching the screening pipeline.
439441

440-
**Dimension checks:** Image dimensions are read from PNG IHDR chunks and JPEG SOF markers using pure buffer parsing (no native dependencies). Images exceeding 8000px on either side are rejected before the Bedrock call.
442+
**Dimension checks:** Image dimensions are read from PNG IHDR chunks and JPEG SOF markers using pure buffer parsing (no native dependencies). Images exceeding 8000px on either side are rejected before the Bedrock call. For PNGs, a missing IHDR chunk is a hard failure (the file is corrupt or incomplete). For JPEGs, if the SOF marker cannot be found: files > 5 MB are rejected (fail-closed — an unparseable large JPEG is too risky to forward without dimension verification); smaller files are allowed through with a logged warning, relying on Bedrock's own validation to reject oversized images.
441443

442444
**Bedrock image screening:** The `ApplyGuardrailCommand` supports `image` content blocks with `png` and `jpeg` formats. Raw image bytes are passed directly — no re-encoding or format conversion needed.
443445

@@ -732,21 +734,13 @@ async function resolveAttachments(attachments, ...) {
732734

733735
for (const att of attachments) {
734736
if (att.type === 'image') {
735-
// getImageDimensions parses PNG IHDR / JPEG SOF markers from the buffer.
736-
// If dimensions cannot be determined (corrupt image, unsupported format variant),
737-
// throw AttachmentResolutionError — never default to (0,0) or skip the estimate.
738-
let width: number, height: number;
739-
try {
740-
({ width, height } = await getImageDimensions(att));
741-
} catch (err) {
742-
throw new AttachmentResolutionError(
743-
`Cannot determine dimensions for image "${att.filename}". ` +
744-
`The image may be corrupt or in an unsupported format variant. ` +
745-
`Re-export the image and try again.`,
746-
{ cause: err },
747-
);
748-
}
749-
const tokenCost = estimateImageTokens(width, height);
737+
// estimateImageTokensFromBuffer parses PNG IHDR / JPEG SOF markers.
738+
// Returns undefined when dimensions cannot be determined (unusual JPEG
739+
// encoder, corrupt tail). This is non-fatal — use MAX_IMAGE_TOKENS as a
740+
// conservative fallback so budget enforcement still works (overestimates
741+
// rather than underestimates).
742+
const tokenCost = estimateImageTokensFromBuffer(att.content, att.content_type)
743+
?? MAX_IMAGE_TOKENS;
750744
att.token_estimate = tokenCost;
751745
attachmentTokenBudget += tokenCost;
752746
}
@@ -767,7 +761,7 @@ async function resolveAttachments(attachments, ...) {
767761
}
768762
```
769763

770-
**Policy:** If image attachments consume more than `USER_PROMPT_TOKEN_BUDGET - MIN_TEXT_TOKEN_BUDGET` tokens (i.e., they would leave fewer than 20K tokens for text context), the task fails with a clear error. The user can reduce image count or downscale images before resubmitting.
764+
**Policy:** If image attachments consume more than `USER_PROMPT_TOKEN_BUDGET - MIN_TEXT_TOKEN_BUDGET` tokens (i.e., they would leave fewer than 20K tokens for text context), the task fails with a clear error. The user can reduce image count or downscale images before resubmitting. When dimensions are unparseable, `MAX_IMAGE_TOKENS` (1568) is used as a conservative budget estimate — this may slightly overcount, but ensures the budget check never underestimates token cost due to parsing limitations.
771765

772766
**Token budget vs. payload size:** The token budget above measures **vision tokens** (based on pixel dimensions). This is separate from the **API payload size**, which is affected by base64 encoding overhead (~33% expansion). Image attachments are sent as multimodal content blocks with base64-encoded data, so a 10 MB image becomes ~13.3 MB in the API request. The Anthropic API has its own request size limits (separate from our Lambda payload limits). The `MAX_ATTACHMENT_SIZE_BYTES` (10 MB) is chosen to ensure that even after base64 expansion, individual images stay within the Anthropic API's per-image limits. For multiple large images, the total base64-encoded payload is bounded by the 50 MB total task limit (which produces ~67 MB base64), but in practice the vision token budget is the binding constraint — 10 full-resolution images would consume ~18,820 vision tokens (well within the 100K budget) but produce a very large API payload. The agent should stream images from local files rather than holding all base64 data in memory simultaneously.
773767

@@ -1619,7 +1613,7 @@ The implementation is ordered to deliver value incrementally while maintaining s
16191613
34. Add `AttachmentConfig` and `PreparedAttachment` Pydantic models to agent `models.py` (with validators, `s3_version_id` required, `checksum_sha256` required as lowercase hex)
16201614
35. Add attachment download from S3 with pinned `VersionId` (via IAM role) and mandatory SHA-256 integrity verification
16211615
36. Add multimodal content blocks for image attachments in agent prompt
1622-
37. Add token budget accounting with resize-aware formula matching Anthropic docs (1568px cap, 28px tile padding, 1568 token cap, 1.2x safety margin), with explicit error path for `getImageDimensions` failures
1616+
37. Add token budget accounting with resize-aware formula matching Anthropic docs (1568px cap, 28px tile padding, 1568 token cap, 1.2x safety margin), with conservative `MAX_IMAGE_TOKENS` fallback when dimensions are unparseable (non-fatal — avoids rejecting valid images with unusual JPEG encoders)
16231617
38. Add `AttachmentBudgetExceededError` (extends `AttachmentError` base class — already caught by hydration re-throw list from Phase 1 step 10)
16241618
39. Add agent capability check in orchestrator (fail task if agent doesn't support attachments)
16251619
40. Add parity test: `AgentAttachmentPayload` fields match `AttachmentConfig` fields (including `s3_version_id` and `checksum_sha256`)

0 commit comments

Comments
 (0)