Skip to content

Commit 988db46

Browse files
authored
Merge pull request #10 from lua-ai-global/feat/multi-modal-scanning
feat(scan): opt-in multi-modal scan orchestration
2 parents 66bb728 + f89155f commit 988db46

5 files changed

Lines changed: 853 additions & 14 deletions

File tree

README.md

Lines changed: 9 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -91,13 +91,15 @@ is exactly what it does and does not do:
9191
Tool executions **inside** AWS action groups are opaque — the adapter
9292
cannot see them, let alone block them. Use `guardToolUse()` to enforce
9393
at the tool level manually, or push tool calls onto the host side.
94-
- **Multi-modal content is not scanned by default.** Image, PDF, and audio
95-
blocks on Anthropic/Vercel AI/Genkit/LlamaIndex/Bedrock pass through
96-
without injection detection in the current release — a vision-enabled
97-
agent bypasses every input scan unless you wire your own scanner.
98-
Opt-in per-modality scanning (image OCR, PDF text extract, Whisper for
99-
audio) is on the near-term roadmap; cost, latency, and data-egress
100-
considerations mean it will ship as opt-in, not on-by-default.
94+
- **Multi-modal scanning is opt-in.** Image, PDF, and audio blocks pass
95+
through without injection detection by default. Register a per-modality
96+
extractor with `registerModalityScanner()` and call `scanMultiModal()`
97+
from `governance-sdk/scan/multi-modal` before `enforce()`; the result's
98+
concatenated text feeds the existing cascade. The SDK ships the
99+
orchestration only — the actual OCR / PDF parser / ASR is caller-
100+
supplied so the zero-dep promise stands. Defaults to text-only;
101+
per-block timeouts and fail-closed semantics (`onMissingScanner`,
102+
`onExtractError`) are configurable.
101103

102104
## Packages
103105

packages/governance/README.md

Lines changed: 9 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -91,13 +91,15 @@ is exactly what it does and does not do:
9191
Tool executions **inside** AWS action groups are opaque — the adapter
9292
cannot see them, let alone block them. Use `guardToolUse()` to enforce
9393
at the tool level manually, or push tool calls onto the host side.
94-
- **Multi-modal content is not scanned by default.** Image, PDF, and audio
95-
blocks on Anthropic/Vercel AI/Genkit/LlamaIndex/Bedrock pass through
96-
without injection detection in the current release — a vision-enabled
97-
agent bypasses every input scan unless you wire your own scanner.
98-
Opt-in per-modality scanning (image OCR, PDF text extract, Whisper for
99-
audio) is on the near-term roadmap; cost, latency, and data-egress
100-
considerations mean it will ship as opt-in, not on-by-default.
94+
- **Multi-modal scanning is opt-in.** Image, PDF, and audio blocks pass
95+
through without injection detection by default. Register a per-modality
96+
extractor with `registerModalityScanner()` and call `scanMultiModal()`
97+
from `governance-sdk/scan/multi-modal` before `enforce()`; the result's
98+
concatenated text feeds the existing cascade. The SDK ships the
99+
orchestration only — the actual OCR / PDF parser / ASR is caller-
100+
supplied so the zero-dep promise stands. Defaults to text-only;
101+
per-block timeouts and fail-closed semantics (`onMissingScanner`,
102+
`onExtractError`) are configurable.
101103

102104
## Packages
103105

packages/governance/package.json

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -73,6 +73,10 @@
7373
"types": "./dist/scanner-plugins/types.d.ts",
7474
"import": "./dist/scanner-plugins/types.js"
7575
},
76+
"./scan/multi-modal": {
77+
"types": "./dist/scan/multi-modal.d.ts",
78+
"import": "./dist/scan/multi-modal.js"
79+
},
7680
"./policy-compose": {
7781
"types": "./dist/policy-compose.d.ts",
7882
"import": "./dist/policy-compose.js"

0 commit comments

Comments
 (0)