Skip to content

feat(adk/backend/local): add MultiModalRead for images and PDFs#814

Open
JonXSnow wants to merge 1 commit intomainfrom
feat/local-multimodal-read
Open

feat(adk/backend/local): add MultiModalRead for images and PDFs#814
JonXSnow wants to merge 1 commit intomainfrom
feat/local-multimodal-read

Conversation

@JonXSnow
Copy link
Copy Markdown
Contributor

Add MultiModalRead to *Local that returns structured multimodal content for image and PDF files, with fallback to Read for other types.

Images:

  • Read raw bytes via local os.ReadFile (stat-based size pre-check)
  • Detect MIME type via magic-number headers (PNG/JPEG/GIF/BMP/WebP/TIFF)
  • Enforce 10 MB size limit

PDFs:

  • Full read (no 'pages'): return raw PDF bytes, 20 MB limit
  • Paged read (with 'pages'): stream pages directly from disk via fitz.New(path) (avoids loading the file into memory), render the page range to PNG at 150 DPI, 100 MB size limit, max 20 pages per request
  • Validate 'pages' parameter syntax and ranges upfront
  • Clamp end page with warning when it exceeds total page count
  • ctx-aware render loop: cancelled context aborts between pages

Size enforcement:

  • Stat-based primary check rejects oversize files before allocation
  • Secondary length sanity check after ReadFile
  • Actionable error messages (e.g. suggest 'pages' param when full-read limit is exceeded)
  • not-exist errors wrap os.ErrNotExist so callers keep errors.Is semantics

Also:

  • Bump go directive to 1.23; upgrade eino to v0.9.0-alpha.5.0.2026... to pick up MultiModalRead types
  • Add go-fitz (MuPDF via purego/ffi) dependency; README documents the native library install commands per OS
  • Add table-driven tests for parsePagesParam, detectImageMIME, and the MultiModalRead branches (text fallback, image, oversize image, full/paged PDF, invalid pages)
  • Update examples/backend/main.go with a MultiModalRead demo and fix the stale middlewares/filesystem import path

Add MultiModalRead to *Local that returns structured multimodal content
for image and PDF files, with fallback to Read for other types.

Images:
- Read raw bytes via local os.ReadFile (stat-based size pre-check)
- Detect MIME type via magic-number headers (PNG/JPEG/GIF/BMP/WebP/TIFF)
- Enforce 10 MB size limit

PDFs:
- Full read (no 'pages'): return raw PDF bytes, 20 MB limit
- Paged read (with 'pages'): stream pages directly from disk via
  fitz.New(path) (avoids loading the file into memory), render the page
  range to PNG at 150 DPI, 100 MB size limit, max 20 pages per request
- Validate 'pages' parameter syntax and ranges upfront
- Clamp end page with warning when it exceeds total page count
- ctx-aware render loop: cancelled context aborts between pages

Size enforcement:
- Stat-based primary check rejects oversize files before allocation
- Secondary length sanity check after ReadFile
- Actionable error messages (e.g. suggest 'pages' param when full-read
  limit is exceeded)
- not-exist errors wrap os.ErrNotExist so callers keep errors.Is semantics

Also:
- Bump go directive to 1.23; upgrade eino to v0.9.0-alpha.5.0.2026...
  to pick up MultiModalRead types
- Add go-fitz (MuPDF via purego/ffi) dependency; README documents the
  native library install commands per OS
- Add table-driven tests for parsePagesParam, detectImageMIME, and the
  MultiModalRead branches (text fallback, image, oversize image,
  full/paged PDF, invalid pages)
- Update examples/backend/main.go with a MultiModalRead demo and fix
  the stale middlewares/filesystem import path
@github-actions
Copy link
Copy Markdown

Need to create a new tag

The following modules have changes and may need version updates:

  • adk/backend/local (Current: adk/backend/local/v0.2.4)

⚠️ Please create and push new version tags for these modules after merging this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant