Is this a client library issue or a product issue?
This is both, but the client library has an actionable gap: the SDK raises no exception, warning, or structured signal when the model silently fails to process an uploaded PDF. The call returns 200 OK with a natural-language response asking the user to paste the document manually — indistinguishable from a successful response in automated pipelines. The underlying model-level regression is separately reported on the Google AI Dev Forum.
Environment details
- Programming language: Python
- OS: Linux / macOS (reproduced on both)
- Language runtime version: Python 3.11+
- Package version:
google-generativeai latest
Steps to reproduce
-
Take a PDF file whose Type1 fonts use a custom /Encoding with /Differences array but have no /ToUnicode map (e.g. any KID document generated by Neevia docCreator v4.5 — full file analysis in the Dev Forum post linked above).
-
Upload the file and call generate_content() targeting any Gemini 3.x model:
import google.generativeai as genai
genai.configure(api_key="YOUR_API_KEY")
uploaded_file = genai.upload_file(
path="LU0089290844_KID.pdf",
mime_type="application/pdf"
)
model = genai.GenerativeModel(model_name="gemini-3.5-flash")
# also reproduced with: gemini-3.1-pro, gemini-3.1-flash-lite
response = model.generate_content([
uploaded_file,
"Extract all data from this PDF document."
])
print(response.text)
-
Observe that:
- No exception is raised.
response.text contains a message such as "It seems the text of the document was not included — please paste it directly."
- There is no structured field in the response to detect the failure programmatically.
-
Switch model_name to "gemini-2.5-flash" with identical code and the same file → correct extracted content is returned.
Expected behavior
- The model extracts the PDF content correctly (as
gemini-2.5-flash does), or
- The SDK raises a warning / structured error signal when the uploaded file is not processed, so callers can detect and handle the failure in automated pipelines.
Actual behavior
The call succeeds with HTTP 200. The model silently ignores the PDF content and returns a natural-language fallback response. No exception, no warning, no detectable signal.
Additional context
I analysed 4 failing files and 2 working references. The pattern is fully reproducible:
| File |
Producer |
/ToUnicode missing |
Result |
LU0089290844_KID.pdf |
Neevia docCreator v4.5 |
All Type1 fonts |
❌ Empty |
LU2533812058_KID.pdf |
Neevia docCreator v4.5 |
All Type1 fonts |
❌ Empty |
LU2314312922_KID.pdf |
Neevia docCreator v4.5 |
All Type1 fonts |
❌ Empty |
LU2526007799_KID.pdf |
Neevia docCreator v5.0 |
/R39 on page 2 only |
⚠️ Partial — page 2 corrupted |
PRIIP_KID_F0GBR04BQM_299.pdf |
Neevia docCreator v5.0 |
None |
✅ OK |
All files produced by Neevia docCreator v4.5 systematically omit /ToUnicode on Type1 fonts with custom encoding. Without /ToUnicode, a conforming PDF text extractor (ISO 32000) cannot map glyph codes to Unicode and reads the document as empty. gemini-2.5-flash and libraries like pypdf handle this correctly by falling back to glyph names in /Differences via the Adobe Glyph List. Gemini 3.x does not apply this fallback.
SDK-level ask: even if the model fix must happen on the product side, the library could optionally add a pre-flight check warning callers when a PDF's fonts lack /ToUnicode maps, preventing silent failures in production pipelines.
Happy to share the PDF files or further details if helpful.
Is this a client library issue or a product issue?
This is both, but the client library has an actionable gap: the SDK raises no exception, warning, or structured signal when the model silently fails to process an uploaded PDF. The call returns
200 OKwith a natural-language response asking the user to paste the document manually — indistinguishable from a successful response in automated pipelines. The underlying model-level regression is separately reported on the Google AI Dev Forum.Environment details
google-generativeailatestSteps to reproduce
Take a PDF file whose Type1 fonts use a custom
/Encodingwith/Differencesarray but have no/ToUnicodemap (e.g. any KID document generated by Neevia docCreator v4.5 — full file analysis in the Dev Forum post linked above).Upload the file and call
generate_content()targeting any Gemini 3.x model:Observe that:
response.textcontains a message such as "It seems the text of the document was not included — please paste it directly."Switch
model_nameto"gemini-2.5-flash"with identical code and the same file → correct extracted content is returned.Expected behavior
gemini-2.5-flashdoes), orActual behavior
The call succeeds with HTTP 200. The model silently ignores the PDF content and returns a natural-language fallback response. No exception, no warning, no detectable signal.
Additional context
I analysed 4 failing files and 2 working references. The pattern is fully reproducible:
/ToUnicodemissingLU0089290844_KID.pdfLU2533812058_KID.pdfLU2314312922_KID.pdfLU2526007799_KID.pdfPRIIP_KID_F0GBR04BQM_299.pdfAll files produced by Neevia docCreator v4.5 systematically omit
/ToUnicodeon Type1 fonts with custom encoding. Without/ToUnicode, a conforming PDF text extractor (ISO 32000) cannot map glyph codes to Unicode and reads the document as empty.gemini-2.5-flashand libraries likepypdfhandle this correctly by falling back to glyph names in/Differencesvia the Adobe Glyph List. Gemini 3.x does not apply this fallback.SDK-level ask: even if the model fix must happen on the product side, the library could optionally add a pre-flight check warning callers when a PDF's fonts lack
/ToUnicodemaps, preventing silent failures in production pipelines.Happy to share the PDF files or further details if helpful.