Replies: 2 comments 1 reply
-
|
Hey @Dynteq , Issue #1531 — Suppressing image transcription in PPTX Q1 — Can online image transcription be suppressed? Simply run markitdown without those parameters: CLI — no LLM = no image description, no internet callmarkitdown example.pptx -o example.md Python — same resultfrom markitdown import MarkItDown Q2 — Does markitdown access the internet for other things? Audio transcription uses speech_recognition which leverages Google's API — so audio conversion does send data online. (DEV Community) Text extraction from PPTX/DOCX/PDF is fully local. The internet is only used when you explicitly configure LLM integration or convert audio files. For sensitive documents: Use markitdown with no llm_client, no audio conversion, and you're fully offline. 👍 If this helped you, please mark it as the answer — it helps others in the community who run into the same issue find the solution faster! |
Beta Was this translation helpful? Give feedback.
-
|
Short answer:
So for sensitive PPTX files, plain local conversion should stay local unless you explicitly enable an online feature. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I just found and installed markitdown, and it works very well.
Even more than I expected. I used the command-line to convert a .pptx so I could to have a discussion with my favorite llm about it. To my surprise it also described the pictures in there which is not mentioned on the first page anywhere.
This was thankfully not a sensitive document but for others I would hate to find out after the fact that they (or the images) were send over the internet to some unnamed AI to be analyzed.
What I did after installation:
markitdown example.pptx -o example.mdResult:
an excellent .md of all the text in the document, including a description of the enclosed image
Question 1:
Can the (online) transcription of images be suppressed? (I didn't find an argument to do so)
Question2:
Does marktidown access the internet for other things? In that case it should be clearly stated as this would make it unsafe for sensitive data that should stay local.
Kind regards,
Martijn
Beta Was this translation helpful? Give feedback.
All reactions