Supporting MuPDF file recognizer#4481
Conversation
44500cd to
1fb4d0d
Compare
We previously used slightly different logic for opening documents from files versus from memory. This fix strives to always use MuPDF's file type recognition and thus become independent from the value of file extensions as much as possible. This works for almost all document types. Exceptions are "txt", "fb2" and "mobi", where either a valid file extension must be present, or the respective filetype must be provided.
1fb4d0d to
2c90d6c
Compare
| if not handler: | ||
| raise FileDataError("Failed to open stream as {magic}") |
There was a problem hiding this comment.
Does this contradict the new docs which says:
:arg str filetype: A string specifying the type of document. Will be ignored in most [#f8]_ cases for :ref:a supported document type<Supported_File_Types>.
?
Shouldn't we should get a handler with fz_recognize_document_stream_content() here, so that the contents overrides the supplied magic where possible?
Or simply use fz_open_document_with_stream(), i.e. don't create an intermediate handler? I.e. remove the entire if magic: ... block in the code?
There was a problem hiding this comment.
The recognizer seems to always detect the type except in 3 cases: "txt", "fb2", "mobi". This means we have identical behavior between file-open versus stream-open.
Only those three cases still require the correct extension, respectively filetype. Everything else disregards file extension, respectively needs no filetype-help.
The two cases "fb2" and "mobi" can be resolved with a little improved recognizer, which will hopefully happen soon.
The case "txt" remains an exception because of principle reasons.
This is the background of my new logic.
jamie-lemon
left a comment
There was a problem hiding this comment.
Docs side of things is fine. No typos - all good! :)
We previously used slightly different logic for opening documents from files versus from memory. This fix strives to always use MuPDF's file type recognition and thus become independent from the value of file extensions as much as possible. This works for almost all document types. Exceptions are "txt", "fb2" and "mobi", where either a valid file extension must be present, or the respective filetype must be provided.