Describe the bug
I have some code which takes uploaded files and passes them into the langchain UnstructuredLoader, which as you can see from my error log down below is calling Unstructured's partition function. When the uploaded file is a zip file I'm using Python's built-in zipfile module to load the contents into file-like objects. I've tried with several different text files with the same results. In all cases I'm passing a file-like object into Unstructured.
- Uploading the text file directly: success
- Uploading a zip file containing PDF, DOCX, PNG etc.: success
- Uploading a zip file containing the working text file:
2025-09-23 16:50:09 File "/shabti/.venv/lib/python3.12/site-packages/unstructured/partition/auto.py", line 292, in partition
2025-09-23 16:50:09 elements = partition(filename=filename, file=file, **partitioning_kwargs)
2025-09-23 16:50:09 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-09-23 16:50:09 File "/shabti/.venv/lib/python3.12/site-packages/unstructured/partition/common/metadata.py", line 162, in wrapper
2025-09-23 16:50:09 elements = func(*args, **kwargs)
2025-09-23 16:50:09 ^^^^^^^^^^^^^^^^^^^^^
2025-09-23 16:50:09 File "/shabti/.venv/lib/python3.12/site-packages/unstructured/chunking/dispatch.py", line 74, in wrapper
2025-09-23 16:50:09 elements = func(*args, **kwargs)
2025-09-23 16:50:09 ^^^^^^^^^^^^^^^^^^^^^
2025-09-23 16:50:09 File "/shabti/.venv/lib/python3.12/site-packages/unstructured/partition/text.py", line 81, in partition_text
2025-09-23 16:50:09 encoding, file_text = read_txt_file(file=file, encoding=encoding)
2025-09-23 16:50:09 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-09-23 16:50:09 File "/shabti/.venv/lib/python3.12/site-packages/unstructured/file_utils/encoding.py", line 146, in read_txt_file
2025-09-23 16:50:09 formatted_encoding, file_text = detect_file_encoding(file=file)
2025-09-23 16:50:09 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-09-23 16:50:09 File "/shabti/.venv/lib/python3.12/site-packages/unstructured/file_utils/encoding.py", line 70, in detect_file_encoding
2025-09-23 16:50:09 byte_data = convert_to_bytes(file)
2025-09-23 16:50:09 ^^^^^^^^^^^^^^^^^^^^^^
2025-09-23 16:50:09 File "/shabti/.venv/lib/python3.12/site-packages/unstructured/partition/common/common.py", line 386, in convert_to_bytes
2025-09-23 16:50:09 raise ValueError("Invalid file-like object type")
2025-09-23 16:50:09 ValueError: Invalid file-like object type
To Reproduce
with zipfile.ZipFile(file) as my_zip:
for info in my_zip.infolist():
loader = UnstructuredLoader(
file=my_zip.open(info),
strategy="auto",
chunking_strategy="by_title",
metadata_filename=info.filename,
)
pages = loader.load()
Expected behavior
The file is able to be loaded
Environment Info
Please run python scripts/collect_env.py and paste the output here.
I can't find where this collect_env.py is in my installation.
I created a Docker image based on astral/uv:python3.12-trixie-slim with unstructured[all-docs]>=0.18.14 in my Python dependencies. I have installed all the recommended system dependencies except libmagic as I am also having some issues with that.
Describe the bug
I have some code which takes uploaded files and passes them into the langchain UnstructuredLoader, which as you can see from my error log down below is calling Unstructured's partition function. When the uploaded file is a zip file I'm using Python's built-in
zipfilemodule to load the contents into file-like objects. I've tried with several different text files with the same results. In all cases I'm passing a file-like object into Unstructured.To Reproduce
Expected behavior
The file is able to be loaded
Environment Info
Please run
python scripts/collect_env.pyand paste the output here.I can't find where this
collect_env.pyis in my installation.I created a Docker image based on
astral/uv:python3.12-trixie-slimwithunstructured[all-docs]>=0.18.14in my Python dependencies. I have installed all the recommended system dependencies except libmagic as I am also having some issues with that.