- Switch PyPI publishing to GitHub trusted publishing so releases can publish via OIDC without a long-lived
PYPI_TOKEN secret.
- Align release automation, package metadata, and generator config on
0.43.2 for the trusted-publishing release flow.
- Add split-PDF observability with operation-aware batch planning, timeout, cancellation, and completion logs.
- Make long-running integration tests stream live progress, timings, and backend failure context for split and single partition phases.
- Preserve chunk-local transport retries for split-PDF execution even when SDK-level retries disable connection-error retries for top-level requests.
- Harden split-PDF timeout and cleanup paths against closed event loops and cancelled chunk tasks.
- Stabilize
hi_res split integration coverage by using a smaller derived multi-page fixture instead of the flaky full layout-parser-paper.pdf path for equivalence and caching checks.
- Retry on all
httpx.TransportError subclasses (including ReadError, WriteError, ConnectError, RemoteProtocolError, and all timeout types) when retry_connection_errors=True. Previously only ConnectError, RemoteProtocolError, and TimeoutException were retried — ReadError (TCP connection reset mid-response) was treated as permanent.
- Retry on
httpx.RemoteProtocolError (e.g. "Server disconnected without sending a response") when retry_connection_errors=True. Previously, mid-request server crashes were treated as permanent errors and not retried.
- Support for on-demand jobs via CreateJob API
- New Read-only APIs GetTemplate and ListTemplates
- Bump dependencies to account for vulnerabilities in pypdf < 6.1.3
- Enable arbitrary dictionary inputs for
CreateSourceConnectorConfig and CreateDestinationConnectorConfig. This decouples us from the backend schemas. Users can send new connector config fields without having to upgrade their client.
- Enable arbitrary inputs for
SourceConnectorType and DestinationConnectorType. This lets the client support new connector types without having to upgrade.
- potential issue referencing models before declaration (commit by @mfbx9da4)
- Fix some environments failing to split pdfs with
Can't patch loop of type <class 'uvloop.Loop'>, remove usage of nest-asyncio
- Remove some operations under
client.users that are not fully ready yet
- Provide a base
UnstructuredClientError to capture every error raised by the SDK. Note that some exceptions such as SDKError now have more information in the message field. This will impact any users who rely on string matching in their error handling.
- Improve PDF validation error handling by introducing FileValidationError base class for better error abstraction
- Replace RequestError with PDFValidationError for invalid PDF files to provide more accurate error context
- Throws appropriate error message in case the given PDF file is invalid (corrupted or encrypted).
- Add Unstructured Platform APIs to manage source and destination connectors, workflows, and workflow runs
WARNING: This is a breaking change for the use of non-default
server_url settings in the client usage.
To set the custom URL for the client, use the the server_url parameter in a given operation:
elements = client.general.partition(
request=operations.PartitionRequest(
partition_parameters=shared.PartitionParameters(
files=shared.Files(
content=doc_file,
file_name="your_document.pdf",
),
strategy=shared.Strategy.FAST,
)
),
server_url="your_server_url",
)
- Use the configured server_url for our split page "dummy" request
- Switch to a httpx based client instead of requests
- Switch to poetry for dependency management
- Add client side parameter checking via Pydantic or TypedDict interfaces
- Add
partition_async for a non blocking alternative to partition
- Address some asyncio based errors in pdf splitting logic