Version: 1.2 Date: December 19, 2024
This document outlines the software design specification for a new Python client library for the Nutrient Document Web Services (DWS) API. The goal of this project is to create a high-quality, lightweight, and intuitive Python package that simplifies interaction with the Nutrient DWS API for developers.
The library will provide two primary modes of interaction:
- A Direct API for executing single, discrete document processing tasks (e.g., converting a single file, rotating a page).
- A Builder API that offers a fluent, chainable interface for composing and executing complex, multi-step document processing workflows, abstracting the
POST /buildendpoint of the Nutrient API.
The final product will be a distributable package suitable for publishing on PyPI, with comprehensive documentation. The design prioritizes ease of use, adherence to Python best practices, and clear documentation consumable by both humans and LLMs.
This specification covers the implemented Python client library:
- Client authentication and configuration
- Direct API methods for common document operations
- Builder API for multi-step workflows
- Comprehensive error handling with custom exceptions
- Optimized file input/output handling
- Standard Python package structure
Out of scope:
- Command-line interface (CLI)
- Asynchronous operations (all calls are synchronous)
- Non-Python implementations
- Nutrient DWS OpenAPI Specification: https://dashboard.nutrient.io/assets/specs/public@1.9.0-dfc6ec1c1d008be3dcb81a72be6346b5.yml
- Nutrient DWS API Documentation: https://www.nutrient.io/api/reference/public/
- Nutrient DWS List of Tools: https://www.nutrient.io/api/tools-overview/
- Target API Endpoint: https://api.pspdfkit.com
- Simplicity: Clean, Pythonic interface abstracting HTTP requests, authentication, and file handling
- Flexibility: Direct API for single operations and Builder API for complex workflows
- Lightweight: Single external dependency on
requestslibrary - Performance: Optimized file handling with streaming for large files (>10MB)
- Distribution-Ready: Standard Python package structure with
pyproject.toml
The library is architected around a central NutrientClient class, which is the main entry point for all interactions.
NutrientClient (The Main Client):
- Handles initialization and configuration, including a configurable timeout for API calls.
- Manages the API key for authentication. All outgoing requests will include the
X-Api-Keyheader. - Contains static methods for the Direct API (e.g.,
client.rotate_pages(...)), which are derived from the OpenAPI specification. - Acts as a factory for the Builder API via the
client.build()method.
Direct API (Static Methods):
- A collection of methods attached directly to the
NutrientClientobject. - Each method corresponds to a specific tool available in the OpenAPI specification (e.g.,
ocr_pdf,rotate_pages). - These methods abstract the
POST /process/{tool}endpoint. They handle file preparation, making the request, and returning the processed file.
BuildAPIWrapper (Builder API):
- A separate class, instantiated via
client.build(). - Implements the Builder design pattern with a fluent, chainable interface.
- The
execute()method compiles the workflow into amultipart/form-datarequest for thePOST /buildendpoint, including a JSON part for actions and the necessary file parts.
Direct API Call:
- User calls method like
client.rotate_pages(input_file='path/to/doc.pdf', degrees=90) - Method internally uses Builder API with single step
- File is processed via
/buildendpoint - Returns processed file bytes or saves to
output_path
Builder API Call:
- User chains operations:
client.build(input_file='doc.docx').add_step(tool='rotate-pages', options={'degrees': 90}) execute()sendsmultipart/form-datarequest to/buildendpoint- Returns processed file bytes or saves to
output_path
from nutrient_dws import NutrientClient, AuthenticationError
# API key from parameter (takes precedence) or NUTRIENT_API_KEY env var
client = NutrientClient(api_key="YOUR_DWS_API_KEY", timeout=300)
# Context manager support
with NutrientClient() as client:
result = client.convert_to_pdf("document.docx")- API Key: Parameter takes precedence over
NUTRIENT_API_KEYenvironment variable - Timeout: Default 300 seconds, configurable per client
- Error Handling:
AuthenticationErrorraised on first API call if key invalid
Input Types:
strorPathfor local file pathsbytesobjects- File-like objects (
io.IOBase)
Output Behavior:
- Returns
bytesby default - Saves to
output_pathand returnsNonewhen path provided - Large files (>10MB) use streaming to optimize memory usage
Method names are snake_case versions of operations. Tool-specific parameters are keyword-only arguments.
Example Usage:
# User Story: Convert a DOCX to PDF and rotate it.
# Step 1: Convert DOCX to PDF
pdf_bytes = client.convert_to_pdf(
input_file="path/to/document.docx"
)
# Step 2: Rotate the newly created PDF from memory
client.rotate_pages(
input_file=pdf_bytes,
output_path="path/to/rotated_document.pdf",
degrees=90 # keyword-only argument
)
print("File saved to path/to/rotated_document.pdf")Fluent interface for multi-step workflows with single API call:
client.build(input_file): Starts workflow.add_step(tool, options=None): Adds processing step.execute(output_path=None): Executes workflow.set_output_options(**options): Sets output metadata/optimization
Example Usage:
from nutrient_dws import APIError
# User Story: Convert a DOCX to PDF and rotate it (Builder version)
try:
client.build(input_file="path/to/document.docx") \
.add_step(tool="rotate-pages", options={"degrees": 90}) \
.execute(output_path="path/to/final_document.pdf")
print("Workflow complete. File saved to path/to/final_document.pdf")
except APIError as e:
print(f"An API error occurred: Status {e.status_code}, Response: {e.response_body}")The library provides a comprehensive set of custom exceptions for clear error feedback:
NutrientError(Exception): The base exception for all library-specific errors.AuthenticationError(NutrientError): Raised on 401/403 HTTP errors, indicating an invalid or missing API key.APIError(NutrientError): Raised for general API errors (e.g., 400, 422, 5xx status codes). Containsstatus_code,response_body, and optionalrequest_idattributes.ValidationError(NutrientError): Raised when request validation fails, with optionalerrorsdictionary.NutrientTimeoutError(NutrientError): Raised when requests timeout.FileProcessingError(NutrientError): Raised when file processing operations fail.FileNotFoundError(Built-in): Standard Python exception for missing file paths.
- Layout: Standard
srclayout withnutrient_dwspackage - Configuration:
pyproject.tomlfor project metadata and dependencies - Dependencies:
requestsas sole runtime dependency - Versioning: Semantic versioning starting at
1.0.0
- Large Files: Files >10MB are streamed rather than loaded into memory
- Input Types: Support for
strpaths,bytes,Pathobjects, and file-like objects - Output: Returns
bytesby default, or saves tooutput_pathwhen provided