Skip to content

Commit 8e58420

Browse files
committed
feat: add security changes
1 parent 9099f93 commit 8e58420

File tree

12 files changed

+500
-620
lines changed

12 files changed

+500
-620
lines changed

CHANGELOG.md

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
# Changelog
2+
3+
All notable changes to this project will be documented in this file.
4+
5+
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
6+
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7+
8+
## [Unreleased]
9+
10+
## [3.0.0] - 2026-01-30
11+
12+
### Security
13+
14+
- **CRITICAL**: Removed client-side URL fetching to prevent SSRF vulnerabilities
15+
- URLs are now passed to the server for secure server-side fetching
16+
- Restricted `sign()` method to local files only (API limitation)
17+
18+
### Changed
19+
20+
- **BREAKING**: `sign()` only accepts local files (paths, bytes, file objects) - no URLs
21+
- **BREAKING**: Most methods now accept `FileInputWithUrl` - URLs passed to server
22+
- **BREAKING**: Removed client-side PDF parsing - leverage API's negative index support
23+
- Methods like `rotate()`, `split()`, `deletePages()` now support negative indices (-1 = last page)
24+
- All methods except `sign()` accept URLs that are passed securely to the server
25+
26+
### Removed
27+
28+
- **BREAKING**: Removed `process_remote_file_input()` from public API (security risk)
29+
- **BREAKING**: Removed `get_pdf_page_count()` from public API (client-side PDF parsing)
30+
- **BREAKING**: Removed `is_valid_pdf()` from public API (internal use only)
31+
- Removed ~200 lines of client-side PDF parsing code
32+
33+
### Added
34+
35+
- SSRF protection documentation in README
36+
- Migration guide (docs/MIGRATION.md)
37+
- Security best practices for handling remote files
38+
- Support for negative page indices in all page-based methods
39+
40+
## [2.0.0] - 2025-01-09
41+
42+
- Initial stable release with full API coverage
43+
- Async-first design with httpx and aiofiles
44+
- Comprehensive type hints and mypy strict mode
45+
- Workflow builder with staged pattern
46+
- Error hierarchy with typed exceptions

docs/MIGRATION.md

Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,75 @@
1+
# Migration Guide: v2.x to v3.0
2+
3+
## Overview
4+
5+
Version 3.0.0 introduces SSRF protection and removes client-side PDF parsing.
6+
7+
## Key Changes
8+
9+
### 1. `sign()` No Longer Accepts URLs (API Limitation)
10+
11+
**Before (v2.x)**:
12+
```python
13+
result = await client.sign('https://example.com/document.pdf', {...})
14+
```
15+
16+
**After (v3.0)** - Fetch file first:
17+
```python
18+
import httpx
19+
20+
async with httpx.AsyncClient() as http:
21+
url = 'https://example.com/document.pdf'
22+
23+
# IMPORTANT: Validate URL
24+
if not url.startswith('https://trusted-domain.com/'):
25+
raise ValueError('URL not from trusted domain')
26+
27+
response = await http.get(url, timeout=10.0)
28+
response.raise_for_status()
29+
pdf_bytes = response.content
30+
31+
result = await client.sign(pdf_bytes, {...})
32+
```
33+
34+
### 2. Most Methods Now Accept URLs (Passed to Server)
35+
36+
Good news! These methods now support URLs passed securely to the server:
37+
- `rotate()`, `split()`, `add_page()`, `duplicate_pages()`, `delete_pages()`
38+
- `set_page_labels()`, `set_metadata()`, `optimize()`
39+
- `flatten()`, `apply_instant_json()`, `apply_xfdf()`
40+
- All redaction methods
41+
- `convert()`, `ocr()`, `watermark_*()`, `extract_*()`, `merge()`, `password_protect()`
42+
43+
**Example**:
44+
```python
45+
# This now works!
46+
result = await client.rotate('https://example.com/doc.pdf', 90, pages={'start': 0, 'end': 5})
47+
```
48+
49+
### 3. Negative Page Indices Now Supported
50+
51+
Use negative indices for "from end" references:
52+
- `-1` = last page
53+
- `-2` = second-to-last page
54+
- etc.
55+
56+
**Examples**:
57+
```python
58+
# Rotate last 3 pages
59+
await client.rotate(pdf, 90, pages={'start': -3, 'end': -1})
60+
61+
# Delete first and last pages
62+
await client.delete_pages(pdf, [0, -1])
63+
64+
# Split: keep middle pages, excluding first and last
65+
await client.split(pdf, [{'start': 1, 'end': -2}])
66+
```
67+
68+
### 4. Removed from Public API
69+
70+
- `process_remote_file_input()` - No longer needed (URLs passed to server)
71+
- `get_pdf_page_count()` - Use negative indices instead
72+
- `is_valid_pdf()` - Let server validate (internal use only)
73+
74+
**Still Available:**
75+
- `is_remote_file_input()` - Helper to detect if input is a URL (still public)

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ nutrient_dws_scripts = [
1818

1919
[project]
2020
name = "nutrient-dws"
21-
version = "2.0.0"
21+
version = "3.0.0"
2222
description = "Python client library for Nutrient Document Web Services API"
2323
readme = "README.md"
2424
requires-python = ">=3.10"

src/nutrient_dws/__init__.py

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,16 +12,19 @@
1212
ValidationError,
1313
)
1414
from nutrient_dws.inputs import (
15+
FileInputWithUrl,
16+
LocalFileInput,
1517
is_remote_file_input,
1618
process_file_input,
17-
process_remote_file_input,
1819
validate_file_input,
1920
)
2021
from nutrient_dws.utils import get_library_version, get_user_agent
2122

2223
__all__ = [
2324
"APIError",
2425
"AuthenticationError",
26+
"FileInputWithUrl",
27+
"LocalFileInput",
2528
"NetworkError",
2629
"NutrientClient",
2730
"NutrientError",
@@ -30,6 +33,5 @@
3033
"get_user_agent",
3134
"is_remote_file_input",
3235
"process_file_input",
33-
"process_remote_file_input",
3436
"validate_file_input",
3537
]

src/nutrient_dws/builder/builder.py

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@
2626
NutrientClientOptions,
2727
)
2828
from nutrient_dws.inputs import (
29-
FileInput,
29+
FileInputWithUrl,
3030
NormalizedFileData,
3131
is_remote_file_input,
3232
process_file_input,
@@ -76,16 +76,16 @@ def __init__(self, client_options: NutrientClientOptions) -> None:
7676
"""
7777
super().__init__(client_options)
7878
self.build_instructions: BuildInstructions = {"parts": []}
79-
self.assets: dict[str, FileInput] = {}
79+
self.assets: dict[str, FileInputWithUrl] = {}
8080
self.asset_index = 0
8181
self.current_step = 0
8282
self.is_executed = False
8383

84-
def _register_asset(self, asset: FileInput) -> str:
84+
def _register_asset(self, asset: FileInputWithUrl) -> str:
8585
"""Register an asset in the workflow and return its key for use in actions.
8686
8787
Args:
88-
asset: The asset to register
88+
asset: The asset to register (must be local, not URL)
8989
9090
Returns:
9191
The asset key that can be used in BuildActions
@@ -188,7 +188,7 @@ def _cleanup(self) -> None:
188188

189189
def add_file_part(
190190
self,
191-
file: FileInput,
191+
file: FileInputWithUrl,
192192
options: FilePartOptions | None = None,
193193
actions: list[ApplicableAction] | None = None,
194194
) -> WorkflowWithPartsStage:
@@ -229,8 +229,8 @@ def add_file_part(
229229

230230
def add_html_part(
231231
self,
232-
html: FileInput,
233-
assets: list[FileInput] | None = None,
232+
html: FileInputWithUrl,
233+
assets: list[FileInputWithUrl] | None = None,
234234
options: HTMLPartOptions | None = None,
235235
actions: list[ApplicableAction] | None = None,
236236
) -> WorkflowWithPartsStage:

src/nutrient_dws/builder/constant.py

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
from collections.abc import Callable
22
from typing import Any, Literal, Protocol, TypeVar, cast
33

4-
from nutrient_dws.inputs import FileInput
4+
from nutrient_dws.inputs import FileInputWithUrl
55
from nutrient_dws.types.build_actions import (
66
ApplyInstantJsonAction,
77
ApplyRedactionsAction,
@@ -53,7 +53,7 @@ class ActionWithFileInput(Protocol):
5353
"""Internal action type that holds FileInput for deferred registration."""
5454

5555
__needsFileRegistration: bool
56-
fileInput: FileInput
56+
fileInput: FileInputWithUrl
5757
createAction: Callable[[FileHandle], BuildAction]
5858

5959

@@ -133,7 +133,7 @@ def watermark_text(
133133

134134
@staticmethod
135135
def watermark_image(
136-
image: FileInput, options: ImageWatermarkActionOptions | None = None
136+
image: FileInputWithUrl, options: ImageWatermarkActionOptions | None = None
137137
) -> ActionWithFileInput:
138138
"""Create an image watermark action.
139139
@@ -163,7 +163,7 @@ class ImageWatermarkActionWithFileInput(ActionWithFileInput):
163163
__needsFileRegistration = True
164164

165165
def __init__(
166-
self, file_input: FileInput, opts: ImageWatermarkActionOptions
166+
self, file_input: FileInputWithUrl, opts: ImageWatermarkActionOptions
167167
):
168168
self.fileInput = file_input
169169
self.options = opts
@@ -196,7 +196,7 @@ def flatten(annotation_ids: list[str | int] | None = None) -> FlattenAction:
196196
return result
197197

198198
@staticmethod
199-
def apply_instant_json(file: FileInput) -> ActionWithFileInput:
199+
def apply_instant_json(file: FileInputWithUrl) -> ActionWithFileInput:
200200
"""Create an apply Instant JSON action.
201201
202202
Args:
@@ -209,7 +209,7 @@ def apply_instant_json(file: FileInput) -> ActionWithFileInput:
209209
class ApplyInstantJsonActionWithFileInput(ActionWithFileInput):
210210
__needsFileRegistration = True
211211

212-
def __init__(self, file_input: FileInput):
212+
def __init__(self, file_input: FileInputWithUrl):
213213
self.fileInput = file_input
214214

215215
def createAction(self, fileHandle: FileHandle) -> ApplyInstantJsonAction:
@@ -222,7 +222,7 @@ def createAction(self, fileHandle: FileHandle) -> ApplyInstantJsonAction:
222222

223223
@staticmethod
224224
def apply_xfdf(
225-
file: FileInput, options: ApplyXfdfActionOptions | None = None
225+
file: FileInputWithUrl, options: ApplyXfdfActionOptions | None = None
226226
) -> ActionWithFileInput:
227227
"""Create an apply XFDF action.
228228
@@ -240,7 +240,7 @@ class ApplyXfdfActionWithFileInput(ActionWithFileInput):
240240
__needsFileRegistration = True
241241

242242
def __init__(
243-
self, file_input: FileInput, opts: ApplyXfdfActionOptions | None
243+
self, file_input: FileInputWithUrl, opts: ApplyXfdfActionOptions | None
244244
):
245245
self.fileInput = file_input
246246
self.options = opts or {}

src/nutrient_dws/builder/staged_builders.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010
from nutrient_dws.types.build_actions import BuildAction
1111

1212
if TYPE_CHECKING:
13-
from nutrient_dws.inputs import FileInput
13+
from nutrient_dws.inputs import FileInputWithUrl
1414
from nutrient_dws.types.analyze_response import AnalyzeBuildResponse
1515
from nutrient_dws.types.build_output import (
1616
ImageOutputOptions,
@@ -114,7 +114,7 @@ class WorkflowInitialStage(ABC):
114114
@abstractmethod
115115
def add_file_part(
116116
self,
117-
file: FileInput,
117+
file: FileInputWithUrl,
118118
options: FilePartOptions | None = None,
119119
actions: list[ApplicableAction] | None = None,
120120
) -> WorkflowWithPartsStage:
@@ -124,8 +124,8 @@ def add_file_part(
124124
@abstractmethod
125125
def add_html_part(
126126
self,
127-
html: FileInput,
128-
assets: list[FileInput] | None = None,
127+
html: FileInputWithUrl,
128+
assets: list[FileInputWithUrl] | None = None,
129129
options: HTMLPartOptions | None = None,
130130
actions: list[ApplicableAction] | None = None,
131131
) -> WorkflowWithPartsStage:

0 commit comments

Comments
 (0)