All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- pypdf Security Vulnerability: Upgraded pypdf>=6.8.0, fixing CVE-2026-28804
- Fixed inefficient decoding of ASCIIHexDecode streams, preventing DoS attacks
- Dependency Upgrades:
- pypdf>=6.7.4 → pypdf>=6.8.0
- Flexible File Path Access: Removed
DOCUMENT_DIRECTORYrestriction, now supports absolute and relative paths- Removed
DOCUMENT_DIRECTORYenvironment variable dependency - Removed
AppContextdataclass andapp_lifespanfunction - Removed
_get_document_path()security function read_document()now directly usesPath(filename)for path handling
- Removed
- Simplified Architecture: Removed FastMCP lifespan configuration for cleaner code
- Test Suite Optimization: Removed
test_lifespan.py, updatedtest_tools.pywith new path handling tests
DOCUMENT_DIRECTORYenvironment variable supportAppContextdataclassapp_lifespanasync context manager_get_document_path()helper function
- pypdf Security Vulnerabilities: Upgraded pypdf>=6.7.4, fixing 3 CVEs
- CVE-2026-28351: RunLengthDecode streams can exhaust RAM
- CVE-2026-27888: FlateDecode XFA streams can exhaust RAM
- CVE-2026-27628: Circular references cause infinite loop
- MCP SDK Upgrade: Upgraded mcp>=1.26.0
- Test Code Security: Refactored path traversal test code to avoid static analysis false positives
- Dependency Upgrades:
- mcp>=1.23.0 → mcp>=1.26.0
- pypdf>=6.7.1 → pypdf>=6.7.4
- typing_extensions>=4.12.0 → typing_extensions>=4.15.0
- MCP SDK Security Vulnerabilities: Upgraded mcp>=1.23.0, fixed 3 high-severity CVEs
- CVE-2025-53365: Unhandled exception in Streamable HTTP Transport leading to DoS
- CVE-2025-53366: FastMCP Server validation error leading to DoS
- CVE-2025-66416: DNS rebinding protection not enabled by default
- PyPDF2 Security Vulnerability: Replaced with pypdf>=6.7.1, fixed CVE-2023-36464
- Path Traversal Protection: Added explicit path validation to prevent arbitrary file read attacks
- Error Message Sanitization: Removed full paths from error messages to prevent information disclosure
- PyPI Package Metadata: Added project.urls linking to GitHub repository
- Dependency Upgrades:
- mcp>=0.1.0 → mcp>=1.23.0
- PyPDF2>=3.0.1 → pypdf>=6.7.1
- python-docx>=0.8.11 → python-docx>=1.2.0
- openpyxl>=3.0.10 → openpyxl>=3.1.5
- typing_extensions>=4.0.0 → typing_extensions>=4.12.0
- CI/CD Migration: Migrated from pip to uv for faster builds
- Python Compatibility: Use
typing_extensions.overrideinstead oftyping.overridefor Python 3.10+ compatibility - Type Checking: Fixed Basedpyright type errors
- Fixed
openpyxl.Workbook.activeoptional type checking - Fixed method override parameter name matching
- Fixed
- Encoding Handling: Removed invalid
ansiencoding (not supported by Python standard library) - Test Fix: Fixed path traversal test case
-
CI/CD Workflows: Added GitHub Actions workflows for automated testing and release
- CI workflow: Ruff, Basedpyright, Pytest
- Release workflow: Publish to PyPI and MCP Registry
- Support for Python 3.10-3.14
-
Test Suite: Test coverage improved to 95%
- 102 test cases covering all core modules
- Unit tests for all readers
- Integration tests for MCP tools
-
Documentation: Added complete documentation structure
- API Reference
- User Guide
- Contributing Guide
- Type Checking: Switched to Basedpyright for better type inference
- Code Formatting: Using Ruff format instead of Black
- Development Dependencies: Updated development toolchain
- Type Safety: Fixed all Basedpyright type errors
- Code Quality: Fixed all Ruff linting issues
-
MCP Tools: Added complete MCP tool interface
read_document: Main reading tool- Unified interface for all document types
-
Error Handling: Improved error messages and exception handling
- Better error messages for unsupported formats
- Graceful handling of corrupted files
- Architecture: Improved reader architecture with factory pattern
- Encoding Detection: Better automatic encoding detection for text files
-
Excel Support: Added Excel reader for .xlsx and .xls files
- Multi-sheet support
- Cell data extraction
-
PDF Support: Added PDF reader
- Text extraction from PDF pages
- Multi-page support
- Encoding: Improved encoding detection for text files
- Error Messages: More descriptive error messages
-
Initial Release: First public release of MCP Document Reader
- Abstract base class for document readers
- DOCX reader using python-docx
- PDF reader using PyPDF2
- Excel reader using openpyxl
- Text reader with encoding detection
- Factory pattern for reader selection
- MCP protocol support for AI assistants
-
Supported Formats:
- Input: DOCX, PDF, Excel (XLSX/XLS), Text
-
Features:
- Automatic format detection
- Encoding detection for text files
- Error handling for corrupted files
- MCP tool interface for AI assistants