Thank you for your interest in contributing to Docsray MCP! This document provides guidelines and information for contributors.
- Python 3.9 or higher
- Git
- A virtual environment manager (venv, conda, or similar)
-
Fork and Clone
git clone https://github.com/your-username/docsray-mcp.git cd docsray-mcp -
Create Virtual Environment
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install Development Dependencies
pip install -e ".[dev]" -
Set Up Environment Variables
cp .env.example .env # If .env.example exists # Add your API keys for testing (use either env var): echo "DOCSRAY_LLAMAPARSE_API_KEY=llx-your-key-here" >> .env # Or: echo "LLAMAPARSE_API_KEY=llx-your-key-here" >> .env
-
Verify Installation
python -m docsray --help pytest tests/unit/ -v
tests/
├── unit/ # Fast isolated tests (no API calls)
├── integration/ # Component interaction tests
├── manual/ # Manual testing scripts and debugging
└── files/ # Test documents (PDFs, etc.)
# Run all tests
pytest
# Run only unit tests (fast, no API calls)
pytest tests/unit/
# Run integration tests (requires API keys)
pytest tests/integration/
# Run with coverage
pytest --cov=src/docsray --cov-report=html
# Run specific test file
pytest tests/unit/test_providers.py -v
# Run specific test
pytest tests/unit/test_providers.py::test_provider_registry -v- Unit tests: Must run without external dependencies or API calls
- Integration tests: Can use API keys but should be marked with appropriate decorators
- All tests: Should be deterministic and not depend on external state
- Coverage: Aim for >90% coverage on new code
import pytest
from unittest.mock import Mock, patch
from docsray.providers.base import DocumentProvider
def test_provider_initialization():
"""Test provider initialization with proper setup."""
provider = DocumentProvider()
assert provider.get_name() == "base"
assert len(provider.get_supported_formats()) >= 0
@pytest.mark.integration
async def test_llamaparse_integration():
"""Integration test requiring API key."""
# Test implementation
passWe follow PEP 8 with some modifications:
- Line length: 88 characters (Black default)
- Imports: Use isort for import organization
- Type hints: Required for all public functions and methods
- Docstrings: Required for all public functions, classes, and modules
# Format code
black src/ tests/
# Check and fix imports
ruff check src/ tests/ --fix
# Type checking
mypy src/
# Run all linting
black src/ tests/ && ruff check src/ tests/ && mypy src/pip install pre-commit
pre-commit installUse Google-style docstrings:
async def extract_content(
document_url: str,
extraction_targets: List[str],
provider: str = "auto"
) -> ExtractResult:
"""Extract content from document.
Args:
document_url: URL or path to document
extraction_targets: List of targets to extract (text, tables, images)
provider: Provider name or "auto" for automatic selection
Returns:
ExtractResult containing extracted content and metadata
Raises:
DocumentNotFoundError: If document cannot be accessed
ProviderError: If extraction fails
"""- Update docstrings when changing function signatures
- Add examples to docstrings for complex functions
- Update README.md if adding new features
- Update SYSTEM_INSTRUCTIONS.md for new capabilities
When adding new providers:
-
Inherit from DocumentProvider
from docsray.providers.base import DocumentProvider class NewProvider(DocumentProvider): def get_name(self) -> str: return "new-provider"
-
Implement required methods
get_supported_formats()get_capabilities()peek(),map(),xray(),extract(),seek()
-
Add to registry
# In providers/__init__.py from .new_provider import NewProvider
-
Add configuration
# In config.py class NewProviderConfig: api_key: Optional[str] = None
When adding new tools:
- Create tool module:
src/docsray/tools/new_tool.py - Implement async function:
async def handle_new_tool(...) - Register in server: Add to
src/docsray/server.py - Add tests: Both unit and integration tests
- Update documentation: Add examples to PROMPTS.md
from docsray.exceptions import DocsrayError, DocumentNotFoundError
async def your_function():
try:
# Your code
pass
except FileNotFoundError as e:
raise DocumentNotFoundError(f"Document not found: {e}")
except Exception as e:
logger.error(f"Unexpected error: {e}")
raise DocsrayError(f"Operation failed: {e}")Please include:
-
Environment details:
- Python version
- OS and version
- Docsray version
- MCP client (Cursor, Claude Desktop, etc.)
-
Reproduction steps:
- Exact commands or prompts used
- Input documents (if shareable)
- Expected vs actual behavior
-
Logs and errors:
- Complete error messages
- Relevant log output
- Stack traces
Please include:
- Use case: Describe the problem you're trying to solve
- Proposed solution: How you envision the feature working
- Alternatives: Other approaches you've considered
- Impact: Who would benefit from this feature
- Create an issue (for non-trivial changes)
- Fork the repository
- Create a feature branch:
git checkout -b feature/your-feature - Write tests for new functionality
- Update documentation as needed
- Run the full test suite
- Check code style with linting tools
- Tests pass (
pytest) - Code style checks pass (
black,ruff,mypy) - Documentation updated (if applicable)
- CHANGELOG.md updated (for significant changes)
- Commit messages are clear and descriptive
## Summary
Brief description of changes
## Type of Change
- [ ] Bug fix
- [ ] New feature
- [ ] Breaking change
- [ ] Documentation update
- [ ] Refactoring
## Testing
- [ ] Unit tests added/updated
- [ ] Integration tests added/updated
- [ ] Manual testing performed
## Documentation
- [ ] README updated
- [ ] API documentation updated
- [ ] Examples added/updated
## Notes
Any additional context or considerations- Automated checks: All CI checks must pass
- Code review: At least one maintainer review required
- Testing: Verify functionality in multiple environments
- Documentation: Ensure documentation is clear and complete
We use semantic versioning (SemVer):
- MAJOR: Breaking changes
- MINOR: New features (backward compatible)
- PATCH: Bug fixes (backward compatible)
- Update version in
pyproject.toml - Update
CHANGELOG.md - Create release branch
- Run full test suite
- Create GitHub release with notes
- Deploy to PyPI (maintainers only)
- Be respectful and inclusive
- Focus on constructive feedback
- Help others learn and grow
- Acknowledge contributions
- GitHub Issues: Bug reports and feature requests
- GitHub Discussions: General questions and ideas
- Pull Requests: Code reviews and technical discussion
Contributors are recognized in:
CHANGELOG.mdfor significant contributions- GitHub contributors list
- Release notes for major features
- Check existing issues and documentation
- Search discussions for similar questions
- Create a discussion for general questions
- Create an issue for bugs or specific problems
Common solutions:
# Clean reinstall
pip uninstall docsray-mcp
pip install -e ".[dev]"
# Reset environment
rm -rf venv/
python -m venv venv
source venv/bin/activate
pip install -e ".[dev]"
# Clear caches
rm -rf .pytest_cache/ __pycache__/
find . -name "*.pyc" -delete- Read this contributing guide
- Set up development environment
- Run tests successfully
- Make a small test change and verify it works
- Look at existing issues for "good first issue" labels
- Join GitHub discussions for community updates
Thank you for contributing to Docsray MCP! Your efforts help make document processing more accessible and powerful for everyone.