Reverse mapping tool: from Python import statements to pip package names
PyImport2Pkg solves a common problem: Given import statements in Python code, how do you know which pip package to install?
For example:
import cv2→ need to installopencv-pythonfrom PIL import Image→ need to installPillowimport sklearn→ need to installscikit-learn
- Project Analysis: Scan entire projects and generate requirements.txt
- Smart Mapping: Handle cases where module name ≠ package name
- Namespace Support: Correctly handle
google.cloud.*,azure.*, etc. - Optional Dependencies: Distinguish required vs optional dependencies (try-except, platform checks)
- Python Version Aware: Auto-detect target Python version, handle backports
- High-Performance Database: Smart incremental updates, parallel processing, batch writes
pip install pyimport2pkggit clone https://github.com/buptanswer/pyimport2pkg.git
cd pyimport2pkg
pip install -e ".[dev]"pyimport2pkg --version
# pyimport2pkg 1.0.0# Analyze current directory
pyimport2pkg analyze .
# Output will show:
# Analyzing: .
# Found imports from 24 files
#
# Dependencies:
# numpy
# pandas
# requests
# ...pyimport2pkg analyze . -o requirements.txtpyimport2pkg query cv2
# Output:
# Module: cv2
# Source: hardcoded
# Candidates:
# 1. opencv-python (recommended)
# 2. opencv-contrib-python
# 3. opencv-python-headlessScan a Python project for imports and identify required packages.
pyimport2pkg analyze <path> [options]Options:
| Option | Description | Default |
|---|---|---|
-o, --output |
Output file path | stdout |
-f, --format |
Format (requirements|json|simple) | requirements |
--python-version |
Target Python version | current |
--exclude |
Directories to exclude | - |
--exclude-optional |
Exclude optional packages | False |
--no-comments |
Disable comments in output | False |
Examples:
# Basic analysis
pyimport2pkg analyze /path/to/project
# Specify target Python version
pyimport2pkg analyze . --python-version 3.11
# Save as JSON
pyimport2pkg analyze . -o deps.json -f json
# Exclude test directories
pyimport2pkg analyze . --exclude tests,docs
# Simple package list (no comments)
pyimport2pkg analyze . -f simpleLook up which package provides a specific module.
pyimport2pkg query <module_name>Examples:
# Query a module
pyimport2pkg query cv2
# Query namespace package
pyimport2pkg query google.cloud.storageBuild or update the module-to-package mapping database from PyPI.
pyimport2pkg build-db [options]Options:
| Option | Description | Default |
|---|---|---|
--max-packages |
Maximum packages to process | 5000 |
--concurrency |
Number of concurrent requests | 50 |
--resume |
Resume interrupted build | False |
--retry-failed |
Retry only failed packages | False |
--rebuild |
Force rebuild from scratch | False |
--db-path |
Custom database path | data/mapping.db |
Examples:
# Build database with top 5000 packages
pyimport2pkg build-db --max-packages 5000
# Expand existing database to 10000 packages (only processes new ones)
pyimport2pkg build-db --max-packages 10000
# Resume interrupted build
pyimport2pkg build-db --resume
# Retry failed packages
pyimport2pkg build-db --retry-failed
# Force rebuild
pyimport2pkg build-db --rebuild --max-packages 5000Check the status of database building.
pyimport2pkg build-statusExample Output:
Build Status:
Status: completed
Total packages: 5000
Processed: 5000
Failed: 15
Started at: 2025-12-06T10:00:00
Last updated: 2025-12-06T12:30:45
Display database statistics.
pyimport2pkg db-infoExample Output:
Database: data/mapping.db
Total packages: 5000
Module mappings: 12543
Last build: 2025-12-06T12:30:45
Source: https://hugovk.github.io/top-pypi-packages/top-pypi-packages-30-days.min.json
pyimport2pkg analyze . -o requirements.txtOutput:
# Auto-generated by pyimport2pkg
# Generated at: 2025-12-06T15:30:00
# === Required packages ===
numpy
pandas
requests
# === Conditional imports (platform/environment specific) ===
pywin32 # conditional
# === Try-except imports (optional dependencies) ===
ujson # try_except
pyimport2pkg analyze . -f json -o dependencies.jsonOutput:
{
"meta": {
"generated_at": "2025-12-06T15:30:00",
"tool": "pyimport2pkg",
"version": "1.0.0"
},
"required": [
{
"package": "numpy",
"module": "numpy",
"source": "database"
}
],
"optional": [],
"unresolved": [],
"warnings": []
}pyimport2pkg analyze . -f simpleOutput:
numpy
pandas
requests
# Analyze for Python 3.8 (will include backports)
pyimport2pkg analyze . --python-version 3.8
# This will include packages like:
# - dataclasses (built-in in 3.7+)
# - importlib-metadata (built-in in 3.8+)# Only show required dependencies
pyimport2pkg analyze . --exclude-optional# Exclude test and documentation directories
pyimport2pkg analyze . --exclude tests,docs,examples# Start with top 500 packages
pyimport2pkg build-db --max-packages 500
# Later, expand to 5000 (automatically skips existing 500)
pyimport2pkg build-db --max-packages 5000
# Further expand to 10000
pyimport2pkg build-db --max-packages 10000# Start building
pyimport2pkg build-db --max-packages 15000
# Press Ctrl+C if needed
# System saves progress automatically
# Resume later
pyimport2pkg build-db --resumefrom pyimport2pkg import Scanner, Parser, Filter, Mapper, Exporter
from pathlib import Path
# 1. Scan project
scanner = Scanner()
files = scanner.scan(Path("./my_project"))
# 2. Parse imports
parser = Parser()
imports = []
for file_path in files:
imports.extend(parser.parse_file(file_path))
# 3. Filter stdlib & local modules
filter = Filter(project_root=Path("./my_project"))
third_party, _ = filter.filter_imports(imports)
# 4. Map to packages
mapper = Mapper()
results = mapper.map_imports(third_party)
# 5. Export results
exporter = Exporter()
exporter.export_requirements_txt(results, output=Path("requirements.txt"))from pyimport2pkg import Mapper, ImportInfo
mapper = Mapper()
imp = ImportInfo.from_module_name("cv2")
result = mapper.map_import(imp)
for candidate in result.candidates:
print(f"{candidate.package_name}: {candidate.download_count} downloads")from pyimport2pkg.database import MappingDatabase
db = MappingDatabase("data/mapping.db")
results = db.lookup("cv2")
for pkg_name, downloads in results:
print(f"{pkg_name}: {downloads}")
db.close()from pyimport2pkg import Resolver, ResolveStrategy
resolver = Resolver(strategy=ResolveStrategy.ALL)
resolved = resolver.resolve_mappings(results)
# Returns all candidates instead of just the most popularA: PyImport2Pkg uses a priority system:
- Namespace packages (if submodules exist) - e.g.,
google.cloud.storage→google-cloud-storage - Hardcoded mappings - Known mismatches like
cv2→opencv-python - Namespace packages (top-level only)
- Database lookup - From PyPI wheel
top_level.txt - Guess - Assume module name equals package name
A: PyImport2Pkg will:
- Check hardcoded mappings first
- Check namespace packages
- Fall back to guessing (module name = package name)
- Add a warning to review the guess
A: Simply run:
pyimport2pkg build-db --max-packages <new_count>The tool automatically detects existing packages and only processes new ones.
A: Yes! Use the --db-path option:
pyimport2pkg analyze . --db-path /path/to/custom.dbA: PyImport2Pkg analyzes import context:
- try-except blocks → optional
- if platform.system() → conditional
- if TYPE_CHECKING → type-checking only
- Function/class level → may be optional
A: No. PyImport2Pkg requires Python 3.10+.
A: Very high for popular packages:
- Top 5000 PyPI packages: ~99% accurate
- Hardcoded mappings: 100% accurate (manually curated)
- Namespace packages: 100% accurate (follows PEP 420)
A: Yes! Add to src/pyimport2pkg/mappings/hardcoded.py and submit a PR.
Solution: Run build-db first to create the database.
Solution: The tool automatically detects and pauses. If it persists, reduce --concurrency:
pyimport2pkg build-db --concurrency 20Solution: The tool uses chunked processing. For very large builds (15000+), ensure 4GB+ RAM available.
Solution:
- Check if it's a local module (should be excluded)
- Try
querycommand to see mapping:pyimport2pkg query <module> - If incorrect, report an issue on GitHub
First Stable Release!
- ✅ Full internationalization (English CLI output)
- ✅ Stable API with root-level imports
- ✅ Dynamic versioning in exports
- ✅ Complete JSON export with unresolved imports
- ✅ Comprehensive documentation
- ✅ Production/Stable status
See CHANGELOG for details.
Need Help?
- 📖 README
- 🐛 Report Issues
- 💬 Discussions
PyImport2Pkg v1.0.0 - December 2025