A comprehensive Python script that extracts metadata from various file types including images, audio files, videos, and documents. This tool is perfect for digital asset management, forensic analysis, and file organization.
- JPEG/JPG - EXIF data, dimensions, camera settings
- PNG - Dimensions, transparency info
- TIFF - EXIF data, multi-page support
- BMP, GIF - Basic image properties
- MP3 - ID3 tags, bitrate, duration, artist, album
- FLAC - Lossless audio metadata
- WAV - Audio properties, duration
- M4A, AAC, OGG, WMA - Various audio metadata
- MP4, AVI, MKV - Duration, bitrate, basic properties
- MOV, WMV, FLV, WebM - Video metadata extraction
- PDF - Page count, author, creation date, encryption status
- DOCX - Author, word count, creation/modification dates
- Single File Analysis - Extract metadata from individual files
- Batch Processing - Process entire directories recursively
- Multiple Output Formats - JSON and CSV export options
- Comprehensive Metadata - File system info + format-specific data
- Error Handling - Graceful handling of unsupported files
- Cross-platform - Works on Windows, macOS, and Linux
- Python 3.6 or higher
- pip (Python package manager)
-
Clone or download the script to your local machine
-
Install required dependencies:
pip install -r requirements.txt
Or install manually:
pip install Pillow mutagen PyPDF2 python-docx
-
Make the script executable (Linux/macOS):
chmod +x file_metadata_extractor.py
Extract metadata from a single file:
python file_metadata_extractor.py /path/to/your/file.jpgProcess all files in a directory:
python file_metadata_extractor.py /path/to/directory/Process directory recursively (including subdirectories):
python file_metadata_extractor.py /path/to/directory/ --recursiveSave results to JSON file:
python file_metadata_extractor.py /path/to/files/ -o results.json -f jsonSave results to CSV file:
python file_metadata_extractor.py /path/to/files/ -o results.csv -f csvProcess directory recursively and save results:
python file_metadata_extractor.py /path/to/files/ -r -o metadata_report.jsonpath- File or directory path to analyze (required)-o, --output- Output file path (optional)-f, --format- Output format: json or csv (default: json)-r, --recursive- Process directories recursively-h, --help- Show help message
{
"filename": "photo.jpg",
"file_size_mb": 2.34,
"width": 1920,
"height": 1080,
"format": "JPEG",
"exif": {
"DateTime": "2023:10:15 14:30:22",
"Camera": "Canon EOS 5D",
"FNumber": "f/2.8",
"ISO": "400"
}
}{
"filename": "song.mp3",
"file_size_mb": 4.56,
"duration_formatted": "3:42",
"bitrate": 320,
"title": "Amazing Song",
"artist": "Great Artist",
"album": "Best Album",
"year": "2023"
}{
"filename": "document.pdf",
"file_size_mb": 1.23,
"page_count": 15,
"title": "Important Document",
"author": "John Doe",
"creation_date": "2023-10-15T10:30:00"
}File System Information (All Files):
- File name and full path
- File size (bytes and MB)
- Creation, modification, and access timestamps
- File extension
Image-Specific Metadata:
- Dimensions (width/height)
- Color mode and format
- EXIF data (camera settings, GPS, timestamps)
- Transparency information
Audio-Specific Metadata:
- Duration and bitrate
- Sample rate and channels
- ID3 tags (title, artist, album, year, genre)
- Track numbers and album artist
Document-Specific Metadata:
- Page/word counts
- Author and title information
- Creation and modification dates
- Document properties and keywords
The script gracefully handles:
- Missing or corrupted files
- Unsupported file formats
- Missing dependencies (with helpful error messages)
- Permission errors
- Large file processing
- Large directories are processed file by file to conserve memory
- EXIF data from images can be extensive
- Video metadata extraction is limited to basic properties
- PDF processing may be slower for large documents
- Pillow (PIL) - Image metadata and EXIF extraction
- mutagen - Audio and video metadata extraction
- PyPDF2 - PDF document metadata
- python-docx - Microsoft Word document metadata
All dependencies are optional - the script will skip unsupported formats if libraries are missing.
Created for the Rotten-Scripts repository
- Digital Asset Management - Organize photo/music libraries
- Forensic Analysis - Extract file creation timestamps and metadata
- Content Audit - Analyze document properties in bulk
- Data Migration - Catalog files before/after transfers
- Media Organization - Sort files by metadata properties
- Video metadata extraction is basic (duration, bitrate only)
- Some proprietary formats may not be fully supported
- Very large files may take time to process
- DOCX support limited to basic properties
- Requires appropriate permissions to read files
- Support for more video codecs and detailed metadata
- Excel file metadata extraction
- Database output options
- GUI interface
- Batch file renaming based on metadata