PixelProbe provides a RESTful API for managing media file corruption detection. The API is built with Flask and follows REST conventions.
- Development:
http://localhost:5000 - Production:
https://pixelprobe.example.com
As of v2.4.1, all API endpoints require authentication.
PixelProbe supports two authentication methods:
- Used automatically when logged in through the web interface
- Managed via secure HTTP-only cookies
- Best for browser-based access
- Generate tokens through the web UI under Account → API Tokens
- Include in requests using the Authorization header
- Two formats are supported:
- Standard:
Authorization: Bearer <your-token> - Direct:
Authorization: <your-token>(for Swagger UI compatibility)
- Standard:
# Using Bearer format
curl -H "Authorization: Bearer your-api-token-here" \
http://localhost:5000/api/scan-status
# Using direct format (Swagger UI style)
curl -H "Authorization: your-api-token-here" \
http://localhost:5000/api/scan-statusimport requests
headers = {
'Authorization': 'Bearer your-api-token-here'
}
response = requests.get('http://localhost:5000/api/scan-status', headers=headers)- Log in to the web interface
- Navigate to Account → API Tokens
- Click "Create New Token"
- Provide a description
- Copy the generated token (it won't be shown again)
The API implements rate limiting to prevent abuse:
- Default limits: 200 requests per day, 50 per hour
- Scan operations: 2-5 requests per minute
- Admin operations: 10 requests per minute
- Maintenance operations: 5 requests per minute
Rate limit headers are included in responses:
X-RateLimit-Limit: Maximum requests allowedX-RateLimit-Remaining: Requests remainingX-RateLimit-Reset: Time when the limit resets
- All requests must include
Content-Type: application/jsonfor POST requests - All responses are in JSON format
- Dates are in ISO 8601 format
- File sizes are in bytes
Errors are returned with appropriate HTTP status codes and a JSON body:
{
"error": "Description of the error"
}Common status codes:
200: Success400: Bad Request (invalid input)401: Unauthorized (authentication required)403: Forbidden (insufficient permissions)404: Not Found409: Conflict (e.g., scan already running)429: Too Many Requests (rate limit exceeded)500: Internal Server Error
GET /healthCheck if the service is running.
Response:
{
"status": "healthy",
"version": "<current_version>",
"timestamp": "2025-01-20T12:00:00Z"
}GET /api/versionGet version information.
Response:
{
"version": "<current_version>",
"github_url": "https://github.com/ttlequals0/PixelProbe",
"api_version": "1.0"
}GET /api/scan-results?page=1&per_page=100&scan_status=all&is_corrupted=allGet paginated scan results with optional filters.
Query Parameters:
page(integer): Page number (default: 1)per_page(integer): Results per page (default: 100, max: 500)scan_status(string): Filter by status:all,pending,scanning,completed,erroris_corrupted(string): Filter by corruption:all,true,false
Response:
{
"results": [
{
"id": 1,
"file_path": "/media/photos/image.jpg",
"file_name": "image.jpg",
"file_size": 2048576,
"scan_date": "2025-01-20T12:00:00Z",
"discovered_date": "2025-01-19T10:00:00Z",
"last_modified": "2025-01-18T08:00:00Z",
"file_hash": "sha256_hash_here",
"scan_status": "completed",
"error_message": null,
"is_corrupted": false,
"marked_as_good": false,
"media_info": {
"width": 1920,
"height": 1080,
"format": "JPEG"
},
"file_exists": true
}
],
"total": 150,
"page": 1,
"per_page": 100,
"pages": 2
}GET /api/scan-results/{result_id}Get detailed information about a specific scan result.
Response: Same as individual result in the list above.
POST /api/scan-fileScan a single file for corruption. Rate limited to 5 requests per minute.
Request Body:
{
"file_path": "/media/photos/image.jpg"
}Response:
{
"message": "Scan started",
"file_path": "/media/photos/image.jpg"
}POST /api/scan-allStart scanning all configured directories. Rate limited to 2 requests per minute.
Request Body:
{
"force_rescan": false,
"directories": ["/media/photos", "/media/videos"]
}Response:
{
"message": "Scan started",
"directories": ["/media/photos", "/media/videos"],
"force_rescan": false
}POST /api/scan-parallelStart a parallel scan with multiple workers. Rate limited to 2 requests per minute.
Request Body:
{
"force_rescan": false,
"num_workers": 4,
"directories": ["/media/photos"]
}POST /api/scan/parallel-v2Start an enhanced parallel scan that distributes work across all available Celery workers.
Request Body:
{
"directories": ["/media/photos", "/media/videos"],
"force_rescan": false,
"chunk_size": 100
}Response:
{
"scan_id": "uuid-string",
"message": "Enhanced parallel scan started",
"total_workers": 8,
"chunks_created": 42
}GET /api/scan/parallel-v2/status/<scan_id>Get detailed status of an enhanced parallel scan including chunk progress.
Response:
{
"scan_id": "uuid-string",
"status": "running",
"total_chunks": 42,
"completed_chunks": 15,
"progress_percentage": 35.7,
"estimated_time_remaining": "5 minutes",
"worker_status": {
"active": 8,
"idle": 0
}
}GET /api/scan/parallel-v2/workersGet current status and utilization of all Celery workers.
Response:
{
"total_workers": 8,
"active_workers": 6,
"idle_workers": 2,
"worker_details": [
{
"worker_id": "worker-1",
"status": "busy",
"current_task": "processing chunk 5"
}
]
}GET /api/scan-statusGet the current scan progress and status.
Response:
{
"current": 45,
"total": 100,
"file": "/media/video.mp4",
"status": "scanning",
"is_running": true,
"scan_id": 123,
"start_time": "2025-01-20T12:00:00Z",
"end_time": null,
"directories": ["/media/photos"],
"force_rescan": false
}Status Values:
idle: No scan runninginitializing: Preparing to scandiscovering: Finding media filesscanning: Scanning filescompleted: Scan finishedcancelled: Scan was cancellederror: Scan encountered an error
POST /api/cancel-scanCancel the currently running scan.
GET /api/stats/summaryGet overall statistics about scanned files.
Response:
{
"total_files": 1000,
"scanned_files": 950,
"corrupted_files": 10,
"healthy_files": 940,
"pending_files": 50,
"error_files": 5,
"total_size": 10737418240,
"corrupted_size": 52428800,
"last_scan_date": "2025-01-20T12:00:00Z",
"corruption_rate": 1.05
}GET /api/stats/corruption-by-typeGet corruption statistics grouped by file type.
Response:
[
{
"file_type": "image/jpeg",
"total_files": 500,
"corrupted_files": 5,
"corruption_rate": 1.0
},
{
"file_type": "video/mp4",
"total_files": 200,
"corrupted_files": 3,
"corruption_rate": 1.5
}
]GET /api/stats/scan-history?days=30Get scan history for the specified number of days.
Response:
[
{
"date": "2025-01-20",
"files_scanned": 100,
"corrupted_found": 2
},
{
"date": "2025-01-19",
"files_scanned": 150,
"corrupted_found": 1
}
]POST /api/mark-as-goodMark files as healthy/good (removes corruption flag). Rate limited to 10 requests per minute.
Request Body:
{
"file_ids": [1, 2, 3, 4, 5]
}GET /api/ignored-patternsGet all ignored error patterns.
POST /api/ignored-patternsAdd a new pattern to ignore in error detection.
Request Body:
{
"pattern": "moov atom not found",
"description": "Common false positive for certain MP4 files"
}GET /api/configurationsGet all scan directory configurations.
POST /api/configurationsAdd a new directory to scan.
Request Body:
{
"path": "/media/new-photos"
}GET /api/error-filesRetrieve a list of all files that failed to scan, with detailed error information. Rate limited to 10 requests per minute.
Use this to review scan failures, identify error patterns, or find files to retry.
Query Parameters:
page(integer): Page number (default: 1)per_page(integer): Results per page (default: 100, use -1 for all)sort_field(string): Field to sort by -scan_date,file_path,file_size,file_type,scan_duration(default: scan_date)sort_order(string): Sort order -ascordesc(default: desc)search(string): Filter by file path (optional, case-insensitive)
Response:
{
"error_files": [
{
"id": 123,
"file_path": "/media/videos/corrupted.mp4",
"file_name": "corrupted.mp4",
"file_size": 15728640,
"file_type": "video/mp4",
"scan_status": "error",
"error_message": "SQLAlchemy session error: This Session's transaction has been rolled back",
"scan_date": "2025-01-20T15:30:00Z",
"scan_duration": 2.5,
"tool_name": "ffmpeg",
"discovered_date": "2025-01-19T10:00:00Z",
"last_modified": "2025-01-18T08:00:00Z"
}
],
"total": 32,
"pages": 1,
"current_page": 1,
"per_page": 100
}Usage Examples:
Get all error files:
curl -H "Authorization: Bearer your-token" \
http://localhost:5000/api/error-filesSearch for specific errors:
curl -H "Authorization: Bearer your-token" \
"http://localhost:5000/api/error-files?search=videos&sort_field=file_size&sort_order=desc"Get paginated results:
curl -H "Authorization: Bearer your-token" \
"http://localhost:5000/api/error-files?page=1&per_page=50"Python example:
import requests
headers = {'Authorization': 'Bearer your-token'}
response = requests.get(
'http://localhost:5000/api/error-files',
headers=headers,
params={
'search': 'mp4',
'sort_field': 'scan_date',
'sort_order': 'desc',
'per_page': 100
}
)
error_files = response.json()
print(f"Found {error_files['total']} files with errors")
for file in error_files['error_files']:
print(f"{file['file_path']}: {file['error_message']}")Note: Files with scan_status='error' indicate the scanning process failed, not that the file is corrupted. These errors may be due to:
- Database connection issues (temporary)
- Unsupported file formats
- Permission issues
- Corrupted file metadata
- Tool failures (ffmpeg, exiftool, etc.)
After fixing underlying issues (e.g., database problems), use the /api/reset-files-by-path endpoint to reset error files to 'pending' status for rescanning.
GET /api/export?format=csv
POST /api/exportExport scan results in multiple formats (CSV, JSON, or PDF).
Query Parameters (GET):
format(string): Output format -csv,json, orpdf(default: csv)scan_status(string): Filter by status -all,pending,completed,erroris_corrupted(string): Filter by corruption -all,true,falsestart_date(string): Start date in ISO formatend_date(string): End date in ISO format
Request Body (POST):
{
"format": "pdf",
"filters": {
"scan_status": "completed",
"is_corrupted": "true",
"start_date": "2025-01-01",
"end_date": "2025-01-31"
}
}Response: File download in requested format
POST /api/export/csvExport scan results to CSV format (legacy endpoint, use /api/export instead).
Request Body:
{
"filters": {
"scan_status": "completed",
"is_corrupted": "true",
"start_date": "2025-01-01",
"end_date": "2025-01-31"
}
}Response: CSV file download
POST /api/cleanupRemove database entries for files that no longer exist. Rate limited to 10 requests per minute.
Request Body:
{
"dry_run": true,
"directories": ["/media/photos"]
}Response:
{
"missing_files": 10,
"cleaned_files": 0,
"dry_run": true
}POST /api/vacuumOptimize the database by running VACUUM. Rate limited to 5 requests per minute.
GET /api/logs?level=ERROR&per_page=50Get paginated log entries with optional filters.
Query Parameters:
since(string): ISO timestamp for polling (returns only newer entries)scan_id(string): Filter by scan run ("system" for non-scan logs)level(string): Minimum log level (DEBUG, INFO, WARNING, ERROR, CRITICAL)search(string): Search text on message (case-insensitive)start_time/end_time(string): Time range filterpage/per_page(integer): Pagination (default 200 per page, max 1000)
GET /api/logs/runsList scan/job runs with log entry counts.
GET /api/logs/download?level=WARNINGDownload filtered logs as a .log text file.
GET /api/logs/retention
PUT /api/logs/retentionGet or set log retention period (days).
POST /api/logs/purgeManually purge log entries. Requires at least one filter parameter.
GET /api/scan-pathsGet list of active configured scan paths for the path filter dropdown.
import requests
# Base URL
BASE_URL = "http://localhost:5000"
# Get scan results
response = requests.get(f"{BASE_URL}/api/scan-results", params={
"page": 1,
"per_page": 50,
"is_corrupted": "true"
})
results = response.json()
# Start a scan
response = requests.post(f"{BASE_URL}/api/scan-all", json={
"force_rescan": False,
"directories": ["/media/photos"]
})
# Check scan status
response = requests.get(f"{BASE_URL}/api/scan-status")
status = response.json()
print(f"Progress: {status['current']}/{status['total']}")const axios = require('axios');
const BASE_URL = 'http://localhost:5000';
// Get scan results
async function getScanResults() {
const response = await axios.get(`${BASE_URL}/api/scan-results`, {
params: {
page: 1,
per_page: 50,
is_corrupted: 'true'
}
});
return response.data;
}
// Start a scan
async function startScan() {
const response = await axios.post(`${BASE_URL}/api/scan-all`, {
force_rescan: false,
directories: ['/media/photos']
});
return response.data;
}# Get scan results
curl -X GET "http://localhost:5000/api/scan-results?is_corrupted=true"
# Start a scan
curl -X POST "http://localhost:5000/api/scan-all" \
-H "Content-Type: application/json" \
-d '{"force_rescan": false, "directories": ["/media/photos"]}'
# Check scan status
curl -X GET "http://localhost:5000/api/scan-status"Future versions will include WebSocket support for real-time updates:
scan:progress: Scan progress updatesscan:complete: Scan completion notificationscan:error: Scan error notification
- Check scan status before starting a new scan to avoid conflicts
- Use pagination when retrieving large result sets
- Implement exponential backoff when rate limited
- Validate file paths before submitting scan requests
- Use dry_run for cleanup operations to preview changes
- Monitor rate limit headers to avoid hitting limits
- Path Validation: All file paths are validated against configured allowed directories
- Input Validation: All inputs are validated for type and length
- Rate Limiting: Prevents abuse and DoS attacks
- CSRF Protection: Enabled for web interface (API endpoints currently exempt)
- Command Injection: All subprocess calls use validated arguments
409 Conflict - "Another scan is already in progress"
- Solution: Wait for current scan to complete or cancel it
400 Bad Request - "Invalid file path"
- Solution: Ensure file path is within allowed directories
429 Too Many Requests
- Solution: Implement rate limiting in your client
500 Internal Server Error
- Solution: Check server logs for details
Include these headers for debugging:
X-Request-ID: Unique request identifierX-Response-Time: Server processing time