Skip to content

Latest commit

 

History

History
844 lines (679 loc) · 17.6 KB

File metadata and controls

844 lines (679 loc) · 17.6 KB

PixelProbe API Documentation

Overview

PixelProbe provides a RESTful API for managing media file corruption detection. The API is built with Flask and follows REST conventions.

Base URL

  • Development: http://localhost:5000
  • Production: https://pixelprobe.example.com

Authentication

As of v2.4.1, all API endpoints require authentication.

PixelProbe supports two authentication methods:

1. Session-Based Authentication (Web UI)

  • Used automatically when logged in through the web interface
  • Managed via secure HTTP-only cookies
  • Best for browser-based access

2. API Token Authentication (Programmatic Access)

  • Generate tokens through the web UI under Account → API Tokens
  • Include in requests using the Authorization header
  • Two formats are supported:
    • Standard: Authorization: Bearer <your-token>
    • Direct: Authorization: <your-token> (for Swagger UI compatibility)

Example with curl:

# Using Bearer format
curl -H "Authorization: Bearer your-api-token-here" \
     http://localhost:5000/api/scan-status

# Using direct format (Swagger UI style)
curl -H "Authorization: your-api-token-here" \
     http://localhost:5000/api/scan-status

Example with Python:

import requests

headers = {
    'Authorization': 'Bearer your-api-token-here'
}

response = requests.get('http://localhost:5000/api/scan-status', headers=headers)

Getting an API Token

  1. Log in to the web interface
  2. Navigate to Account → API Tokens
  3. Click "Create New Token"
  4. Provide a description
  5. Copy the generated token (it won't be shown again)

Rate Limiting

The API implements rate limiting to prevent abuse:

  • Default limits: 200 requests per day, 50 per hour
  • Scan operations: 2-5 requests per minute
  • Admin operations: 10 requests per minute
  • Maintenance operations: 5 requests per minute

Rate limit headers are included in responses:

  • X-RateLimit-Limit: Maximum requests allowed
  • X-RateLimit-Remaining: Requests remaining
  • X-RateLimit-Reset: Time when the limit resets

Request/Response Format

  • All requests must include Content-Type: application/json for POST requests
  • All responses are in JSON format
  • Dates are in ISO 8601 format
  • File sizes are in bytes

Error Handling

Errors are returned with appropriate HTTP status codes and a JSON body:

{
  "error": "Description of the error"
}

Common status codes:

  • 200: Success
  • 400: Bad Request (invalid input)
  • 401: Unauthorized (authentication required)
  • 403: Forbidden (insufficient permissions)
  • 404: Not Found
  • 409: Conflict (e.g., scan already running)
  • 429: Too Many Requests (rate limit exceeded)
  • 500: Internal Server Error

API Endpoints

System Endpoints

Health Check

GET /health

Check if the service is running.

Response:

{
  "status": "healthy",
  "version": "<current_version>",
  "timestamp": "2025-01-20T12:00:00Z"
}

Version

GET /api/version

Get version information.

Response:

{
  "version": "<current_version>",
  "github_url": "https://github.com/ttlequals0/PixelProbe",
  "api_version": "1.0"
}

Scan Endpoints

Get Scan Results

GET /api/scan-results?page=1&per_page=100&scan_status=all&is_corrupted=all

Get paginated scan results with optional filters.

Query Parameters:

  • page (integer): Page number (default: 1)
  • per_page (integer): Results per page (default: 100, max: 500)
  • scan_status (string): Filter by status: all, pending, scanning, completed, error
  • is_corrupted (string): Filter by corruption: all, true, false

Response:

{
  "results": [
    {
      "id": 1,
      "file_path": "/media/photos/image.jpg",
      "file_name": "image.jpg",
      "file_size": 2048576,
      "scan_date": "2025-01-20T12:00:00Z",
      "discovered_date": "2025-01-19T10:00:00Z",
      "last_modified": "2025-01-18T08:00:00Z",
      "file_hash": "sha256_hash_here",
      "scan_status": "completed",
      "error_message": null,
      "is_corrupted": false,
      "marked_as_good": false,
      "media_info": {
        "width": 1920,
        "height": 1080,
        "format": "JPEG"
      },
      "file_exists": true
    }
  ],
  "total": 150,
  "page": 1,
  "per_page": 100,
  "pages": 2
}

Get Single Scan Result

GET /api/scan-results/{result_id}

Get detailed information about a specific scan result.

Response: Same as individual result in the list above.

Scan Single File

POST /api/scan-file

Scan a single file for corruption. Rate limited to 5 requests per minute.

Request Body:

{
  "file_path": "/media/photos/image.jpg"
}

Response:

{
  "message": "Scan started",
  "file_path": "/media/photos/image.jpg"
}

Scan All Files

POST /api/scan-all

Start scanning all configured directories. Rate limited to 2 requests per minute.

Request Body:

{
  "force_rescan": false,
  "directories": ["/media/photos", "/media/videos"]
}

Response:

{
  "message": "Scan started",
  "directories": ["/media/photos", "/media/videos"],
  "force_rescan": false
}

Parallel Scan

POST /api/scan-parallel

Start a parallel scan with multiple workers. Rate limited to 2 requests per minute.

Request Body:

{
  "force_rescan": false,
  "num_workers": 4,
  "directories": ["/media/photos"]
}

Enhanced Parallel Scan V2

POST /api/scan/parallel-v2

Start an enhanced parallel scan that distributes work across all available Celery workers.

Request Body:

{
  "directories": ["/media/photos", "/media/videos"],
  "force_rescan": false,
  "chunk_size": 100
}

Response:

{
  "scan_id": "uuid-string",
  "message": "Enhanced parallel scan started",
  "total_workers": 8,
  "chunks_created": 42
}

Get Parallel Scan V2 Status

GET /api/scan/parallel-v2/status/<scan_id>

Get detailed status of an enhanced parallel scan including chunk progress.

Response:

{
  "scan_id": "uuid-string",
  "status": "running",
  "total_chunks": 42,
  "completed_chunks": 15,
  "progress_percentage": 35.7,
  "estimated_time_remaining": "5 minutes",
  "worker_status": {
    "active": 8,
    "idle": 0
  }
}

Get Worker Status

GET /api/scan/parallel-v2/workers

Get current status and utilization of all Celery workers.

Response:

{
  "total_workers": 8,
  "active_workers": 6,
  "idle_workers": 2,
  "worker_details": [
    {
      "worker_id": "worker-1",
      "status": "busy",
      "current_task": "processing chunk 5"
    }
  ]
}

Get Scan Status

GET /api/scan-status

Get the current scan progress and status.

Response:

{
  "current": 45,
  "total": 100,
  "file": "/media/video.mp4",
  "status": "scanning",
  "is_running": true,
  "scan_id": 123,
  "start_time": "2025-01-20T12:00:00Z",
  "end_time": null,
  "directories": ["/media/photos"],
  "force_rescan": false
}

Status Values:

  • idle: No scan running
  • initializing: Preparing to scan
  • discovering: Finding media files
  • scanning: Scanning files
  • completed: Scan finished
  • cancelled: Scan was cancelled
  • error: Scan encountered an error

Cancel Scan

POST /api/cancel-scan

Cancel the currently running scan.

Statistics Endpoints

Summary Statistics

GET /api/stats/summary

Get overall statistics about scanned files.

Response:

{
  "total_files": 1000,
  "scanned_files": 950,
  "corrupted_files": 10,
  "healthy_files": 940,
  "pending_files": 50,
  "error_files": 5,
  "total_size": 10737418240,
  "corrupted_size": 52428800,
  "last_scan_date": "2025-01-20T12:00:00Z",
  "corruption_rate": 1.05
}

Corruption by File Type

GET /api/stats/corruption-by-type

Get corruption statistics grouped by file type.

Response:

[
  {
    "file_type": "image/jpeg",
    "total_files": 500,
    "corrupted_files": 5,
    "corruption_rate": 1.0
  },
  {
    "file_type": "video/mp4",
    "total_files": 200,
    "corrupted_files": 3,
    "corruption_rate": 1.5
  }
]

Scan History

GET /api/stats/scan-history?days=30

Get scan history for the specified number of days.

Response:

[
  {
    "date": "2025-01-20",
    "files_scanned": 100,
    "corrupted_found": 2
  },
  {
    "date": "2025-01-19",
    "files_scanned": 150,
    "corrupted_found": 1
  }
]

Admin Endpoints

Mark Files as Good

POST /api/mark-as-good

Mark files as healthy/good (removes corruption flag). Rate limited to 10 requests per minute.

Request Body:

{
  "file_ids": [1, 2, 3, 4, 5]
}

Ignored Error Patterns

GET /api/ignored-patterns

Get all ignored error patterns.

POST /api/ignored-patterns

Add a new pattern to ignore in error detection.

Request Body:

{
  "pattern": "moov atom not found",
  "description": "Common false positive for certain MP4 files"
}

Scan Configurations

GET /api/configurations

Get all scan directory configurations.

POST /api/configurations

Add a new directory to scan.

Request Body:

{
  "path": "/media/new-photos"
}

Error Management Endpoints

Get Error Files

GET /api/error-files

Retrieve a list of all files that failed to scan, with detailed error information. Rate limited to 10 requests per minute.

Use this to review scan failures, identify error patterns, or find files to retry.

Query Parameters:

  • page (integer): Page number (default: 1)
  • per_page (integer): Results per page (default: 100, use -1 for all)
  • sort_field (string): Field to sort by - scan_date, file_path, file_size, file_type, scan_duration (default: scan_date)
  • sort_order (string): Sort order - asc or desc (default: desc)
  • search (string): Filter by file path (optional, case-insensitive)

Response:

{
  "error_files": [
    {
      "id": 123,
      "file_path": "/media/videos/corrupted.mp4",
      "file_name": "corrupted.mp4",
      "file_size": 15728640,
      "file_type": "video/mp4",
      "scan_status": "error",
      "error_message": "SQLAlchemy session error: This Session's transaction has been rolled back",
      "scan_date": "2025-01-20T15:30:00Z",
      "scan_duration": 2.5,
      "tool_name": "ffmpeg",
      "discovered_date": "2025-01-19T10:00:00Z",
      "last_modified": "2025-01-18T08:00:00Z"
    }
  ],
  "total": 32,
  "pages": 1,
  "current_page": 1,
  "per_page": 100
}

Usage Examples:

Get all error files:

curl -H "Authorization: Bearer your-token" \
  http://localhost:5000/api/error-files

Search for specific errors:

curl -H "Authorization: Bearer your-token" \
  "http://localhost:5000/api/error-files?search=videos&sort_field=file_size&sort_order=desc"

Get paginated results:

curl -H "Authorization: Bearer your-token" \
  "http://localhost:5000/api/error-files?page=1&per_page=50"

Python example:

import requests

headers = {'Authorization': 'Bearer your-token'}
response = requests.get(
    'http://localhost:5000/api/error-files',
    headers=headers,
    params={
        'search': 'mp4',
        'sort_field': 'scan_date',
        'sort_order': 'desc',
        'per_page': 100
    }
)

error_files = response.json()
print(f"Found {error_files['total']} files with errors")

for file in error_files['error_files']:
    print(f"{file['file_path']}: {file['error_message']}")

Note: Files with scan_status='error' indicate the scanning process failed, not that the file is corrupted. These errors may be due to:

  • Database connection issues (temporary)
  • Unsupported file formats
  • Permission issues
  • Corrupted file metadata
  • Tool failures (ffmpeg, exiftool, etc.)

After fixing underlying issues (e.g., database problems), use the /api/reset-files-by-path endpoint to reset error files to 'pending' status for rescanning.

Export Endpoints

Export Scan Results (Enhanced)

GET /api/export?format=csv
POST /api/export

Export scan results in multiple formats (CSV, JSON, or PDF).

Query Parameters (GET):

  • format (string): Output format - csv, json, or pdf (default: csv)
  • scan_status (string): Filter by status - all, pending, completed, error
  • is_corrupted (string): Filter by corruption - all, true, false
  • start_date (string): Start date in ISO format
  • end_date (string): End date in ISO format

Request Body (POST):

{
  "format": "pdf",
  "filters": {
    "scan_status": "completed",
    "is_corrupted": "true",
    "start_date": "2025-01-01",
    "end_date": "2025-01-31"
  }
}

Response: File download in requested format

Export to CSV (Legacy)

POST /api/export/csv

Export scan results to CSV format (legacy endpoint, use /api/export instead).

Request Body:

{
  "filters": {
    "scan_status": "completed",
    "is_corrupted": "true",
    "start_date": "2025-01-01",
    "end_date": "2025-01-31"
  }
}

Response: CSV file download

Maintenance Endpoints

Cleanup Missing Files

POST /api/cleanup

Remove database entries for files that no longer exist. Rate limited to 10 requests per minute.

Request Body:

{
  "dry_run": true,
  "directories": ["/media/photos"]
}

Response:

{
  "missing_files": 10,
  "cleaned_files": 0,
  "dry_run": true
}

Vacuum Database

POST /api/vacuum

Optimize the database by running VACUUM. Rate limited to 5 requests per minute.

Log Endpoints

Get Logs

GET /api/logs?level=ERROR&per_page=50

Get paginated log entries with optional filters.

Query Parameters:

  • since (string): ISO timestamp for polling (returns only newer entries)
  • scan_id (string): Filter by scan run ("system" for non-scan logs)
  • level (string): Minimum log level (DEBUG, INFO, WARNING, ERROR, CRITICAL)
  • search (string): Search text on message (case-insensitive)
  • start_time / end_time (string): Time range filter
  • page / per_page (integer): Pagination (default 200 per page, max 1000)

Get Log Runs

GET /api/logs/runs

List scan/job runs with log entry counts.

Download Logs

GET /api/logs/download?level=WARNING

Download filtered logs as a .log text file.

Log Retention

GET /api/logs/retention
PUT /api/logs/retention

Get or set log retention period (days).

Purge Logs

POST /api/logs/purge

Manually purge log entries. Requires at least one filter parameter.

Get Scan Paths

GET /api/scan-paths

Get list of active configured scan paths for the path filter dropdown.

Code Examples

Python

import requests

# Base URL
BASE_URL = "http://localhost:5000"

# Get scan results
response = requests.get(f"{BASE_URL}/api/scan-results", params={
    "page": 1,
    "per_page": 50,
    "is_corrupted": "true"
})
results = response.json()

# Start a scan
response = requests.post(f"{BASE_URL}/api/scan-all", json={
    "force_rescan": False,
    "directories": ["/media/photos"]
})

# Check scan status
response = requests.get(f"{BASE_URL}/api/scan-status")
status = response.json()
print(f"Progress: {status['current']}/{status['total']}")

JavaScript (Node.js)

const axios = require('axios');

const BASE_URL = 'http://localhost:5000';

// Get scan results
async function getScanResults() {
  const response = await axios.get(`${BASE_URL}/api/scan-results`, {
    params: {
      page: 1,
      per_page: 50,
      is_corrupted: 'true'
    }
  });
  return response.data;
}

// Start a scan
async function startScan() {
  const response = await axios.post(`${BASE_URL}/api/scan-all`, {
    force_rescan: false,
    directories: ['/media/photos']
  });
  return response.data;
}

cURL

# Get scan results
curl -X GET "http://localhost:5000/api/scan-results?is_corrupted=true"

# Start a scan
curl -X POST "http://localhost:5000/api/scan-all" \
  -H "Content-Type: application/json" \
  -d '{"force_rescan": false, "directories": ["/media/photos"]}'

# Check scan status
curl -X GET "http://localhost:5000/api/scan-status"

WebSocket Events (Future)

Future versions will include WebSocket support for real-time updates:

  • scan:progress: Scan progress updates
  • scan:complete: Scan completion notification
  • scan:error: Scan error notification

Best Practices

  1. Check scan status before starting a new scan to avoid conflicts
  2. Use pagination when retrieving large result sets
  3. Implement exponential backoff when rate limited
  4. Validate file paths before submitting scan requests
  5. Use dry_run for cleanup operations to preview changes
  6. Monitor rate limit headers to avoid hitting limits

Security Considerations

  1. Path Validation: All file paths are validated against configured allowed directories
  2. Input Validation: All inputs are validated for type and length
  3. Rate Limiting: Prevents abuse and DoS attacks
  4. CSRF Protection: Enabled for web interface (API endpoints currently exempt)
  5. Command Injection: All subprocess calls use validated arguments

Troubleshooting

Common Errors

409 Conflict - "Another scan is already in progress"

  • Solution: Wait for current scan to complete or cancel it

400 Bad Request - "Invalid file path"

  • Solution: Ensure file path is within allowed directories

429 Too Many Requests

  • Solution: Implement rate limiting in your client

500 Internal Server Error

  • Solution: Check server logs for details

Debug Headers

Include these headers for debugging:

  • X-Request-ID: Unique request identifier
  • X-Response-Time: Server processing time