Skip to content

[Proposal] Unified Loader Entry Point #65

@ABNER-1

Description

@ABNER-1

Motivation

As inference engine (e.g., vLLM) users, the storage backend for model weights varies across deployment environments:

Deployment Environment Expected Loading Method
Local NVMe SSD + CUDA SafeTensorsFileLoader (GDS, auto-fallback to nogds)
3FS Distributed File System ThreeFSLoader / ParallelThreeFSLoader
Other distributed storage (future) New loader implementations

Pain point: Switching storage backends requires modifying engine code (different imports, different constructor parameters). This means:

  • Engine maintainers must maintain separate code branches for each storage type — not sustainable
  • End users cannot switch backends via simple configuration
  • The same engine codebase cannot be reused across different storage environments

Goal: Engine code uses a single unified entry point; users switch the underlying loader via environment variables or config files:

# Default (local GDS/nogds)
python run_inference.py

# Switch to 3FS
FASTSAFETENSORS_LOADER=threefs python run_inference.py

# Fine-grained control via config file
FASTSAFETENSORS_CONFIG=/path/to/config.json python run_inference.py

Current State

PipelineParallel (base class: producer-consumer parallel loading framework)
├── ParallelLoader         → internally creates SafeTensorsFileLoader
└── ParallelThreeFSLoader  → internally creates ThreeFSLoader
SafeTensorsFileLoader  -> can also be used via ParallelLoader with queue_size=-1, but not intuitive
  • Both differ only in which underlying loader is created in the constructor; all other logic (iterate_weights(), iterator, scheduling) is identical
  • Existing env var conventions: FASTSAFETENSORS_DEBUG, FASTSAFETENSORS_UNIFIED_MEM

Approach Comparison

Approach A: New Unified Entry Class (e.g., FastLoader)

from fastsafetensors import FastLoader
loader = FastLoader(pg, files, device="cuda:0")  # auto-selects underlying loader via env/config
for key, tensor in loader.iterate_weights():
    process(key, tensor)
loader.close()
  • Pros: Purely additive — no changes to existing classes; best backward compatibility; clean separation of concerns
  • Cons: New class name; one extra layer of delegation

Approach B: Extend Existing ParallelLoader

from fastsafetensors import ParallelLoader
loader = ParallelLoader(pg, files, device="cuda:0")  # auto-selects underlying loader via env/config
  • Pros: Zero learning curve — entry point stays the same
  • Cons: Modifies existing class behavior; nogds/bbuf_size_kb params would be ignored in threefs mode, which may cause confusion

Backward compatibility: Both approaches behave identically to today when no new env vars are set. Approach A leaves existing ParallelLoader completely untouched; Approach B defaults to loader="base", so existing calls are unaffected.


Configuration Design

Principles

  • Static config (env / config file): loader type, framework, debug toggle, loader-specific tuning params
  • Runtime params (passed in code): device, pg, hf_weights_files — never in config files
  • Priority: env vars > config file > code params > defaults

Environment Variables

Variable Description Values Default
FASTSAFETENSORS_LOADER Loader type base, threefs base
FASTSAFETENSORS_CONFIG Config file path file path none
FASTSAFETENSORS_FRAMEWORK DL framework pytorch, paddle none
FASTSAFETENSORS_DEBUG Debug logging (existing) true, false false

Config File (JSON)

Specified via FASTSAFETENSORS_CONFIG:

{
  "loader": "threefs",
  "framework": "pytorch",
  "debug_log": false,
  "base": {
    "nogds": false,
    "bbuf_size_kb": 16384,
    "max_threads": 16
  },
  "threefs": {},
  "parallel": {
    "max_concurrent_producers": 1,
    "queue_size": 0,
    "use_tqdm_on_load": true
  }
}
  • loader: "base"SafeTensorsFileLoader (handles GDS/nogds fallback internally), "threefs"ThreeFSLoader
  • base / threefs: loader-specific tuning params
  • parallel: PipelineParallel-level tuning params

Discussion

The above proposal is based on real-world inference engine integration scenarios. We'd love to hear the maintainers' thoughts:

  1. Do you agree with the direction of unifying the loader entry point via configuration?
  2. If so, any suggestions or alternative approaches you'd prefer?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions