[Proposal] Unified Loader Entry Point

## Motivation

As inference engine (e.g., vLLM) users, the storage backend for model weights varies across deployment environments:

| Deployment Environment | Expected Loading Method |
|------------------------|------------------------|
| Local NVMe SSD + CUDA | `SafeTensorsFileLoader` (GDS, auto-fallback to nogds) |
| 3FS Distributed File System | `ThreeFSLoader` / `ParallelThreeFSLoader` |
| Other distributed storage (future) | New loader implementations |

**Pain point**: Switching storage backends requires modifying engine code (different imports, different constructor parameters). This means:
- Engine maintainers must maintain separate code branches for each storage type — not sustainable
- End users cannot switch backends via simple configuration
- The same engine codebase cannot be reused across different storage environments

**Goal**: Engine code uses a single unified entry point; users switch the underlying loader via environment variables or config files:

```bash
# Default (local GDS/nogds)
python run_inference.py

# Switch to 3FS
FASTSAFETENSORS_LOADER=threefs python run_inference.py

# Fine-grained control via config file
FASTSAFETENSORS_CONFIG=/path/to/config.json python run_inference.py
```

---

## Current State

```
PipelineParallel (base class: producer-consumer parallel loading framework)
├── ParallelLoader         → internally creates SafeTensorsFileLoader
└── ParallelThreeFSLoader  → internally creates ThreeFSLoader
SafeTensorsFileLoader  -> can also be used via ParallelLoader with queue_size=-1, but not intuitive
```

- Both differ **only in which underlying loader is created in the constructor**; all other logic (`iterate_weights()`, iterator, scheduling) is identical
- Existing env var conventions: `FASTSAFETENSORS_DEBUG`, `FASTSAFETENSORS_UNIFIED_MEM`

---

## Approach Comparison

### Approach A: New Unified Entry Class (e.g., `FastLoader`)

```python
from fastsafetensors import FastLoader
loader = FastLoader(pg, files, device="cuda:0")  # auto-selects underlying loader via env/config
for key, tensor in loader.iterate_weights():
    process(key, tensor)
loader.close()
```

- **Pros**: Purely additive — no changes to existing classes; best backward compatibility; clean separation of concerns
- **Cons**: New class name; one extra layer of delegation

### Approach B: Extend Existing `ParallelLoader`

```python
from fastsafetensors import ParallelLoader
loader = ParallelLoader(pg, files, device="cuda:0")  # auto-selects underlying loader via env/config
```

- **Pros**: Zero learning curve — entry point stays the same
- **Cons**: Modifies existing class behavior; `nogds`/`bbuf_size_kb` params would be ignored in `threefs` mode, which may cause confusion

**Backward compatibility**: Both approaches behave identically to today when no new env vars are set. Approach A leaves existing `ParallelLoader` completely untouched; Approach B defaults to `loader="base"`, so existing calls are unaffected.

---

## Configuration Design

### Principles

- **Static config** (env / config file): loader type, framework, debug toggle, loader-specific tuning params
- **Runtime params** (passed in code): `device`, `pg`, `hf_weights_files` — never in config files
- **Priority**: env vars > config file > code params > defaults

### Environment Variables

| Variable | Description | Values | Default |
|----------|-------------|--------|---------|
| `FASTSAFETENSORS_LOADER` | Loader type | `base`, `threefs` | `base` |
| `FASTSAFETENSORS_CONFIG` | Config file path | file path | none |
| `FASTSAFETENSORS_FRAMEWORK` | DL framework | `pytorch`, `paddle` | none |
| `FASTSAFETENSORS_DEBUG` | Debug logging (existing) | `true`, `false` | `false` |

### Config File (JSON)

Specified via `FASTSAFETENSORS_CONFIG`:

```json
{
  "loader": "threefs",
  "framework": "pytorch",
  "debug_log": false,
  "base": {
    "nogds": false,
    "bbuf_size_kb": 16384,
    "max_threads": 16
  },
  "threefs": {},
  "parallel": {
    "max_concurrent_producers": 1,
    "queue_size": 0,
    "use_tqdm_on_load": true
  }
}
```

- `loader`: `"base"` → `SafeTensorsFileLoader` (handles GDS/nogds fallback internally), `"threefs"` → `ThreeFSLoader`
- `base` / `threefs`: loader-specific tuning params
- `parallel`: `PipelineParallel`-level tuning params

---

## Discussion

The above proposal is based on real-world inference engine integration scenarios. We'd love to hear the maintainers' thoughts:

1. Do you agree with the direction of unifying the loader entry point via configuration?
2. If so, any suggestions or alternative approaches you'd prefer?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Proposal] Unified Loader Entry Point #65

Motivation

Current State

Approach Comparison

Approach A: New Unified Entry Class (e.g., `FastLoader`)

Approach B: Extend Existing `ParallelLoader`

Configuration Design

Principles

Environment Variables

Config File (JSON)

Discussion

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Deployment Environment	Expected Loading Method
Local NVMe SSD + CUDA	`SafeTensorsFileLoader` (GDS, auto-fallback to nogds)
3FS Distributed File System	`ThreeFSLoader` / `ParallelThreeFSLoader`
Other distributed storage (future)	New loader implementations

Variable	Description	Values	Default
`FASTSAFETENSORS_LOADER`	Loader type	`base`, `threefs`	`base`
`FASTSAFETENSORS_CONFIG`	Config file path	file path	none
`FASTSAFETENSORS_FRAMEWORK`	DL framework	`pytorch`, `paddle`	none
`FASTSAFETENSORS_DEBUG`	Debug logging (existing)	`true`, `false`	`false`

[Proposal] Unified Loader Entry Point #65

Description

Motivation

Current State

Approach Comparison

Approach A: New Unified Entry Class (e.g., FastLoader)

Approach B: Extend Existing ParallelLoader

Configuration Design

Principles

Environment Variables

Config File (JSON)

Discussion

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Approach A: New Unified Entry Class (e.g., `FastLoader`)

Approach B: Extend Existing `ParallelLoader`