[Bug]:  PyTorch loads unconditionally regardless of embedding provider — prevents memory savings when using external embeddings

### Bug Description

When HINDSIGHT_API_EMBEDDINGS_PROVIDER is set to an external provider (e.g., openai), the hindsight-api Python process still loads PyTorch and local model files at startup, resulting in ~1.15 GB RSS baseline with no local model configured — only ~950 MB less than a BGE-local instance.

### Steps to Reproduce

- Configure hindsight-api with HINDSIGHT_API_EMBEDDINGS_PROVIDER=openai and a valid HINDSIGHT_API_EMBEDDINGS_OPENAI_API_KEY
- Start the daemon (no local embedding model needed or configured)
- Check mapped files: grep -c "torch\|bge" /proc/<hindsight-pid>/maps
- Observe PyTorch and BGE file mappings present despite no local model being used


- **Hindsight version:** v0.6.5
- **Operating system:** Ubuntu 24.04.4 LTS, Linux 6.8.0-111-generic x86_64
- **Install method:** uvx hindsight-api@latest via uv, running as background daemon process
- **Model:** N/A (embedding-provider configuration issue, not LLM)
- **Provider / routing chain:** HINDSIGHT_API_EMBEDDINGS_PROVIDER=openai, HINDSIGHT_API_EMBEDDINGS_OPENAI_MODEL=text-embedding-3-small. No local sentence-transformers model configured.

**Additional context**
Discovered while migrating from local BGE-small embeddings to OpenAI external embeddings to reduce CPU load and RSS. The migration successfully eliminates embedding inference CPU saturation (the primary goal), but the memory savings were significantly smaller than expected due to PyTorch loading unconditionally.

Likely cause: import torch or from sentence_transformers import ... at module top-level in embeddings.py or a dependency, rather than inside the LocalEmbeddings class initializer. A lazy import pattern (importing only when EMBEDDINGS_PROVIDER=local) would fix this.

### Expected Behavior

When an external embedding provider is configured, PyTorch and local model weights should not be imported or loaded. RSS baseline should reflect only the API server, database client, and LLM client — expected ~200–400 MB, not ~1.15 GB.

### Actual Behavior

64 PyTorch/BGE file mappings present in /proc/<pid>/maps on a daemon configured exclusively for OpenAI embeddings. RSS sits at ~1,152 MB at idle — consistent with PyTorch runtime being fully loaded despite serving zero inference requests.

'''
On daemon with HINDSIGHT_API_EMBEDDINGS_PROVIDER=openai
grep -c "torch\|bge" /proc/249656/maps
Output: 64
'''

'''
RSS
grep VmRSS /proc/249656/status
VmRSS: 1178492 kB  (~1.15 GB)
'''

For comparison, a BGE-local instance on the same host:
'''
grep VmRSS /proc/142163/status
VmRSS: 2166572 kB  (~2.1 GB)
'''

Switching to external embeddings saves ~950 MB — but the expected saving was ~1.5–1.8 GB (PyTorch runtime + model weights). PyTorch appears to be imported unconditionally at module load time rather than lazily when a local provider is actually selected.

### Version

Hindsight version: v0.6.5, Operating system: Ubuntu 24.04.4 LTS, Linux 6.8.0-111-generic x86_64

### LLM Provider

OpenAI

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: PyTorch loads unconditionally regardless of embedding provider — prevents memory savings when using external embeddings #1587

Bug Description

Steps to Reproduce

Expected Behavior

Actual Behavior

Version

LLM Provider

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug]: PyTorch loads unconditionally regardless of embedding provider — prevents memory savings when using external embeddings #1587

Description

Bug Description

Steps to Reproduce

Expected Behavior

Actual Behavior

Version

LLM Provider

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions