Skip to content

[Bug]: PyTorch loads unconditionally regardless of embedding provider — prevents memory savings when using external embeddings #1587

@zip-it-e-dee-doo-dah

Description

@zip-it-e-dee-doo-dah

Bug Description

When HINDSIGHT_API_EMBEDDINGS_PROVIDER is set to an external provider (e.g., openai), the hindsight-api Python process still loads PyTorch and local model files at startup, resulting in ~1.15 GB RSS baseline with no local model configured — only ~950 MB less than a BGE-local instance.

Steps to Reproduce

  • Configure hindsight-api with HINDSIGHT_API_EMBEDDINGS_PROVIDER=openai and a valid HINDSIGHT_API_EMBEDDINGS_OPENAI_API_KEY

  • Start the daemon (no local embedding model needed or configured)

  • Check mapped files: grep -c "torch|bge" /proc//maps

  • Observe PyTorch and BGE file mappings present despite no local model being used

  • Hindsight version: v0.6.5

  • Operating system: Ubuntu 24.04.4 LTS, Linux 6.8.0-111-generic x86_64

  • Install method: uvx hindsight-api@latest via uv, running as background daemon process

  • Model: N/A (embedding-provider configuration issue, not LLM)

  • Provider / routing chain: HINDSIGHT_API_EMBEDDINGS_PROVIDER=openai, HINDSIGHT_API_EMBEDDINGS_OPENAI_MODEL=text-embedding-3-small. No local sentence-transformers model configured.

Additional context
Discovered while migrating from local BGE-small embeddings to OpenAI external embeddings to reduce CPU load and RSS. The migration successfully eliminates embedding inference CPU saturation (the primary goal), but the memory savings were significantly smaller than expected due to PyTorch loading unconditionally.

Likely cause: import torch or from sentence_transformers import ... at module top-level in embeddings.py or a dependency, rather than inside the LocalEmbeddings class initializer. A lazy import pattern (importing only when EMBEDDINGS_PROVIDER=local) would fix this.

Expected Behavior

When an external embedding provider is configured, PyTorch and local model weights should not be imported or loaded. RSS baseline should reflect only the API server, database client, and LLM client — expected ~200–400 MB, not ~1.15 GB.

Actual Behavior

64 PyTorch/BGE file mappings present in /proc//maps on a daemon configured exclusively for OpenAI embeddings. RSS sits at ~1,152 MB at idle — consistent with PyTorch runtime being fully loaded despite serving zero inference requests.

'''
On daemon with HINDSIGHT_API_EMBEDDINGS_PROVIDER=openai
grep -c "torch|bge" /proc/249656/maps
Output: 64
'''

'''
RSS
grep VmRSS /proc/249656/status
VmRSS: 1178492 kB (~1.15 GB)
'''

For comparison, a BGE-local instance on the same host:
'''
grep VmRSS /proc/142163/status
VmRSS: 2166572 kB (~2.1 GB)
'''

Switching to external embeddings saves ~950 MB — but the expected saving was ~1.5–1.8 GB (PyTorch runtime + model weights). PyTorch appears to be imported unconditionally at module load time rather than lazily when a local provider is actually selected.

Version

Hindsight version: v0.6.5, Operating system: Ubuntu 24.04.4 LTS, Linux 6.8.0-111-generic x86_64

LLM Provider

OpenAI

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions