Skip to content

fix: Resolve socket hang up errors for large file execution#66

Merged
usnavy13 merged 1 commit into
devfrom
fix/large-file-socket-hangup
Mar 5, 2026
Merged

fix: Resolve socket hang up errors for large file execution#66
usnavy13 merged 1 commit into
devfrom
fix/large-file-socket-hangup

Conversation

@usnavy13
Copy link
Copy Markdown
Owner

@usnavy13 usnavy13 commented Mar 5, 2026

Summary

  • Fixes "socket hang up" errors when executing code with large files (>40MB) from LibreChat
  • Root cause: Node.js 20 sets a default 5-second socket timeout; long-running server operations (cold sandbox starts, large file mounting) exceed this, killing the TCP connection
  • Three complementary server-side fixes applied without any client code changes

Changes

Streaming keepalive on /exec (src/api/exec.py)

  • Converts the endpoint to use StreamingResponse that sends a space character every 3 seconds while execution is running
  • JSON parsers ignore leading whitespace, so this is fully transparent to all clients
  • Properly re-raises ValidationError/ServiceUnavailableError so FastAPI exception handlers still return correct HTTP status codes
  • Preserves OpenAPI schema via responses parameter

Non-blocking file I/O (src/services/file.py, src/services/execution/runner.py)

  • Fixed get_file_content(): response.read() was running synchronously on the asyncio event loop, blocking all concurrent HTTP connections during large file downloads from MinIO
  • Added stream_file_to_path() using MinIO's fget_object for direct disk-to-disk transfer without loading files into memory
  • Updated _mount_files_to_sandbox() to use streaming, with non-blocking os.chown/os.chmod
  • Added stream_file_to_path to FileServiceInterface

Configuration (.env)

  • Increased SANDBOX_POOL_PY from 2 to 5 to reduce cold-start frequency

Test plan

  • New functional test: tests/functional/test_concurrent_file_exec.py — uploads 50MB CSV, fires 5 concurrent file execs + 3 simple pings, asserts pings are not blocked
  • Verified fix from LibreChat container using node-fetch (exact CodeExecutor flow)
  • Verified existing functional tests pass
  • Lint checks pass (black, flake8)

🤖 Generated with Claude Code

Node.js 20 sets a default 5-second socket timeout on HTTP connections.
When code execution takes longer (cold sandbox starts, large file
mounting, heavy pandas operations), the client destroys the socket
before the server responds, causing "socket hang up" errors.

Three fixes applied:

1. Streaming keepalive on /exec endpoint: sends whitespace every 3s
   to keep the TCP connection alive during long operations. JSON
   parsers ignore leading whitespace so this is fully transparent.

2. Non-blocking file I/O: moved MinIO response.read() into the thread
   pool executor (was blocking the asyncio event loop), and added
   stream_file_to_path() using fget_object for direct disk-to-disk
   transfer without loading files into memory.

3. Increased default sandbox pool size (SANDBOX_POOL_PY=5) to reduce
   cold-start frequency under concurrent load.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@usnavy13 usnavy13 merged commit 2529fe6 into dev Mar 5, 2026
2 checks passed
djuillard pushed a commit to On-Behalf-AI/LibreCodeInterpreter that referenced this pull request Apr 21, 2026
…ngup

fix: Resolve socket hang up errors for large file execution
@usnavy13 usnavy13 deleted the fix/large-file-socket-hangup branch May 7, 2026 02:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant