Skip to content

feat(files): stream uploads to disk instead of buffering#9527

Open
mscolnick wants to merge 1 commit into
mainfrom
ms/feature/files-create-streaming-upload
Open

feat(files): stream uploads to disk instead of buffering#9527
mscolnick wants to merge 1 commit into
mainfrom
ms/feature/files-create-streaming-upload

Conversation

@mscolnick
Copy link
Copy Markdown
Contributor

Previously /api/files/create read the entire UploadFile into memory
before writing to disk, peaking at ~100 MB for a 100 MB upload. Now we
drain the upload in 1 MiB chunks straight to a .part temp file and
atomically rename on success, so peak memory stays bounded regardless
of file size and a failed upload never leaves a half-written file at
the final path.

  • OSFileSystem.stream_create_file does the chunked write with atomic
    rename and cleanup on failure
  • parse_multipart_request now hands callers un-read UploadFile
    handles (instead of pre-read bytes), so streaming is possible without
    giving up the small-payload .read() path
  • Directories and the default-template notebook still go through the
    in-memory create_file_or_directory path; only real file content
    streams

Copilot AI review requested due to automatic review settings May 12, 2026 20:03
@vercel
Copy link
Copy Markdown

vercel Bot commented May 12, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
marimo-docs Ready Ready Preview, Comment May 12, 2026 8:38pm

Request Review

@github-actions github-actions Bot added the bash-focus Area to focus on during release bug bash label May 12, 2026
@mscolnick mscolnick requested a review from kirangadhave May 12, 2026 20:06
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 8 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="marimo/_server/files/os_file_system.py">

<violation number="1" location="marimo/_server/files/os_file_system.py:243">
P1: This re-loads the whole uploaded file into memory after streaming. Return metadata directly instead of calling `get_details()` here.</violation>
</file>
Architecture diagram
sequenceDiagram
    participant Client
    participant API as API: create_file_or_directory
    participant Utils as Utils: parse_multipart_request
    participant Starlette as Starlette: UploadFile
    participant FS as FS: OSFileSystem
    participant Disk as Local Disk

    Client->>API: POST /api/files/create (Multipart)
    API->>Utils: CHANGED: parse_multipart_request()
    Utils->>Starlette: Access request form data
    Note over Utils,Starlette: CHANGED: No longer calls .read() <br/>to buffer full content into memory
    Utils-->>API: Return MultipartRequest (contains UploadFile handles)

    alt NEW: Streaming Path (Type is "file" or "notebook")
        API->>FS: NEW: stream_create_file(path, name, upload_file)
        FS->>FS: _validate_create_name()
        FS->>Disk: Create unique temporary file (path/name.part)
        
        loop Every 1 MiB Chunk
            FS->>Starlette: CHANGED: read(1024 * 1024)
            Starlette-->>FS: bytes
            FS->>Disk: write chunk to .part file
        end

        alt Success
            FS->>Disk: NEW: Atomic rename (.part to final name)
            FS->>FS: get_details(final_path)
            FS-->>API: Return FileInfo
        else Failure (e.g. Disconnect/IO Error)
            FS->>Disk: NEW: unlink/cleanup .part file
            FS-->>API: Raise Exception
        end

    else Traditional Path (Directory or Template)
        API->>FS: create_file_or_directory(..., contents=None)
        FS->>Disk: mkdir or write default template
        FS-->>API: Return FileInfo
    end

    API-->>Client: 200 OK (FileCreateResponse)
Loading

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Comment thread marimo/_server/files/os_file_system.py Outdated
@mscolnick mscolnick force-pushed the ms/feature/files-create-streaming-upload branch from 977f8ec to 2612cf9 Compare May 12, 2026 20:06
@mscolnick mscolnick added the enhancement New feature or request label May 12, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves file upload handling for POST /api/files/create by enabling streaming writes to disk (instead of buffering entire uploads in memory), and updates multipart parsing to return un-read UploadFile handles so callers can stream.

Changes:

  • Added OSFileSystem.stream_create_file() to drain an async byte source into a .part temp file and atomically rename on success.
  • Updated parse_multipart_request() to return raw UploadFile objects (not pre-read bytes) and adjusted the /api/files/create endpoint to stream uploads.
  • Updated tests and OpenAPI docs to reflect the new multipart/file-upload behavior.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tests/_server/files/test_os_file_system.py Adds async tests covering chunked streaming, unique names, traversal rejection, and cleanup on failure.
tests/_server/api/test_api_utils.py Updates multipart parsing test to expect UploadFile handles and reads bytes in-endpoint.
marimo/_server/files/os_file_system.py Introduces streaming upload write path + shared name validation.
marimo/_server/api/utils.py Changes multipart parsing to return unconsumed UploadFile objects for streaming.
marimo/_server/api/endpoints/file_explorer.py Switches /create to stream uploads when provided.
marimo/_server/models/files.py Clarifies FileCreateMultipartRequest is schema-only and explains runtime behavior.
packages/openapi/api.yaml Regenerates/adjusts OpenAPI descriptions (including multipart schema clarification).
packages/openapi/src/api.ts Regenerates/adjusts TS OpenAPI typings/doc comments for multipart clarification/formatting.

Comment thread marimo/_server/files/os_file_system.py
Comment thread marimo/_server/api/endpoints/file_explorer.py Outdated
Comment thread marimo/_server/api/utils.py Outdated
Previously /api/files/create read the entire UploadFile into memory
before writing to disk, peaking at ~100 MB for a 100 MB upload. Now we
drain the upload in 1 MiB chunks straight to a `.part` temp file and
atomically rename on success, so peak memory stays bounded regardless
of file size and a failed upload never leaves a half-written file at
the final path.

- `OSFileSystem.stream_create_file` does the chunked write with atomic
  rename and cleanup on failure
- `parse_multipart_request` now hands callers un-read `UploadFile`
  handles (instead of pre-read bytes), so streaming is possible without
  giving up the small-payload `.read()` path
- Directories and the default-template notebook still go through the
  in-memory `create_file_or_directory` path; only real file content
  streams
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bash-focus Area to focus on during release bug bash enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants