feat(files): stream uploads to disk instead of buffering#9527
Open
mscolnick wants to merge 1 commit into
Open
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Contributor
There was a problem hiding this comment.
1 issue found across 8 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="marimo/_server/files/os_file_system.py">
<violation number="1" location="marimo/_server/files/os_file_system.py:243">
P1: This re-loads the whole uploaded file into memory after streaming. Return metadata directly instead of calling `get_details()` here.</violation>
</file>
Architecture diagram
sequenceDiagram
participant Client
participant API as API: create_file_or_directory
participant Utils as Utils: parse_multipart_request
participant Starlette as Starlette: UploadFile
participant FS as FS: OSFileSystem
participant Disk as Local Disk
Client->>API: POST /api/files/create (Multipart)
API->>Utils: CHANGED: parse_multipart_request()
Utils->>Starlette: Access request form data
Note over Utils,Starlette: CHANGED: No longer calls .read() <br/>to buffer full content into memory
Utils-->>API: Return MultipartRequest (contains UploadFile handles)
alt NEW: Streaming Path (Type is "file" or "notebook")
API->>FS: NEW: stream_create_file(path, name, upload_file)
FS->>FS: _validate_create_name()
FS->>Disk: Create unique temporary file (path/name.part)
loop Every 1 MiB Chunk
FS->>Starlette: CHANGED: read(1024 * 1024)
Starlette-->>FS: bytes
FS->>Disk: write chunk to .part file
end
alt Success
FS->>Disk: NEW: Atomic rename (.part to final name)
FS->>FS: get_details(final_path)
FS-->>API: Return FileInfo
else Failure (e.g. Disconnect/IO Error)
FS->>Disk: NEW: unlink/cleanup .part file
FS-->>API: Raise Exception
end
else Traditional Path (Directory or Template)
API->>FS: create_file_or_directory(..., contents=None)
FS->>Disk: mkdir or write default template
FS-->>API: Return FileInfo
end
API-->>Client: 200 OK (FileCreateResponse)
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
977f8ec to
2612cf9
Compare
Contributor
There was a problem hiding this comment.
Pull request overview
This PR improves file upload handling for POST /api/files/create by enabling streaming writes to disk (instead of buffering entire uploads in memory), and updates multipart parsing to return un-read UploadFile handles so callers can stream.
Changes:
- Added
OSFileSystem.stream_create_file()to drain an async byte source into a.parttemp file and atomically rename on success. - Updated
parse_multipart_request()to return rawUploadFileobjects (not pre-read bytes) and adjusted the/api/files/createendpoint to stream uploads. - Updated tests and OpenAPI docs to reflect the new multipart/file-upload behavior.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/_server/files/test_os_file_system.py | Adds async tests covering chunked streaming, unique names, traversal rejection, and cleanup on failure. |
| tests/_server/api/test_api_utils.py | Updates multipart parsing test to expect UploadFile handles and reads bytes in-endpoint. |
| marimo/_server/files/os_file_system.py | Introduces streaming upload write path + shared name validation. |
| marimo/_server/api/utils.py | Changes multipart parsing to return unconsumed UploadFile objects for streaming. |
| marimo/_server/api/endpoints/file_explorer.py | Switches /create to stream uploads when provided. |
| marimo/_server/models/files.py | Clarifies FileCreateMultipartRequest is schema-only and explains runtime behavior. |
| packages/openapi/api.yaml | Regenerates/adjusts OpenAPI descriptions (including multipart schema clarification). |
| packages/openapi/src/api.ts | Regenerates/adjusts TS OpenAPI typings/doc comments for multipart clarification/formatting. |
2612cf9 to
a3ee101
Compare
Previously /api/files/create read the entire UploadFile into memory before writing to disk, peaking at ~100 MB for a 100 MB upload. Now we drain the upload in 1 MiB chunks straight to a `.part` temp file and atomically rename on success, so peak memory stays bounded regardless of file size and a failed upload never leaves a half-written file at the final path. - `OSFileSystem.stream_create_file` does the chunked write with atomic rename and cleanup on failure - `parse_multipart_request` now hands callers un-read `UploadFile` handles (instead of pre-read bytes), so streaming is possible without giving up the small-payload `.read()` path - Directories and the default-template notebook still go through the in-memory `create_file_or_directory` path; only real file content streams
a3ee101 to
54ba3f1
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Previously /api/files/create read the entire UploadFile into memory
before writing to disk, peaking at ~100 MB for a 100 MB upload. Now we
drain the upload in 1 MiB chunks straight to a
.parttemp file andatomically rename on success, so peak memory stays bounded regardless
of file size and a failed upload never leaves a half-written file at
the final path.
OSFileSystem.stream_create_filedoes the chunked write with atomicrename and cleanup on failure
parse_multipart_requestnow hands callers un-readUploadFilehandles (instead of pre-read bytes), so streaming is possible without
giving up the small-payload
.read()pathin-memory
create_file_or_directorypath; only real file contentstreams