Skip to content

[Bug]: Download of big amount of data object fails #1783

@fashxp

Description

@fashxp

Studio API version

2025.4

Steps to reproduce

Open data object grid of cars folder in demo and download all cars as CSV

Image

try to download result

Image

Actual Behavior

for me it tries to download a json file (even though it should be a csv) and fails...

Expected Behavior

Download should be possible...

When I read the code in DownloadService correctly, we create a streamed response and then clean up the data even before the response is finally sent...

I stubbled at this with playwright frontend testing. Following also the research of claude:

 Why the data-object CSV download gets canceled                                                          
  
  The code path is identical for both asset and data-object CSV exports — same controller                 
  (/api/export/download/csv/{jobRunId} at                
  studio-backend-bundle/src/Export/Controller/Csv/DownloadController.php:70), same service method         
  (DownloadService::downloadResourceByJobRunId). So the bug isn't in "which code runs"; it's in a race
  inside that shared code.

  The race:
  // src/Export/Service/DownloadService.php:47-80
  $streamedResponse = $this->getFileStreamedResponse(...)   // opens the stream (lazy)
  $storage->delete($filePath);                               // ← deletes file NOW                        
  $this->storageService->cleanUpFolder($folderName);         // ← deletes folder NOW
  $this->executionEngineService->hideJobRun($jobRunId);      // hide jobRun                               
  return $streamedResponse;                                   // only NOW does the body stream            
                                                                                                          
  getFileStreamedResponse (in StreamedResponseTrait.php:103) calls $storage->readStream($path) and returns
   a StreamedResponse(fn() => fpassthru($stream), …). The fpassthru only runs when Symfony calls ->send() 
  on the response — after the controller returned, i.e. after delete + cleanUpFolder have already run.
                                                                                                          
  On a local Flysystem filesystem this usually works (POSIX semantics keep the inode alive while a handle 
  is open), but Flysystem does not guarantee that across adapters/streams. If readStream returns a wrapper
   that re-reads from storage lazily, or if the underlying cleanUpFolder is recursive enough to invalidate
   the stream, the transfer gets truncated/aborted mid-response. Chromium reports that as
  download.failure() = 'canceled'.

  Why asset side works, data-object side fails (my best theory): Same bug, different file sizes. Asset    
  CSVs in the test are small (~10 rows, default columns) so the body finishes streaming before
  delete/cleanUpFolder has any effect. The data-object test (test 6 in 040) deliberately adds every       
  available leaf column (easily dozens), then exports 10 rows × many wide cells — the stream takes long
  enough that the cleanup races ahead of the read.

  Why manual works for you: Same reason — a real browser on the host drains the response in one continuous
   pipe and usually beats the cleanup, while the Playwright browser (same binary, but possibly different
  network buffering and a busier event loop while Playwright probes state) is just slightly slower, which 
  is enough to lose the race.                            

  Proper fix (in the backend, not the test): swap the order in DownloadService::downloadResourceByJobRunId
   — read the stream into memory / finish transmission before running $storage->delete, cleanUpFolder, and
   hideJobRun. In Symfony you do that by moving the cleanup into a $response->on('terminate') hook or a   
  kernel.terminate listener, so it runs after the body has been sent.

  I left the test's .csv assertion intact so the race stays visible as a failing test.                    
  

Also check for other similar architecture!

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions