Write on-disk read-only mirror of virtual datasets by fm3 · Pull Request #9512 · scalableminds/webknossos

fm3 · 2026-04-23T13:03:43Z

Steps to test:

Enable this in the config (datastore.writeVirtualDatasetsMirror=true)
Upload a dataset (creates virtual dataset)
mirror should be created.
as superuser send request POST /api/datasets/writeAllMirrors, should create mirrors for all virtual datasets
readme in mirror dir should adequately describe the situation

TODOs:

Make this opt-in
trigger this similar to realpath scan, see Scan realpaths for virtual datasets #9518
- for existing, use ticker? or supseruser-only route?
handle lacking write access
add readme note that this is read-only

Issues:

Added changelog entry (create a $PR_NUMBER.md file in unreleased_changes or use ./tools/create-changelog-entry.py)
Removed dev-only changes like prints and application.conf edits
Considered common edge cases
Needs datastore update after deployment

coderabbitai · 2026-04-23T13:03:50Z

Warning

Rate limit exceeded

@fm3 has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 16 minutes and 24 seconds before requesting another review.

To continue reviewing without waiting, purchase usage credits in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a01e29ea-fdea-4b08-8602-a5c86782ad43

📥 Commits

Reviewing files that changed from the base of the PR and between 89dac10 and 290f7ca.

📒 Files selected for processing (1)

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceMirrorService.scala

📝 Walkthrough

Walkthrough

This PR introduces functionality to create on-disk mirrors of virtual, database-backed datasets. A new mirror-writing service is added to the datastore, integrated with dataset update and upload workflows, and exposed via new controller endpoints. Configuration flags control the feature across both applications.

Changes

Virtual Dataset Mirroring

Layer / File(s)	Summary
Configuration & Infrastructure `conf/application.conf`, `webknossos-datastore/conf/standalone-datastore.conf`, `webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreConfig.scala`	New `writeVirtualDatasetsMirror` boolean flag added to datastore configuration in both main application and standalone datastore configs.
Core Mirror Service `webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceMirrorService.scala`	New service creates and maintains on-disk filesystem mirrors for data sources. Writes per-layer subdirectories, symbolic links for magnetic data and attachments, relativized properties JSON, and readme metadata.
Datastore Endpoint & Routing `webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala`, `webknossos-datastore/conf/datastore.latest.routes`	DataSourceController gains `writeMirror(failOnError: Boolean)` action that accepts a sequence of dataset IDs. New POST `/datasets/writeMirror` route wired to the controller.
Client-side Integration `app/models/dataset/WKRemoteDataStoreClient.scala`, `app/models/dataset/DatasetService.scala`	WKRemoteDataStoreClient adds `writeMirror()` method to invoke the remote datastore endpoint. DatasetService adds `writeMirrorForVirtual()` to conditionally trigger mirroring for virtual and usable datasets.
Main Controller Wiring `app/controllers/DatasetController.scala`, `app/controllers/WKRemoteDataStoreController.scala`	DatasetController constructor extended with `rpc` dependency. Mirror-writing calls integrated into `updatePartial()`, `finishMagUploadToPath()`, and `finishUploadToPaths()` workflows. WKRemoteDataStoreController adds mirror-writing to non-conversion dataset upload path.
Public Endpoints & Routing `conf/webknossos.latest.routes`	Two new POST routes added: `/datasets/:datasetId/writeMirror` and `/datasets/writeAllMirrors` mapped to DatasetController actions.
Documentation `unreleased_changes/9512.md`	Release notes document the new `datastore.writeVirtualDatasetsMirror` configuration option.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

scalableminds/webknossos#9518: Modifies the same dataset update/upload flows in DatasetController to add post-update operations, creating a potential conflict or sequencing dependency.

Suggested labels

enhancement

Suggested reviewers

MichaelBuessemeyer

Poem

🐰 A rabbit hops through disk and tree,
Mirroring datasets wild and free,
With symlinks soft and paths made right,
Virtual worlds now shine so bright! ✨

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description check	✅ Passed	The description provides clear testing steps, documents completed TODOs, references linked issues, and includes deployment notes related to the changeset.
Linked Issues check	✅ Passed	The PR implements the core requirement from `#9449`: creating on-disk mirrors of virtual datasets with automatic triggering and explicit endpoints for management.
Out of Scope Changes check	✅ Passed	All changes are directly related to implementing virtual dataset mirroring: configuration options, controllers, services, routes, and documentation are all in scope.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Title check	✅ Passed	The title accurately describes the main change: introducing a feature to write on-disk mirrors for virtual datasets, which is the primary objective reflected across all modified files.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch mirror-virtual

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@app/controllers/DatasetController.scala`:
- Around line 484-485: The call to datasetService.writeMirrorForVirtual inside
the request flow (followed by
analyticsService.track(ChangeDatasetSettingsEvent(...))) must not make the whole
request fail if mirroring errors; change these paths (the writeMirrorForVirtual
invocations used after the primary commit) to run mirror generation
asynchronously or guarded with error handling so failures are log-and-continue.
Specifically, wrap or dispatch
datasetService.writeMirrorForVirtual(updated)(GlobalAccessContext) so any thrown
exceptions are caught or converted to a non-fatal Future recovery and logged
(use the existing logger/analytics for context) instead of propagating, allowing
the primary mutation and analyticsService.track(ChangeDatasetSettingsEvent(...))
to complete; apply the same change to the other occurrences of
writeMirrorForVirtual in this file.

In `@app/models/dataset/DatasetService.scala`:
- Around line 705-711: The helper writeMirrorForVirtual currently calls
client.writeMirror(Seq(dataset._id), failOnError = true) which causes
mirror-write failures to bubble up and abort upload-report flows; change this
call to use failOnError = false so mirror write errors are advisory and don't
fail reportDatasetUpload, and ensure any errors are logged locally inside
writeMirrorForVirtual (do not change the superuser writeAllMirrors flow where
hard failures should remain). Locate writeMirrorForVirtual and the
client.writeMirror invocation to make the adjustment and add a local log on
error instead of propagating the exception.

In
`@webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala`:
- Around line 756-764: writeMirrorForDataset currently collapses
datasetCache.getById(datasetId) failures by calling dataSourceBox.toOption and
Fox.runOptional, so lookup errors are silently ignored; instead surface the Box
as a Fox (do not call toOption) so failures propagate or are counted: replace
the Fox.runOptional(dataSourceBox.toOption) branch with code that converts the
Box into a Fox (e.g. use the library helper to lift a Box into a Fox or
pattern-match on dataSourceBox and turn Failure/Empty into a failing Fox) and
then call dataSourceMirrorService.writeMirror(dataSource, datasetId) ?~> s"Error
writing datasource mirror for $datasetId" so real lookup failures from
datasetCache.getById and errors from writeMirror are not swallowed.

In
`@webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceMirrorService.scala`:
- Around line 64-72: The current implementation in writeMirrorMag (and the
analogous writeMirrorAttachment around lines 98-107) creates symlinks to the
original files which allows writes through the mirror if the caller has
permission; replace the symlink approach with a true read-only mirror: create an
actual copy of the target into layerDir (e.g. use Files.copy from
explicitPath.toLocalPathUnsafe to defaultMagPath), then explicitly set the
copied file's permissions to be read-only for all principals (and where
available set an immutable flag via platform-specific APIs), and update the
returned MagLocator path to point to the copied read-only file; ensure the same
logic is applied in the writeMirrorAttachment routine so mirrored attachments
cannot be modified via the mirror.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: cd7adbc8-fc0f-4fbe-8f9d-e7e419e5f645

📥 Commits

Reviewing files that changed from the base of the PR and between 409f236 and 89dac10.

📒 Files selected for processing (12)

app/controllers/DatasetController.scala
app/controllers/WKRemoteDataStoreController.scala
app/models/dataset/DatasetService.scala
app/models/dataset/WKRemoteDataStoreClient.scala
conf/application.conf
conf/webknossos.latest.routes
unreleased_changes/9512.md
webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreConfig.scala
webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceMirrorService.scala
webknossos-datastore/conf/datastore.latest.routes
webknossos-datastore/conf/standalone-datastore.conf

MichaelBuessemeyer

Neat, thanks for implementing this 🎉
Mainly found messages not present in the conf/messages file.
Testing is open TODO

MichaelBuessemeyer · 2026-05-13T12:22:42Z

                                                           request.body.attachmentRenamings.getOrElse(Seq.empty)))
        updated <- datasetDAO.findOne(datasetId)
        _ <- datasetService.scanRealpathsIfVirtual(updated)
+        _ <- datasetService.writeMirrorForVirtual(updated)(GlobalAccessContext)


I'd say to also include this in the update function below

MichaelBuessemeyer · 2026-05-13T12:48:09Z

+        _ <- Fox.assertTrue(datasetService.isEditableBy(dataset, Some(request.identity))) ?~> "notAllowed" ~> FORBIDDEN
+        _ <- Fox.fromBool(dataset.isVirtual) ?~> "dataset.writeMirror.onlyForVirtual"
+        _ <- Fox.fromBool(dataset.isUsable) ?~> "dataset.writeMirror.onlyForUsable"
+        _ <- datasetService.writeMirrorForVirtual(dataset) ?~> "dataset.writeMirror.failed"


The messages are missing in messages.conf

MichaelBuessemeyer · 2026-05-13T12:59:51Z

 DELETE        /datasets/deletePaths                                                                                                                   @com.scalableminds.webknossos.datastore.controllers.DataSourceController.deletePaths()
 POST          /datasets/exploreRemote                                                                                                                 @com.scalableminds.webknossos.datastore.controllers.DataSourceController.exploreRemoteDataset()
 POST          /datasets/validatePaths                                                                                                                 @com.scalableminds.webknossos.datastore.controllers.DataSourceController.validatePaths()
+POST          /datasets/writeMirror                                                                                                                   @com.scalableminds.webknossos.datastore.controllers.DataSourceController.writeMirror(failOnError: Boolean)


maybe name the route and the handler function writeMirrors instead as it expects a list of dataset ids 🤔

MichaelBuessemeyer · 2026-05-13T14:48:48Z

+        _ <- ensureMirrorParent(mirrorDir)
+        _ <- Fox.runIf(Files.exists(tempMirrorDir)) {
+          tryo(FileUtils.deleteDirectory(tempMirrorDir.toFile)).toFox
+        } ?~> "dataset.writeMirror.deleteStaleTempMirrorFailed"


the messages in this file also need to be added to conf/messages

MichaelBuessemeyer

Testing session results:

I am honestly confused: Why is this feature needed? Because when I upload a dataset it automatically is added to the binaryData folder of the organization and not in the .virtual folder or so. Only remote datasets seem to not get any on disk representation but the also do not get a mirror (which makes sense). So it this feature "just" for datasets which are composed up of other datasets that are all available on the local file system (are not remote)?

Anyway: Testing went well for these dataset, although dataset which are already in the orga's binaryData folder also get a mirror which is ok I guess, but a bit of a duplicate 🙈

WIP Write on-disk read-only mirror of virtual datasets

db1729a

fm3 self-assigned this Apr 23, 2026

fm3 added backend new feature labels Apr 23, 2026

fm3 and others added 14 commits April 23, 2026 15:35

write mag path

78629b6

Make this opt-in

e707cbe

add readme, call this on dataset upload

d61204e

Merge branch 'master' into mirror-virtual

aea1f12

Merge branch 'master' into mirror-virtual

6d04fc0

Merge branch 'master' into mirror-virtual

64e641f

Merge branch 'master' into mirror-virtual

38c0961

Trigger route for writing mirror. Also do it on dataset mutations

d6ef958

add error messages

b516f81

changelog

de715cf

fixes

f37b48c

Fixes

96d10c2

improve logging

d098774

set to false by default

89dac10

fm3 marked this pull request as ready for review May 6, 2026 13:14

fm3 changed the title ~~WIP Write on-disk read-only mirror of virtual datasets~~ Write on-disk read-only mirror of virtual datasets May 6, 2026

fm3 changed the title ~~Write on-disk read-only mirror of virtual datasets~~ Write on-disk read-only mirror of virtual datasets May 6, 2026

coderabbitai Bot reviewed May 6, 2026

View reviewed changes

create in tmp .new dir first

290f7ca

fm3 requested a review from MichaelBuessemeyer May 6, 2026 14:14

MichaelBuessemeyer reviewed May 13, 2026

View reviewed changes

MichaelBuessemeyer reviewed May 14, 2026

View reviewed changes

Conversation

fm3 commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Steps to test:

TODOs:

Issues:

Uh oh!

coderabbitai Bot commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MichaelBuessemeyer left a comment

Choose a reason for hiding this comment

Uh oh!

MichaelBuessemeyer May 13, 2026

Choose a reason for hiding this comment

Uh oh!

MichaelBuessemeyer May 13, 2026

Choose a reason for hiding this comment

Uh oh!

MichaelBuessemeyer May 13, 2026

Choose a reason for hiding this comment

Uh oh!

MichaelBuessemeyer May 13, 2026

Choose a reason for hiding this comment

Uh oh!

MichaelBuessemeyer left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fm3 commented Apr 23, 2026 •

edited

Loading

coderabbitai Bot commented Apr 23, 2026 •

edited

Loading