Write on-disk read-only mirror of virtual datasets#9512
Conversation
|
Warning Rate limit exceeded
To continue reviewing without waiting, purchase usage credits in the billing tab. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThis PR introduces functionality to create on-disk mirrors of virtual, database-backed datasets. A new mirror-writing service is added to the datastore, integrated with dataset update and upload workflows, and exposed via new controller endpoints. Configuration flags control the feature across both applications. ChangesVirtual Dataset Mirroring
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@app/controllers/DatasetController.scala`:
- Around line 484-485: The call to datasetService.writeMirrorForVirtual inside
the request flow (followed by
analyticsService.track(ChangeDatasetSettingsEvent(...))) must not make the whole
request fail if mirroring errors; change these paths (the writeMirrorForVirtual
invocations used after the primary commit) to run mirror generation
asynchronously or guarded with error handling so failures are log-and-continue.
Specifically, wrap or dispatch
datasetService.writeMirrorForVirtual(updated)(GlobalAccessContext) so any thrown
exceptions are caught or converted to a non-fatal Future recovery and logged
(use the existing logger/analytics for context) instead of propagating, allowing
the primary mutation and analyticsService.track(ChangeDatasetSettingsEvent(...))
to complete; apply the same change to the other occurrences of
writeMirrorForVirtual in this file.
In `@app/models/dataset/DatasetService.scala`:
- Around line 705-711: The helper writeMirrorForVirtual currently calls
client.writeMirror(Seq(dataset._id), failOnError = true) which causes
mirror-write failures to bubble up and abort upload-report flows; change this
call to use failOnError = false so mirror write errors are advisory and don't
fail reportDatasetUpload, and ensure any errors are logged locally inside
writeMirrorForVirtual (do not change the superuser writeAllMirrors flow where
hard failures should remain). Locate writeMirrorForVirtual and the
client.writeMirror invocation to make the adjustment and add a local log on
error instead of propagating the exception.
In
`@webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala`:
- Around line 756-764: writeMirrorForDataset currently collapses
datasetCache.getById(datasetId) failures by calling dataSourceBox.toOption and
Fox.runOptional, so lookup errors are silently ignored; instead surface the Box
as a Fox (do not call toOption) so failures propagate or are counted: replace
the Fox.runOptional(dataSourceBox.toOption) branch with code that converts the
Box into a Fox (e.g. use the library helper to lift a Box into a Fox or
pattern-match on dataSourceBox and turn Failure/Empty into a failing Fox) and
then call dataSourceMirrorService.writeMirror(dataSource, datasetId) ?~> s"Error
writing datasource mirror for $datasetId" so real lookup failures from
datasetCache.getById and errors from writeMirror are not swallowed.
In
`@webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceMirrorService.scala`:
- Around line 64-72: The current implementation in writeMirrorMag (and the
analogous writeMirrorAttachment around lines 98-107) creates symlinks to the
original files which allows writes through the mirror if the caller has
permission; replace the symlink approach with a true read-only mirror: create an
actual copy of the target into layerDir (e.g. use Files.copy from
explicitPath.toLocalPathUnsafe to defaultMagPath), then explicitly set the
copied file's permissions to be read-only for all principals (and where
available set an immutable flag via platform-specific APIs), and update the
returned MagLocator path to point to the copied read-only file; ensure the same
logic is applied in the writeMirrorAttachment routine so mirrored attachments
cannot be modified via the mirror.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: cd7adbc8-fc0f-4fbe-8f9d-e7e419e5f645
📒 Files selected for processing (12)
app/controllers/DatasetController.scalaapp/controllers/WKRemoteDataStoreController.scalaapp/models/dataset/DatasetService.scalaapp/models/dataset/WKRemoteDataStoreClient.scalaconf/application.confconf/webknossos.latest.routesunreleased_changes/9512.mdwebknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreConfig.scalawebknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scalawebknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceMirrorService.scalawebknossos-datastore/conf/datastore.latest.routeswebknossos-datastore/conf/standalone-datastore.conf
MichaelBuessemeyer
left a comment
There was a problem hiding this comment.
Neat, thanks for implementing this 🎉
Mainly found messages not present in the conf/messages file.
Testing is open TODO
| request.body.attachmentRenamings.getOrElse(Seq.empty))) | ||
| updated <- datasetDAO.findOne(datasetId) | ||
| _ <- datasetService.scanRealpathsIfVirtual(updated) | ||
| _ <- datasetService.writeMirrorForVirtual(updated)(GlobalAccessContext) |
There was a problem hiding this comment.
I'd say to also include this in the update function below
| _ <- Fox.assertTrue(datasetService.isEditableBy(dataset, Some(request.identity))) ?~> "notAllowed" ~> FORBIDDEN | ||
| _ <- Fox.fromBool(dataset.isVirtual) ?~> "dataset.writeMirror.onlyForVirtual" | ||
| _ <- Fox.fromBool(dataset.isUsable) ?~> "dataset.writeMirror.onlyForUsable" | ||
| _ <- datasetService.writeMirrorForVirtual(dataset) ?~> "dataset.writeMirror.failed" |
There was a problem hiding this comment.
The messages are missing in messages.conf
| DELETE /datasets/deletePaths @com.scalableminds.webknossos.datastore.controllers.DataSourceController.deletePaths() | ||
| POST /datasets/exploreRemote @com.scalableminds.webknossos.datastore.controllers.DataSourceController.exploreRemoteDataset() | ||
| POST /datasets/validatePaths @com.scalableminds.webknossos.datastore.controllers.DataSourceController.validatePaths() | ||
| POST /datasets/writeMirror @com.scalableminds.webknossos.datastore.controllers.DataSourceController.writeMirror(failOnError: Boolean) |
There was a problem hiding this comment.
maybe name the route and the handler function writeMirrors instead as it expects a list of dataset ids 🤔
| _ <- ensureMirrorParent(mirrorDir) | ||
| _ <- Fox.runIf(Files.exists(tempMirrorDir)) { | ||
| tryo(FileUtils.deleteDirectory(tempMirrorDir.toFile)).toFox | ||
| } ?~> "dataset.writeMirror.deleteStaleTempMirrorFailed" |
There was a problem hiding this comment.
the messages in this file also need to be added to conf/messages
MichaelBuessemeyer
left a comment
There was a problem hiding this comment.
Testing session results:
I am honestly confused: Why is this feature needed? Because when I upload a dataset it automatically is added to the binaryData folder of the organization and not in the .virtual folder or so. Only remote datasets seem to not get any on disk representation but the also do not get a mirror (which makes sense). So it this feature "just" for datasets which are composed up of other datasets that are all available on the local file system (are not remote)?
Anyway: Testing went well for these dataset, although dataset which are already in the orga's binaryData folder also get a mirror which is ok I guess, but a bit of a duplicate 🙈
Steps to test:
datastore.writeVirtualDatasetsMirror=true)POST /api/datasets/writeAllMirrors, should create mirrors for all virtual datasetsTODOs:
Issues:
$PR_NUMBER.mdfile inunreleased_changesor use./tools/create-changelog-entry.py)