Skip to content

Fix performance when checking dataset version file count#12494

Open
stevenwinship wants to merge 2 commits into
developfrom
performance-fix-for-dataset-version-has-files
Open

Fix performance when checking dataset version file count#12494
stevenwinship wants to merge 2 commits into
developfrom
performance-fix-for-dataset-version-has-files

Conversation

@stevenwinship

Copy link
Copy Markdown
Contributor

What this PR does / why we need it: Performance issues with sql call
#12284

Which issue(s) this PR closes:

  • Closes #

Special notes for your reviewer:

Suggestions on how to test this:

Does this PR introduce a user interface change? If mockups are available, please link/include them here:

Is there a release notes update needed for this change?:

Additional documentation:

@stevenwinship stevenwinship self-assigned this Jun 29, 2026
@stevenwinship stevenwinship moved this to In Progress 💻 in IQSS Dataverse Project Jun 29, 2026
@stevenwinship stevenwinship added this to the 6.11 milestone Jun 29, 2026
@stevenwinship stevenwinship added Size: 10 A percentage of a sprint. 7 hours. Feature: Performance & Stability labels Jun 29, 2026
@coveralls

coveralls commented Jun 29, 2026

Copy link
Copy Markdown

Coverage Status

Coverage is 25.022%performance-fix-for-dataset-version-has-files into develop. No base build found for develop.

landreev
landreev previously approved these changes Jun 29, 2026

@landreev landreev left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good; extra logic for guest user helps.

@github-project-automation github-project-automation Bot moved this from In Progress 💻 to Ready for QA ⏩ in IQSS Dataverse Project Jun 29, 2026
@github-actions

This comment has been minimized.

@stevenwinship stevenwinship removed their assignment Jun 29, 2026
@@ -257,15 +260,17 @@ public boolean canIssueDeleteDatasetCommand(DvObject dvo){
// PUBLISH DATASET
public boolean canIssuePublishDatasetCommand(DvObject dvo){

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are in the permissionsWrapper class - shouldn't the results also be cached for the life of the bean?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would we cache the results? It would change when a file upload completes

@qqmyers qqmyers Jun 29, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If caching isn't appropriate, should the method be in the PermissionServiceBean instead? The *Wrapper classes are viewscoped and should just have cached info (probably not enforced). That would make the call accessable via API as well.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, we want everything that is called from JSF rendered="#{...}" attributes cached (because these are evaluated 7 times). Probably a good idea even when there is no reason to expect the task to be expensive.

In this specific case, I am actually not sure off the top of my head about the "would change when a file upload completes"... PermissionsWrapper is @ViewScoped. Does that mean that it will be a new "View" when the page re-renders itself after an upload completes? Or the same one?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be easy to test though.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point - I'm not sure w/o digging - @viewScoped would stay as long as we're just doing ajax updates? We also cache in the DatasetPage at times - that would be a place where the cache could be emptied when a file upload completes.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looked into it some more

  1. From a quick test, caching that info within a View does NOT prevent the page from properly updating itself once one or more files has been uploaded (that re-rendering apparently counts as a new View).

  2. Yes, we DO want to cache this (on top of caching the result of permissionService...canIssue(command) that we are already doing there). JSF is calling these 2 methods over and over incessantly. Having that NativeQuery executed every time feels wrong.

  3. I mentioned in the perf. issue that "The query count goes from 9K+ down to 813 ... This is ~100 more than it has been consistently between the last few releases, but ..." That was from a test of the first iteration of this PR, that was not checking for || u instanceof GuestUser. The extra 100 queries were solely from the native query and whatever it instantiated in the datasetVersionService.hasFiles() performed on the 10 underlying datasets. Re-testing the PR in its current state brought that query count down to 715. (it was 714 in 6.10.1).
    It will be worse for the dataset page and a logged in browser user.

... So yes, we want to cache.
We'll wrap it up tomorrow.

@@ -257,15 +260,17 @@ public boolean canIssueDeleteDatasetCommand(DvObject dvo){
// PUBLISH DATASET
public boolean canIssuePublishDatasetCommand(DvObject dvo){
User u = session.getUser();

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't getUser return a GuestUser? i.e. getAuthenticatedUser() is needed for the check?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@github-actions

Copy link
Copy Markdown

📦 Pushed preview images as

ghcr.io/gdcc/dataverse:performance-fix-for-dataset-version-has-files
ghcr.io/gdcc/configbaker:performance-fix-for-dataset-version-has-files

🚢 See on GHCR. Use by referencing with full name as printed above, mind the registry name.

@landreev landreev dismissed their stale review June 29, 2026 21:29

(I meant to add that as a comment; it would be great if somebody else approved it)

@github-actions

Copy link
Copy Markdown

Test Results

403 tests   388 ✅  32m 16s ⏱️
 55 suites   15 💤
 55 files      0 ❌

Results for commit 644795a.

@stevenwinship stevenwinship moved this from Ready for QA ⏩ to Ready for Review ⏩ in IQSS Dataverse Project Jun 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Feature: Performance & Stability Size: 10 A percentage of a sprint. 7 hours.

Projects

Status: Ready for Review ⏩

Development

Successfully merging this pull request may close these issues.

4 participants