Skip to content

feat: asynchronous checks#62

Open
carlinmack wants to merge 8 commits into
inveniosoftware:masterfrom
carlinmack:async-checks
Open

feat: asynchronous checks#62
carlinmack wants to merge 8 commits into
inveniosoftware:masterfrom
carlinmack:async-checks

Conversation

@carlinmack
Copy link
Copy Markdown
Contributor

@carlinmack carlinmack commented Apr 27, 2026

❤️ Thank you for your contribution!

Description

Please describe briefly your pull request.

Celery tasks are used to execute the check, however we do not want to duplicate the logic of executing a task, and so in the task context the asynchronous task is run synchronously. A visual explanation of this is shown below (see run_check(..., sync=true)).

image

Checklist

Ticks in all boxes and 🟢 on all GitHub actions status checks are required to merge:

Frontend

Reminder

By using GitHub, you have already agreed to the GitHub’s Terms of Service including that:

  1. You license your contribution under the same terms as the current repository’s license.
  2. You agree that you have the right to license your contribution under the current repository’s license.

@carlinmack carlinmack marked this pull request as ready for review May 6, 2026 09:24
@carlinmack carlinmack moved this from In progress to In review 🔍 in Sprint Q2 2026 ☀️ May 6, 2026
@carlinmack carlinmack removed their assignment May 6, 2026
Comment thread invenio_checks/api.py Outdated
"""Add a rule result and update the overall success."""
self.rule_results.append(rule_result)
if not rule_result.success and rule_result.level == "failure":
if rule_result.success and rule_result.level == "failure":
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the change?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you read the statement it's

if not X and ...:
     X = false

so it's would always be a no-op. In the DB all checks have a success of true regardless of it the checks are passing

Copy link
Copy Markdown
Contributor

@ntarocco ntarocco May 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for coming back to this. The condition was checking rule_result.success, and then setting self.success, so not changing the same var.
It looked more correct before, it feels wrong checking for rule_result.success True and level Failure.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay I will double check

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay great catch nico! there is an error with this line but it's

Suggested change
if rule_result.success and rule_result.level == "failure":
if not rule_result.success and rule_result.level == "error":

Comment thread invenio_checks/tasks.py
Comment on lines +56 to +61
db.session.commit()

with UnitOfWork() as uow:
# as we are in the task, run the check synchronously
ChecksAPI.run_check(config, record, uow, sync=True)
uow.commit()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we mixingg manual commits with UnitOfWork?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good question. I was imagining that the uow would automatically commit when exiting the scope but this is not the behaviour. @slint what's the best practise here?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It actually surprisingly doesn't commit, so it has to be explicit... there's a similar behavior with SQLAlchemy's session so I think this where this design is coming from.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could pass an optional uow to the run_check APIs, and treat that class as a service. The decorator will take care of the commit.

Comment thread invenio_checks/tasks.py Outdated
Comment thread invenio_checks/api.py
Comment thread invenio_checks/api.py
Comment thread invenio_checks/api.py
return result_run

@classmethod
def run_check(cls, config, record, uow, is_draft=None, sync=False):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I think run_check is mixing execution and persistence, which can causes a problem in the async flow. The task recieves a check_run_id, but then run_check queries by config_id, record_id and is_draft instead of using that id, so it may update or overwrite the wrong row if multiple runs exist. I think it would be safer to separate concerns so the execution just returns a result, and the task updates the existing CheckRun directly by id.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shouldn't be possible for there to be multiple CheckRuns as they are only created in _create_or_update_check_run which handles this. I don't think I need to make any other changes from your comment.

For clarity, refer to the diagram in the original message in this PR and note if you think there's anything wrong with when the status is being updated

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern is that it feels brittle that the task receives a check_run_id, but run_check(sync=True) then re-queries instead of updating that specific row directly, since the async task already have owns the CheckRun conceptually.

I was thinking it might be cleaner if run_check just returned the result/state, and the task then updated the existing CheckRun directly by id.

Copy link
Copy Markdown
Contributor Author

@carlinmack carlinmack May 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we please refer to the diagram so we're on the same page :)

image

If run_check (in api.py) just returned the result state, then the component would have to update the db (which is not correct).

What you're suggesting I think would to run all checks via a task, so that the task is the only thing updating the DB, however Alex already said that this is not correct and that instead we should consider creating a current_checks_api or a service (which I haven't proceeded with as it requires more design thinking about how we want the service/proxy to work)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea referring to the diagram 😄 What I meant isn’t that the return value from api.py should change, but that in the async execution flow the task already has the check_run_id, so it could update/save that specific CheckRun directly instead of run_check(sync=True) re-querying it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but then we would need to make the params of run_check fully optional as you can either call it with (config, record) or (check_run_id) with the additional implied checking of params



class AsyncCheck(Check):
"""Example of an async check."""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this code that we need to change later?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll drop this commit before we merge, it's just to have something to test with and give an example of how to configure it

Comment thread invenio_checks/tasks.py
Comment on lines +56 to +61
db.session.commit()

with UnitOfWork() as uow:
# as we are in the task, run the check synchronously
ChecksAPI.run_check(config, record, uow, sync=True)
uow.commit()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could pass an optional uow to the run_check APIs, and treat that class as a service. The decorator will take care of the commit.

@carlinmack carlinmack moved this from In review 🔍 to In progress in Sprint Q2 2026 ☀️ May 13, 2026
@carlinmack carlinmack moved this from In progress to In review 🔍 in Sprint Q2 2026 ☀️ May 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In review 🔍

Development

Successfully merging this pull request may close these issues.

4 participants