Skip to content

Revert "Retry embeddings tasks on Qdrant gRPC DEADLINE_EXCEEDED (#3520)"#3522

Closed
mbertrand wants to merge 1 commit into
mainfrom
mb/no_gpc_retry
Closed

Revert "Retry embeddings tasks on Qdrant gRPC DEADLINE_EXCEEDED (#3520)"#3522
mbertrand wants to merge 1 commit into
mainfrom
mb/no_gpc_retry

Conversation

@mbertrand

Copy link
Copy Markdown
Member

This reverts a previous commit that rertried grpc errors and raise after final retry. Reverting because the embed_run_content_files task creates a chain of generate_embeddings tasks and if one fails, the entire rest of the chain is halted. Will figure out an alternative approach later.

What are the relevant tickets?

Related to https://github.com/mitodl/hq/issues/11987

Description (What does it do?)

Just reverts the commit

How can this be tested?

Tests should pass

Copilot AI review requested due to automatic review settings June 25, 2026 10:34
@mbertrand mbertrand added the Needs Review An open Pull Request that is ready for review label Jun 25, 2026
@github-actions

Copy link
Copy Markdown

OpenAPI Changes

No changes detected

View full changelog

Unexpected changes? Ensure your branch is up-to-date with main (consider rebasing).

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR reverts prior changes that retried Qdrant gRPC DEADLINE_EXCEEDED errors in the vector-search embedding/removal Celery tasks, to avoid halting long Celery chains (notably those created by embed_run_content_files) when a single task fails.

Changes:

  • Remove gRPC-specific retry/raise behavior from generate_embeddings / remove_embeddings and revert to “swallow and log” error handling.
  • Drop the gRPC/RetryError-focused unit tests that enforced the prior retry/raise behavior.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
vector_search/tasks.py Reverts task-level exception handling from gRPC-aware retry/raise back to a catch-all “log and return” approach.
vector_search/tasks_test.py Removes tests that asserted gRPC deadline behavior and related imports/helpers.
Comments suppressed due to low confidence (1)

vector_search/tasks_test.py:37

  • With the gRPC-specific tests removed, there’s no longer any coverage for the current behavior that generate_embeddings/remove_embeddings swallow unexpected exceptions and return an error string. Adding a small regression test for the swallow-and-log behavior would help ensure future changes don’t reintroduce chain-halting failures unintentionally.
from vector_search.tasks import (
    embed_learning_resources_by_id,
    embed_new_content_files,
    embed_new_learning_resources,
    embed_run_content_files,
    embeddings_healthcheck,
    remove_run_content_files,
    remove_unpublished_run_content_files,
    start_embed_resources,
)

Comment thread vector_search/tasks.py
Comment on lines +122 to +125
except: # noqa: E722
error = "generate_embeddings threw an error"
log.exception(error)
return error
Comment thread vector_search/tasks.py
Comment on lines +151 to +154
except: # noqa: E722
error = "generate_embeddings threw an error"
log.exception(error)
return error
@shanbady shanbady self-requested a review June 25, 2026 15:17

@shanbady shanbady left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a revert. approved

@mbertrand mbertrand closed this Jun 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Needs Review An open Pull Request that is ready for review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants