fix: Every time a vectorized document is generated, the entire vectorized data of the document is deleted#2721
Conversation
…ized data of the document is deleted
|
Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
| VectorStore.get_embedding_vector().delete_by_document_id(document_id) | ||
|
|
||
| # 根据段落进行向量化处理 | ||
| page_desc(QuerySet(Paragraph) |
There was a problem hiding this comment.
The provided code snippet does not contain any obvious irregularities or potential issues. However, there are some general optimizations that can be made:
Potential Improvements
-
Use of Generators for Data Handling: If
QuerySetsupports generator operations, you might use generators to handle data more efficiently, especially if dealing with large datasets. -
Exception Handling: Consider adding try-except blocks around database operations and document management tasks to ensure robustness against exceptions during execution.
-
Code Clarity: Ensure that variable names and function structures are clear and concise. This makes the code easier to understand and maintain.
-
Performance Optimization: If
VectorStore's delete_by_document_id method is slow due to indexing or other optimizations, consider caching results or using batch deletion when applicable.
Here's a slightly optimized version of the relevant part of the code (assuming document_id is valid):
def is_the_task_interrupted():
try:
# Update listener status
ListenerManagement.update_status(
QuerySet(Document).filter(id=document_id),
TaskType.EMBEDDING,
State.STARTED
)
# Delete document vector data
VectorStore.get_embedding_vector().delete_by_document_id(document_id)
# According to paragraphs for vectorization processing
page_desc(QuerySet(Paragraph))
except Exception as e:
print(f"An error occurred: {e}")By implementing these suggestions, the code will be more robust, efficient, and readable. Adjustments may vary based on specific requirements and constraints within your project.
fix: Every time a vectorized document is generated, the entire vectorized data of the document is deleted