fix: When generating a problem, the generated data is incorrect#2408
Merged
shaohuzhang1 merged 1 commit intomainfrom Feb 26, 2025
Merged
fix: When generating a problem, the generated data is incorrect#2408shaohuzhang1 merged 1 commit intomainfrom
shaohuzhang1 merged 1 commit intomainfrom
Conversation
shaohuzhang1
commented
Feb 26, 2025
| TaskType.GENERATE_PROBLEM, | ||
| State.PENDING) | ||
| ListenerManagement.get_aggregation_document_status_by_query_set( | ||
| QuerySet(Document).filter(id__in=document_id_list))() |
Contributor
Author
There was a problem hiding this comment.
Here's a list of improvements and optimizations for your code:
-
Consistent Keyword Usage: Use consistent keyword usage (e.g., use
asinstead of comma separation when importing modules). -
Readability Improvements:
- Add comments to explain complex operations and the purpose behind each line.
-
Optimization Suggestions:
- Consider caching expensive query sets where possible to improve performance.
- Ensure that filters like
task_type_status__in=state_listanddocument_id__in=document_id_listare used efficiently to reduce database load.
-
Correctness Checks:
- Verify if there is no error handling on potentially failed queryset operations or aggregation queries.
-
Code Structure:
- Ensure proper indentation and spacing throughout the code for better readability.
from collections import defaultdict
# Assuming necessary imports are here
def batch_generate_related(instance: dict, with_valid=True) -> Tuple[List[int], List[Document]]:
"""
Generate related content based on instance data.
Args:
instance: A dictionary containing relevant information about generation tasks.
with_valid: Flag to include only valid documents in the resulting collection.
Returns:
A tuple of two lists: [related_task_ids] and [Documents].
Raises:
ValueError: If something goes wrong during aggregation.
"""
# Fetch task IDs for pending embedding-related tasks
pending_embedding_tasks = (
QuerySet(Task).annotate(
document_id=F('document'))
.filter(task_type_status=State.PENDING,
task_type_name=TaskType.EMBEDDING)
.select_related("document")
)
# Fetch aggregated status for these documents
doc_statuses = \
ListenerManagement.get_aggregation_document_status_by_query_set(pending_embedding_tasks())()
# Group documents by ID and sort them by updated time (or another suitable criterion)
grouped_documents = defaultdict(list)
for doc_status in doc_statuses:
if with_valid and not _is_document_valid(doc_status['status']):
continue
grouped_documents[group_doc_status['document']].append(group_doc_status)
# Filter out invalid instances from grouped_documents
filtered_docs = {doc_id: docs for doc_id, docs in grouped_documents.items() if all(_is_instance_valid(doc) for doc in docs)}
# Collect final task IDs from filtered documents
task_ids = []
for _, docs in filtered_docs.items():
task_ids.extend([dt.task.id for dt in docs[:1]]) # Assume we want just one task per document
return task_ids, list(filtered_docs.keys())
def _is_document_valid(status):
"""Check if document validation condition is met."""
pass # Implement actual logic
def _is_instance_valid(doc):
"""Check if instance validity condition is satisfied."""
pass # Implement actual logicThis version enhances clarity, structure, and efficiency while maintaining consistency with existing practices.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
fix: When generating a problem, the generated data is incorrect