Skip to content

fix: When generating a problem, the generated data is incorrect#2408

Merged
shaohuzhang1 merged 1 commit intomainfrom
pr@main@fix_generate_related
Feb 26, 2025
Merged

fix: When generating a problem, the generated data is incorrect#2408
shaohuzhang1 merged 1 commit intomainfrom
pr@main@fix_generate_related

Conversation

@shaohuzhang1
Copy link
Copy Markdown
Contributor

fix: When generating a problem, the generated data is incorrect

TaskType.GENERATE_PROBLEM,
State.PENDING)
ListenerManagement.get_aggregation_document_status_by_query_set(
QuerySet(Document).filter(id__in=document_id_list))()
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a list of improvements and optimizations for your code:

  1. Consistent Keyword Usage: Use consistent keyword usage (e.g., use as instead of comma separation when importing modules).

  2. Readability Improvements:

    • Add comments to explain complex operations and the purpose behind each line.
  3. Optimization Suggestions:

    • Consider caching expensive query sets where possible to improve performance.
    • Ensure that filters like task_type_status__in=state_list and document_id__in=document_id_list are used efficiently to reduce database load.
  4. Correctness Checks:

    • Verify if there is no error handling on potentially failed queryset operations or aggregation queries.
  5. Code Structure:

    • Ensure proper indentation and spacing throughout the code for better readability.
from collections import defaultdict

# Assuming necessary imports are here

def batch_generate_related(instance: dict, with_valid=True) -> Tuple[List[int], List[Document]]:
    """
    Generate related content based on instance data.

    Args:
        instance: A dictionary containing relevant information about generation tasks.
        with_valid: Flag to include only valid documents in the resulting collection.

    Returns:
        A tuple of two lists: [related_task_ids] and [Documents].
        
    Raises:
        ValueError: If something goes wrong during aggregation.
    """

    # Fetch task IDs for pending embedding-related tasks
    pending_embedding_tasks = (
        QuerySet(Task).annotate(
            document_id=F('document'))
        .filter(task_type_status=State.PENDING,
                task_type_name=TaskType.EMBEDDING)
        .select_related("document")
    )
    
    # Fetch aggregated status for these documents
    doc_statuses = \
        ListenerManagement.get_aggregation_document_status_by_query_set(pending_embedding_tasks())()

    # Group documents by ID and sort them by updated time (or another suitable criterion)
    grouped_documents = defaultdict(list)
    for doc_status in doc_statuses:
        if with_valid and not _is_document_valid(doc_status['status']):
            continue
        grouped_documents[group_doc_status['document']].append(group_doc_status)

    # Filter out invalid instances from grouped_documents
    filtered_docs = {doc_id: docs for doc_id, docs in grouped_documents.items() if all(_is_instance_valid(doc) for doc in docs)}

    # Collect final task IDs from filtered documents
    task_ids = []
    for _, docs in filtered_docs.items():
        task_ids.extend([dt.task.id for dt in docs[:1]])  # Assume we want just one task per document
        
    return task_ids, list(filtered_docs.keys())

def _is_document_valid(status):
    """Check if document validation condition is met."""
    pass  # Implement actual logic

def _is_instance_valid(doc):
    """Check if instance validity condition is satisfied."""
    pass  # Implement actual logic

This version enhances clarity, structure, and efficiency while maintaining consistency with existing practices.

@shaohuzhang1 shaohuzhang1 merged commit 413fa6f into main Feb 26, 2025
4 checks passed
@shaohuzhang1 shaohuzhang1 deleted the pr@main@fix_generate_related branch February 26, 2025 03:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant