Skip to content

feat: update document index status model and related components#983

Closed
iziang wants to merge 1 commit into
mainfrom
support/optimize-indexing
Closed

feat: update document index status model and related components#983
iziang wants to merge 1 commit into
mainfrom
support/optimize-indexing

Conversation

@iziang
Copy link
Copy Markdown
Contributor

@iziang iziang commented Jun 24, 2025

  • Refactored DocumentIndex to use a simplified status model with new states: pending, creating, active, deleting, deletion_in_progress, and failed.
  • Updated related schemas, services, and frontend components to reflect the new status model.
  • Removed old desired_state and actual_state fields, streamlining the index lifecycle management.
  • Adjusted API responses and documentation to align with the new status definitions.

- Refactored DocumentIndex to use a simplified status model with new states: pending, creating, active, deleting, deletion_in_progress, and failed.
- Updated related schemas, services, and frontend components to reflect the new status model.
- Removed old desired_state and actual_state fields, streamlining the index lifecycle management.
- Adjusted API responses and documentation to align with the new status definitions.
@apecloud-bot apecloud-bot added the size/XL Denotes a PR that changes 500-999 lines. label Jun 24, 2025
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Callback Signature Mismatch Causes Runtime Error

The on_index_failed callback in aperag.index.reconciler was updated to expect a singular index_type (string) and a task_context (dictionary) parameter. However, the _handle_index_failure helper method, used by tasks like delete_index_task and implicitly by the update_document_indexes_workflow, still calls on_index_failed with its old signature, passing a list of index_types and omitting task_context. This signature mismatch causes a TypeError at runtime when index operations fail and attempt to invoke the callback, specifically when task_context.get('version') is called on the incorrectly passed error_msg string.

config/celery_tasks.py#L154-L161

def _handle_index_failure(self, document_id: str, index_types: List[str], error_msg: str):
try:
from aperag.index.reconciler import index_task_callbacks
index_task_callbacks.on_index_failed(document_id, index_types, error_msg)
logger.info(f"Index failure callback executed for {index_types} indexes of document {document_id}")
except Exception as e:
logger.warning(f"Failed to execute index failure callback for {document_id}: {e}", exc_info=True)

aperag/tasks/scheduler.py#L270-L278

def schedule_update_index(self, document_id: str, index_types: List[str], **kwargs) -> str:
"""Schedule index update workflow"""
from config.celery_tasks import update_document_indexes_workflow
try:
# Execute workflow and return AsyncResult ID (not calling .get())
workflow_result = update_document_indexes_workflow(document_id, index_types)
workflow_id = workflow_result.id # Use .id instead of .get('workflow_id')

aperag/tasks/scheduler.py#L156-L157

file_path=file_path,
)

config/celery_tasks.py#L272-L277

# Only mark as failed if all retries are exhausted
if self.request.retries >= self.max_retries:
self._handle_index_failure(document_id, [index_type], result.error)
return result.to_dict()

Fix in Cursor


Bug: Celery Task Callbacks Missing Context Parameter

The _handle_index_success and _handle_index_failure methods in config/celery_tasks.py call index_task_callbacks.on_index_created and on_index_failed with an incorrect signature. These callback methods now require a task_context parameter. If invoked, this will cause a TypeError. Specifically, _handle_index_success will pass index_data_json (a string) as task_context, leading to an AttributeError when task_context.get('version') is called.

config/celery_tasks.py#L125-L161

abstract = True
def _handle_index_success(self, document_id: str, index_type: str, index_data: dict = None):
try:
from aperag.index.reconciler import index_task_callbacks
index_data_json = json.dumps(index_data) if index_data else None
index_task_callbacks.on_index_created(document_id, index_type, index_data_json)
logger.info(f"Index success callback executed for {index_type} index of document {document_id}")
except Exception as e:
logger.warning(f"Failed to execute index success callback for {index_type} of {document_id}: {e}", exc_info=True)
def _handle_index_success_with_context(self, document_id: str, index_type: str, task_context: dict, index_data: dict = None):
try:
from aperag.index.reconciler import index_task_callbacks
index_data_json = json.dumps(index_data) if index_data else None
index_task_callbacks.on_index_created(document_id, index_type, task_context, index_data_json)
version = task_context.get('version', 'unknown')
logger.info(f"Index success callback executed for {index_type} index of document {document_id} version {version}")
except Exception as e:
version = task_context.get('version', 'unknown')
logger.warning(f"Failed to execute index success callback for {index_type} of {document_id} version {version}: {e}", exc_info=True)
def _handle_index_deletion_success(self, document_id: str, index_type: str):
try:
from aperag.index.reconciler import index_task_callbacks
index_task_callbacks.on_index_deleted(document_id, index_type)
logger.info(f"Index deletion callback executed for {index_type} index of document {document_id}")
except Exception as e:
logger.warning(f"Failed to execute index deletion callback for {index_type} of {document_id}: {e}", exc_info=True)
def _handle_index_failure(self, document_id: str, index_types: List[str], error_msg: str):
try:
from aperag.index.reconciler import index_task_callbacks
index_task_callbacks.on_index_failed(document_id, index_types, error_msg)
logger.info(f"Index failure callback executed for {index_types} indexes of document {document_id}")
except Exception as e:
logger.warning(f"Failed to execute index failure callback for {document_id}: {e}", exc_info=True)

aperag/tasks/scheduler.py#L130-L131

def schedule_create_index(self, document_id: str, index_types: List[str], **kwargs) -> str:

Fix in Cursor


Bug: TaskScheduler Method Signature Mismatch

The LocalTaskScheduler.schedule_create_index method signature does not match the updated abstract TaskScheduler.schedule_create_index. The abstract method expects (index_types: List[DocumentIndexType], document_id: str, task_context: dict, task_id: str = None), while the LocalTaskScheduler implementation retains the old signature (document_id: str, index_types: List[str], **kwargs). This mismatch will cause a TypeError at runtime.

aperag/tasks/scheduler.py#L37-L54

@abstractmethod
def schedule_create_index(
self, index_types: List[DocumentIndexType], document_id: str, task_context: dict, task_id: str = None
):
"""
Schedule single index creation task
Args:
document_id: Document ID to process
index_types: List of index types (vector, fulltext, graph)
task_context: Task metadata context (version, etc.)
task_id: Task ID for tracking (optional)
Returns:
Task ID for tracking
"""
pass

Fix in Cursor


BugBot free trial expires on July 22, 2025
You have used $0.00 of your $1.00 spend limit so far. Manage your spend limit in the Cursor dashboard.

Was this report helpful? Give feedback by reacting with 👍 or 👎

@iziang iziang closed this Jun 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/XL Denotes a PR that changes 500-999 lines.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants