Skip to content

fix(mongodb): add pymongo error handling to classify user vs platform errors#677

Open
aballman wants to merge 1 commit intomainfrom
aballman/mongodb-error-handling
Open

fix(mongodb): add pymongo error handling to classify user vs platform errors#677
aballman wants to merge 1 commit intomainfrom
aballman/mongodb-error-handling

Conversation

@aballman
Copy link
Copy Markdown
Contributor

@aballman aballman commented Mar 30, 2026

Summary

  • Add explicit pymongo exception handling to run_data() and delete_by_record_id() in the MongoDB uploader
  • Pymongo exceptions were falling through to the generic Exception catch in api_generator.py, defaulting to HTTP 500 and being classified as "platform" errors — this was the largest source of SLI misattribution for the DAG Job Completions SLO (933 errors in 7 days)
  • Map pymongo exceptions to appropriate error types: OperationFailure with quota → QuotaError (401), ServerSelectionTimeoutErrorTimeoutError (408), BulkWriteErrorWriteError (400), AutoReconnectDestinationConnectionError (400)

Test plan

  • 8 new unit tests covering all exception mappings for both run_data() and delete_by_record_id()
  • Ruff lint and format pass
  • After deploy: query Elasticsearch to confirm MongoDB uploader errors shift from error_type: "platform" to error_type: "user"

Note

Medium Risk
Changes how MongoDB write/delete failures are surfaced (error type/status code), which may affect downstream error handling and retries. Logic is straightforward and covered by new unit tests, but impacts runtime classification of production failures.

Overview
MongoDB uploader errors are now explicitly classified. MongoDBUploader.run_data() and delete_by_record_id() catch key pymongo exceptions and re-raise them as typed ingest errors (QuotaError, TimeoutError, WriteError, DestinationConnectionError) instead of falling through as generic failures.

Adds a focused unit test suite (test_mongodb_errors.py) to assert the exception-to-error mapping for both insert and delete paths.

Written by Cursor Bugbot for commit 75eb740. This will update automatically on new commits. Configure here.

… errors

Pymongo exceptions in run_data() and delete_by_record_id() were falling
through to the generic Exception catch in api_generator.py, defaulting
to HTTP 500 and being classified as "platform" errors. This was the
largest source of SLI misattribution (933 errors in 7 days).

Add explicit try/except for pymongo-specific exceptions:
- OperationFailure with "quota" → QuotaError (401)
- OperationFailure (other) → DestinationConnectionError (400)
- ServerSelectionTimeoutError → TimeoutError (408)
- AutoReconnect → DestinationConnectionError (400)
- BulkWriteError → WriteError (400)
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

except AutoReconnect as e:
raise DestinationConnectionError(f"MongoDB connection lost: {e}") from e
except Exception as e:
raise DestinationConnectionError(f"MongoDB error: {e}") from e
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Catch-all swallows correctly classified errors from delete_by_record_id

High Severity

The except Exception catch-all in run_data re-wraps already-classified exceptions from delete_by_record_id as DestinationConnectionError. When delete_by_record_id raises QuotaError or TimeoutError, those propagate into run_data's try block, skip all the pymongo-specific except handlers, and get caught by the generic except Exception — converting them back to DestinationConnectionError. This directly undermines the PR's goal of correct error classification. The catch-all needs to re-raise UnstructuredIngestError subclasses before falling through to the generic handler.

Additional Locations (1)
Fix in Cursor Fix in Web

Copy link
Copy Markdown
Contributor

@brndnblck brndnblck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bugbot flagged a real issue that needs addressing:

except Exception swallows correctly classified errors

When delete_by_record_id() raises QuotaError or TimeoutError, those exceptions propagate into run_data()'s try block, skip the pymongo-specific handlers (since they're no longer pymongo exceptions), and get caught by the generic except Exception - converting them back to DestinationConnectionError. This directly undermines the PR's goal.

Fix by re-raising already-classified errors before the catch-all:

except (QuotaError, TimeoutError, WriteError, DestinationConnectionError):
    raise  # Re-raise already-classified errors
except Exception as e:
    raise DestinationConnectionError(f"MongoDB error: {e}") from e

Or if UnstructuredIngestError is the base class:

except UnstructuredIngestError:
    raise
except Exception as e:
    raise DestinationConnectionError(f"MongoDB error: {e}") from e

Also worth adding a test that exercises the delete -> error -> propagation path to catch this regression.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants