Fix: forward aggregate report_type to TriggerReconAggregateService by moomindani · Pull Request #2401 · databrickslabs/lakebridge

moomindani · 2026-05-01T04:42:05Z

Changes

What does this PR do?

Forwards report_type=="aggregate" from TriggerReconService.trigger_recon to TriggerReconAggregateService.trigger_recon_aggregates so programmatic callers that pass an aggregate report type get the actual aggregate comparison instead of a silent no-op.

Relevant implementation details

TriggerReconService.trigger_recon validates report_type against _RECON_REPORT_TYPES = {"schema","data","row","all","aggregate"}, so "aggregate" is accepted at entry. However, _do_recon_one only branches on {"schema","all"} and {"data","row","all"}. The aggregate case falls through, no comparison runs, and recon_capture.start() records the run with status=true. Callers see a successful reconciliation even when source and target differ.

The CLI dispatch in execute.py already routes AGG_RECONCILE_OPERATION_NAME to TriggerReconAggregateService.trigger_recon_aggregates, but programmatic users of the public Reconcile API (notebooks, integrations) hit the silent no-op without this change.

This PR adds an early dispatch at the top of trigger_recon. The import is deferred because trigger_recon_aggregate_service already imports trigger_recon_service and a top-level import would be circular.

Caveats/things to watch out for when reviewing:

Deferred import inside the function — done to avoid the circular import; behavior identical to execute.py's existing dispatch
All existing report types (schema, data, row, all) keep the original path

Linked issues

None

Functionality

added relevant user documentation
added new CLI command
modified existing command: programmatic API of TriggerReconService.trigger_recon

Tests

manually tested
added unit tests
added integration tests

Manual end-to-end on Databricks Serverless with two tables of different row counts:

Before: aggregate=None, status=true, no rows in recon.aggregate_*
After: aggregate=False, mismatch=1, rows present in recon.aggregate_metrics

Unit tests added in tests/unit/reconcile/test_verify_reconcile.py:

test_trigger_recon_dispatches_aggregate_to_aggregate_service: forwarding works and the data-path setup is skipped
test_trigger_recon_does_not_dispatch_non_aggregate: schema / data / row / all keep the original path

TriggerReconService.trigger_recon validates that report_type is in {"schema","data","row","all","aggregate"} via _RECON_REPORT_TYPES, so "aggregate" is accepted at the entry point. However, _do_recon_one only branches on {"schema","all"} and {"data","row","all"}; the "aggregate" case falls through, no comparison is performed, and recon_capture.start() records the run with status=true. Callers see a successful reconciliation even when source and target differ. The aggregate path lives in TriggerReconAggregateService. trigger_recon_aggregates. CLI dispatch in execute.py already routes the AGG_RECONCILE_OPERATION_NAME to that service, but programmatic callers of TriggerReconService.trigger_recon (e.g. notebooks built around the public Reconcile API) hit the silent no-op. Forward report_type=="aggregate" to TriggerReconAggregateService at the top of trigger_recon. The import is deferred because trigger_recon_aggregate_service already imports trigger_recon_service. Co-authored-by: Isaac

Cover the routing introduced in this branch: - report_type='aggregate' is forwarded to TriggerReconAggregateService.trigger_recon_aggregates without going through create_recon_dependencies of the data path - The aggregate-specific kwargs (table_recon, reconcile_config, local_test_run) are passed through unchanged - All other report_types ('schema', 'data', 'row', 'all') keep the existing path and never call the aggregate service Without the dispatch, the existing _do_recon_one short-circuits silently for report_type='aggregate' and the run is recorded as successful. The first test reproduces the original behavior (no aggregate forward) and fails on origin/main. Co-authored-by: Isaac

m-abulazm

why do we need this? notebook/python users should use TriggerReconAggregateService.trigger_recon_aggregates

moomindani · 2026-05-07T00:42:53Z

Fair question — TriggerReconAggregateService.trigger_recon_aggregates does exist as a direct entry point, so this isn't about adding new capability. The motivation is consistency / discoverability of the public TriggerReconService.trigger_recon surface.

trigger_recon already accepts report_type in {"schema","data","row","all","aggregate"} (validated against _RECON_REPORT_TYPES), and for data / row / all / schema it does the right thing. aggregate is the only value that's accepted at the entrypoint but silently no-ops downstream — _do_recon_one only branches on {"schema","all"} and {"data","row","all"}, so the aggregate case falls through and recon_capture.start() records the run with status=true even when source/target differ.

We've actually seen users get caught by this in practice — they followed the same trigger_recon(..., report_type=...) pattern they already use for data/row, passed "aggregate", got back a successful-looking result, and didn't realize no comparison had run. The CLI dispatch in execute.py already routes the aggregate case to TriggerReconAggregateService, so this PR just brings the programmatic API in line with that, keeping a single trigger_recon entry point that behaves uniformly across all accepted report_type values.

Happy to take an alternate fix if you'd prefer — e.g., narrowing _RECON_REPORT_TYPES in trigger_recon to exclude "aggregate" and explicitly documenting that aggregate users must call TriggerReconAggregateService directly. The current shape was chosen to avoid breaking callers that already pass "aggregate".

…ispatch-in-trigger-recon # Conflicts: # src/databricks/labs/lakebridge/reconcile/trigger_recon_service.py

moomindani · 2026-05-08T15:52:05Z

Reopened as #2423 on an upstream branch to bypass the fork-PR OIDC restriction on JFrog auth (CI cannot run on fork PRs). All review comments here are preserved as history; further discussion will happen on #2423.

moomindani added 2 commits May 1, 2026 11:27

moomindani requested a review from a team as a code owner May 1, 2026 04:42

moomindani had a problem deploying to tool May 1, 2026 04:42 — with GitHub Actions Failure

m-abulazm requested changes May 1, 2026

View reviewed changes

moomindani mentioned this pull request May 7, 2026

Feat: support agg_columns=["*"] for COUNT(*) in aggregate reconcile #2403

Closed

6 tasks

Merge remote-tracking branch 'origin/main' into fix/recon-aggregate-d…

6f6889e

…ispatch-in-trigger-recon # Conflicts: # src/databricks/labs/lakebridge/reconcile/trigger_recon_service.py

moomindani had a problem deploying to tool May 7, 2026 06:15 — with GitHub Actions Failure

moomindani mentioned this pull request May 8, 2026

Fix: forward aggregate report_type to TriggerReconAggregateService #2423

Open

6 tasks

moomindani temporarily deployed to tool May 8, 2026 15:52 — with GitHub Actions Inactive

moomindani closed this May 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: forward aggregate report_type to TriggerReconAggregateService#2401

Fix: forward aggregate report_type to TriggerReconAggregateService#2401
moomindani wants to merge 3 commits intodatabrickslabs:mainfrom
moomindani:fix/recon-aggregate-dispatch-in-trigger-recon

moomindani commented May 1, 2026

Uh oh!

m-abulazm left a comment

Uh oh!

moomindani commented May 7, 2026

Uh oh!

moomindani commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

moomindani commented May 1, 2026

Changes

What does this PR do?

Relevant implementation details

Caveats/things to watch out for when reviewing:

Linked issues

Functionality

Tests

Uh oh!

m-abulazm left a comment

Choose a reason for hiding this comment

Uh oh!

moomindani commented May 7, 2026

Uh oh!

moomindani commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants