Skip to content

Region normalization fix#46792

Open
dibahlfi wants to merge 3 commits intohotfix/azure-cosmos_4.14.6from
users/dibahl/region-normalization-fix
Open

Region normalization fix#46792
dibahlfi wants to merge 3 commits intohotfix/azure-cosmos_4.14.6from
users/dibahl/region-normalization-fix

Conversation

@dibahlfi
Copy link
Copy Markdown
Member

@dibahlfi dibahlfi commented May 8, 2026

When a customer configures a Cosmos client, they pass region names as strings (preferred_locations, excluded_locations). The SDK previously did exact string comparisons against the canonical names returned by the account ("East US 2", "West US 3", ...).
A small spelling difference — "eastus2", "east-us-2", "east_us_2" — silently failed to match, the entry was dropped, and the client could end up routing all traffic through the global endpoint instead of the regional pool.

What this change does:
Region matching is now tolerant of case, surrounding/internal whitespace, hyphens, and underscores.
Equivalent inputs all resolve to the same region:
"East US 2"
"east us 2"
"eastus2"
"EASTUS2"
"east-us-2"
"east_us_2"
" EastUs2 "

Anything beyond that (punctuation, digits, fuzzy matching) is intentionally not stripped — a more aggressive rule could collapse genuinely different regions like "East US" and "East US 2" into the same key and silently route to the wrong region. All client-supplied region-name strings — client-level preferred, client-level excluded, and request-level excluded are normalized

The same matching rule is applied wherever the customer's region string is consumed. Previously some paths used exact-string comparisons; now they all share one normalization rule:

Routing: which region serves a request (preferred + excluded, client-level + per-request).
Refresh decision: whether to schedule a background refresh based on the most-preferred region.
Bootstrap fallback URL: when the global endpoint can't be reached at startup and the SDK constructs a regional URL from a preferred region.

Misconfigured region names now produce a visible warning that names the dropped entry and the regions that were available, emitted at config time — when account metadata is processed at startup and on each background refresh.
There is no warning for per-request values as those happen thousands of times and its a lower blast radius.

No public API change - preferred_locations and excluded_locations remain plain list[str].

Behavior examples
Account regions: ["East US 2", "West US 3"]

Customer input Before After
preferred_locations=["East US 2"] routed to East US 2 unchanged — routed to East US 2
preferred_locations=["eastus2"] dropped silently → fell back to global endpoint routed to East US 2
preferred_locations=["east-us-2", "west_us_3"] both dropped silently → fell back to global endpoint routed to East US 2 with West US 3 as secondary
excluded_locations=["eastus2"] exclusion silently ignored East US 2 actually excluded
preferred_locations=["eastus3"] (typo, not in account) dropped silently dropped, with a warning naming the unmatched entry and the regions that were available
Bootstrap fallback with preferred_locations=["east-us-2"] constructed contoso-east-us-2.documents.azure.com (wrong host) constructs contoso-eastus2.documents.azure.com

Backwards compatibility:
If client config already uses the exact spelling Azure returns (e.g., "East US 2"), nothing changes
New Azure regions continue to work without an SDK upgrade; nothing in this change hardcodes a region list.

What this PR does not do:
Does not add a Regions constants surface. Adding a named-constant list of well-known regions would expand the public API surface area. Normalization plus a visible warning is enough to close the failure mode without touching the public surface today. Constants are therefore deferred to a later stable release as a separate, additive change.

@dibahlfi
Copy link
Copy Markdown
Member Author

dibahlfi commented May 8, 2026

/azp run python - cosmos - tests

@dibahlfi
Copy link
Copy Markdown
Member Author

dibahlfi commented May 8, 2026

@sdkReviewAgent-2

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@dibahlfi
Copy link
Copy Markdown
Member Author

dibahlfi commented May 8, 2026

@sdkReviewAgent-2

@dibahlfi
Copy link
Copy Markdown
Member Author

dibahlfi commented May 8, 2026

/azp run python - cosmos - tests

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Comment thread sdk/cosmos/azure-cosmos/tests/test_location_cache.py
Comment thread sdk/cosmos/azure-cosmos/azure/cosmos/_location_cache.py
Comment thread sdk/cosmos/azure-cosmos/azure/cosmos/_location_cache.py
Comment thread sdk/cosmos/azure-cosmos/azure/cosmos/_location_cache.py Outdated
Comment thread sdk/cosmos/azure-cosmos/azure/cosmos/_location_cache.py Outdated
@xinlian12
Copy link
Copy Markdown
Member

Review complete (39:06)

Posted 5 inline comment(s).

Steps: ✓ context, correctness, cross-sdk, design, history, past-prs, synthesis, test-coverage

@dibahlfi
Copy link
Copy Markdown
Member Author

dibahlfi commented May 8, 2026

/azp run python - cosmos - tests

@dibahlfi
Copy link
Copy Markdown
Member Author

dibahlfi commented May 8, 2026

@sdkReviewAgent-2

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

# Most-preferred is already the primary; no background refresh should be triggered.
assert lc.should_refresh_endpoints() is False

def test_get_locational_endpoint_normalizes_customer_region_string(self):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Recommendation — Test Coverage: Missing collision-safety test for normalization invariant

The PR description explicitly notes: "a more aggressive rule could collapse genuinely different regions like 'East US' and 'East US 2' into the same key." This is the critical design invariant — but no test guards it.

All normalization tests use "East US 2" in different spellings. None verify that structurally similar but distinct regions (e.g., "East US" vs "East US 2", "West US" vs "West US 2" vs "West US 3") remain distinct after normalization. If a future refactor of _normalize_region_name accidentally strips digits, traffic could silently route to the wrong region with no test catching it.

Suggestion: Add a direct unit test for the normalization boundary:

from azure.cosmos._location_cache import _normalize_region_name

def test_normalize_preserves_distinct_regions():

    # Regions that differ only by trailing digit must stay distinct
    assert _normalize_region_name("East US") != _normalize_region_name("East US 2")
    assert _normalize_region_name("West US") != _normalize_region_name("West US 2")
    assert _normalize_region_name("West US 2") != _normalize_region_name("West US 3")

    # Verify exact canonical outputs
    assert _normalize_region_name("East US 2") == "eastus2"
    assert _normalize_region_name("East US") == "eastus"
    assert _normalize_region_name(None) == ""

This is cheap to add and protects the most important invariant in the PR.

⚠️ AI-generated review — may be incorrect. Agree? → resolve the conversation. Disagree? → reply with your reasoning.

]
assert len(unmatched_logs) == 1

def test_unmatched_preferred_locations_warning_is_deduped(self, caplog):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Recommendation — Test Coverage: Warning re-emission on topology change is untested

The dedupe key at _emit_config_mismatch_warning_once intentionally includes tuple(sorted(available_by_normalized.keys())) so that warnings re-fire when the account topology changes (e.g., a new region appears). This is the right design — if the available regions change, the user should be re-notified that their config still doesn't match.

However, both test_unmatched_excluded_locations_warning_is_deduped and this test only exercise suppression (same db_acc repeated). Neither verifies the positive case: that a different topology does re-emit the warning (count == 2).

Suggestion: Add a test that changes the available regions between calls:

def test_topology_change_retriggers_mismatch_warning(self, caplog):
    lc = refresh_location_cache(["unknown-region"], True)
    db_acc1 = create_database_account_with_canonical_regions(True)
    db_acc2 = create_database_account_with_canonical_regions(True)

    # Mutate topology for the second refresh
    db_acc2._ReadableLocations.append(
        {"name": "New Region", "databaseAccountEndpoint": "https://newregion.documents.azure.com"}
    )
    with caplog.at_level("WARNING", logger="azure.cosmos.LocationCache"):
        lc.perform_on_database_account_read(db_acc1)
        lc.perform_on_database_account_read(db_acc2)
    warnings = [r for r in caplog.records if "Ignoring" in r.getMessage()]
    assert len(warnings) == 2  # One per distinct topology

This protects the deliberate dedupe-key design choice against accidental simplification.

⚠️ AI-generated review — may be incorrect. Agree? → resolve the conversation. Disagree? → reply with your reasoning.

@xinlian12
Copy link
Copy Markdown
Member

Review complete (41:45)

Posted 2 inline comment(s).

Steps: ✓ context, correctness, cross-sdk, design, history, past-prs, synthesis, test-coverage

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

2 participants