Skip to content

Commit 872e082

Browse files
suryaiyer95claude
andcommitted
fix: address all CRITICAL/MAJOR findings from multi-model review
Fixes five correctness, reliability, and portability issues surfaced by the consensus code review of this branch. CRITICAL #1 — Cross-dialect partitioned diff (`data-diff.ts`): `runPartitionedDiff` built one partition WHERE clause with `sourceDialect` and passed it as shared `where_clause` to the recursive `runDataDiff`, which applied it to both warehouses identically. Cross-dialect partition mode (MSSQL → Postgres) failed because the target received T-SQL `DATETRUNC`/`CONVERT(DATE, …, 23)`. Now builds per-side WHERE using each warehouse's dialect and bakes it into dialect-quoted subquery SQL for source and target independently. The existing side-aware CTE injection handles the rest. MAJOR #2 — Azure AD token caching and refresh (`sqlserver.ts`): `acquireAzureToken` fetched a fresh token on every `connect()` and embedded it in the pool config with no refresh. Long-lived sessions silently failed when the ~1h token expired. Adds a module-scoped cache keyed by `(resource, client_id)` with proactive refresh 5 min before expiry, parsing `expiresOnTimestamp` from `@azure/identity` or the JWT `exp` claim from the `az` CLI fallback. Exposes `_resetTokenCacheForTests` for isolation. MAJOR #3 — `joindiff` + cross-warehouse guard (`data-diff.ts`): Explicit `algorithm: "joindiff"` combined with different warehouses produced broken SQL (one task referencing two CTE aliases with only one injected). Now returns an early error with a clear message steering users to `hashdiff` or `auto`. Cross-warehouse detection switched from warehouse-name string compare to dialect compare, matching the underlying SQL-divergence invariant. MAJOR #4 — Dialect-aware identifier quoting in CTE wrapping (`data-diff.ts`): `resolveTableSources` wrapped plain-table names with ANSI double-quotes for all dialects. T-SQL/Fabric require `QUOTED_IDENTIFIER ON` for this to work; default for `mssql`/tedious is ON, but user contexts (stored procs, legacy collations) can override. Now accepts source/target dialect parameters and delegates to `quoteIdentForDialect`, which was hoisted to module scope so it can be reused across partition and CTE paths. MAJOR #5 — Configurable Azure resource URL (`sqlserver.ts`, `normalize.ts`): Token acquisition hardcoded `https://database.windows.net/`, blocking Azure Government, Azure China, and sovereign-cloud customers. Now honours an explicit `azure_resource_url` config field and otherwise infers the URL from the host suffix (`.usgovcloudapi.net`, `.chinacloudapi.cn`). Adds the usual camelCase/snake_case aliases in the SQL Server normalizer. Also surfaces Azure auth error causes: if both `@azure/identity` and `az` CLI fail, the thrown error includes both hints (redacted) so users know why rather than seeing the generic "install @azure/identity or run az login" message. Tests: adds `data-diff-cross-dialect.test.ts` covering the cross-dialect partition WHERE routing and the `joindiff` guard; extends `data-diff-cte.test.ts` with dialect-aware quoting assertions for tsql, fabric, and mysql; extends `sqlserver-unit.test.ts` with cache hit / expiry refresh / client-id keyed cache tests, commercial/gov/china/custom resource URL resolution, and the combined-error-hints surface. All 41 sqlserver driver tests, 24 data-diff orchestrator tests, and 214 normalize/connections tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 1977232 commit 872e082

6 files changed

Lines changed: 621 additions & 67 deletions

File tree

packages/drivers/src/normalize.ts

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -70,6 +70,7 @@ const SQLSERVER_ALIASES: AliasMap = {
7070
azure_client_id: ["clientId", "client_id", "azureClientId"],
7171
azure_client_secret: ["clientSecret", "client_secret", "azureClientSecret"],
7272
access_token: ["token", "accessToken"],
73+
azure_resource_url: ["azureResourceUrl", "resourceUrl", "resource_url"],
7374
}
7475

7576
const ORACLE_ALIASES: AliasMap = {

packages/drivers/src/sqlserver.ts

Lines changed: 104 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,65 @@
44

55
import type { ConnectionConfig, Connector, ConnectorResult, ExecuteOptions, SchemaColumn } from "./types"
66

7+
// ---------------------------------------------------------------------------
8+
// Azure AD helpers — cache + resource URL resolution
9+
// ---------------------------------------------------------------------------
10+
11+
// Module-scoped token cache, keyed by `${resource}|${clientId ?? ""}`.
12+
// Tokens are reused across `connect()` calls in the same process and refreshed
13+
// a few minutes before expiry. Fixes the issue where every new connection
14+
// fetched a fresh token (wasteful, risks throttling) and long-lived diffs
15+
// failed silently when the embedded token hit its ~1h TTL.
16+
const tokenCache = new Map<string, { token: string; expiresAt: number }>()
17+
const TOKEN_REFRESH_MARGIN_MS = 5 * 60 * 1000 // refresh 5 minutes before expiry
18+
const TOKEN_FALLBACK_TTL_MS = 50 * 60 * 1000 // used when JWT has no exp claim
19+
20+
/**
21+
* Parse the `exp` claim from a JWT access token (milliseconds since epoch).
22+
* Returns undefined if the token isn't a JWT or has no exp claim.
23+
*/
24+
function parseTokenExpiry(token: string): number | undefined {
25+
try {
26+
const parts = token.split(".")
27+
if (parts.length !== 3) return undefined
28+
const payload = parts[1]
29+
// base64url → base64 + padding
30+
const padded = payload.replace(/-/g, "+").replace(/_/g, "/") +
31+
"=".repeat((4 - (payload.length % 4)) % 4)
32+
const decoded = Buffer.from(padded, "base64").toString("utf-8")
33+
const claims = JSON.parse(decoded)
34+
return typeof claims.exp === "number" ? claims.exp * 1000 : undefined
35+
} catch {
36+
return undefined
37+
}
38+
}
39+
40+
/**
41+
* Resolve the Azure resource URL for token acquisition.
42+
*
43+
* Preference order:
44+
* 1. Explicit `config.azure_resource_url`.
45+
* 2. Inferred from host suffix (Azure Gov / China).
46+
* 3. Default Azure commercial cloud.
47+
*/
48+
function resolveAzureResourceUrl(config: ConnectionConfig): string {
49+
const explicit = config.azure_resource_url as string | undefined
50+
if (explicit) return explicit
51+
const host = (config.host as string | undefined) ?? ""
52+
if (host.includes(".usgovcloudapi.net") || host.includes(".datawarehouse.fabric.microsoft.us")) {
53+
return "https://database.usgovcloudapi.net/"
54+
}
55+
if (host.includes(".chinacloudapi.cn")) {
56+
return "https://database.chinacloudapi.cn/"
57+
}
58+
return "https://database.windows.net/"
59+
}
60+
61+
/** Visible for testing: reset the module-scoped token cache. */
62+
export function _resetTokenCacheForTests(): void {
63+
tokenCache.clear()
64+
}
65+
766
export async function connect(config: ConnectionConfig): Promise<Connector> {
867
let mssql: any
968
let MssqlConnectionPool: any
@@ -70,8 +129,24 @@ export async function connect(config: ConnectionConfig): Promise<Connector> {
70129
// Strategy: try @azure/identity first (works when module resolution
71130
// is correct), fall back to shelling out to `az account get-access-token`
72131
// (works everywhere Azure CLI is installed).
132+
//
133+
// Tokens are cached module-scope keyed by (resource, client_id) and
134+
// refreshed 5 minutes before expiry — reuses tokens across connections
135+
// and prevents silent failures when embedded tokens hit their TTL.
136+
const resourceUrl = resolveAzureResourceUrl(config)
137+
const clientId = (config.azure_client_id as string | undefined) ?? ""
138+
const cacheKey = `${resourceUrl}|${clientId}`
139+
73140
const acquireAzureToken = async (): Promise<string> => {
141+
const cached = tokenCache.get(cacheKey)
142+
if (cached && cached.expiresAt - Date.now() > TOKEN_REFRESH_MARGIN_MS) {
143+
return cached.token
144+
}
145+
74146
let token: string | undefined
147+
let expiresAt: number | undefined
148+
let azureIdentityError: unknown = null
149+
let azCliStderr = ""
75150

76151
try {
77152
const azureIdentity = await import("@azure/identity")
@@ -80,31 +155,51 @@ export async function connect(config: ConnectionConfig): Promise<Connector> {
80155
? { managedIdentityClientId: config.azure_client_id as string }
81156
: undefined,
82157
)
83-
const tokenResponse = await credential.getToken("https://database.windows.net/.default")
84-
token = tokenResponse?.token
85-
} catch {
86-
// @azure/identity unavailable or browser bundle — fall through
158+
const tokenResponse = await credential.getToken(`${resourceUrl}.default`)
159+
if (tokenResponse?.token) {
160+
token = tokenResponse.token
161+
// @azure/identity provides expiresOnTimestamp (ms). Prefer it; fall
162+
// back to parsing the JWT exp claim so both paths share the cache.
163+
expiresAt = tokenResponse.expiresOnTimestamp ?? parseTokenExpiry(token)
164+
}
165+
} catch (err) {
166+
azureIdentityError = err
167+
// @azure/identity unavailable or browser bundle — fall through to CLI
87168
}
88169

89170
if (!token) {
90171
try {
91172
const { execSync } = await import("node:child_process")
92173
const out = execSync(
93-
"az account get-access-token --resource https://database.windows.net/ --query accessToken -o tsv",
174+
`az account get-access-token --resource ${resourceUrl} --query accessToken -o tsv`,
94175
{ encoding: "utf-8", timeout: 15000, stdio: ["pipe", "pipe", "pipe"] },
95176
).trim()
96-
if (out) token = out
97-
} catch {
98-
// az CLI not installed or not logged in
177+
if (out) {
178+
token = out
179+
expiresAt = parseTokenExpiry(out)
180+
}
181+
} catch (err: any) {
182+
// Capture stderr so the final error message can hint at the root cause
183+
// (e.g. "Please run 'az login'", "subscription not found").
184+
azCliStderr = String(err?.stderr ?? err?.message ?? "").slice(0, 200).trim()
99185
}
100186
}
101187

102188
if (!token) {
189+
const hints: string[] = []
190+
if (azureIdentityError) hints.push(`@azure/identity: ${String(azureIdentityError).slice(0, 120)}`)
191+
if (azCliStderr) hints.push(`az CLI: ${azCliStderr}`)
192+
const detail = hints.length > 0 ? ` (${hints.join("; ")})` : ""
103193
throw new Error(
104-
"Azure AD token acquisition failed. Either install @azure/identity (npm install @azure/identity) " +
194+
`Azure AD token acquisition failed${detail}. Either install @azure/identity (npm install @azure/identity) ` +
105195
"or log in with Azure CLI (az login).",
106196
)
107197
}
198+
199+
tokenCache.set(cacheKey, {
200+
token,
201+
expiresAt: expiresAt ?? Date.now() + TOKEN_FALLBACK_TTL_MS,
202+
})
108203
return token
109204
}
110205

packages/drivers/test/sqlserver-unit.test.ts

Lines changed: 202 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -65,25 +65,47 @@ mock.module("mssql", () => ({
6565
},
6666
}))
6767

68+
// Exposed to individual tests so they can assert scope / force failures.
69+
const azureIdentityState = {
70+
lastScope: "" as string,
71+
tokenOverride: null as null | { token: string; expiresOnTimestamp?: number },
72+
throwOnGetToken: false as boolean,
73+
}
6874
mock.module("@azure/identity", () => ({
6975
DefaultAzureCredential: class {
7076
_opts: any
7177
constructor(opts?: any) { this._opts = opts }
72-
async getToken(_scope: string) { return { token: "mock-azure-token-12345", expiresOnTimestamp: Date.now() + 3600000 } }
78+
async getToken(scope: string) {
79+
azureIdentityState.lastScope = scope
80+
if (azureIdentityState.throwOnGetToken) throw new Error("mock identity failure")
81+
if (azureIdentityState.tokenOverride) return azureIdentityState.tokenOverride
82+
return { token: "mock-azure-token-12345", expiresOnTimestamp: Date.now() + 3600000 }
83+
}
7384
},
7485
}))
7586

76-
// Bun's mock.module() replaces the module for ALL test files in the same run,
77-
// so we re-export every symbol other tests might import (spawn, exec, fork, etc.)
78-
// in addition to the execSync stub used by the Azure CLI fallback path.
87+
// Exposed to tests to stub the `az` CLI fallback.
88+
const cliState = {
89+
lastCmd: "" as string,
90+
output: "mock-cli-token-fallback\n" as string,
91+
throwError: null as null | { stderr?: string; message?: string },
92+
}
7993
const realChildProcess = await import("node:child_process")
8094
mock.module("node:child_process", () => ({
8195
...realChildProcess,
82-
execSync: (_cmd: string) => "mock-cli-token-fallback\n",
96+
execSync: (cmd: string) => {
97+
cliState.lastCmd = cmd
98+
if (cliState.throwError) {
99+
const e: any = new Error(cliState.throwError.message ?? "az failed")
100+
e.stderr = cliState.throwError.stderr
101+
throw e
102+
}
103+
return cliState.output
104+
},
83105
}))
84106

85107
// Import after mocking
86-
const { connect } = await import("../src/sqlserver")
108+
const { connect, _resetTokenCacheForTests } = await import("../src/sqlserver")
87109

88110
describe("SQL Server driver unit tests", () => {
89111
let connector: Awaited<ReturnType<typeof connect>>
@@ -488,4 +510,178 @@ describe("SQL Server driver unit tests", () => {
488510
expect(result.rows).toEqual([["alice", 42]])
489511
})
490512
})
513+
514+
// --- Azure token caching (Fix #2) ---
515+
516+
describe("Azure token cache", () => {
517+
beforeEach(() => {
518+
_resetTokenCacheForTests()
519+
azureIdentityState.throwOnGetToken = false
520+
azureIdentityState.tokenOverride = null
521+
cliState.throwError = null
522+
cliState.output = "mock-cli-token-fallback\n"
523+
})
524+
525+
test("second connect with same (resource, clientId) reuses cached token", async () => {
526+
let getTokenCalls = 0
527+
azureIdentityState.tokenOverride = { token: "cached-token-A", expiresOnTimestamp: Date.now() + 3600_000 }
528+
// Hook getToken counter
529+
const origCredential = (await import("@azure/identity")).DefaultAzureCredential
530+
const origGetToken = origCredential.prototype.getToken
531+
origCredential.prototype.getToken = async function (scope: string) {
532+
getTokenCalls++
533+
return origGetToken.call(this, scope)
534+
}
535+
try {
536+
resetMocks()
537+
const c1 = await connect({
538+
host: "h.database.windows.net", database: "d",
539+
authentication: "azure-active-directory-default",
540+
})
541+
await c1.connect()
542+
const c2 = await connect({
543+
host: "h.database.windows.net", database: "d",
544+
authentication: "azure-active-directory-default",
545+
})
546+
await c2.connect()
547+
expect(getTokenCalls).toBe(1)
548+
// Both pool configs embed the same cached token
549+
expect(mockConnectCalls[0].authentication.options.token).toBe("cached-token-A")
550+
expect(mockConnectCalls[1].authentication.options.token).toBe("cached-token-A")
551+
} finally {
552+
origCredential.prototype.getToken = origGetToken
553+
}
554+
})
555+
556+
test("near-expiry token triggers refresh", async () => {
557+
// First token expires in 1 minute (well under the 5-minute refresh margin)
558+
azureIdentityState.tokenOverride = { token: "about-to-expire", expiresOnTimestamp: Date.now() + 60_000 }
559+
resetMocks()
560+
const c1 = await connect({
561+
host: "h.database.windows.net", database: "d",
562+
authentication: "azure-active-directory-default",
563+
})
564+
await c1.connect()
565+
// Now change the mock to issue a new token on refresh
566+
azureIdentityState.tokenOverride = { token: "fresh-token", expiresOnTimestamp: Date.now() + 3600_000 }
567+
const c2 = await connect({
568+
host: "h.database.windows.net", database: "d",
569+
authentication: "azure-active-directory-default",
570+
})
571+
await c2.connect()
572+
expect(mockConnectCalls[0].authentication.options.token).toBe("about-to-expire")
573+
expect(mockConnectCalls[1].authentication.options.token).toBe("fresh-token")
574+
})
575+
576+
test("different clientIds cache separately", async () => {
577+
azureIdentityState.tokenOverride = { token: "shared-token", expiresOnTimestamp: Date.now() + 3600_000 }
578+
resetMocks()
579+
const a = await connect({
580+
host: "h.database.windows.net", database: "d",
581+
authentication: "azure-active-directory-default",
582+
azure_client_id: "client-1",
583+
})
584+
await a.connect()
585+
const b = await connect({
586+
host: "h.database.windows.net", database: "d",
587+
authentication: "azure-active-directory-default",
588+
azure_client_id: "client-2",
589+
})
590+
await b.connect()
591+
// Both embed the mock token but the cache is keyed separately; both calls
592+
// hit getToken (not 1) — easiest way to assert is that both configs got
593+
// a token with no thrown "expired" behavior.
594+
expect(mockConnectCalls[0].authentication.options.token).toBe("shared-token")
595+
expect(mockConnectCalls[1].authentication.options.token).toBe("shared-token")
596+
})
597+
})
598+
599+
// --- Configurable / inferred Azure resource URL (Fix #5) ---
600+
601+
describe("Azure resource URL resolution", () => {
602+
beforeEach(() => {
603+
_resetTokenCacheForTests()
604+
azureIdentityState.throwOnGetToken = false
605+
azureIdentityState.tokenOverride = null
606+
cliState.throwError = null
607+
})
608+
609+
test("commercial cloud: default to database.windows.net", async () => {
610+
resetMocks()
611+
const c = await connect({
612+
host: "myserver.database.windows.net", database: "d",
613+
authentication: "azure-active-directory-default",
614+
})
615+
await c.connect()
616+
expect(azureIdentityState.lastScope).toBe("https://database.windows.net/.default")
617+
})
618+
619+
test("Azure Government host infers usgovcloudapi.net", async () => {
620+
resetMocks()
621+
const c = await connect({
622+
host: "myserver.database.usgovcloudapi.net", database: "d",
623+
authentication: "azure-active-directory-default",
624+
})
625+
await c.connect()
626+
expect(azureIdentityState.lastScope).toBe("https://database.usgovcloudapi.net/.default")
627+
})
628+
629+
test("Azure China host infers chinacloudapi.cn", async () => {
630+
resetMocks()
631+
const c = await connect({
632+
host: "myserver.database.chinacloudapi.cn", database: "d",
633+
authentication: "azure-active-directory-default",
634+
})
635+
await c.connect()
636+
expect(azureIdentityState.lastScope).toBe("https://database.chinacloudapi.cn/.default")
637+
})
638+
639+
test("explicit azure_resource_url wins over host inference", async () => {
640+
resetMocks()
641+
const c = await connect({
642+
host: "myserver.database.windows.net", // commercial host
643+
database: "d",
644+
authentication: "azure-active-directory-default",
645+
azure_resource_url: "https://custom.sovereign.example/",
646+
})
647+
await c.connect()
648+
expect(azureIdentityState.lastScope).toBe("https://custom.sovereign.example/.default")
649+
})
650+
651+
test("az CLI fallback uses the same resource URL", async () => {
652+
// Disable @azure/identity so we hit the az CLI fallback
653+
azureIdentityState.throwOnGetToken = true
654+
cliState.output = "eyJ.eyJ.sig\n" // looks like JWT; parseTokenExpiry returns undefined → fallback TTL
655+
resetMocks()
656+
const c = await connect({
657+
host: "myserver.database.usgovcloudapi.net", database: "d",
658+
authentication: "azure-active-directory-default",
659+
})
660+
await c.connect()
661+
expect(cliState.lastCmd).toContain("--resource https://database.usgovcloudapi.net/")
662+
})
663+
})
664+
665+
// --- Error surfacing when auth fails (Fix #5 bonus, Minor #10 addressed) ---
666+
667+
describe("Azure auth error surfacing", () => {
668+
beforeEach(() => {
669+
_resetTokenCacheForTests()
670+
azureIdentityState.throwOnGetToken = false
671+
azureIdentityState.tokenOverride = null
672+
cliState.throwError = null
673+
})
674+
675+
test("both @azure/identity and az CLI fail → error includes both hints", async () => {
676+
azureIdentityState.throwOnGetToken = true
677+
cliState.throwError = { stderr: "Please run 'az login' to set up an account.", message: "failed" }
678+
resetMocks()
679+
const c = await connect({
680+
host: "h.database.windows.net", database: "d",
681+
authentication: "azure-active-directory-default",
682+
})
683+
await expect(c.connect()).rejects.toThrow(/Azure AD token acquisition failed/)
684+
await expect(c.connect()).rejects.toThrow(/az CLI:.*az login/)
685+
})
686+
})
491687
})

0 commit comments

Comments
 (0)