Skip to content

[refactor] Introduce SpannerConnector to decouple normalized client#1897

Draft
clincoln8 wants to merge 10 commits into
datacommonsorg:masterfrom
clincoln8:spanner-client
Draft

[refactor] Introduce SpannerConnector to decouple normalized client#1897
clincoln8 wants to merge 10 commits into
datacommonsorg:masterfrom
clincoln8:spanner-client

Conversation

@clincoln8
Copy link
Copy Markdown
Contributor

@clincoln8 clincoln8 commented May 15, 2026

High Level Summary

This PR refactors the Spanner client implementation to improve maintainability and testability by consolidating components and separating concerns. It moves the schema selector logic into the main client.go, separates schema-specific logic into dedicated files, and extracts common execution details into a new SpannerConnector. It also reorganizes tests to separate unit tests from integration tests.

Overall Approach & Structure

  • Consolidation: Moved the selectorClient from client_selector.go into client.go to centralize routing concerns.
  • Component Separation:
    • Renamed query.go to standard_client.go and focused it on the legacy schema.
    • Renamed query_normalized.go to normalized_client.go and consolidated normalized schema logic there.
    • Created connector.go to encapsulate low-level connection and execution logic.
  • Test Isolation: Moved unit tests out of the golden directory into the main package directory, and refactored the selector test to be a pure unit test using mocks.

Detailed Changes

Design & Architecture

  • Interface for Testability: In client.go, changed selectorClient's normalized field from a concrete *normalizedSchemaClient to the SpannerClient interface. This was necessary to allow passing a mock client in unit tests.
  • Encapsulation of Execution: Extracted all low-level Spanner client interactions, background timestamp polling, and generic query execution methods into SpannerConnector (in connector.go). Both legacy and normalized clients now share this connector, avoiding duplicate connections.

Testing Strategy

  • Isolation of Unit Tests: Moved tests that do not require a live database connection out of the golden directory into the main package directory (datasource_test.go). This makes it easier to run fast unit tests during iteration.
  • Mocking Improvements: Used interface embedding in mockSpannerClient in client_test.go to avoid implementing dozens of dummy methods, keeping the mock focused only on what is being tested.
  • Pure Unit Tests for Selector: Replaced the database-dependent selector test (which was in golden/datasource_normalized_test.go) with pure unit tests in client_test.go that use mocks and do not require a database connection.

Testing

  • Ran unit tests in internal/server/spanner and internal/server/spanner/golden. All tests passed.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the Spanner client implementation by extracting common execution and staleness logic into a new SpannerExecutor component. It renames spannerDatabaseClient to defaultSpannerClient, introduces the NormalizedObservationProvider interface, and updates the schemaSelectorClient to use this interface. These changes decouple the high-level client logic from low-level Spanner operations and improve the overall structure for supporting multiple schemas. I have no feedback to provide.

@clincoln8 clincoln8 changed the title [refactor] Introduce SpannerExecutor to decouple normalized client [refactor] Introduce SpannerConnector to decouple normalized client May 15, 2026
@clincoln8
Copy link
Copy Markdown
Contributor Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the Spanner client architecture by introducing a SpannerConnector for low-level operations and a selectorClient to route requests between standard and normalized schema implementations. Key changes include moving legacy logic to standard_client.go and reorganizing unit tests. Reviewer feedback highlights opportunities to improve logging by reducing the severity of staleness timestamp errors, ensuring safer data type handling in dynamic queries, and refactoring the normalized client's constructor to avoid brittle type assertions and panics. Additionally, it is recommended to restore documentation regarding concurrency limits for parallel operations.

if val != 0 {
return time.Unix(0, val).UTC(), nil
}
slog.Error("Spanner staleness timestamp not available")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This error log will trigger for every query if the staleness timestamp is unavailable. Since the system handles the fallback to StrongRead gracefully, consider reducing the log level to Debug to avoid log spam, ensuring that failures in pre-processors or background tasks do not clutter logs while still allowing for local fallback processing.

Suggested change
slog.Error("Spanner staleness timestamp not available")
slog.Debug("Spanner staleness timestamp not available")
References
  1. Implement graceful error handling in pre-processors by logging failures and falling back to local processing, ensuring the entire request does not fail.

if err := row.Column(i, &val); err != nil {
return rowData, err
}
data = append(data, val.Value.GetStringValue())
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Avoid relying on generic string conversions like GetStringValue() for numeric types returned by the database (e.g., INT64 in Spanner), as this can lead to errors. Instead, explicitly cast the value to a string within the SQL query or implement a custom row handler that directly processes numeric types to ensure data integrity.

References
  1. When processing query results involving aggregate functions or numeric types, ensure the data type is handled correctly. Avoid generic string conversions; explicitly cast in SQL or use a custom row handler.

Comment on lines +42 to +53
func NewNormalizedClient(client SpannerClient) *normalizedSchemaClient {
var conn *SpannerConnector
if sc, ok := client.(*standardSpannerClient); ok {
conn = sc.exec
} else if sc, ok := client.(*selectorClient); ok {
if std, ok := sc.SpannerClient.(*standardSpannerClient); ok {
conn = std.exec
}
}
if conn == nil {
panic(fmt.Sprintf("NewNormalizedClient: unexpected client type %T", client))
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

NewNormalizedClient relies on brittle type assertions to extract the SpannerConnector. This prevents the use of mock clients and could cause a panic if the interface is wrapped. For recoverable errors or configuration issues, prefer returning an error to allow for graceful handling rather than risking a panic.

Suggested change
func NewNormalizedClient(client SpannerClient) *normalizedSchemaClient {
var conn *SpannerConnector
if sc, ok := client.(*standardSpannerClient); ok {
conn = sc.exec
} else if sc, ok := client.(*selectorClient); ok {
if std, ok := sc.SpannerClient.(*standardSpannerClient); ok {
conn = std.exec
}
}
if conn == nil {
panic(fmt.Sprintf("NewNormalizedClient: unexpected client type %T", client))
}
func NewNormalizedClient(connector *SpannerConnector) *NormalizedClient {
References
  1. For recoverable errors, prefer returning an error to allow graceful handling rather than panicking.

response := map[string]*FilteredStatVarGroupNode{}
errGroup, errCtx := errgroup.WithContext(ctx)
errGroup.SetLimit(maxConcurrentFilteredSVGGoroutines) // Limit the number of concurrent goroutines to avoid overwhelming Spanner with too many requests.
errGroup.SetLimit(maxConcurrentFilteredSVGGoroutines)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The comment explaining the purpose of the concurrency limit was removed. It is helpful to keep this context, as adding a concurrency limit when using errgroup for parallel operations is recommended to prevent overwhelming the system with too many simultaneous requests.

Suggested change
errGroup.SetLimit(maxConcurrentFilteredSVGGoroutines)
errGroup.SetLimit(maxConcurrentFilteredSVGGoroutines) // Limit the number of concurrent goroutines to avoid overwhelming Spanner with too many requests.
References
  1. When using errgroup for parallel operations, consider adding a concurrency limit to prevent overwhelming the system.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant