Commit cf6e3cc
Add Go-specific telemetry design document (#298)
## Summary
This PR adds a comprehensive telemetry design document specifically
adapted for the `databricks-sql-go` driver. The design was transformed
from a C#/.NET ADBC driver design to follow Go best practices and
idiomatic patterns.
## Go-Specific Adaptations
This design document has been completely rewritten to align with Go
conventions and the existing codebase patterns:
### 1. **Replaced C#/.NET Concepts with Go Equivalents**
| C#/.NET Pattern | Go Pattern |
|-----------------|------------|
| `Activity`/`ActivitySource` | `context.Context` + middleware
interceptors |
| `ActivityListener` | Custom telemetry interceptor pattern |
| `async`/`await` | Goroutines and channels |
| `ConcurrentDictionary` | `map` with `sync.RWMutex` |
| `IDisposable` | `Close()` methods |
| C# namespaces | Go packages |
### 2. **Applied Go Naming Conventions**
- **Unexported types**: `featureFlagCache`, `clientManager`,
`metricsAggregator` (lowercase for internal types)
- **Exported functions**: Following Go conventions for public APIs
- **Idiomatic names**: `mu` for mutex, `cfg` for config, `ctx` for
context
- **Package naming**: Single lowercase word (`telemetry`)
### 3. **Idiomatic Go Code Patterns**
#### Concurrency & Thread Safety
\`\`\`go
// Singleton with sync.Once
var (
managerOnce sync.Once
managerInstance *clientManager
)
func getClientManager() *clientManager {
managerOnce.Do(func() {
managerInstance = &clientManager{
clients: make(map[string]*clientHolder),
}
})
return managerInstance
}
// Thread-safe operations with RWMutex
func (m *clientManager) getOrCreateClient(host string, ...)
*telemetryClient {
m.mu.Lock()
defer m.mu.Unlock()
// ...
}
\`\`\`
#### Context Propagation
\`\`\`go
// Context-based metric collection
func (i *interceptor) beforeExecute(ctx context.Context, statementID
string) context.Context {
mc := &metricContext{
statementID: statementID,
startTime: time.Now(),
tags: make(map[string]interface{}),
}
return withMetricContext(ctx, mc)
}
\`\`\`
#### Error Handling
\`\`\`go
// Defer/recover pattern for error swallowing
func recoverAndLog(operation string) {
if r := recover(); r != nil {
// Log at trace level only
}
}
func (i *interceptor) afterExecute(ctx context.Context, err error) {
defer recoverAndLog("afterExecute")
// Telemetry logic
}
\`\`\`
### 4. **Async Patterns with Goroutines**
\`\`\`go
// Background flush loop
func (agg *metricsAggregator) flushLoop() {
ticker := time.NewTicker(agg.flushInterval)
defer ticker.Stop()
for {
select {
case <-ticker.C:
agg.flush(context.Background())
case <-agg.stopCh:
return
}
}
}
// Async export
go func() {
defer recoverAndLog("export")
agg.exporter.export(ctx, metrics)
}()
\`\`\`
### 5. **Standard Library Integration**
- **\`net/http\`**: HTTP client for telemetry export
- **\`context.Context\`**: Cancellation and deadline propagation
- **\`time\`**: Timers, tickers, and duration handling
- **\`sync\`**: Mutexes, WaitGroups, and Once
- **\`encoding/json\`**: Metric serialization
### 6. **Driver Integration Points**
#### In \`connector.go\`
\`\`\`go
func (c *connector) Connect(ctx context.Context) (driver.Conn, error) {
// ... existing code ...
if c.cfg.telemetryEnabled {
conn.telemetry = newTelemetryInterceptor(conn.id, c.cfg)
conn.telemetry.recordConnection(ctx, tags)
}
return conn, nil
}
\`\`\`
#### In \`statement.go\`
\`\`\`go
func (s *stmt) QueryContext(ctx context.Context, args
[]driver.NamedValue) (driver.Rows, error) {
if s.conn.telemetry != nil {
ctx = s.conn.telemetry.beforeExecute(ctx, statementID)
defer func() {
s.conn.telemetry.afterExecute(ctx, err)
}()
}
// ... existing implementation ...
}
\`\`\`
### 7. **Testing Strategy**
- **Unit tests**: Standard \`*testing.T\` patterns
- **Integration tests**: Using \`testing.Short()\` for skip flags
- **Benchmarks**: \`BenchmarkXxx\` functions to measure overhead
- **Table-driven tests**: Go idiomatic test patterns
\`\`\`go
func BenchmarkInterceptor_Overhead(b *testing.B) {
// ... setup ...
b.ResetTimer()
for i := 0; i < b.N; i++ {
ctx = interceptor.beforeExecute(ctx, "stmt-123")
interceptor.afterExecute(ctx, nil)
}
}
\`\`\`
## Key Design Features
### Per-Host Resource Management
- **Feature Flag Cache**: Singleton per host with reference counting
(15min TTL)
- **Telemetry Client**: One shared client per host to prevent rate
limiting
- **Circuit Breaker**: Per-host protection against failing endpoints
### Privacy & Security
- ✅ No PII collected (no SQL queries, user data, or credentials)
- ✅ Tag filtering ensures only approved metrics exported
- ✅ All sensitive info excluded from Databricks export
### Reliability
- ✅ All telemetry errors swallowed (never impacts driver)
- ✅ Circuit breaker prevents cascade failures
- ✅ Graceful shutdown with proper resource cleanup
- ✅ Terminal vs retryable error classification
## File Structure
\`\`\`
telemetry/
├── DESIGN.md # This comprehensive design document
├── config.go # Configuration types
├── tags.go # Tag definitions and filtering
├── featureflag.go # Per-host feature flag caching
├── manager.go # Per-host client management
├── circuitbreaker.go # Circuit breaker implementation
├── interceptor.go # Telemetry interceptor
├── aggregator.go # Metrics aggregation
├── exporter.go # Export to Databricks
├── client.go # Telemetry client
├── errors.go # Error classification
└── *_test.go # Test files
\`\`\`
## Alignment with Existing Codebase
This design follows patterns observed in:
- \`connection.go\`: Connection lifecycle management
- \`connector.go\`: Factory patterns and options
- \`internal/config/config.go\`: Configuration structures
- \`internal/client/client.go\`: HTTP client patterns
## Next Steps
This is a **design document only**. Implementation will be tracked in
separate PRs following the implementation checklist in the design.
## Related Work
- Based on JDBC driver telemetry implementation patterns
- Adapted from C#/.NET ADBC driver design
- Follows Go best practices and standard library patterns
---
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>1 file changed
+1818
-0
lines changed
0 commit comments