|
| 1 | +# Couchbase Ruby SDK — OpenTelemetry examples |
| 2 | + |
| 3 | +## inventory_with_opentelemetry.rb |
| 4 | + |
| 5 | +Demonstrates how to instrument a Couchbase Ruby application with OpenTelemetry |
| 6 | +distributed tracing and metrics, and how to ship both signals to a local |
| 7 | +observability stack (Jaeger, Prometheus, Grafana) via an OTel Collector. |
| 8 | + |
| 9 | +### OpenTelemetry integration with the Couchbase Ruby SDK |
| 10 | + |
| 11 | +The SDK exposes two hook points in `Couchbase::Options::Cluster`: |
| 12 | + |
| 13 | +**Tracing** (`Couchbase::OpenTelemetry::RequestTracer`) |
| 14 | + |
| 15 | +Wraps an `OpenTelemetry::Trace::TracerProvider`. Installed via: |
| 16 | + |
| 17 | +```ruby |
| 18 | +options = Couchbase::Options::Cluster.new(tracer: request_tracer) |
| 19 | +``` |
| 20 | + |
| 21 | +Every SDK operation — upsert, get, query, etc. — creates a child span under the |
| 22 | +`parent_span` supplied at call time (e.g. `Couchbase::Options::Upsert.new(parent_span: cb_parent)`). |
| 23 | +Child spans are annotated with the bucket, scope, collection, and internal timing |
| 24 | +(encode / dispatch / decode). |
| 25 | + |
| 26 | +**Metrics** (`Couchbase::OpenTelemetry::Meter`) |
| 27 | + |
| 28 | +Wraps an `OpenTelemetry::Metrics::MeterProvider`. Installed via: |
| 29 | + |
| 30 | +```ruby |
| 31 | +options = Couchbase::Options::Cluster.new(meter: sdk_meter) |
| 32 | +``` |
| 33 | + |
| 34 | +The SDK records per-operation latency histograms (`db.client.operation.duration`, |
| 35 | +unit `"s"`) and retry/timeout counters, all labelled by bucket, scope, collection, |
| 36 | +and operation type. A `PeriodicMetricReader` (default interval: 5 s) pushes those |
| 37 | +measurements to the configured OTLP endpoint. |
| 38 | + |
| 39 | +**Histogram bucket calibration** — The OTel SDK's built-in default histogram |
| 40 | +boundaries are calibrated for millisecond values. For second-valued Couchbase |
| 41 | +histograms — where a well-connected operation typically completes in under 10 ms — |
| 42 | +almost every sample would land in the first bucket, making p50/p99 estimates |
| 43 | +meaningless. The example installs a process-wide catch-all View that replaces |
| 44 | +those defaults with eight boundaries spanning 100 µs to 10 s, matching the |
| 45 | +Couchbase Java SDK's canonical nanosecond recommendation scaled to seconds: |
| 46 | + |
| 47 | +| Java SDK (ns) | Ruby SDK (s) | Human-readable | |
| 48 | +|----------------|--------------|----------------| |
| 49 | +| 100 000 | 0.0001 | 100 µs | |
| 50 | +| 250 000 | 0.00025 | 250 µs | |
| 51 | +| 500 000 | 0.0005 | 500 µs | |
| 52 | +| 1 000 000 | 0.001 | 1 ms | |
| 53 | +| 10 000 000 | 0.01 | 10 ms | |
| 54 | +| 100 000 000 | 0.1 | 100 ms | |
| 55 | +| 1 000 000 000 | 1.0 | 1 s | |
| 56 | +| 10 000 000 000 | 10.0 | 10 s | |
| 57 | + |
| 58 | +Both providers use an AlwaysOn sampler / cumulative aggregation and export via |
| 59 | +OTLP/HTTP to the OTel Collector. `force_flush` is called explicitly before exit |
| 60 | +so no spans or metrics are dropped. |
| 61 | + |
| 62 | +> **Note** — AlwaysOn sampling (100 %) is fine for demos and development but is |
| 63 | +> rarely appropriate in production. Consider `ParentBased(TraceIdRatioBased(N))` |
| 64 | +> for head-based probabilistic sampling or a tail-based sampler in the Collector. |
| 65 | +
|
| 66 | +> **Note** — The Couchbase Ruby SDK currently supports only metrics and traces. |
| 67 | +> It does not emit logs via OpenTelemetry. The Loki and Promtail containers in |
| 68 | +> the telemetry stack are present for completeness but receive no data from this |
| 69 | +> example. |
| 70 | +
|
| 71 | +### Signal flow |
| 72 | + |
| 73 | +``` |
| 74 | +This program |
| 75 | + │ OTLP/HTTP http://localhost:4318/v1/traces (traces) |
| 76 | + │ OTLP/HTTP http://localhost:4318/v1/metrics (metrics) |
| 77 | + ▼ |
| 78 | +OpenTelemetry Collector (telemetry-cluster/otel-collector-config.yaml) |
| 79 | + │ traces ── OTLP/gRPC ──► Jaeger (port 16686) |
| 80 | + │ metrics ── Prometheus scrape endpoint :8889 ──► Prometheus (port 9090) |
| 81 | + ▼ |
| 82 | +Jaeger http://localhost:16686 — distributed trace viewer |
| 83 | +Prometheus http://localhost:9090 — time-series metrics store |
| 84 | +Grafana http://localhost:3000 — unified dashboards (queries both) |
| 85 | +``` |
| 86 | + |
| 87 | +### Quick-start |
| 88 | + |
| 89 | +#### 1. Start the observability stack |
| 90 | + |
| 91 | +The Docker Compose files for the telemetry stack live in the |
| 92 | +[couchbase-cxx-client-demo](https://github.com/couchbaselabs/couchbase-cxx-client-demo/tree/main/telemetry-cluster) |
| 93 | +repository. Clone it and start the stack: |
| 94 | + |
| 95 | +```sh |
| 96 | +git clone https://github.com/couchbaselabs/couchbase-cxx-client-demo.git |
| 97 | +cd couchbase-cxx-client-demo/telemetry-cluster |
| 98 | +docker compose up -d |
| 99 | +``` |
| 100 | + |
| 101 | +Containers started: `otel-collector`, `jaeger`, `prometheus`, `loki`, |
| 102 | +`promtail`, `grafana`. Allow ~10 s for all services to become healthy. |
| 103 | + |
| 104 | +#### 2. Install dependencies |
| 105 | + |
| 106 | +```sh |
| 107 | +cd couchbase-opentelemetry/examples |
| 108 | +bundle install |
| 109 | +``` |
| 110 | + |
| 111 | +#### 3. Run the example |
| 112 | + |
| 113 | +```sh |
| 114 | +CONNECTION_STRING=couchbase://127.0.0.1 \ |
| 115 | +USER_NAME=Administrator \ |
| 116 | +PASSWORD=password \ |
| 117 | +BUCKET_NAME=default \ |
| 118 | + bundle exec ruby inventory_with_opentelemetry.rb |
| 119 | +``` |
| 120 | + |
| 121 | +The OTLP endpoints default to `http://localhost:4318/v1/{traces,metrics}`, |
| 122 | +pointing at the OTel Collector started above. |
| 123 | + |
| 124 | +### Environment variables |
| 125 | + |
| 126 | +| Variable | Default | Description | |
| 127 | +|---|---|---| |
| 128 | +| `CONNECTION_STRING` | `couchbase://127.0.0.1` | Couchbase connection string | |
| 129 | +| `USER_NAME` | `Administrator` | RBAC username | |
| 130 | +| `PASSWORD` | `password` | RBAC password | |
| 131 | +| `BUCKET_NAME` | `default` | Bucket to write into | |
| 132 | +| `SCOPE_NAME` | `_default` | Scope within the bucket | |
| 133 | +| `COLLECTION_NAME` | `_default` | Collection within the scope | |
| 134 | +| `PROFILE` | _(none)_ | SDK connection profile, e.g. `wan_development` | |
| 135 | +| `NUM_ITERATIONS` | `1000` | Number of upsert+get loop iterations | |
| 136 | +| `VERBOSE` | `false` | `true` — enable Couchbase SDK trace-level logging to stderr | |
| 137 | +| `OTEL_VERBOSE` | `false` | `true` — print OTel SDK internal warnings/errors to stderr | |
| 138 | +| `OTEL_TRACES_ENDPOINT` | `http://localhost:4318/v1/traces` | OTLP HTTP endpoint for traces | |
| 139 | +| `OTEL_METRICS_ENDPOINT` | `http://localhost:4318/v1/metrics` | OTLP HTTP endpoint for metrics | |
| 140 | +| `OTEL_METRICS_READER_EXPORT_INTERVAL_MS` | `5000` | How often the metric reader collects and exports | |
| 141 | +| `OTEL_METRICS_READER_EXPORT_TIMEOUT_MS` | `10000` | Timeout for a single metric export call | |
| 142 | + |
| 143 | +### Where to see the results |
| 144 | + |
| 145 | +**Traces → Jaeger UI** `http://localhost:16686` |
| 146 | + |
| 147 | +1. Open the Jaeger UI in a browser. |
| 148 | +2. In the **Service** drop-down select `inventory-service`. |
| 149 | +3. Click **Find Traces**. |
| 150 | +4. Open an `update-inventory` trace. The hierarchy looks like: |
| 151 | + |
| 152 | + ``` |
| 153 | + update-inventory ← top-level span (this program) |
| 154 | + upsert ← SDK upsert operation |
| 155 | + request_encoding ← document serialization |
| 156 | + dispatch_to_server ← server round-trip |
| 157 | + get ← SDK get operation |
| 158 | + dispatch_to_server ← server round-trip |
| 159 | + ``` |
| 160 | + |
| 161 | + Operation spans (`upsert`, `get`) carry: `db.system.name`, `db.namespace`, |
| 162 | + `db.operation.name`, `couchbase.collection.name`, `couchbase.scope.name`, |
| 163 | + `couchbase.service`, `couchbase.retries`. |
| 164 | + |
| 165 | + `dispatch_to_server` spans carry: `network.peer.address`, `network.peer.port`, |
| 166 | + `network.transport`, `server.address`, `server.port`, `couchbase.operation_id`, |
| 167 | + `couchbase.server_duration`, `couchbase.local_id`. |
| 168 | + |
| 169 | +**Metrics → Prometheus** `http://localhost:9090` |
| 170 | + |
| 171 | +The OTel Collector exposes a Prometheus scrape endpoint on `:8889`; Prometheus |
| 172 | +scrapes it every 15 s (`telemetry-cluster/prometheus.yml`). |
| 173 | + |
| 174 | +The Couchbase Ruby SDK records: |
| 175 | + |
| 176 | +``` |
| 177 | +db_client_operation_duration_seconds_bucket — per-bucket sample counts (use for percentiles) |
| 178 | +db_client_operation_duration_seconds_sum — cumulative latency across all operations |
| 179 | +db_client_operation_duration_seconds_count — total number of completed operations |
| 180 | +``` |
| 181 | + |
| 182 | +Each series is labelled with the service type (`kv`, `query`, …) and operation |
| 183 | +name (`upsert`, `get`, …). |
| 184 | + |
| 185 | +The example also records two application-level histograms: |
| 186 | + |
| 187 | +``` |
| 188 | +inventory_demo_iteration_duration_ms — wall-clock duration of each upsert+get iteration |
| 189 | +inventory_demo_run_duration_ms — total wall-clock duration of the demo run |
| 190 | +``` |
| 191 | + |
| 192 | +**Metrics + Traces → Grafana** `http://localhost:3000` |
| 193 | + |
| 194 | +Grafana is pre-provisioned (anonymous Admin, no login required) with Prometheus |
| 195 | +and Jaeger as data sources. |
| 196 | + |
| 197 | +- **Explore → Prometheus**: query SDK metrics by name or label. |
| 198 | +- **Explore → Jaeger**: search traces by service `inventory-service`. |
0 commit comments