Skip to content

Commit 8991692

Browse files
authored
Add inventory_with_opentelemetry example for the couchbase-opentelemetry gem (#222)
Port of the C++ couchbase-cxx-client-demo inventory example. Wires RequestTracer and Meter into Couchbase::Options::Cluster, ships traces and metrics to a local OTel Collector via OTLP/HTTP, and installs a catch-all histogram View with second-scale bucket boundaries calibrated for Couchbase KV latency. All settings are overridable via environment variables. Includes a README.md with setup instructions, environment variable reference, signal flow diagram, and guidance on reading results in Jaeger, Prometheus, and Grafana; links to the telemetry-cluster Docker Compose stack in couchbase-cxx-client-demo.
1 parent 1503747 commit 8991692

File tree

3 files changed

+719
-0
lines changed

3 files changed

+719
-0
lines changed
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
# frozen_string_literal: true
2+
3+
source "https://rubygems.org"
4+
5+
gem "couchbase"
6+
gem "couchbase-opentelemetry"
7+
8+
gem "opentelemetry-exporter-otlp"
9+
gem "opentelemetry-exporter-otlp-metrics"
10+
gem "opentelemetry-metrics-sdk"
11+
gem "opentelemetry-sdk"
Lines changed: 198 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,198 @@
1+
# Couchbase Ruby SDK — OpenTelemetry examples
2+
3+
## inventory_with_opentelemetry.rb
4+
5+
Demonstrates how to instrument a Couchbase Ruby application with OpenTelemetry
6+
distributed tracing and metrics, and how to ship both signals to a local
7+
observability stack (Jaeger, Prometheus, Grafana) via an OTel Collector.
8+
9+
### OpenTelemetry integration with the Couchbase Ruby SDK
10+
11+
The SDK exposes two hook points in `Couchbase::Options::Cluster`:
12+
13+
**Tracing** (`Couchbase::OpenTelemetry::RequestTracer`)
14+
15+
Wraps an `OpenTelemetry::Trace::TracerProvider`. Installed via:
16+
17+
```ruby
18+
options = Couchbase::Options::Cluster.new(tracer: request_tracer)
19+
```
20+
21+
Every SDK operation — upsert, get, query, etc. — creates a child span under the
22+
`parent_span` supplied at call time (e.g. `Couchbase::Options::Upsert.new(parent_span: cb_parent)`).
23+
Child spans are annotated with the bucket, scope, collection, and internal timing
24+
(encode / dispatch / decode).
25+
26+
**Metrics** (`Couchbase::OpenTelemetry::Meter`)
27+
28+
Wraps an `OpenTelemetry::Metrics::MeterProvider`. Installed via:
29+
30+
```ruby
31+
options = Couchbase::Options::Cluster.new(meter: sdk_meter)
32+
```
33+
34+
The SDK records per-operation latency histograms (`db.client.operation.duration`,
35+
unit `"s"`) and retry/timeout counters, all labelled by bucket, scope, collection,
36+
and operation type. A `PeriodicMetricReader` (default interval: 5 s) pushes those
37+
measurements to the configured OTLP endpoint.
38+
39+
**Histogram bucket calibration** — The OTel SDK's built-in default histogram
40+
boundaries are calibrated for millisecond values. For second-valued Couchbase
41+
histograms — where a well-connected operation typically completes in under 10 ms —
42+
almost every sample would land in the first bucket, making p50/p99 estimates
43+
meaningless. The example installs a process-wide catch-all View that replaces
44+
those defaults with eight boundaries spanning 100 µs to 10 s, matching the
45+
Couchbase Java SDK's canonical nanosecond recommendation scaled to seconds:
46+
47+
| Java SDK (ns) | Ruby SDK (s) | Human-readable |
48+
|----------------|--------------|----------------|
49+
| 100 000 | 0.0001 | 100 µs |
50+
| 250 000 | 0.00025 | 250 µs |
51+
| 500 000 | 0.0005 | 500 µs |
52+
| 1 000 000 | 0.001 | 1 ms |
53+
| 10 000 000 | 0.01 | 10 ms |
54+
| 100 000 000 | 0.1 | 100 ms |
55+
| 1 000 000 000 | 1.0 | 1 s |
56+
| 10 000 000 000 | 10.0 | 10 s |
57+
58+
Both providers use an AlwaysOn sampler / cumulative aggregation and export via
59+
OTLP/HTTP to the OTel Collector. `force_flush` is called explicitly before exit
60+
so no spans or metrics are dropped.
61+
62+
> **Note** — AlwaysOn sampling (100 %) is fine for demos and development but is
63+
> rarely appropriate in production. Consider `ParentBased(TraceIdRatioBased(N))`
64+
> for head-based probabilistic sampling or a tail-based sampler in the Collector.
65+
66+
> **Note** — The Couchbase Ruby SDK currently supports only metrics and traces.
67+
> It does not emit logs via OpenTelemetry. The Loki and Promtail containers in
68+
> the telemetry stack are present for completeness but receive no data from this
69+
> example.
70+
71+
### Signal flow
72+
73+
```
74+
This program
75+
│ OTLP/HTTP http://localhost:4318/v1/traces (traces)
76+
│ OTLP/HTTP http://localhost:4318/v1/metrics (metrics)
77+
78+
OpenTelemetry Collector (telemetry-cluster/otel-collector-config.yaml)
79+
│ traces ── OTLP/gRPC ──► Jaeger (port 16686)
80+
│ metrics ── Prometheus scrape endpoint :8889 ──► Prometheus (port 9090)
81+
82+
Jaeger http://localhost:16686 — distributed trace viewer
83+
Prometheus http://localhost:9090 — time-series metrics store
84+
Grafana http://localhost:3000 — unified dashboards (queries both)
85+
```
86+
87+
### Quick-start
88+
89+
#### 1. Start the observability stack
90+
91+
The Docker Compose files for the telemetry stack live in the
92+
[couchbase-cxx-client-demo](https://github.com/couchbaselabs/couchbase-cxx-client-demo/tree/main/telemetry-cluster)
93+
repository. Clone it and start the stack:
94+
95+
```sh
96+
git clone https://github.com/couchbaselabs/couchbase-cxx-client-demo.git
97+
cd couchbase-cxx-client-demo/telemetry-cluster
98+
docker compose up -d
99+
```
100+
101+
Containers started: `otel-collector`, `jaeger`, `prometheus`, `loki`,
102+
`promtail`, `grafana`. Allow ~10 s for all services to become healthy.
103+
104+
#### 2. Install dependencies
105+
106+
```sh
107+
cd couchbase-opentelemetry/examples
108+
bundle install
109+
```
110+
111+
#### 3. Run the example
112+
113+
```sh
114+
CONNECTION_STRING=couchbase://127.0.0.1 \
115+
USER_NAME=Administrator \
116+
PASSWORD=password \
117+
BUCKET_NAME=default \
118+
bundle exec ruby inventory_with_opentelemetry.rb
119+
```
120+
121+
The OTLP endpoints default to `http://localhost:4318/v1/{traces,metrics}`,
122+
pointing at the OTel Collector started above.
123+
124+
### Environment variables
125+
126+
| Variable | Default | Description |
127+
|---|---|---|
128+
| `CONNECTION_STRING` | `couchbase://127.0.0.1` | Couchbase connection string |
129+
| `USER_NAME` | `Administrator` | RBAC username |
130+
| `PASSWORD` | `password` | RBAC password |
131+
| `BUCKET_NAME` | `default` | Bucket to write into |
132+
| `SCOPE_NAME` | `_default` | Scope within the bucket |
133+
| `COLLECTION_NAME` | `_default` | Collection within the scope |
134+
| `PROFILE` | _(none)_ | SDK connection profile, e.g. `wan_development` |
135+
| `NUM_ITERATIONS` | `1000` | Number of upsert+get loop iterations |
136+
| `VERBOSE` | `false` | `true` — enable Couchbase SDK trace-level logging to stderr |
137+
| `OTEL_VERBOSE` | `false` | `true` — print OTel SDK internal warnings/errors to stderr |
138+
| `OTEL_TRACES_ENDPOINT` | `http://localhost:4318/v1/traces` | OTLP HTTP endpoint for traces |
139+
| `OTEL_METRICS_ENDPOINT` | `http://localhost:4318/v1/metrics` | OTLP HTTP endpoint for metrics |
140+
| `OTEL_METRICS_READER_EXPORT_INTERVAL_MS` | `5000` | How often the metric reader collects and exports |
141+
| `OTEL_METRICS_READER_EXPORT_TIMEOUT_MS` | `10000` | Timeout for a single metric export call |
142+
143+
### Where to see the results
144+
145+
**Traces → Jaeger UI** `http://localhost:16686`
146+
147+
1. Open the Jaeger UI in a browser.
148+
2. In the **Service** drop-down select `inventory-service`.
149+
3. Click **Find Traces**.
150+
4. Open an `update-inventory` trace. The hierarchy looks like:
151+
152+
```
153+
update-inventory ← top-level span (this program)
154+
upsert ← SDK upsert operation
155+
request_encoding ← document serialization
156+
dispatch_to_server ← server round-trip
157+
get ← SDK get operation
158+
dispatch_to_server ← server round-trip
159+
```
160+
161+
Operation spans (`upsert`, `get`) carry: `db.system.name`, `db.namespace`,
162+
`db.operation.name`, `couchbase.collection.name`, `couchbase.scope.name`,
163+
`couchbase.service`, `couchbase.retries`.
164+
165+
`dispatch_to_server` spans carry: `network.peer.address`, `network.peer.port`,
166+
`network.transport`, `server.address`, `server.port`, `couchbase.operation_id`,
167+
`couchbase.server_duration`, `couchbase.local_id`.
168+
169+
**Metrics → Prometheus** `http://localhost:9090`
170+
171+
The OTel Collector exposes a Prometheus scrape endpoint on `:8889`; Prometheus
172+
scrapes it every 15 s (`telemetry-cluster/prometheus.yml`).
173+
174+
The Couchbase Ruby SDK records:
175+
176+
```
177+
db_client_operation_duration_seconds_bucket — per-bucket sample counts (use for percentiles)
178+
db_client_operation_duration_seconds_sum — cumulative latency across all operations
179+
db_client_operation_duration_seconds_count — total number of completed operations
180+
```
181+
182+
Each series is labelled with the service type (`kv`, `query`, …) and operation
183+
name (`upsert`, `get`, …).
184+
185+
The example also records two application-level histograms:
186+
187+
```
188+
inventory_demo_iteration_duration_ms — wall-clock duration of each upsert+get iteration
189+
inventory_demo_run_duration_ms — total wall-clock duration of the demo run
190+
```
191+
192+
**Metrics + Traces → Grafana** `http://localhost:3000`
193+
194+
Grafana is pre-provisioned (anonymous Admin, no login required) with Prometheus
195+
and Jaeger as data sources.
196+
197+
- **Explore → Prometheus**: query SDK metrics by name or label.
198+
- **Explore → Jaeger**: search traces by service `inventory-service`.

0 commit comments

Comments
 (0)