Skip to content

feat: Integrate OpenTelemetry for distributed tracing and metrics#186

Open
KitAndrew wants to merge 7 commits into
crossoverJie:masterfrom
KitAndrew:feat/opentelemetry-integration
Open

feat: Integrate OpenTelemetry for distributed tracing and metrics#186
KitAndrew wants to merge 7 commits into
crossoverJie:masterfrom
KitAndrew:feat/opentelemetry-integration

Conversation

@KitAndrew
Copy link
Copy Markdown

@KitAndrew KitAndrew commented Feb 11, 2026

What

Integrate OpenTelemetry into cim-server and cim-forward-route modules to enable distributed tracing and business metrics collection, as listed in the TODO.

Why

As CIM is a distributed IM system with multiple services (server, route), it's essential to have end-to-end tracing and metrics to monitor message flow, identify bottlenecks, and debug issues in production.

Changes

Dependencies (pom.xml)

  • Add micrometer-tracing-bridge-otel, opentelemetry-exporter-otlp, opentelemetry-api, opentelemetry-sdk, micrometer-registry-otlp to root and module POMs.

cim-server

  • New OtelConfig.java: Initialize OpenTelemetry SDK, configure OTLP exporter to Jaeger, register custom metrics.
  • Modified CIMServerHandle.java: Add tracing spans for LOGIN_REQUEST and PING commands with metrics counters.
  • Modified CIMServer.java: Add tracing span for sendMsg with message pushed counter.
  • Modified application.yaml: Add OTLP endpoints and Actuator metrics exposure.

cim-forward-route

  • New OtelConfig.java: Initialize OpenTelemetry SDK with route-specific metrics (group/p2p message counters, login/register counters).
  • Modified RouteController.java: Add tracing spans for groupRoute, p2pRoute, login with corresponding metrics.
  • Modified application.yaml: Add OTLP endpoints and Actuator metrics exposure.

Docker

  • Modified allin1-ubuntu.Dockerfile: Add Jaeger all-in-one binary.
  • Modified supervisord.conf: Add Jaeger service, server/route wait for Jaeger OTLP port (4317) before starting.

Custom Metrics

Module Metric Type
cim-server cim.server.online.users Gauge
cim-server cim.server.messages.pushed Counter
cim-server cim.server.heartbeat.count Counter
cim-server cim.server.login.count Counter
cim-forward-route cim.route.messages.group Counter
cim-forward-route cim.route.messages.p2p Counter
cim-forward-route cim.route.login.count Counter
cim-forward-route cim.route.register.count Counter

How to verify

  1. Start the all-in-one Docker container (includes Jaeger).
  2. Access Jaeger UI at http://localhost:16686.
  3. Select service cim-server or cim-forward-route to view traces.
  4. Access metrics via Actuator: http://localhost:8081/actuator/metrics/cim.server.online.users.

Summary by CodeRabbit

  • New Features

    • Added optional distributed tracing and metrics (OTLP/Jaeger) and instrumentation for login, heartbeat, message push and routing flows.
  • Documentation

    • Updated README(s) to mark OpenTelemetry integration as completed.
  • Chores

    • Added runtime configuration switches and container updates to optionally run Jaeger and expose OTLP/tracing ports.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Feb 11, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds OpenTelemetry tracing and Micrometer OTLP metrics: dependency additions, new OtelConfig classes, Tracer and Counter/Gauge beans, instrumentation of route and server flows with spans and counters, config toggles in application YAMLs, and optional Jaeger runtime entries in Docker files.

Changes

Cohort / File(s) Summary
Documentation
README.md, README-zh.md
Marked OpenTelemetry integration item as completed (checkbox toggled).
Root dependency management
pom.xml
Added opentelemetry and micrometer version properties and dependencyManagement entries for OTLP/tracing libraries.
Module dependencies
cim-forward-route/pom.xml, cim-server/pom.xml
Added dependencies: micrometer-tracing-bridge-otel, opentelemetry-api, opentelemetry-sdk, opentelemetry-exporter-otlp, micrometer-registry-otlp.
OpenTelemetry config classes
cim-forward-route/src/.../config/OtelConfig.java, cim-server/src/.../config/OtelConfig.java
New Spring configs creating OpenTelemetry SDK, OTLP gRPC exporter, BatchSpanProcessor, Tracer beans, Micrometer Counters/Gauges, resource attributes, and graceful shutdown hooks (enabled via cim.otel.enabled).
Controller instrumentation
cim-forward-route/src/.../controller/RouteController.java
Injected Tracer and Counter beans; wrapped groupRoute, p2pRoute, login, registerAccount flows with spans (attributes, status, exception recording) and incremented counters.
Server instrumentation
cim-server/src/.../handle/CIMServerHandle.java, cim-server/src/.../server/CIMServer.java
Added tracer usage and spans for LOGIN/PING and message push flows; incremented login/heartbeat/message counters; added lazy tracer lookup helper and minor lifecycle/channel-close adjustments.
Application configs
cim-forward-route/src/.../application.yaml, cim-server/src/.../application.yaml
Added cim.otel.enabled flag, OTLP tracing/metrics endpoints (HTTP and gRPC), sampling probability, and actuator exposure settings; env var overrides supported.
Docker / runtime
docker/allin1-ubuntu.Dockerfile, docker/supervisord.conf
Added build args and ENV for enabling OTEL, Jaeger all-in-one install/startup (gated by ENABLE_OTEL), exposed OTLP/Jaeger ports, and supervisord jaeger program; updated server start command to wait for dependencies.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant RouteController
    participant CIMServer
    participant Tracer as OpenTelemetry
    participant OTLP

    Client->>RouteController: request (message / login / register)
    activate RouteController
    RouteController->>Tracer: start span (route)
    RouteController->>CIMServer: forward request
    activate CIMServer
    CIMServer->>Tracer: start span (server)
    CIMServer->>CIMServer: write socket / update session / counters++
    CIMServer->>Tracer: set attributes, set status, end span
    deactivate CIMServer
    RouteController->>Tracer: set attributes, set status, end span
    deactivate RouteController
    Tracer->>OTLP: BatchSpanProcessor exports spans
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • Feat/offline msg #170: Modifies cim-forward-route files (including RouteController and pom) and is closely related to these tracing/instrumentation changes.

Suggested labels

new features

Suggested reviewers

  • crossoverJie

Poem

🐰 Hopping through spans with a carrot of cheer,

Counters that tally each hop and each call,
Jaeger's warm glow brings traces near,
I nibble metrics as spring starts to sprawl,
A rabbitly dance — observability for all!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 28.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title clearly and accurately summarizes the main change—integrating OpenTelemetry for distributed tracing and metrics across the codebase.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

You can enable review details to help with troubleshooting, context usage and more.

Enable the reviews.review_details setting to include review details such as the model used, the time taken for each step and more in the review comments.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 13

🤖 Fix all issues with AI agents
In
`@cim-forward-route/src/main/java/com/crossoverjie/cim/route/config/OtelConfig.java`:
- Around line 50-52: The endpoint derivation using otlpEndpoint.replace(...) is
fragile; introduce a dedicated gRPC endpoint configuration (e.g., a new property
name like otlpGrpcEndpoint) and use that when building the OtlpGrpcSpanExporter
instead of mutating otlpEndpoint. Update OtelConfig to read the new
otlpGrpcEndpoint (with a sensible default or explicit validation) and construct
OtlpGrpcSpanExporter.builder().setEndpoint(otlpGrpcEndpoint).build(); leave
otlpEndpoint for HTTP exporters and ensure any existing callers/configs are
updated to provide the new property.

In
`@cim-forward-route/src/main/java/com/crossoverjie/cim/route/controller/RouteController.java`:
- Around line 186-189: In RouteController's p2pRoute method catch block for
CIMException you only call span.setStatus(...) and set the response, so
exception details aren't recorded in tracing; add span.recordException(e)
alongside span.setStatus(StatusCode.ERROR, e.getErrorMessage()) (matching the
pattern used in groupRoute and login) so the exception and stack trace are
emitted to Jaeger when p2pRoute fails.
- Around line 235-238: The early return when status != StatusEnum.SUCCESS sets
span attribute "loginStatus" but leaves the span status UNSET; before returning
(in the failure branch in RouteController where you set
span.setAttribute("loginStatus", "failed")), also set the span status to an
error (e.g., span.setStatus(...) with an ERROR/StatusCode.ERROR and optional
message) so the finally block that ends the span records a non-OK status and
failed logins are visible in Jaeger.
- Around line 101-102: The created Span in RouteController (Span span =
tracer.spanBuilder("cim.route.groupMsg").startSpan() inside groupRoute — and
similarly in p2pRoute and login) is never made current, breaking context
propagation; fix by making the span current with span.makeCurrent() (assign to a
Scope, e.g., try-with-resources Scope scope = span.makeCurrent()) before
executing downstream work and ensure the Scope is closed and span.end() is
always called in a finally block so child spans and instrumented libraries
inherit the context.

In `@cim-forward-route/src/main/resources/application.yaml`:
- Around line 73-82: The OTLP metrics property path is incorrect and tracing
sampling is hard-coded; change the metrics config key from
management.otlp.metrics.endpoint to management.otlp.metrics.export.url (per
Spring Boot/Micrometer OTLP) and externalize the tracing sampling probability
(management.tracing.sampling.probability) by reading it from an environment
variable or profile-specific property (e.g.,
${TRACING_SAMPLING_PROBABILITY:0.1}) or a non-dev profile override so production
does not trace 100% of requests.

In `@cim-server/src/main/java/com/crossoverjie/cim/server/config/OtelConfig.java`:
- Around line 47-52: The static Counter fields loginCounter, heartbeatCounter,
and messagePushedCounter in OtelConfig are written during Spring initialization
and later read from Netty event-loop threads, so add the volatile modifier to
each static field declaration to ensure visibility across threads (Lombok's
`@Getter` can stay as-is); update the declarations of loginCounter,
heartbeatCounter, and messagePushedCounter to be private static volatile Counter
... so Netty threads reliably see the initialized non-null counters.

In
`@cim-server/src/main/java/com/crossoverjie/cim/server/handle/CIMServerHandle.java`:
- Around line 93-101: The empty if block after creating Span (Span span in
CIMServerHandle) should actually make the span current: call span.makeCurrent()
and assign to a Scope (e.g., Scope scope) when span != null, and ensure the
Scope is closed and span.end() in the finally block; alternatively remove the
misleading if block—prefer implementing Scope scope = span.makeCurrent() inside
the try and closing scope.close() (and ending span) in the finally to ensure
proper context propagation.

In `@cim-server/src/main/resources/application.yaml`:
- Around line 40-52: The YAML uses an unrecognized property for OTLP metrics:
replace management.otlp.metrics.endpoint with the correct nested property
management.otlp.metrics.export.url so that metrics export is configured as
management -> otlp -> metrics -> export -> url (keep
management.otlp.tracing.endpoint as-is); update the application.yaml to reflect
this property name change so Spring Boot 3.x will actually export metrics to the
OTLP collector.

In `@docker/allin1-ubuntu.Dockerfile`:
- Around line 45-46: The EXPOSE directive currently lists only Jaeger/OTel ports
(16686, 4317, 4318) but omits Zookeeper and Redis; update the EXPOSE instruction
in the Dockerfile (modify the EXPOSE line) to include ports 2181 and 6379 so it
reads to expose 16686 4317 4318 2181 6379, and verify whether port 8083 should
be added based on the application before including it.
- Around line 13-18: The Dockerfile currently hardcodes the Jaeger artifact
"jaeger-1.57.0-linux-amd64.tar.gz" and will fail for non-amd64 builds; change
the step to accept a build ARG TARGETARCH, map TARGETARCH values to the correct
Jaeger archive arch string (e.g., use "amd64" for TARGETARCH=amd64 and "arm64"
for TARGETARCH=arm64, and fail/skip with a clear message for unsupported
TARGETARCH like linux/arm/v7), then use that mapped arch when composing the
download filename and extracting the "jaeger-all-in-one" binary so the RUN
wget/tar/mv/rm sequence fetches the correct platform-specific tarball.

In `@docker/supervisord.conf`:
- Around line 34-38: The supervisord [program:server] startup sequence makes
Jaeger/OTLP a hard dependency by calling /wait-for-it.sh on port 4317 before
launching java -jar /cim-server.jar; update the command in the supervisord
[program:server] (and any analogous [program:cim-route]) to remove the
/wait-for-it.sh -t 30 localhost:4317 check (or if you must keep it, increase the
timeout significantly and add a clear comment documenting that Jaeger is
required), leaving only the Redis (6379) and Zookeeper (2181) waits so the OTLP
exporter can retry/buffer without blocking application start.

In `@pom.xml`:
- Around line 26-28: The POM declares redundant properties
micrometer-tracing.version and micrometer.version (matching Spring Boot 3.3.0
managed versions) and a mismatched opentelemetry.version; remove the
micrometer-tracing.version and micrometer.version properties from the pom.xml so
Spring Boot's dependency management controls those versions, and either change
opentelemetry.version to 1.37.0 to match Spring Boot 3.3.0 or remove the
opentelemetry.version property if you intend to use the Boot-managed version
instead (check for any intentional override before removal).

In `@README.md`:
- Line 25: The checklist item "Integrate OpenTelemetry ." has an extra space
before the period; update that checklist line (the "- [x] Integrate
OpenTelemetry ." entry) to remove the extra space so it reads "Integrate
OpenTelemetry." to correct the punctuation.
🧹 Nitpick comments (6)
docker/supervisord.conf (1)

24-32: Nit: Redundant OTLP enablement — both CLI flag and environment variable are set.

Line 25 passes --collector.otlp.enabled=true and Line 26 sets the equivalent COLLECTOR_OTLP_ENABLED="true" env var. One of them is sufficient.

cim-forward-route/src/main/java/com/crossoverjie/cim/route/config/OtelConfig.java (2)

115-121: tracerProvider.shutdown() is fire-and-forget — pending spans may be lost.

SdkTracerProvider.shutdown() returns a CompletableResultCode. Without awaiting its completion, in-flight spans in the BatchSpanProcessor may not be flushed before the JVM exits.

Proposed fix: await shutdown completion
     `@PreDestroy`
     public void shutdown() {
         if (tracerProvider != null) {
-            tracerProvider.shutdown();
+            tracerProvider.shutdown().join(10, java.util.concurrent.TimeUnit.SECONDS);
             log.info("OpenTelemetry TracerProvider shut down");
         }
     }

1-121: Significant code duplication with cim-server's OtelConfig.

The OTLP SDK initialization, tracer bean creation, and shutdown logic are nearly identical between this class and cim-server/…/OtelConfig.java. Consider extracting a shared base configuration (e.g., in cim-common) parameterized by service.name, with module-specific subclasses or configs only defining their custom metrics.

cim-server/src/main/java/com/crossoverjie/cim/server/config/OtelConfig.java (1)

116-141: @Bean returning a dummy String to trigger metric registration is a code smell.

Returning "metricsRegistered" registers an unrelated String bean in the context and obscures the intent. A @PostConstruct method on this @Configuration class (with MeterRegistry injected as a field) or a SmartInitializingSingleton would be more idiomatic.

Suggested alternative
+    `@Resource`
+    private MeterRegistry meterRegistry;
+
-    `@Bean`
-    public String registerMetrics(MeterRegistry meterRegistry) {
+    `@PostConstruct`
+    public void registerMetrics() {
         Gauge.builder("cim.server.online.users",
                 () -> SessionSocketHolder.getRelationShip().size())
                 .description("Current number of online users connected to this server")
                 .register(meterRegistry);

         loginCounter = Counter.builder("cim.server.login.count")
                 .description("Total number of login requests processed")
                 .register(meterRegistry);

         heartbeatCounter = Counter.builder("cim.server.heartbeat.count")
                 .description("Total number of heartbeat (PING) messages received")
                 .register(meterRegistry);

         messagePushedCounter = Counter.builder("cim.server.messages.pushed")
                 .description("Total number of messages pushed to clients")
                 .register(meterRegistry);

         log.info("CIM server custom metrics registered");
-        return "metricsRegistered";
     }
cim-server/src/main/java/com/crossoverjie/cim/server/handle/CIMServerHandle.java (1)

41-47: Per-request SpringBeanFactory.getBean() lookup for the Tracer.

getTracer() performs a Spring context lookup on every LOGIN_REQUEST. While singletons are cached, the lookup still traverses the bean factory. Cache the result in a volatile field after the first successful retrieval.

Proposed fix
+    private volatile Tracer cachedTracer;
+
     private Tracer getTracer() {
+        if (cachedTracer != null) {
+            return cachedTracer;
+        }
         try {
-            return SpringBeanFactory.getBean(Tracer.class);
+            cachedTracer = SpringBeanFactory.getBean(Tracer.class);
+            return cachedTracer;
         } catch (Exception e) {
             return null;
         }
     }
cim-server/src/main/java/com/crossoverjie/cim/server/server/CIMServer.java (1)

85-139: Span ends before the async writeAndFlush completes.

span.end() in finally (line 137) executes immediately after socketChannel.writeAndFlush(protocol) is called, but the actual network write hasn't completed yet — the ChannelFutureListener fires asynchronously later. This means the span duration does not reflect actual write latency or failures. If accurate push timing matters, consider ending the span inside the listener callback instead.

Comment on lines +235 to +238
if (status != StatusEnum.SUCCESS) {
span.setAttribute("loginStatus", "failed");
return res;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Login early-return leaves span status as UNSET.

When status != SUCCESS, the method returns early at line 237. The span is ended in finally, but its status is never explicitly set (only the attribute loginStatus=failed is recorded). In Jaeger, this span will appear without a clear OK/ERROR status, making it harder to filter and alert on failed logins.

Proposed fix
             if (status != StatusEnum.SUCCESS) {
                 span.setAttribute("loginStatus", "failed");
+                span.setStatus(StatusCode.OK); // request itself succeeded; login was rejected by business logic
                 return res;
             }
🤖 Prompt for AI Agents
In
`@cim-forward-route/src/main/java/com/crossoverjie/cim/route/controller/RouteController.java`
around lines 235 - 238, The early return when status != StatusEnum.SUCCESS sets
span attribute "loginStatus" but leaves the span status UNSET; before returning
(in the failure branch in RouteController where you set
span.setAttribute("loginStatus", "failed")), also set the span status to an
error (e.g., span.setStatus(...) with an ERROR/StatusCode.ERROR and optional
message) so the finally block that ends the span records a non-OK status and
failed logins are visible in Jaeger.

Comment thread cim-forward-route/src/main/resources/application.yaml Outdated
Comment thread docker/allin1-ubuntu.Dockerfile Outdated
Comment on lines +13 to +18
# install Jaeger all-in-one (OpenTelemetry trace backend)
# Jaeger UI: 16686, OTLP gRPC: 4317, OTLP HTTP: 4318
RUN wget -q https://github.com/jaegertracing/jaeger/releases/download/v1.57.0/jaeger-1.57.0-linux-amd64.tar.gz \
&& tar -xzf jaeger-1.57.0-linux-amd64.tar.gz \
&& mv jaeger-1.57.0-linux-amd64/jaeger-all-in-one /usr/local/bin/ \
&& rm -rf jaeger-1.57.0-linux-amd64*
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check Jaeger v1.57.0 available architectures
curl -sL https://api.github.com/repos/jaegertracing/jaeger/releases/tags/v1.57.0 | \
  jq -r '.assets[].name' | grep -i 'linux'

Repository: crossoverJie/cim

Length of output: 505


🏁 Script executed:

#!/bin/bash
# Get more details on Jaeger release to understand available architectures
curl -sL https://api.github.com/repos/jaegertracing/jaeger/releases/tags/v1.57.0 | \
  jq -r '.assets[] | "\(.name)"' | sort

Repository: crossoverJie/cim

Length of output: 965


🏁 Script executed:

#!/bin/bash
# Find and examine the Dockerfile to understand context
fd -t f -name "*allin1-ubuntu.Dockerfile" -o -name "*.Dockerfile" | head -20

Repository: crossoverJie/cim

Length of output: 290


🏁 Script executed:

#!/bin/bash
# Check for README to verify platform claims
fd -t f -name "README*" | head -5

Repository: crossoverJie/cim

Length of output: 290


🏁 Script executed:

#!/bin/bash
cat docker/allin1-ubuntu.Dockerfile | head -50

Repository: crossoverJie/cim

Length of output: 1749


🏁 Script executed:

#!/bin/bash
find . -name README.md -o -name README.txt | xargs cat 2>/dev/null | head -100

Repository: crossoverJie/cim

Length of output: 3877


🏁 Script executed:

#!/bin/bash
# Check if there's a Dockerfile.build context or multi-arch setup mentioned
find . -name Dockerfile -o -name "*.Dockerfile" | xargs ls -la

Repository: crossoverJie/cim

Length of output: 225


🏁 Script executed:

#!/bin/bash
# Look for docker build scripts, CI/CD configs, or hints about multi-arch builds
find . -name "*.yml" -o -name "*.yaml" -o -name "Makefile" -o -name "*.sh" | xargs grep -l "docker\|buildx\|arm64\|amd64" 2>/dev/null | head -10

Repository: crossoverJie/cim

Length of output: 90


🏁 Script executed:

#!/bin/bash
# Check if there's any reference to multi-arch or arm in docker-related configs
git ls-files | grep -E "(docker|build|workflow|\.github)" | head -20

Repository: crossoverJie/cim

Length of output: 337


🏁 Script executed:

#!/bin/bash
cat .github/workflows/docker.yml

Repository: crossoverJie/cim

Length of output: 1387


Jaeger binary is hardcoded to linux-amd64, breaking multi-arch CI/CD builds.

The CI/CD workflow explicitly builds for linux/amd64, linux/arm64, and linux/arm/v7, but this Dockerfile step downloads only the amd64 binary, causing arm64 builds to fail. Use TARGETARCH build arg to select the correct architecture:

🐛 Proposed fix
+ARG TARGETARCH
 # install Jaeger all-in-one (OpenTelemetry trace backend)
 # Jaeger UI: 16686, OTLP gRPC: 4317, OTLP HTTP: 4318
-RUN wget -q https://github.com/jaegertracing/jaeger/releases/download/v1.57.0/jaeger-1.57.0-linux-amd64.tar.gz \
-    && tar -xzf jaeger-1.57.0-linux-amd64.tar.gz \
-    && mv jaeger-1.57.0-linux-amd64/jaeger-all-in-one /usr/local/bin/ \
-    && rm -rf jaeger-1.57.0-linux-amd64*
+RUN JAEGER_ARCH=$(case "${TARGETARCH}" in arm64) echo "arm64";; *) echo "amd64";; esac) \
+    && wget -q https://github.com/jaegertracing/jaeger/releases/download/v1.57.0/jaeger-1.57.0-linux-${JAEGER_ARCH}.tar.gz \
+    && tar -xzf jaeger-1.57.0-linux-${JAEGER_ARCH}.tar.gz \
+    && mv jaeger-1.57.0-linux-${JAEGER_ARCH}/jaeger-all-in-one /usr/local/bin/ \
+    && rm -rf jaeger-1.57.0-linux-${JAEGER_ARCH}*

Note: Jaeger v1.57.0 provides only amd64 and arm64 binaries. The linux/arm/v7 platform in the CI/CD workflow cannot be satisfied with this version.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# install Jaeger all-in-one (OpenTelemetry trace backend)
# Jaeger UI: 16686, OTLP gRPC: 4317, OTLP HTTP: 4318
RUN wget -q https://github.com/jaegertracing/jaeger/releases/download/v1.57.0/jaeger-1.57.0-linux-amd64.tar.gz \
&& tar -xzf jaeger-1.57.0-linux-amd64.tar.gz \
&& mv jaeger-1.57.0-linux-amd64/jaeger-all-in-one /usr/local/bin/ \
&& rm -rf jaeger-1.57.0-linux-amd64*
ARG TARGETARCH
# install Jaeger all-in-one (OpenTelemetry trace backend)
# Jaeger UI: 16686, OTLP gRPC: 4317, OTLP HTTP: 4318
RUN JAEGER_ARCH=$(case "${TARGETARCH}" in arm64) echo "arm64";; *) echo "amd64";; esac) \
&& wget -q https://github.com/jaegertracing/jaeger/releases/download/v1.57.0/jaeger-1.57.0-linux-${JAEGER_ARCH}.tar.gz \
&& tar -xzf jaeger-1.57.0-linux-${JAEGER_ARCH}.tar.gz \
&& mv jaeger-1.57.0-linux-${JAEGER_ARCH}/jaeger-all-in-one /usr/local/bin/ \
&& rm -rf jaeger-1.57.0-linux-${JAEGER_ARCH}*
🤖 Prompt for AI Agents
In `@docker/allin1-ubuntu.Dockerfile` around lines 13 - 18, The Dockerfile
currently hardcodes the Jaeger artifact "jaeger-1.57.0-linux-amd64.tar.gz" and
will fail for non-amd64 builds; change the step to accept a build ARG
TARGETARCH, map TARGETARCH values to the correct Jaeger archive arch string
(e.g., use "amd64" for TARGETARCH=amd64 and "arm64" for TARGETARCH=arm64, and
fail/skip with a clear message for unsupported TARGETARCH like linux/arm/v7),
then use that mapped arch when composing the download filename and extracting
the "jaeger-all-in-one" binary so the RUN wget/tar/mv/rm sequence fetches the
correct platform-specific tarball.

Comment thread docker/allin1-ubuntu.Dockerfile Outdated
Comment thread docker/supervisord.conf
Comment thread pom.xml Outdated
Comment thread README.md Outdated
- Use dedicated gRPC endpoint property (otel.exporter.otlp.grpc.endpoint)
- Add span.makeCurrent() with Scope for proper context propagation
- Add span.recordException() in p2pRoute catch block
- Set span status on login failure early return
- Fix OTLP metrics property (management.otlp.metrics.export.url)
- Externalize tracing sampling probability via env var
- Use volatile for static counter fields in cim-server OtelConfig
- Use @PostConstruct instead of @bean returning String for metrics
- Await TracerProvider shutdown completion with join()
- Cache Tracer with volatile field in CIMServerHandle
- Support multi-arch Docker builds with TARGETARCH
- Expose Zookeeper/Redis ports in Dockerfile
- Remove Jaeger as hard startup dependency in supervisord
- Remove redundant micrometer version properties from POM
- Update OpenTelemetry version to 1.40.0
- Fix README punctuation
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In
`@cim-forward-route/src/main/java/com/crossoverjie/cim/route/controller/RouteController.java`:
- Around line 155-196: p2pRoute currently only catches CIMException so non-CIM
exceptions leave the OpenTelemetry Span with StatusCode.UNSET and unrecorded;
add a general catch (Exception e) after the existing catch to set
span.setStatus(StatusCode.ERROR, e.getMessage()), span.recordException(e), and
populate the response with a generic error code/message (consistent with other
handlers like groupRoute/login), ensuring the catch references the existing Span
variable `span`, the method `p2pRoute`, and the response `res` to mirror the
error handling pattern used elsewhere.

In `@cim-forward-route/src/main/resources/application.yaml`:
- Line 95: Remove the trailing blank line at the end of application.yaml so the
file ends immediately after the last YAML content; open application.yaml, delete
the extra empty newline/whitespace after the final entry, save, and re-run
yamllint to confirm no EOF blank-line warning remains.
- Around line 73-94: The YAML and manual SDK both configure tracing causing
spans to be exported to different endpoints; consolidate to one approach by
either removing the custom OtelConfig and its
OpenTelemetrySdk/SdkTracerProvider/OtlpGrpcSpanExporter so all spans use Spring
Boot auto-config (management.otlp.tracing.endpoint ->
http://localhost:4318/v1/traces) and inject the auto-configured Tracer into
RouteController, or disable Spring Boot tracing auto-config (via
spring.autoconfigure.exclude) and keep OtelConfig
(otel.exporter.otlp.grpc.endpoint -> http://localhost:4317) as the sole
tracer—update RouteController to use whichever Tracer bean remains and remove
the unused tracing configuration to avoid dual export.
🧹 Nitpick comments (3)
cim-server/src/main/java/com/crossoverjie/cim/server/handle/CIMServerHandle.java (1)

38-54: Lazy Tracer initialization is fine for this non-Spring-managed handler, but consider a small robustness improvement.

The double-read of cachedTracer without synchronization is benign (worst case: redundant getBean calls). However, the catch (Exception e) silently swallows all exceptions and returns null, which means configuration errors (e.g., missing bean definition) will be hidden. Consider logging at debug/warn level on first failure so misconfigurations are detectable.

Proposed improvement
         try {
             cachedTracer = SpringBeanFactory.getBean(Tracer.class);
             return cachedTracer;
         } catch (Exception e) {
+            log.debug("Tracer bean not available, tracing disabled: {}", e.getMessage());
             return null;
         }
cim-forward-route/src/main/java/com/crossoverjie/cim/route/config/OtelConfig.java (1)

36-37: Unused meterRegistry field injection.

The meterRegistry field is injected via @Resource but never referenced. The counter @Bean methods receive MeterRegistry as a method parameter (Spring auto-injects it), so this field is dead code.

Proposed fix
-    `@jakarta.annotation.Resource`
-    private MeterRegistry meterRegistry;
-
cim-forward-route/src/main/java/com/crossoverjie/cim/route/controller/RouteController.java (1)

285-301: Consider adding span tracing to registerAccount for consistency.

All other mutating endpoints (groupRoute, p2pRoute, login) are instrumented with both spans and counters, but registerAccount only has a counter increment. If a registration fails, there's no trace to diagnose it. This is a minor consistency gap.

Comment on lines 155 to 196
public BaseResponse<NULLBody> p2pRoute(@RequestBody P2PReqVO p2pRequest) {
BaseResponse<NULLBody> res = new BaseResponse();

try {
//获取接收消息用户的路由信息
Optional<CIMServerResVO> cimServerResVO = accountService.loadRouteRelatedByUserId(p2pRequest.getReceiveUserId());
// === OpenTelemetry: 创建私聊路由 Span ===
Span span = tracer.spanBuilder("cim.route.p2pMsg").startSpan();

try (Scope scope = span.makeCurrent()) {
// 获取接收消息用户的路由信息
Optional<CIMServerResVO> cimServerResVO = accountService
.loadRouteRelatedByUserId(p2pRequest.getReceiveUserId());
if (cimServerResVO.isEmpty()) {
log.warn("userId={} not online, save offline msg", p2pRequest.getReceiveUserId());
offlineMsgService.saveOfflineMsg(p2pRequest);
throw new CIMException(OFF_LINE);
}

//p2pRequest.getReceiveUserId()==>消息接收者的 userID
ChatReqVO chatVO = new ChatReqVO(p2pRequest.getReceiveUserId(), p2pRequest.getMsg(), p2pRequest.getBatchMsg());
// p2pRequest.getReceiveUserId()==>消息接收者的 userID
ChatReqVO chatVO = new ChatReqVO(p2pRequest.getReceiveUserId(), p2pRequest.getMsg(),
p2pRequest.getBatchMsg());
accountService.pushMsg(cimServerResVO.get(), p2pRequest.getUserId(), chatVO);

// 记录 Span 属性
span.setAttribute("senderId", p2pRequest.getUserId());
span.setAttribute("receiverId", p2pRequest.getReceiveUserId());
span.setStatus(StatusCode.OK);

// === Metrics: 递增私聊计数器 ===
p2pMessageCounter.increment();

res.setCode(StatusEnum.SUCCESS.getCode());
res.setMessage(StatusEnum.SUCCESS.getMessage());

} catch (CIMException e) {
span.setStatus(StatusCode.ERROR, e.getErrorMessage());
span.recordException(e);
res.setCode(e.getErrorCode());
res.setMessage(e.getErrorMessage());
} finally {
span.end();
}
return res;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

p2pRoute only catches CIMException — non-CIM exceptions leave span status UNSET.

If a non-CIMException (e.g., NullPointerException, network error) is thrown, it bypasses the catch (CIMException e) block and goes directly to finally, where span.end() is called but the span status remains UNSET and the exception is not recorded. This contrasts with groupRoute and login, which catch all Exception types.

Proposed fix — add a general catch block
         } catch (CIMException e) {
             span.setStatus(StatusCode.ERROR, e.getErrorMessage());
             span.recordException(e);
             res.setCode(e.getErrorCode());
             res.setMessage(e.getErrorMessage());
+        } catch (Exception e) {
+            span.setStatus(StatusCode.ERROR, e.getMessage());
+            span.recordException(e);
+            throw e;
         } finally {
             span.end();
         }
🤖 Prompt for AI Agents
In
`@cim-forward-route/src/main/java/com/crossoverjie/cim/route/controller/RouteController.java`
around lines 155 - 196, p2pRoute currently only catches CIMException so non-CIM
exceptions leave the OpenTelemetry Span with StatusCode.UNSET and unrecorded;
add a general catch (Exception e) after the existing catch to set
span.setStatus(StatusCode.ERROR, e.getMessage()), span.recordException(e), and
populate the response with a generic error code/message (consistent with other
handlers like groupRoute/login), ensuring the catch references the existing Span
variable `span`, the method `p2pRoute`, and the response `res` to mirror the
error handling pattern used elsewhere.

Comment on lines +73 to +94
# OpenTelemetry 配置
management:
otlp:
tracing:
endpoint: http://localhost:4318/v1/traces
metrics:
export:
url: http://localhost:4318/v1/metrics
tracing:
sampling:
probability: ${TRACING_SAMPLING_PROBABILITY:1.0}
endpoints:
web:
exposure:
include: health,info,metrics

# OpenTelemetry gRPC endpoint for SDK exporter
otel:
exporter:
otlp:
grpc:
endpoint: ${OTEL_EXPORTER_OTLP_GRPC_ENDPOINT:http://localhost:4317}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find and examine OtelConfig.java
fd -type f -name "OtelConfig.java" -exec cat -n {} \;
echo "---"
# Check the application.yaml file to confirm configuration
fd -type f -name "application.yaml" -path "*/cim-forward-route/*" -exec cat -n {} \;

Repository: crossoverJie/cim

Length of output: 422


🏁 Script executed:

#!/bin/bash
# Find OtelConfig.java anywhere in the repo
find . -name "OtelConfig.java" -type f
echo "---"
# Also search for files containing OpenTelemetrySdk
rg -l "OpenTelemetrySdk" --type java
echo "---"
# Check for application.yaml in the cim-forward-route module
find . -path "*/cim-forward-route/*/application.yaml" -type f
echo "---"
# Look for pom.xml in cim-forward-route to check dependencies
find . -path "*/cim-forward-route/pom.xml" -type f

Repository: crossoverJie/cim

Length of output: 531


🏁 Script executed:

#!/bin/bash
# Examine OtelConfig.java in cim-forward-route
cat -n cim-forward-route/src/main/java/com/crossoverjie/cim/route/config/OtelConfig.java

Repository: crossoverJie/cim

Length of output: 4419


🏁 Script executed:

#!/bin/bash
# Check pom.xml for Spring Boot version and OpenTelemetry dependencies
cat -n cim-forward-route/pom.xml

Repository: crossoverJie/cim

Length of output: 6444


🏁 Script executed:

#!/bin/bash
# Read the application.yaml file for cim-forward-route
cat -n cim-forward-route/src/main/resources/application.yaml
echo "---"
# Check if there are any autoconfiguration exclusions
rg -n "spring.autoconfigure.exclude" cim-forward-route/

Repository: crossoverJie/cim

Length of output: 2881


🏁 Script executed:

#!/bin/bash
# Check if GlobalOpenTelemetry.set() is called
rg -n "GlobalOpenTelemetry.set" --type java
echo "---"
# Check parent pom.xml for Spring Boot version
cat cim/pom.xml | head -50

Repository: crossoverJie/cim

Length of output: 124


🏁 Script executed:

#!/bin/bash
# Find the parent pom.xml
find . -name "pom.xml" -path "*/cim/*" -type f | head -5
echo "---"
# Check the root pom.xml
ls -la pom.xml 2>/dev/null && cat -n pom.xml | head -100

Repository: crossoverJie/cim

Length of output: 4736


🏁 Script executed:

#!/bin/bash
# Check for any spring.autoconfigure.exclude in the parent or module poms
rg -n "spring.autoconfigure.exclude" --type xml
echo "---"
# Check for how the Tracer bean is being used in the codebase
rg -n "private.*Tracer\|@Autowired.*Tracer\|@Resource.*Tracer" --type java cim-forward-route/src/main/java/
echo "---"
# Check if spring-boot-starter-actuator triggers auto-config
rg -n "spring.boot.actuate" --type java cim-forward-route/src/main/java/ | head -20

Repository: crossoverJie/cim

Length of output: 67


🏁 Script executed:

#!/bin/bash
# Search for any usage of the Tracer bean in cim-forward-route
rg -n "tracer\|Tracer" --type java cim-forward-route/src/main/java/ | grep -v "TracerProvider\|OpenTelemetry" | head -30
echo "---"
# Check if there are any span creations
rg -n "startSpan\|startActiveSpan" --type java cim-forward-route/src/main/java/

Repository: crossoverJie/cim

Length of output: 63


🏁 Script executed:

#!/bin/bash
# Check cim-server's OtelConfig to see if it follows the same pattern
cat -n cim-server/src/main/java/com/crossoverjie/cim/server/config/OtelConfig.java | head -70
echo "---"
# Check if cim-server's application.yaml has similar configuration
cat -n cim-server/src/main/resources/application.yaml | grep -A 25 "OpenTelemetry"

Repository: crossoverJie/cim

Length of output: 3954


🏁 Script executed:

#!/bin/bash
# Check if cim-server uses the Tracer bean
rg -n "Tracer" cim-server/src/main/java/ --type java | grep -v "TracerProvider\|@Bean" | head -20
echo "---"
# Check how GlobalOpenTelemetry is used or if it's set anywhere
rg -n "GlobalOpenTelemetry\|setGlobal" --type java

Repository: crossoverJie/cim

Length of output: 1948


🏁 Script executed:

#!/bin/bash
# Check if cim-forward-route's RouteHandler or MessageHandler uses Tracer
rg -n "import.*Tracer|new.*Span|startSpan|startActiveSpan" cim-forward-route/src/main/java/ --type java
echo "---"
# Check for Tracer usage in cim-forward-route via SpringBeanFactory or direct injection
rg -n "@Autowired.*Tracer|@Resource.*Tracer|getBean.*Tracer" cim-forward-route/src/main/java/ --type java
echo "---"
# List main classes in cim-forward-route to check if Tracer is injected
ls -la cim-forward-route/src/main/java/com/crossoverjie/cim/route/*/

Repository: crossoverJie/cim

Length of output: 3642


🏁 Script executed:

#!/bin/bash
# Check how tracer is injected in RouteController
cat -n cim-forward-route/src/main/java/com/crossoverjie/cim/route/controller/RouteController.java | head -50

Repository: crossoverJie/cim

Length of output: 2603


🏁 Script executed:

#!/bin/bash
# Check the field declarations in RouteController
sed -n '40,120p' cim-forward-route/src/main/java/com/crossoverjie/cim/route/controller/RouteController.java

Repository: crossoverJie/cim

Length of output: 2486


Dual tracing setup exports spans to different endpoints — consolidate to single configuration.

The application configures two separate OpenTelemetry tracing paths:

  1. Spring Boot auto-configuration (via micrometer-tracing-bridge-otel on classpath + management.otlp.tracing.endpoint in YAML) auto-instruments HTTP requests and exports spans via HTTP to http://localhost:4318/v1/traces.

  2. Manual SDK in OtelConfig creates a separate OpenTelemetrySdk with SdkTracerProvider and OtlpGrpcSpanExporter, exporting spans via gRPC to http://localhost:4317 (configured by otel.exporter.otlp.grpc.endpoint).

RouteController explicitly injects and uses the manual Tracer bean to create spans for group messaging, P2P messaging, and login operations. This means spans from these operations use the manual gRPC path (port 4317), while Spring Boot's auto-instrumented request/response tracing uses the HTTP path (port 4318). If only one collector is running, you'll lose traces; if both are running, traces are split across endpoints and difficult to correlate.

Choose one approach:

  • Option A: Rely solely on Spring Boot's auto-configuration by removing OtelConfig and letting Spring Boot handle all tracing.
  • Option B: Disable Spring Boot's auto-configuration (spring.autoconfigure.exclude) and keep only OtelConfig for manual control.
🤖 Prompt for AI Agents
In `@cim-forward-route/src/main/resources/application.yaml` around lines 73 - 94,
The YAML and manual SDK both configure tracing causing spans to be exported to
different endpoints; consolidate to one approach by either removing the custom
OtelConfig and its OpenTelemetrySdk/SdkTracerProvider/OtlpGrpcSpanExporter so
all spans use Spring Boot auto-config (management.otlp.tracing.endpoint ->
http://localhost:4318/v1/traces) and inject the auto-configured Tracer into
RouteController, or disable Spring Boot tracing auto-config (via
spring.autoconfigure.exclude) and keep OtelConfig
(otel.exporter.otlp.grpc.endpoint -> http://localhost:4317) as the sole
tracer—update RouteController to use whichever Tracer bean remains and remove
the unused tracing configuration to avoid dual export.

@crossoverJie
Copy link
Copy Markdown
Owner

Sorry for the late reply, please check the CI.

Spring Boot BOM does not manage micrometer-tracing-bridge-otel and
micrometer-registry-otlp versions. Restored version properties with
updated versions (micrometer-tracing 1.4.2, micrometer 1.14.3).
@crossoverJie
Copy link
Copy Markdown
Owner

Please refer to the checkstyle fix here

- Remove unused import io.opentelemetry.context.Scope in CIMServerHandle
- Remove trailing whitespace in CIMServer Javadoc
@KitAndrew
Copy link
Copy Markdown
Author

Please refer to the checkstyle fix here

done.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
cim-server/src/main/java/com/crossoverjie/cim/server/server/CIMServer.java (1)

41-42: Consider constructor injection for better testability.

Field injection with @Autowired makes the class harder to unit test and hides dependencies. Constructor injection is generally preferred in Spring applications.

♻️ Suggested refactor to constructor injection
 `@Component`
 `@Slf4j`
 public class CIMServer {

     private EventLoopGroup boss = new NioEventLoopGroup();
     private EventLoopGroup work = new NioEventLoopGroup();

-    `@Autowired`
-    private Tracer tracer;
+    private final Tracer tracer;
+
+    public CIMServer(Tracer tracer) {
+        this.tracer = tracer;
+    }

     `@Value`("${cim.server.port}")
     private int nettyPort;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cim-server/src/main/java/com/crossoverjie/cim/server/server/CIMServer.java`
around lines 41 - 42, The CIMServer class currently uses field injection for the
Tracer (private Tracer tracer) which reduces testability; change to constructor
injection by removing the `@Autowired` field and adding a constructor (e.g.,
public CIMServer(Tracer tracer, ...) or overloaded constructor) that accepts
Tracer and assigns it to the instance field, update any `@Autowired` annotations
accordingly, and ensure tests and Spring configuration instantiate CIMServer via
its constructor so the Tracer dependency can be injected/mocked.
cim-server/src/main/java/com/crossoverjie/cim/server/handle/CIMServerHandle.java (1)

47-52: Surface tracer lookup failures instead of failing silently

Line 50 currently swallows initialization errors, which can disable tracing with no operational signal. Log at least once when bean lookup fails.

Proposed fix
+    private final java.util.concurrent.atomic.AtomicBoolean tracerLookupWarned =
+            new java.util.concurrent.atomic.AtomicBoolean(false);
+
     private Tracer getTracer() {
         if (cachedTracer != null) {
             return cachedTracer;
         }
         try {
             cachedTracer = SpringBeanFactory.getBean(Tracer.class);
             return cachedTracer;
         } catch (Exception e) {
+            if (tracerLookupWarned.compareAndSet(false, true)) {
+                log.warn("Tracer bean unavailable; login tracing is disabled.", e);
+            }
             return null;
         }
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@cim-server/src/main/java/com/crossoverjie/cim/server/handle/CIMServerHandle.java`
around lines 47 - 52, The current try/catch around
SpringBeanFactory.getBean(Tracer.class) in CIMServerHandle swallows exceptions;
change the catch to log the failure (include the exception) before returning
null so tracer lookup failures are visible; use the class logger (e.g., the
CIMServerHandle logger) and, if you want to avoid noisy repeated logs, gate the
log with a static flag/AtomicBoolean (e.g., tracerLookupLogged) so the failure
is logged at least once, then return null as before.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@cim-server/src/main/java/com/crossoverjie/cim/server/handle/CIMServerHandle.java`:
- Around line 110-113: The span currently exports a raw username via
span.setAttribute("userName", msg.getReqMsg()) in CIMServerHandle; replace that
exposure with an anonymized value (e.g., a deterministic hash or truncated
token) or remove the attribute entirely: compute a non-reversible identifier
from msg.getReqMsg() and set that with span.setAttribute("userNameHash",
<hashedValue>) or only emit a boolean/role attribute instead, keeping
span.setAttribute("userId", msg.getRequestId()) and
span.setStatus(StatusCode.OK) unchanged.

In `@cim-server/src/main/java/com/crossoverjie/cim/server/server/CIMServer.java`:
- Around line 93-139: The span is ended and marked OK before the async write
completes; update the logic so the ChannelFutureListener handles span status,
exception recording, counter increment and span.end() based on the write result:
keep creating the Span (variable span) as-is, but remove setting
span.setStatus(StatusCode.OK) and the
OtelConfig.getMessagePushedCounter().increment() from the try block and also
remove span.end() from the finally; instead, in the ChannelFutureListener
attached to socketChannel.writeAndFlush(protocol) inspect
channelFuture.isSuccess() and on success set span attributes
(targetUserId/messageType), set StatusCode.OK, increment
OtelConfig.getMessagePushedCounter(), and call span.end(); on failure set
StatusCode.ERROR, call span.recordException(channelFuture.cause()) and then
span.end(); preserve the existing catch block behavior for synchronous
exceptions thrown before write.

---

Nitpick comments:
In
`@cim-server/src/main/java/com/crossoverjie/cim/server/handle/CIMServerHandle.java`:
- Around line 47-52: The current try/catch around
SpringBeanFactory.getBean(Tracer.class) in CIMServerHandle swallows exceptions;
change the catch to log the failure (include the exception) before returning
null so tracer lookup failures are visible; use the class logger (e.g., the
CIMServerHandle logger) and, if you want to avoid noisy repeated logs, gate the
log with a static flag/AtomicBoolean (e.g., tracerLookupLogged) so the failure
is logged at least once, then return null as before.

In `@cim-server/src/main/java/com/crossoverjie/cim/server/server/CIMServer.java`:
- Around line 41-42: The CIMServer class currently uses field injection for the
Tracer (private Tracer tracer) which reduces testability; change to constructor
injection by removing the `@Autowired` field and adding a constructor (e.g.,
public CIMServer(Tracer tracer, ...) or overloaded constructor) that accepts
Tracer and assigns it to the instance field, update any `@Autowired` annotations
accordingly, and ensure tests and Spring configuration instantiate CIMServer via
its constructor so the Tracer dependency can be injected/mocked.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 974aca4 and 348e192.

📒 Files selected for processing (2)
  • cim-server/src/main/java/com/crossoverjie/cim/server/handle/CIMServerHandle.java
  • cim-server/src/main/java/com/crossoverjie/cim/server/server/CIMServer.java

@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 2, 2026

Codecov Report

❌ Patch coverage is 0% with 130 lines in your changes missing coverage. Please review.
✅ Project coverage is 47.65%. Comparing base (f1f3bae) to head (9542423).

Files with missing lines Patch % Lines
...soverjie/cim/route/controller/RouteController.java 0.00% 96 Missing ⚠️
.../com/crossoverjie/cim/route/config/OtelConfig.java 0.00% 34 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master     #186      +/-   ##
============================================
- Coverage     50.34%   47.65%   -2.69%     
  Complexity      320      320              
============================================
  Files            79       80       +1     
  Lines          1738     1836      +98     
  Branches        134      157      +23     
============================================
  Hits            875      875              
- Misses          822      920      +98     
  Partials         41       41              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment thread docker/allin1-ubuntu.Dockerfile Outdated
Comment thread cim-forward-route/src/main/resources/application.yaml
- Add @ConditionalOnProperty(cim.otel.enabled) to OtelConfig in both
  cim-server and cim-forward-route, so OTel beans are only created
  when explicitly enabled
- Use @Autowired(required=false) with null guards for Tracer/Counter
  in RouteController; CIMServerHandle already has null protection
- Add cim.otel.enabled=${OTEL_ENABLED:false} to both application.yaml
  files, defaulting to disabled
- Parameterize OTLP endpoints with environment variables
- Dockerfile: use ARG ENABLE_OTEL=false to conditionally install Jaeger
- supervisord: use %(ENV_ENABLE_OTEL)s for Jaeger autostart

To enable OTel, build with --build-arg ENABLE_OTEL=true and run with
OTEL_ENABLED=true.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
docker/allin1-ubuntu.Dockerfile (1)

17-23: ⚠️ Potential issue | 🟠 Major

Handle unsupported TARGETARCH explicitly before downloading Jaeger.

Line 18 currently does a no-op transform, so unsupported values (for example arm / arm/v7) pass through and produce invalid artifact URLs when ENABLE_OTEL=true. Add an explicit architecture map and fail fast with a clear error.

🐛 Proposed fix
 RUN if [ "$ENABLE_OTEL" = "true" ]; then \
-      JAEGER_ARCH=$(echo ${TARGETARCH:-amd64} | sed 's/arm64/arm64/;s/amd64/amd64/') \
+      case "${TARGETARCH:-amd64}" in \
+        amd64) JAEGER_ARCH="amd64" ;; \
+        arm64) JAEGER_ARCH="arm64" ;; \
+        *) echo "Unsupported TARGETARCH='${TARGETARCH}' for Jaeger v1.57.0 (supported: amd64, arm64)" >&2; exit 1 ;; \
+      esac \
       && wget -q https://github.com/jaegertracing/jaeger/releases/download/v1.57.0/jaeger-1.57.0-linux-${JAEGER_ARCH}.tar.gz \
       && tar -xzf jaeger-1.57.0-linux-${JAEGER_ARCH}.tar.gz \
       && mv jaeger-1.57.0-linux-${JAEGER_ARCH}/jaeger-all-in-one /usr/local/bin/ \
       && rm -rf jaeger-1.57.0-linux-${JAEGER_ARCH}*; \
     fi

Use this read-only verification script to confirm the current mapping problem and Jaeger asset support:

#!/bin/bash
set -euo pipefail

echo "== Current Dockerfile mapping line =="
fd -t f allin1-ubuntu.Dockerfile --exec rg -n 'JAEGER_ARCH|TARGETARCH|jaeger-1\.57\.0-linux-\$\{JAEGER_ARCH\}' {}

echo
echo "== Simulate current sed mapping =="
for a in amd64 arm64 arm arm/v7; do
  mapped="$(printf '%s' "$a" | sed 's/arm64/arm64/;s/amd64/amd64/')"
  echo "$a -> $mapped"
done

echo
echo "== Jaeger v1.57.0 Linux assets =="
python - <<'PY'
import json, urllib.request
url="https://api.github.com/repos/jaegertracing/jaeger/releases/tags/v1.57.0"
data=json.load(urllib.request.urlopen(url))
assets=sorted(a["name"] for a in data.get("assets",[]) if "linux" in a.get("name",""))
for a in assets:
    print(a)
PY

Expected result: arm/arm/v7 remain unmapped under current logic, while Jaeger v1.57.0 provides linux archives for amd64/arm64 only.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docker/allin1-ubuntu.Dockerfile` around lines 17 - 23, The current
JAEGER_ARCH mapping (sed 's/arm64/arm64/;s/amd64/amd64/') silently passes
unsupported TARGETARCH values and produces invalid Jaeger URLs when
ENABLE_OTEL=true; update the Dockerfile block that computes JAEGER_ARCH to
implement an explicit architecture map (e.g., switch/case or if/elif) that
translates known TARGETARCH values (amd64 -> amd64, arm64 -> arm64) and
otherwise aborts with a clear error message (exit 1) before attempting the wget;
ensure you reference and validate the TARGETARCH variable, set JAEGER_ARCH only
on supported values, and fail fast with a descriptive message so the subsequent
wget/tar/mv steps are never run for unsupported architectures.
cim-forward-route/src/main/java/com/crossoverjie/cim/route/controller/RouteController.java (1)

203-216: ⚠️ Potential issue | 🟡 Minor

p2pRoute only catches CIMException — non-CIM exceptions leave span with UNSET status.

Unlike groupRoute and login which catch all Exception types, p2pRoute only catches CIMException. If a different exception occurs (e.g., NullPointerException from accountService), the span status remains UNSET and the exception is not recorded, making troubleshooting harder in distributed traces.

🔧 Proposed fix — add a general catch block
         } catch (CIMException e) {
             if (span != null) {
                 span.setStatus(StatusCode.ERROR, e.getErrorMessage());
                 span.recordException(e);
             }
             res.setCode(e.getErrorCode());
             res.setMessage(e.getErrorMessage());
+        } catch (Exception e) {
+            if (span != null) {
+                span.setStatus(StatusCode.ERROR, e.getMessage());
+                span.recordException(e);
+            }
+            throw e;
         } finally {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@cim-forward-route/src/main/java/com/crossoverjie/cim/route/controller/RouteController.java`
around lines 203 - 216, The p2pRoute handler currently only catches CIMException
so non-CIM exceptions leave the OpenTelemetry span UNSET and unrecorded; add a
general catch (Exception e) after the existing catch(CIMException e) in the
p2pRoute method to set span.setStatus(StatusCode.ERROR, e.getMessage()), call
span.recordException(e), and populate the response (res) with a generic error
code/message (or map to existing error contract) so unexpected exceptions are
logged similarly to CIMException; keep the existing finally block that closes
scope and ends span.
🧹 Nitpick comments (2)
cim-server/src/main/java/com/crossoverjie/cim/server/config/OtelConfig.java (2)

75-75: Prefer dynamic tracer version over hardcoded 1.0.0.

Line 75 hardcodes instrumentation version, which can become stale after releases and make trace metadata less reliable for debugging.

Proposed refactor
- return openTelemetry.getTracer("cim-server", "1.0.0");
+ String version = OtelConfig.class.getPackage().getImplementationVersion();
+ return openTelemetry.getTracer("cim-server", version != null ? version : "unknown");
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cim-server/src/main/java/com/crossoverjie/cim/server/config/OtelConfig.java`
at line 75, The tracer version is hardcoded in the getTracer call; change
OtelConfig to compute a dynamic version string (e.g., read from the package
implementation version via
OtelConfig.class.getPackage().getImplementationVersion() or from project/Maven
properties like pom.properties/manifest) and pass that variable into
openTelemetry.getTracer("cim-server", version) with a safe fallback (e.g.,
"unknown" or project version) if the lookup returns null. Update the code path
that currently returns openTelemetry.getTracer("cim-server", "1.0.0") to use
this computed version value.

32-32: Avoid exposing nullable static counters when OTel is disabled.

Because Line 32 disables this config by default unless enabled, counters at Lines 43-48 are never initialized unless registerMetrics() runs (Lines 85-95). Static getters therefore expose nullable counters, which is a runtime NPE hazard for any direct .increment() call path.

Proposed safer pattern (null-safe increment API)
 `@Getter`
 private static volatile Counter loginCounter;
 `@Getter`
 private static volatile Counter heartbeatCounter;
 `@Getter`
 private static volatile Counter messagePushedCounter;

+public static void incrementLoginCounter() {
+    Counter c = loginCounter;
+    if (c != null) {
+        c.increment();
+    }
+}
+
+public static void incrementHeartbeatCounter() {
+    Counter c = heartbeatCounter;
+    if (c != null) {
+        c.increment();
+    }
+}
+
+public static void incrementMessagePushedCounter() {
+    Counter c = messagePushedCounter;
+    if (c != null) {
+        c.increment();
+    }
+}

Then switch call sites from get...Counter().increment() to these helpers.

Also applies to: 43-48, 85-95

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cim-server/src/main/java/com/crossoverjie/cim/server/config/OtelConfig.java`
at line 32, OtelConfig currently exposes nullable static counter fields via
get...Counter() that are only set in registerMetrics(), creating NPE risk when
cim.otel.enabled is false; change to provide null-safe behavior by replacing
direct static counter exposure with either (a) initialize counters to a
Noop/NoopDoubleCounter placeholder or (b) remove public static getters and add
null-safe helper methods on OtelConfig such as incrementXCounter(),
incrementYCounter() that check the real counter and perform a no-op if null;
update all call sites to use these new incrementXCounter() helpers instead of
get...Counter().increment(), and keep registerMetrics() responsible only for
swapping in real counters when enabled. Ensure symbols referenced: class
OtelConfig, registerMetrics(), all get...Counter() methods and the new
incrementXCounter()/incrementYCounter() helpers.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cim-forward-route/src/main/resources/application.yaml`:
- Line 100: Remove the extra blank line at the end of application.yaml so the
file ends immediately after the last YAML content (no trailing empty line); open
application.yaml and delete the final empty line so YAMLlint no longer flags a
trailing blank line.

---

Duplicate comments:
In
`@cim-forward-route/src/main/java/com/crossoverjie/cim/route/controller/RouteController.java`:
- Around line 203-216: The p2pRoute handler currently only catches CIMException
so non-CIM exceptions leave the OpenTelemetry span UNSET and unrecorded; add a
general catch (Exception e) after the existing catch(CIMException e) in the
p2pRoute method to set span.setStatus(StatusCode.ERROR, e.getMessage()), call
span.recordException(e), and populate the response (res) with a generic error
code/message (or map to existing error contract) so unexpected exceptions are
logged similarly to CIMException; keep the existing finally block that closes
scope and ends span.

In `@docker/allin1-ubuntu.Dockerfile`:
- Around line 17-23: The current JAEGER_ARCH mapping (sed
's/arm64/arm64/;s/amd64/amd64/') silently passes unsupported TARGETARCH values
and produces invalid Jaeger URLs when ENABLE_OTEL=true; update the Dockerfile
block that computes JAEGER_ARCH to implement an explicit architecture map (e.g.,
switch/case or if/elif) that translates known TARGETARCH values (amd64 -> amd64,
arm64 -> arm64) and otherwise aborts with a clear error message (exit 1) before
attempting the wget; ensure you reference and validate the TARGETARCH variable,
set JAEGER_ARCH only on supported values, and fail fast with a descriptive
message so the subsequent wget/tar/mv steps are never run for unsupported
architectures.

---

Nitpick comments:
In `@cim-server/src/main/java/com/crossoverjie/cim/server/config/OtelConfig.java`:
- Line 75: The tracer version is hardcoded in the getTracer call; change
OtelConfig to compute a dynamic version string (e.g., read from the package
implementation version via
OtelConfig.class.getPackage().getImplementationVersion() or from project/Maven
properties like pom.properties/manifest) and pass that variable into
openTelemetry.getTracer("cim-server", version) with a safe fallback (e.g.,
"unknown" or project version) if the lookup returns null. Update the code path
that currently returns openTelemetry.getTracer("cim-server", "1.0.0") to use
this computed version value.
- Line 32: OtelConfig currently exposes nullable static counter fields via
get...Counter() that are only set in registerMetrics(), creating NPE risk when
cim.otel.enabled is false; change to provide null-safe behavior by replacing
direct static counter exposure with either (a) initialize counters to a
Noop/NoopDoubleCounter placeholder or (b) remove public static getters and add
null-safe helper methods on OtelConfig such as incrementXCounter(),
incrementYCounter() that check the real counter and perform a no-op if null;
update all call sites to use these new incrementXCounter() helpers instead of
get...Counter().increment(), and keep registerMetrics() responsible only for
swapping in real counters when enabled. Ensure symbols referenced: class
OtelConfig, registerMetrics(), all get...Counter() methods and the new
incrementXCounter()/incrementYCounter() helpers.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 348e192 and 35a3841.

📒 Files selected for processing (7)
  • cim-forward-route/src/main/java/com/crossoverjie/cim/route/config/OtelConfig.java
  • cim-forward-route/src/main/java/com/crossoverjie/cim/route/controller/RouteController.java
  • cim-forward-route/src/main/resources/application.yaml
  • cim-server/src/main/java/com/crossoverjie/cim/server/config/OtelConfig.java
  • cim-server/src/main/resources/application.yaml
  • docker/allin1-ubuntu.Dockerfile
  • docker/supervisord.conf
🚧 Files skipped from review as they are similar to previous changes (3)
  • cim-forward-route/src/main/java/com/crossoverjie/cim/route/config/OtelConfig.java
  • cim-server/src/main/resources/application.yaml
  • docker/supervisord.conf

otlp:
grpc:
endpoint: ${OTEL_EXPORTER_OTLP_GRPC_ENDPOINT:http://localhost:4317}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Remove trailing blank line.

YAMLlint flags an extra blank line at the end of the file. Remove it to satisfy the linter.

🧰 Tools
🪛 YAMLlint (1.38.0)

[warning] 100-100: too many blank lines (1 > 0)

(empty-lines)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cim-forward-route/src/main/resources/application.yaml` at line 100, Remove
the extra blank line at the end of application.yaml so the file ends immediately
after the last YAML content (no trailing empty line); open application.yaml and
delete the final empty line so YAMLlint no longer flags a trailing blank line.

crossoverJie and others added 2 commits March 17, 2026 09:33
- Replace raw userName with userNameHash in span attributes to avoid
  privacy/compliance risk (CIMServerHandle)
- Move span lifecycle management into ChannelFutureListener callback
  to correctly reflect async write results (CIMServer)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants