feat: Integrate OpenTelemetry for distributed tracing and metrics#186
feat: Integrate OpenTelemetry for distributed tracing and metrics#186KitAndrew wants to merge 7 commits into
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds OpenTelemetry tracing and Micrometer OTLP metrics: dependency additions, new OtelConfig classes, Tracer and Counter/Gauge beans, instrumentation of route and server flows with spans and counters, config toggles in application YAMLs, and optional Jaeger runtime entries in Docker files. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant RouteController
participant CIMServer
participant Tracer as OpenTelemetry
participant OTLP
Client->>RouteController: request (message / login / register)
activate RouteController
RouteController->>Tracer: start span (route)
RouteController->>CIMServer: forward request
activate CIMServer
CIMServer->>Tracer: start span (server)
CIMServer->>CIMServer: write socket / update session / counters++
CIMServer->>Tracer: set attributes, set status, end span
deactivate CIMServer
RouteController->>Tracer: set attributes, set status, end span
deactivate RouteController
Tracer->>OTLP: BatchSpanProcessor exports spans
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment Tip You can enable review details to help with troubleshooting, context usage and more.Enable the |
There was a problem hiding this comment.
Actionable comments posted: 13
🤖 Fix all issues with AI agents
In
`@cim-forward-route/src/main/java/com/crossoverjie/cim/route/config/OtelConfig.java`:
- Around line 50-52: The endpoint derivation using otlpEndpoint.replace(...) is
fragile; introduce a dedicated gRPC endpoint configuration (e.g., a new property
name like otlpGrpcEndpoint) and use that when building the OtlpGrpcSpanExporter
instead of mutating otlpEndpoint. Update OtelConfig to read the new
otlpGrpcEndpoint (with a sensible default or explicit validation) and construct
OtlpGrpcSpanExporter.builder().setEndpoint(otlpGrpcEndpoint).build(); leave
otlpEndpoint for HTTP exporters and ensure any existing callers/configs are
updated to provide the new property.
In
`@cim-forward-route/src/main/java/com/crossoverjie/cim/route/controller/RouteController.java`:
- Around line 186-189: In RouteController's p2pRoute method catch block for
CIMException you only call span.setStatus(...) and set the response, so
exception details aren't recorded in tracing; add span.recordException(e)
alongside span.setStatus(StatusCode.ERROR, e.getErrorMessage()) (matching the
pattern used in groupRoute and login) so the exception and stack trace are
emitted to Jaeger when p2pRoute fails.
- Around line 235-238: The early return when status != StatusEnum.SUCCESS sets
span attribute "loginStatus" but leaves the span status UNSET; before returning
(in the failure branch in RouteController where you set
span.setAttribute("loginStatus", "failed")), also set the span status to an
error (e.g., span.setStatus(...) with an ERROR/StatusCode.ERROR and optional
message) so the finally block that ends the span records a non-OK status and
failed logins are visible in Jaeger.
- Around line 101-102: The created Span in RouteController (Span span =
tracer.spanBuilder("cim.route.groupMsg").startSpan() inside groupRoute — and
similarly in p2pRoute and login) is never made current, breaking context
propagation; fix by making the span current with span.makeCurrent() (assign to a
Scope, e.g., try-with-resources Scope scope = span.makeCurrent()) before
executing downstream work and ensure the Scope is closed and span.end() is
always called in a finally block so child spans and instrumented libraries
inherit the context.
In `@cim-forward-route/src/main/resources/application.yaml`:
- Around line 73-82: The OTLP metrics property path is incorrect and tracing
sampling is hard-coded; change the metrics config key from
management.otlp.metrics.endpoint to management.otlp.metrics.export.url (per
Spring Boot/Micrometer OTLP) and externalize the tracing sampling probability
(management.tracing.sampling.probability) by reading it from an environment
variable or profile-specific property (e.g.,
${TRACING_SAMPLING_PROBABILITY:0.1}) or a non-dev profile override so production
does not trace 100% of requests.
In `@cim-server/src/main/java/com/crossoverjie/cim/server/config/OtelConfig.java`:
- Around line 47-52: The static Counter fields loginCounter, heartbeatCounter,
and messagePushedCounter in OtelConfig are written during Spring initialization
and later read from Netty event-loop threads, so add the volatile modifier to
each static field declaration to ensure visibility across threads (Lombok's
`@Getter` can stay as-is); update the declarations of loginCounter,
heartbeatCounter, and messagePushedCounter to be private static volatile Counter
... so Netty threads reliably see the initialized non-null counters.
In
`@cim-server/src/main/java/com/crossoverjie/cim/server/handle/CIMServerHandle.java`:
- Around line 93-101: The empty if block after creating Span (Span span in
CIMServerHandle) should actually make the span current: call span.makeCurrent()
and assign to a Scope (e.g., Scope scope) when span != null, and ensure the
Scope is closed and span.end() in the finally block; alternatively remove the
misleading if block—prefer implementing Scope scope = span.makeCurrent() inside
the try and closing scope.close() (and ending span) in the finally to ensure
proper context propagation.
In `@cim-server/src/main/resources/application.yaml`:
- Around line 40-52: The YAML uses an unrecognized property for OTLP metrics:
replace management.otlp.metrics.endpoint with the correct nested property
management.otlp.metrics.export.url so that metrics export is configured as
management -> otlp -> metrics -> export -> url (keep
management.otlp.tracing.endpoint as-is); update the application.yaml to reflect
this property name change so Spring Boot 3.x will actually export metrics to the
OTLP collector.
In `@docker/allin1-ubuntu.Dockerfile`:
- Around line 45-46: The EXPOSE directive currently lists only Jaeger/OTel ports
(16686, 4317, 4318) but omits Zookeeper and Redis; update the EXPOSE instruction
in the Dockerfile (modify the EXPOSE line) to include ports 2181 and 6379 so it
reads to expose 16686 4317 4318 2181 6379, and verify whether port 8083 should
be added based on the application before including it.
- Around line 13-18: The Dockerfile currently hardcodes the Jaeger artifact
"jaeger-1.57.0-linux-amd64.tar.gz" and will fail for non-amd64 builds; change
the step to accept a build ARG TARGETARCH, map TARGETARCH values to the correct
Jaeger archive arch string (e.g., use "amd64" for TARGETARCH=amd64 and "arm64"
for TARGETARCH=arm64, and fail/skip with a clear message for unsupported
TARGETARCH like linux/arm/v7), then use that mapped arch when composing the
download filename and extracting the "jaeger-all-in-one" binary so the RUN
wget/tar/mv/rm sequence fetches the correct platform-specific tarball.
In `@docker/supervisord.conf`:
- Around line 34-38: The supervisord [program:server] startup sequence makes
Jaeger/OTLP a hard dependency by calling /wait-for-it.sh on port 4317 before
launching java -jar /cim-server.jar; update the command in the supervisord
[program:server] (and any analogous [program:cim-route]) to remove the
/wait-for-it.sh -t 30 localhost:4317 check (or if you must keep it, increase the
timeout significantly and add a clear comment documenting that Jaeger is
required), leaving only the Redis (6379) and Zookeeper (2181) waits so the OTLP
exporter can retry/buffer without blocking application start.
In `@pom.xml`:
- Around line 26-28: The POM declares redundant properties
micrometer-tracing.version and micrometer.version (matching Spring Boot 3.3.0
managed versions) and a mismatched opentelemetry.version; remove the
micrometer-tracing.version and micrometer.version properties from the pom.xml so
Spring Boot's dependency management controls those versions, and either change
opentelemetry.version to 1.37.0 to match Spring Boot 3.3.0 or remove the
opentelemetry.version property if you intend to use the Boot-managed version
instead (check for any intentional override before removal).
In `@README.md`:
- Line 25: The checklist item "Integrate OpenTelemetry ." has an extra space
before the period; update that checklist line (the "- [x] Integrate
OpenTelemetry ." entry) to remove the extra space so it reads "Integrate
OpenTelemetry." to correct the punctuation.
🧹 Nitpick comments (6)
docker/supervisord.conf (1)
24-32: Nit: Redundant OTLP enablement — both CLI flag and environment variable are set.Line 25 passes
--collector.otlp.enabled=trueand Line 26 sets the equivalentCOLLECTOR_OTLP_ENABLED="true"env var. One of them is sufficient.cim-forward-route/src/main/java/com/crossoverjie/cim/route/config/OtelConfig.java (2)
115-121:tracerProvider.shutdown()is fire-and-forget — pending spans may be lost.
SdkTracerProvider.shutdown()returns aCompletableResultCode. Without awaiting its completion, in-flight spans in theBatchSpanProcessormay not be flushed before the JVM exits.Proposed fix: await shutdown completion
`@PreDestroy` public void shutdown() { if (tracerProvider != null) { - tracerProvider.shutdown(); + tracerProvider.shutdown().join(10, java.util.concurrent.TimeUnit.SECONDS); log.info("OpenTelemetry TracerProvider shut down"); } }
1-121: Significant code duplication withcim-server'sOtelConfig.The OTLP SDK initialization, tracer bean creation, and shutdown logic are nearly identical between this class and
cim-server/…/OtelConfig.java. Consider extracting a shared base configuration (e.g., incim-common) parameterized byservice.name, with module-specific subclasses or configs only defining their custom metrics.cim-server/src/main/java/com/crossoverjie/cim/server/config/OtelConfig.java (1)
116-141:@Beanreturning a dummyStringto trigger metric registration is a code smell.Returning
"metricsRegistered"registers an unrelatedStringbean in the context and obscures the intent. A@PostConstructmethod on this@Configurationclass (withMeterRegistryinjected as a field) or aSmartInitializingSingletonwould be more idiomatic.Suggested alternative
+ `@Resource` + private MeterRegistry meterRegistry; + - `@Bean` - public String registerMetrics(MeterRegistry meterRegistry) { + `@PostConstruct` + public void registerMetrics() { Gauge.builder("cim.server.online.users", () -> SessionSocketHolder.getRelationShip().size()) .description("Current number of online users connected to this server") .register(meterRegistry); loginCounter = Counter.builder("cim.server.login.count") .description("Total number of login requests processed") .register(meterRegistry); heartbeatCounter = Counter.builder("cim.server.heartbeat.count") .description("Total number of heartbeat (PING) messages received") .register(meterRegistry); messagePushedCounter = Counter.builder("cim.server.messages.pushed") .description("Total number of messages pushed to clients") .register(meterRegistry); log.info("CIM server custom metrics registered"); - return "metricsRegistered"; }cim-server/src/main/java/com/crossoverjie/cim/server/handle/CIMServerHandle.java (1)
41-47: Per-requestSpringBeanFactory.getBean()lookup for theTracer.
getTracer()performs a Spring context lookup on everyLOGIN_REQUEST. While singletons are cached, the lookup still traverses the bean factory. Cache the result in avolatilefield after the first successful retrieval.Proposed fix
+ private volatile Tracer cachedTracer; + private Tracer getTracer() { + if (cachedTracer != null) { + return cachedTracer; + } try { - return SpringBeanFactory.getBean(Tracer.class); + cachedTracer = SpringBeanFactory.getBean(Tracer.class); + return cachedTracer; } catch (Exception e) { return null; } }cim-server/src/main/java/com/crossoverjie/cim/server/server/CIMServer.java (1)
85-139: Span ends before the asyncwriteAndFlushcompletes.
span.end()infinally(line 137) executes immediately aftersocketChannel.writeAndFlush(protocol)is called, but the actual network write hasn't completed yet — theChannelFutureListenerfires asynchronously later. This means the span duration does not reflect actual write latency or failures. If accurate push timing matters, consider ending the span inside the listener callback instead.
| if (status != StatusEnum.SUCCESS) { | ||
| span.setAttribute("loginStatus", "failed"); | ||
| return res; | ||
| } |
There was a problem hiding this comment.
Login early-return leaves span status as UNSET.
When status != SUCCESS, the method returns early at line 237. The span is ended in finally, but its status is never explicitly set (only the attribute loginStatus=failed is recorded). In Jaeger, this span will appear without a clear OK/ERROR status, making it harder to filter and alert on failed logins.
Proposed fix
if (status != StatusEnum.SUCCESS) {
span.setAttribute("loginStatus", "failed");
+ span.setStatus(StatusCode.OK); // request itself succeeded; login was rejected by business logic
return res;
}🤖 Prompt for AI Agents
In
`@cim-forward-route/src/main/java/com/crossoverjie/cim/route/controller/RouteController.java`
around lines 235 - 238, The early return when status != StatusEnum.SUCCESS sets
span attribute "loginStatus" but leaves the span status UNSET; before returning
(in the failure branch in RouteController where you set
span.setAttribute("loginStatus", "failed")), also set the span status to an
error (e.g., span.setStatus(...) with an ERROR/StatusCode.ERROR and optional
message) so the finally block that ends the span records a non-OK status and
failed logins are visible in Jaeger.
| # install Jaeger all-in-one (OpenTelemetry trace backend) | ||
| # Jaeger UI: 16686, OTLP gRPC: 4317, OTLP HTTP: 4318 | ||
| RUN wget -q https://github.com/jaegertracing/jaeger/releases/download/v1.57.0/jaeger-1.57.0-linux-amd64.tar.gz \ | ||
| && tar -xzf jaeger-1.57.0-linux-amd64.tar.gz \ | ||
| && mv jaeger-1.57.0-linux-amd64/jaeger-all-in-one /usr/local/bin/ \ | ||
| && rm -rf jaeger-1.57.0-linux-amd64* |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check Jaeger v1.57.0 available architectures
curl -sL https://api.github.com/repos/jaegertracing/jaeger/releases/tags/v1.57.0 | \
jq -r '.assets[].name' | grep -i 'linux'Repository: crossoverJie/cim
Length of output: 505
🏁 Script executed:
#!/bin/bash
# Get more details on Jaeger release to understand available architectures
curl -sL https://api.github.com/repos/jaegertracing/jaeger/releases/tags/v1.57.0 | \
jq -r '.assets[] | "\(.name)"' | sortRepository: crossoverJie/cim
Length of output: 965
🏁 Script executed:
#!/bin/bash
# Find and examine the Dockerfile to understand context
fd -t f -name "*allin1-ubuntu.Dockerfile" -o -name "*.Dockerfile" | head -20Repository: crossoverJie/cim
Length of output: 290
🏁 Script executed:
#!/bin/bash
# Check for README to verify platform claims
fd -t f -name "README*" | head -5Repository: crossoverJie/cim
Length of output: 290
🏁 Script executed:
#!/bin/bash
cat docker/allin1-ubuntu.Dockerfile | head -50Repository: crossoverJie/cim
Length of output: 1749
🏁 Script executed:
#!/bin/bash
find . -name README.md -o -name README.txt | xargs cat 2>/dev/null | head -100Repository: crossoverJie/cim
Length of output: 3877
🏁 Script executed:
#!/bin/bash
# Check if there's a Dockerfile.build context or multi-arch setup mentioned
find . -name Dockerfile -o -name "*.Dockerfile" | xargs ls -laRepository: crossoverJie/cim
Length of output: 225
🏁 Script executed:
#!/bin/bash
# Look for docker build scripts, CI/CD configs, or hints about multi-arch builds
find . -name "*.yml" -o -name "*.yaml" -o -name "Makefile" -o -name "*.sh" | xargs grep -l "docker\|buildx\|arm64\|amd64" 2>/dev/null | head -10Repository: crossoverJie/cim
Length of output: 90
🏁 Script executed:
#!/bin/bash
# Check if there's any reference to multi-arch or arm in docker-related configs
git ls-files | grep -E "(docker|build|workflow|\.github)" | head -20Repository: crossoverJie/cim
Length of output: 337
🏁 Script executed:
#!/bin/bash
cat .github/workflows/docker.ymlRepository: crossoverJie/cim
Length of output: 1387
Jaeger binary is hardcoded to linux-amd64, breaking multi-arch CI/CD builds.
The CI/CD workflow explicitly builds for linux/amd64, linux/arm64, and linux/arm/v7, but this Dockerfile step downloads only the amd64 binary, causing arm64 builds to fail. Use TARGETARCH build arg to select the correct architecture:
🐛 Proposed fix
+ARG TARGETARCH
# install Jaeger all-in-one (OpenTelemetry trace backend)
# Jaeger UI: 16686, OTLP gRPC: 4317, OTLP HTTP: 4318
-RUN wget -q https://github.com/jaegertracing/jaeger/releases/download/v1.57.0/jaeger-1.57.0-linux-amd64.tar.gz \
- && tar -xzf jaeger-1.57.0-linux-amd64.tar.gz \
- && mv jaeger-1.57.0-linux-amd64/jaeger-all-in-one /usr/local/bin/ \
- && rm -rf jaeger-1.57.0-linux-amd64*
+RUN JAEGER_ARCH=$(case "${TARGETARCH}" in arm64) echo "arm64";; *) echo "amd64";; esac) \
+ && wget -q https://github.com/jaegertracing/jaeger/releases/download/v1.57.0/jaeger-1.57.0-linux-${JAEGER_ARCH}.tar.gz \
+ && tar -xzf jaeger-1.57.0-linux-${JAEGER_ARCH}.tar.gz \
+ && mv jaeger-1.57.0-linux-${JAEGER_ARCH}/jaeger-all-in-one /usr/local/bin/ \
+ && rm -rf jaeger-1.57.0-linux-${JAEGER_ARCH}*Note: Jaeger v1.57.0 provides only amd64 and arm64 binaries. The linux/arm/v7 platform in the CI/CD workflow cannot be satisfied with this version.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # install Jaeger all-in-one (OpenTelemetry trace backend) | |
| # Jaeger UI: 16686, OTLP gRPC: 4317, OTLP HTTP: 4318 | |
| RUN wget -q https://github.com/jaegertracing/jaeger/releases/download/v1.57.0/jaeger-1.57.0-linux-amd64.tar.gz \ | |
| && tar -xzf jaeger-1.57.0-linux-amd64.tar.gz \ | |
| && mv jaeger-1.57.0-linux-amd64/jaeger-all-in-one /usr/local/bin/ \ | |
| && rm -rf jaeger-1.57.0-linux-amd64* | |
| ARG TARGETARCH | |
| # install Jaeger all-in-one (OpenTelemetry trace backend) | |
| # Jaeger UI: 16686, OTLP gRPC: 4317, OTLP HTTP: 4318 | |
| RUN JAEGER_ARCH=$(case "${TARGETARCH}" in arm64) echo "arm64";; *) echo "amd64";; esac) \ | |
| && wget -q https://github.com/jaegertracing/jaeger/releases/download/v1.57.0/jaeger-1.57.0-linux-${JAEGER_ARCH}.tar.gz \ | |
| && tar -xzf jaeger-1.57.0-linux-${JAEGER_ARCH}.tar.gz \ | |
| && mv jaeger-1.57.0-linux-${JAEGER_ARCH}/jaeger-all-in-one /usr/local/bin/ \ | |
| && rm -rf jaeger-1.57.0-linux-${JAEGER_ARCH}* |
🤖 Prompt for AI Agents
In `@docker/allin1-ubuntu.Dockerfile` around lines 13 - 18, The Dockerfile
currently hardcodes the Jaeger artifact "jaeger-1.57.0-linux-amd64.tar.gz" and
will fail for non-amd64 builds; change the step to accept a build ARG
TARGETARCH, map TARGETARCH values to the correct Jaeger archive arch string
(e.g., use "amd64" for TARGETARCH=amd64 and "arm64" for TARGETARCH=arm64, and
fail/skip with a clear message for unsupported TARGETARCH like linux/arm/v7),
then use that mapped arch when composing the download filename and extracting
the "jaeger-all-in-one" binary so the RUN wget/tar/mv/rm sequence fetches the
correct platform-specific tarball.
- Use dedicated gRPC endpoint property (otel.exporter.otlp.grpc.endpoint) - Add span.makeCurrent() with Scope for proper context propagation - Add span.recordException() in p2pRoute catch block - Set span status on login failure early return - Fix OTLP metrics property (management.otlp.metrics.export.url) - Externalize tracing sampling probability via env var - Use volatile for static counter fields in cim-server OtelConfig - Use @PostConstruct instead of @bean returning String for metrics - Await TracerProvider shutdown completion with join() - Cache Tracer with volatile field in CIMServerHandle - Support multi-arch Docker builds with TARGETARCH - Expose Zookeeper/Redis ports in Dockerfile - Remove Jaeger as hard startup dependency in supervisord - Remove redundant micrometer version properties from POM - Update OpenTelemetry version to 1.40.0 - Fix README punctuation
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In
`@cim-forward-route/src/main/java/com/crossoverjie/cim/route/controller/RouteController.java`:
- Around line 155-196: p2pRoute currently only catches CIMException so non-CIM
exceptions leave the OpenTelemetry Span with StatusCode.UNSET and unrecorded;
add a general catch (Exception e) after the existing catch to set
span.setStatus(StatusCode.ERROR, e.getMessage()), span.recordException(e), and
populate the response with a generic error code/message (consistent with other
handlers like groupRoute/login), ensuring the catch references the existing Span
variable `span`, the method `p2pRoute`, and the response `res` to mirror the
error handling pattern used elsewhere.
In `@cim-forward-route/src/main/resources/application.yaml`:
- Line 95: Remove the trailing blank line at the end of application.yaml so the
file ends immediately after the last YAML content; open application.yaml, delete
the extra empty newline/whitespace after the final entry, save, and re-run
yamllint to confirm no EOF blank-line warning remains.
- Around line 73-94: The YAML and manual SDK both configure tracing causing
spans to be exported to different endpoints; consolidate to one approach by
either removing the custom OtelConfig and its
OpenTelemetrySdk/SdkTracerProvider/OtlpGrpcSpanExporter so all spans use Spring
Boot auto-config (management.otlp.tracing.endpoint ->
http://localhost:4318/v1/traces) and inject the auto-configured Tracer into
RouteController, or disable Spring Boot tracing auto-config (via
spring.autoconfigure.exclude) and keep OtelConfig
(otel.exporter.otlp.grpc.endpoint -> http://localhost:4317) as the sole
tracer—update RouteController to use whichever Tracer bean remains and remove
the unused tracing configuration to avoid dual export.
🧹 Nitpick comments (3)
cim-server/src/main/java/com/crossoverjie/cim/server/handle/CIMServerHandle.java (1)
38-54: LazyTracerinitialization is fine for this non-Spring-managed handler, but consider a small robustness improvement.The double-read of
cachedTracerwithout synchronization is benign (worst case: redundantgetBeancalls). However, thecatch (Exception e)silently swallows all exceptions and returnsnull, which means configuration errors (e.g., missing bean definition) will be hidden. Consider logging atdebug/warnlevel on first failure so misconfigurations are detectable.Proposed improvement
try { cachedTracer = SpringBeanFactory.getBean(Tracer.class); return cachedTracer; } catch (Exception e) { + log.debug("Tracer bean not available, tracing disabled: {}", e.getMessage()); return null; }cim-forward-route/src/main/java/com/crossoverjie/cim/route/config/OtelConfig.java (1)
36-37: UnusedmeterRegistryfield injection.The
meterRegistryfield is injected via@Resourcebut never referenced. The counter@Beanmethods receiveMeterRegistryas a method parameter (Spring auto-injects it), so this field is dead code.Proposed fix
- `@jakarta.annotation.Resource` - private MeterRegistry meterRegistry; -cim-forward-route/src/main/java/com/crossoverjie/cim/route/controller/RouteController.java (1)
285-301: Consider adding span tracing toregisterAccountfor consistency.All other mutating endpoints (
groupRoute,p2pRoute,login) are instrumented with both spans and counters, butregisterAccountonly has a counter increment. If a registration fails, there's no trace to diagnose it. This is a minor consistency gap.
| public BaseResponse<NULLBody> p2pRoute(@RequestBody P2PReqVO p2pRequest) { | ||
| BaseResponse<NULLBody> res = new BaseResponse(); | ||
|
|
||
| try { | ||
| //获取接收消息用户的路由信息 | ||
| Optional<CIMServerResVO> cimServerResVO = accountService.loadRouteRelatedByUserId(p2pRequest.getReceiveUserId()); | ||
| // === OpenTelemetry: 创建私聊路由 Span === | ||
| Span span = tracer.spanBuilder("cim.route.p2pMsg").startSpan(); | ||
|
|
||
| try (Scope scope = span.makeCurrent()) { | ||
| // 获取接收消息用户的路由信息 | ||
| Optional<CIMServerResVO> cimServerResVO = accountService | ||
| .loadRouteRelatedByUserId(p2pRequest.getReceiveUserId()); | ||
| if (cimServerResVO.isEmpty()) { | ||
| log.warn("userId={} not online, save offline msg", p2pRequest.getReceiveUserId()); | ||
| offlineMsgService.saveOfflineMsg(p2pRequest); | ||
| throw new CIMException(OFF_LINE); | ||
| } | ||
|
|
||
| //p2pRequest.getReceiveUserId()==>消息接收者的 userID | ||
| ChatReqVO chatVO = new ChatReqVO(p2pRequest.getReceiveUserId(), p2pRequest.getMsg(), p2pRequest.getBatchMsg()); | ||
| // p2pRequest.getReceiveUserId()==>消息接收者的 userID | ||
| ChatReqVO chatVO = new ChatReqVO(p2pRequest.getReceiveUserId(), p2pRequest.getMsg(), | ||
| p2pRequest.getBatchMsg()); | ||
| accountService.pushMsg(cimServerResVO.get(), p2pRequest.getUserId(), chatVO); | ||
|
|
||
| // 记录 Span 属性 | ||
| span.setAttribute("senderId", p2pRequest.getUserId()); | ||
| span.setAttribute("receiverId", p2pRequest.getReceiveUserId()); | ||
| span.setStatus(StatusCode.OK); | ||
|
|
||
| // === Metrics: 递增私聊计数器 === | ||
| p2pMessageCounter.increment(); | ||
|
|
||
| res.setCode(StatusEnum.SUCCESS.getCode()); | ||
| res.setMessage(StatusEnum.SUCCESS.getMessage()); | ||
|
|
||
| } catch (CIMException e) { | ||
| span.setStatus(StatusCode.ERROR, e.getErrorMessage()); | ||
| span.recordException(e); | ||
| res.setCode(e.getErrorCode()); | ||
| res.setMessage(e.getErrorMessage()); | ||
| } finally { | ||
| span.end(); | ||
| } | ||
| return res; | ||
| } |
There was a problem hiding this comment.
p2pRoute only catches CIMException — non-CIM exceptions leave span status UNSET.
If a non-CIMException (e.g., NullPointerException, network error) is thrown, it bypasses the catch (CIMException e) block and goes directly to finally, where span.end() is called but the span status remains UNSET and the exception is not recorded. This contrasts with groupRoute and login, which catch all Exception types.
Proposed fix — add a general catch block
} catch (CIMException e) {
span.setStatus(StatusCode.ERROR, e.getErrorMessage());
span.recordException(e);
res.setCode(e.getErrorCode());
res.setMessage(e.getErrorMessage());
+ } catch (Exception e) {
+ span.setStatus(StatusCode.ERROR, e.getMessage());
+ span.recordException(e);
+ throw e;
} finally {
span.end();
}🤖 Prompt for AI Agents
In
`@cim-forward-route/src/main/java/com/crossoverjie/cim/route/controller/RouteController.java`
around lines 155 - 196, p2pRoute currently only catches CIMException so non-CIM
exceptions leave the OpenTelemetry Span with StatusCode.UNSET and unrecorded;
add a general catch (Exception e) after the existing catch to set
span.setStatus(StatusCode.ERROR, e.getMessage()), span.recordException(e), and
populate the response with a generic error code/message (consistent with other
handlers like groupRoute/login), ensuring the catch references the existing Span
variable `span`, the method `p2pRoute`, and the response `res` to mirror the
error handling pattern used elsewhere.
| # OpenTelemetry 配置 | ||
| management: | ||
| otlp: | ||
| tracing: | ||
| endpoint: http://localhost:4318/v1/traces | ||
| metrics: | ||
| export: | ||
| url: http://localhost:4318/v1/metrics | ||
| tracing: | ||
| sampling: | ||
| probability: ${TRACING_SAMPLING_PROBABILITY:1.0} | ||
| endpoints: | ||
| web: | ||
| exposure: | ||
| include: health,info,metrics | ||
|
|
||
| # OpenTelemetry gRPC endpoint for SDK exporter | ||
| otel: | ||
| exporter: | ||
| otlp: | ||
| grpc: | ||
| endpoint: ${OTEL_EXPORTER_OTLP_GRPC_ENDPOINT:http://localhost:4317} |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Find and examine OtelConfig.java
fd -type f -name "OtelConfig.java" -exec cat -n {} \;
echo "---"
# Check the application.yaml file to confirm configuration
fd -type f -name "application.yaml" -path "*/cim-forward-route/*" -exec cat -n {} \;Repository: crossoverJie/cim
Length of output: 422
🏁 Script executed:
#!/bin/bash
# Find OtelConfig.java anywhere in the repo
find . -name "OtelConfig.java" -type f
echo "---"
# Also search for files containing OpenTelemetrySdk
rg -l "OpenTelemetrySdk" --type java
echo "---"
# Check for application.yaml in the cim-forward-route module
find . -path "*/cim-forward-route/*/application.yaml" -type f
echo "---"
# Look for pom.xml in cim-forward-route to check dependencies
find . -path "*/cim-forward-route/pom.xml" -type fRepository: crossoverJie/cim
Length of output: 531
🏁 Script executed:
#!/bin/bash
# Examine OtelConfig.java in cim-forward-route
cat -n cim-forward-route/src/main/java/com/crossoverjie/cim/route/config/OtelConfig.javaRepository: crossoverJie/cim
Length of output: 4419
🏁 Script executed:
#!/bin/bash
# Check pom.xml for Spring Boot version and OpenTelemetry dependencies
cat -n cim-forward-route/pom.xmlRepository: crossoverJie/cim
Length of output: 6444
🏁 Script executed:
#!/bin/bash
# Read the application.yaml file for cim-forward-route
cat -n cim-forward-route/src/main/resources/application.yaml
echo "---"
# Check if there are any autoconfiguration exclusions
rg -n "spring.autoconfigure.exclude" cim-forward-route/Repository: crossoverJie/cim
Length of output: 2881
🏁 Script executed:
#!/bin/bash
# Check if GlobalOpenTelemetry.set() is called
rg -n "GlobalOpenTelemetry.set" --type java
echo "---"
# Check parent pom.xml for Spring Boot version
cat cim/pom.xml | head -50Repository: crossoverJie/cim
Length of output: 124
🏁 Script executed:
#!/bin/bash
# Find the parent pom.xml
find . -name "pom.xml" -path "*/cim/*" -type f | head -5
echo "---"
# Check the root pom.xml
ls -la pom.xml 2>/dev/null && cat -n pom.xml | head -100Repository: crossoverJie/cim
Length of output: 4736
🏁 Script executed:
#!/bin/bash
# Check for any spring.autoconfigure.exclude in the parent or module poms
rg -n "spring.autoconfigure.exclude" --type xml
echo "---"
# Check for how the Tracer bean is being used in the codebase
rg -n "private.*Tracer\|@Autowired.*Tracer\|@Resource.*Tracer" --type java cim-forward-route/src/main/java/
echo "---"
# Check if spring-boot-starter-actuator triggers auto-config
rg -n "spring.boot.actuate" --type java cim-forward-route/src/main/java/ | head -20Repository: crossoverJie/cim
Length of output: 67
🏁 Script executed:
#!/bin/bash
# Search for any usage of the Tracer bean in cim-forward-route
rg -n "tracer\|Tracer" --type java cim-forward-route/src/main/java/ | grep -v "TracerProvider\|OpenTelemetry" | head -30
echo "---"
# Check if there are any span creations
rg -n "startSpan\|startActiveSpan" --type java cim-forward-route/src/main/java/Repository: crossoverJie/cim
Length of output: 63
🏁 Script executed:
#!/bin/bash
# Check cim-server's OtelConfig to see if it follows the same pattern
cat -n cim-server/src/main/java/com/crossoverjie/cim/server/config/OtelConfig.java | head -70
echo "---"
# Check if cim-server's application.yaml has similar configuration
cat -n cim-server/src/main/resources/application.yaml | grep -A 25 "OpenTelemetry"Repository: crossoverJie/cim
Length of output: 3954
🏁 Script executed:
#!/bin/bash
# Check if cim-server uses the Tracer bean
rg -n "Tracer" cim-server/src/main/java/ --type java | grep -v "TracerProvider\|@Bean" | head -20
echo "---"
# Check how GlobalOpenTelemetry is used or if it's set anywhere
rg -n "GlobalOpenTelemetry\|setGlobal" --type javaRepository: crossoverJie/cim
Length of output: 1948
🏁 Script executed:
#!/bin/bash
# Check if cim-forward-route's RouteHandler or MessageHandler uses Tracer
rg -n "import.*Tracer|new.*Span|startSpan|startActiveSpan" cim-forward-route/src/main/java/ --type java
echo "---"
# Check for Tracer usage in cim-forward-route via SpringBeanFactory or direct injection
rg -n "@Autowired.*Tracer|@Resource.*Tracer|getBean.*Tracer" cim-forward-route/src/main/java/ --type java
echo "---"
# List main classes in cim-forward-route to check if Tracer is injected
ls -la cim-forward-route/src/main/java/com/crossoverjie/cim/route/*/Repository: crossoverJie/cim
Length of output: 3642
🏁 Script executed:
#!/bin/bash
# Check how tracer is injected in RouteController
cat -n cim-forward-route/src/main/java/com/crossoverjie/cim/route/controller/RouteController.java | head -50Repository: crossoverJie/cim
Length of output: 2603
🏁 Script executed:
#!/bin/bash
# Check the field declarations in RouteController
sed -n '40,120p' cim-forward-route/src/main/java/com/crossoverjie/cim/route/controller/RouteController.javaRepository: crossoverJie/cim
Length of output: 2486
Dual tracing setup exports spans to different endpoints — consolidate to single configuration.
The application configures two separate OpenTelemetry tracing paths:
-
Spring Boot auto-configuration (via
micrometer-tracing-bridge-otelon classpath +management.otlp.tracing.endpointin YAML) auto-instruments HTTP requests and exports spans via HTTP tohttp://localhost:4318/v1/traces. -
Manual SDK in OtelConfig creates a separate
OpenTelemetrySdkwithSdkTracerProviderandOtlpGrpcSpanExporter, exporting spans via gRPC tohttp://localhost:4317(configured byotel.exporter.otlp.grpc.endpoint).
RouteController explicitly injects and uses the manual Tracer bean to create spans for group messaging, P2P messaging, and login operations. This means spans from these operations use the manual gRPC path (port 4317), while Spring Boot's auto-instrumented request/response tracing uses the HTTP path (port 4318). If only one collector is running, you'll lose traces; if both are running, traces are split across endpoints and difficult to correlate.
Choose one approach:
- Option A: Rely solely on Spring Boot's auto-configuration by removing OtelConfig and letting Spring Boot handle all tracing.
- Option B: Disable Spring Boot's auto-configuration (
spring.autoconfigure.exclude) and keep only OtelConfig for manual control.
🤖 Prompt for AI Agents
In `@cim-forward-route/src/main/resources/application.yaml` around lines 73 - 94,
The YAML and manual SDK both configure tracing causing spans to be exported to
different endpoints; consolidate to one approach by either removing the custom
OtelConfig and its OpenTelemetrySdk/SdkTracerProvider/OtlpGrpcSpanExporter so
all spans use Spring Boot auto-config (management.otlp.tracing.endpoint ->
http://localhost:4318/v1/traces) and inject the auto-configured Tracer into
RouteController, or disable Spring Boot tracing auto-config (via
spring.autoconfigure.exclude) and keep OtelConfig
(otel.exporter.otlp.grpc.endpoint -> http://localhost:4317) as the sole
tracer—update RouteController to use whichever Tracer bean remains and remove
the unused tracing configuration to avoid dual export.
|
Sorry for the late reply, please check the CI. |
Spring Boot BOM does not manage micrometer-tracing-bridge-otel and micrometer-registry-otlp versions. Restored version properties with updated versions (micrometer-tracing 1.4.2, micrometer 1.14.3).
|
Please refer to the checkstyle fix here |
- Remove unused import io.opentelemetry.context.Scope in CIMServerHandle - Remove trailing whitespace in CIMServer Javadoc
done. |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
cim-server/src/main/java/com/crossoverjie/cim/server/server/CIMServer.java (1)
41-42: Consider constructor injection for better testability.Field injection with
@Autowiredmakes the class harder to unit test and hides dependencies. Constructor injection is generally preferred in Spring applications.♻️ Suggested refactor to constructor injection
`@Component` `@Slf4j` public class CIMServer { private EventLoopGroup boss = new NioEventLoopGroup(); private EventLoopGroup work = new NioEventLoopGroup(); - `@Autowired` - private Tracer tracer; + private final Tracer tracer; + + public CIMServer(Tracer tracer) { + this.tracer = tracer; + } `@Value`("${cim.server.port}") private int nettyPort;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cim-server/src/main/java/com/crossoverjie/cim/server/server/CIMServer.java` around lines 41 - 42, The CIMServer class currently uses field injection for the Tracer (private Tracer tracer) which reduces testability; change to constructor injection by removing the `@Autowired` field and adding a constructor (e.g., public CIMServer(Tracer tracer, ...) or overloaded constructor) that accepts Tracer and assigns it to the instance field, update any `@Autowired` annotations accordingly, and ensure tests and Spring configuration instantiate CIMServer via its constructor so the Tracer dependency can be injected/mocked.cim-server/src/main/java/com/crossoverjie/cim/server/handle/CIMServerHandle.java (1)
47-52: Surface tracer lookup failures instead of failing silentlyLine 50 currently swallows initialization errors, which can disable tracing with no operational signal. Log at least once when bean lookup fails.
Proposed fix
+ private final java.util.concurrent.atomic.AtomicBoolean tracerLookupWarned = + new java.util.concurrent.atomic.AtomicBoolean(false); + private Tracer getTracer() { if (cachedTracer != null) { return cachedTracer; } try { cachedTracer = SpringBeanFactory.getBean(Tracer.class); return cachedTracer; } catch (Exception e) { + if (tracerLookupWarned.compareAndSet(false, true)) { + log.warn("Tracer bean unavailable; login tracing is disabled.", e); + } return null; } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cim-server/src/main/java/com/crossoverjie/cim/server/handle/CIMServerHandle.java` around lines 47 - 52, The current try/catch around SpringBeanFactory.getBean(Tracer.class) in CIMServerHandle swallows exceptions; change the catch to log the failure (include the exception) before returning null so tracer lookup failures are visible; use the class logger (e.g., the CIMServerHandle logger) and, if you want to avoid noisy repeated logs, gate the log with a static flag/AtomicBoolean (e.g., tracerLookupLogged) so the failure is logged at least once, then return null as before.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@cim-server/src/main/java/com/crossoverjie/cim/server/handle/CIMServerHandle.java`:
- Around line 110-113: The span currently exports a raw username via
span.setAttribute("userName", msg.getReqMsg()) in CIMServerHandle; replace that
exposure with an anonymized value (e.g., a deterministic hash or truncated
token) or remove the attribute entirely: compute a non-reversible identifier
from msg.getReqMsg() and set that with span.setAttribute("userNameHash",
<hashedValue>) or only emit a boolean/role attribute instead, keeping
span.setAttribute("userId", msg.getRequestId()) and
span.setStatus(StatusCode.OK) unchanged.
In `@cim-server/src/main/java/com/crossoverjie/cim/server/server/CIMServer.java`:
- Around line 93-139: The span is ended and marked OK before the async write
completes; update the logic so the ChannelFutureListener handles span status,
exception recording, counter increment and span.end() based on the write result:
keep creating the Span (variable span) as-is, but remove setting
span.setStatus(StatusCode.OK) and the
OtelConfig.getMessagePushedCounter().increment() from the try block and also
remove span.end() from the finally; instead, in the ChannelFutureListener
attached to socketChannel.writeAndFlush(protocol) inspect
channelFuture.isSuccess() and on success set span attributes
(targetUserId/messageType), set StatusCode.OK, increment
OtelConfig.getMessagePushedCounter(), and call span.end(); on failure set
StatusCode.ERROR, call span.recordException(channelFuture.cause()) and then
span.end(); preserve the existing catch block behavior for synchronous
exceptions thrown before write.
---
Nitpick comments:
In
`@cim-server/src/main/java/com/crossoverjie/cim/server/handle/CIMServerHandle.java`:
- Around line 47-52: The current try/catch around
SpringBeanFactory.getBean(Tracer.class) in CIMServerHandle swallows exceptions;
change the catch to log the failure (include the exception) before returning
null so tracer lookup failures are visible; use the class logger (e.g., the
CIMServerHandle logger) and, if you want to avoid noisy repeated logs, gate the
log with a static flag/AtomicBoolean (e.g., tracerLookupLogged) so the failure
is logged at least once, then return null as before.
In `@cim-server/src/main/java/com/crossoverjie/cim/server/server/CIMServer.java`:
- Around line 41-42: The CIMServer class currently uses field injection for the
Tracer (private Tracer tracer) which reduces testability; change to constructor
injection by removing the `@Autowired` field and adding a constructor (e.g.,
public CIMServer(Tracer tracer, ...) or overloaded constructor) that accepts
Tracer and assigns it to the instance field, update any `@Autowired` annotations
accordingly, and ensure tests and Spring configuration instantiate CIMServer via
its constructor so the Tracer dependency can be injected/mocked.
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
cim-server/src/main/java/com/crossoverjie/cim/server/handle/CIMServerHandle.javacim-server/src/main/java/com/crossoverjie/cim/server/server/CIMServer.java
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #186 +/- ##
============================================
- Coverage 50.34% 47.65% -2.69%
Complexity 320 320
============================================
Files 79 80 +1
Lines 1738 1836 +98
Branches 134 157 +23
============================================
Hits 875 875
- Misses 822 920 +98
Partials 41 41 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
- Add @ConditionalOnProperty(cim.otel.enabled) to OtelConfig in both cim-server and cim-forward-route, so OTel beans are only created when explicitly enabled - Use @Autowired(required=false) with null guards for Tracer/Counter in RouteController; CIMServerHandle already has null protection - Add cim.otel.enabled=${OTEL_ENABLED:false} to both application.yaml files, defaulting to disabled - Parameterize OTLP endpoints with environment variables - Dockerfile: use ARG ENABLE_OTEL=false to conditionally install Jaeger - supervisord: use %(ENV_ENABLE_OTEL)s for Jaeger autostart To enable OTel, build with --build-arg ENABLE_OTEL=true and run with OTEL_ENABLED=true.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (2)
docker/allin1-ubuntu.Dockerfile (1)
17-23:⚠️ Potential issue | 🟠 MajorHandle unsupported
TARGETARCHexplicitly before downloading Jaeger.Line 18 currently does a no-op transform, so unsupported values (for example
arm/arm/v7) pass through and produce invalid artifact URLs whenENABLE_OTEL=true. Add an explicit architecture map and fail fast with a clear error.🐛 Proposed fix
RUN if [ "$ENABLE_OTEL" = "true" ]; then \ - JAEGER_ARCH=$(echo ${TARGETARCH:-amd64} | sed 's/arm64/arm64/;s/amd64/amd64/') \ + case "${TARGETARCH:-amd64}" in \ + amd64) JAEGER_ARCH="amd64" ;; \ + arm64) JAEGER_ARCH="arm64" ;; \ + *) echo "Unsupported TARGETARCH='${TARGETARCH}' for Jaeger v1.57.0 (supported: amd64, arm64)" >&2; exit 1 ;; \ + esac \ && wget -q https://github.com/jaegertracing/jaeger/releases/download/v1.57.0/jaeger-1.57.0-linux-${JAEGER_ARCH}.tar.gz \ && tar -xzf jaeger-1.57.0-linux-${JAEGER_ARCH}.tar.gz \ && mv jaeger-1.57.0-linux-${JAEGER_ARCH}/jaeger-all-in-one /usr/local/bin/ \ && rm -rf jaeger-1.57.0-linux-${JAEGER_ARCH}*; \ fiUse this read-only verification script to confirm the current mapping problem and Jaeger asset support:
#!/bin/bash set -euo pipefail echo "== Current Dockerfile mapping line ==" fd -t f allin1-ubuntu.Dockerfile --exec rg -n 'JAEGER_ARCH|TARGETARCH|jaeger-1\.57\.0-linux-\$\{JAEGER_ARCH\}' {} echo echo "== Simulate current sed mapping ==" for a in amd64 arm64 arm arm/v7; do mapped="$(printf '%s' "$a" | sed 's/arm64/arm64/;s/amd64/amd64/')" echo "$a -> $mapped" done echo echo "== Jaeger v1.57.0 Linux assets ==" python - <<'PY' import json, urllib.request url="https://api.github.com/repos/jaegertracing/jaeger/releases/tags/v1.57.0" data=json.load(urllib.request.urlopen(url)) assets=sorted(a["name"] for a in data.get("assets",[]) if "linux" in a.get("name","")) for a in assets: print(a) PYExpected result:
arm/arm/v7remain unmapped under current logic, while Jaeger v1.57.0 provides linux archives foramd64/arm64only.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docker/allin1-ubuntu.Dockerfile` around lines 17 - 23, The current JAEGER_ARCH mapping (sed 's/arm64/arm64/;s/amd64/amd64/') silently passes unsupported TARGETARCH values and produces invalid Jaeger URLs when ENABLE_OTEL=true; update the Dockerfile block that computes JAEGER_ARCH to implement an explicit architecture map (e.g., switch/case or if/elif) that translates known TARGETARCH values (amd64 -> amd64, arm64 -> arm64) and otherwise aborts with a clear error message (exit 1) before attempting the wget; ensure you reference and validate the TARGETARCH variable, set JAEGER_ARCH only on supported values, and fail fast with a descriptive message so the subsequent wget/tar/mv steps are never run for unsupported architectures.cim-forward-route/src/main/java/com/crossoverjie/cim/route/controller/RouteController.java (1)
203-216:⚠️ Potential issue | 🟡 Minor
p2pRouteonly catchesCIMException— non-CIM exceptions leave span with UNSET status.Unlike
groupRouteandloginwhich catch allExceptiontypes,p2pRouteonly catchesCIMException. If a different exception occurs (e.g.,NullPointerExceptionfromaccountService), the span status remains UNSET and the exception is not recorded, making troubleshooting harder in distributed traces.🔧 Proposed fix — add a general catch block
} catch (CIMException e) { if (span != null) { span.setStatus(StatusCode.ERROR, e.getErrorMessage()); span.recordException(e); } res.setCode(e.getErrorCode()); res.setMessage(e.getErrorMessage()); + } catch (Exception e) { + if (span != null) { + span.setStatus(StatusCode.ERROR, e.getMessage()); + span.recordException(e); + } + throw e; } finally {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cim-forward-route/src/main/java/com/crossoverjie/cim/route/controller/RouteController.java` around lines 203 - 216, The p2pRoute handler currently only catches CIMException so non-CIM exceptions leave the OpenTelemetry span UNSET and unrecorded; add a general catch (Exception e) after the existing catch(CIMException e) in the p2pRoute method to set span.setStatus(StatusCode.ERROR, e.getMessage()), call span.recordException(e), and populate the response (res) with a generic error code/message (or map to existing error contract) so unexpected exceptions are logged similarly to CIMException; keep the existing finally block that closes scope and ends span.
🧹 Nitpick comments (2)
cim-server/src/main/java/com/crossoverjie/cim/server/config/OtelConfig.java (2)
75-75: Prefer dynamic tracer version over hardcoded1.0.0.Line 75 hardcodes instrumentation version, which can become stale after releases and make trace metadata less reliable for debugging.
Proposed refactor
- return openTelemetry.getTracer("cim-server", "1.0.0"); + String version = OtelConfig.class.getPackage().getImplementationVersion(); + return openTelemetry.getTracer("cim-server", version != null ? version : "unknown");🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cim-server/src/main/java/com/crossoverjie/cim/server/config/OtelConfig.java` at line 75, The tracer version is hardcoded in the getTracer call; change OtelConfig to compute a dynamic version string (e.g., read from the package implementation version via OtelConfig.class.getPackage().getImplementationVersion() or from project/Maven properties like pom.properties/manifest) and pass that variable into openTelemetry.getTracer("cim-server", version) with a safe fallback (e.g., "unknown" or project version) if the lookup returns null. Update the code path that currently returns openTelemetry.getTracer("cim-server", "1.0.0") to use this computed version value.
32-32: Avoid exposing nullable static counters when OTel is disabled.Because Line 32 disables this config by default unless enabled, counters at Lines 43-48 are never initialized unless
registerMetrics()runs (Lines 85-95). Static getters therefore expose nullable counters, which is a runtime NPE hazard for any direct.increment()call path.Proposed safer pattern (null-safe increment API)
`@Getter` private static volatile Counter loginCounter; `@Getter` private static volatile Counter heartbeatCounter; `@Getter` private static volatile Counter messagePushedCounter; +public static void incrementLoginCounter() { + Counter c = loginCounter; + if (c != null) { + c.increment(); + } +} + +public static void incrementHeartbeatCounter() { + Counter c = heartbeatCounter; + if (c != null) { + c.increment(); + } +} + +public static void incrementMessagePushedCounter() { + Counter c = messagePushedCounter; + if (c != null) { + c.increment(); + } +}Then switch call sites from
get...Counter().increment()to these helpers.Also applies to: 43-48, 85-95
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cim-server/src/main/java/com/crossoverjie/cim/server/config/OtelConfig.java` at line 32, OtelConfig currently exposes nullable static counter fields via get...Counter() that are only set in registerMetrics(), creating NPE risk when cim.otel.enabled is false; change to provide null-safe behavior by replacing direct static counter exposure with either (a) initialize counters to a Noop/NoopDoubleCounter placeholder or (b) remove public static getters and add null-safe helper methods on OtelConfig such as incrementXCounter(), incrementYCounter() that check the real counter and perform a no-op if null; update all call sites to use these new incrementXCounter() helpers instead of get...Counter().increment(), and keep registerMetrics() responsible only for swapping in real counters when enabled. Ensure symbols referenced: class OtelConfig, registerMetrics(), all get...Counter() methods and the new incrementXCounter()/incrementYCounter() helpers.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cim-forward-route/src/main/resources/application.yaml`:
- Line 100: Remove the extra blank line at the end of application.yaml so the
file ends immediately after the last YAML content (no trailing empty line); open
application.yaml and delete the final empty line so YAMLlint no longer flags a
trailing blank line.
---
Duplicate comments:
In
`@cim-forward-route/src/main/java/com/crossoverjie/cim/route/controller/RouteController.java`:
- Around line 203-216: The p2pRoute handler currently only catches CIMException
so non-CIM exceptions leave the OpenTelemetry span UNSET and unrecorded; add a
general catch (Exception e) after the existing catch(CIMException e) in the
p2pRoute method to set span.setStatus(StatusCode.ERROR, e.getMessage()), call
span.recordException(e), and populate the response (res) with a generic error
code/message (or map to existing error contract) so unexpected exceptions are
logged similarly to CIMException; keep the existing finally block that closes
scope and ends span.
In `@docker/allin1-ubuntu.Dockerfile`:
- Around line 17-23: The current JAEGER_ARCH mapping (sed
's/arm64/arm64/;s/amd64/amd64/') silently passes unsupported TARGETARCH values
and produces invalid Jaeger URLs when ENABLE_OTEL=true; update the Dockerfile
block that computes JAEGER_ARCH to implement an explicit architecture map (e.g.,
switch/case or if/elif) that translates known TARGETARCH values (amd64 -> amd64,
arm64 -> arm64) and otherwise aborts with a clear error message (exit 1) before
attempting the wget; ensure you reference and validate the TARGETARCH variable,
set JAEGER_ARCH only on supported values, and fail fast with a descriptive
message so the subsequent wget/tar/mv steps are never run for unsupported
architectures.
---
Nitpick comments:
In `@cim-server/src/main/java/com/crossoverjie/cim/server/config/OtelConfig.java`:
- Line 75: The tracer version is hardcoded in the getTracer call; change
OtelConfig to compute a dynamic version string (e.g., read from the package
implementation version via
OtelConfig.class.getPackage().getImplementationVersion() or from project/Maven
properties like pom.properties/manifest) and pass that variable into
openTelemetry.getTracer("cim-server", version) with a safe fallback (e.g.,
"unknown" or project version) if the lookup returns null. Update the code path
that currently returns openTelemetry.getTracer("cim-server", "1.0.0") to use
this computed version value.
- Line 32: OtelConfig currently exposes nullable static counter fields via
get...Counter() that are only set in registerMetrics(), creating NPE risk when
cim.otel.enabled is false; change to provide null-safe behavior by replacing
direct static counter exposure with either (a) initialize counters to a
Noop/NoopDoubleCounter placeholder or (b) remove public static getters and add
null-safe helper methods on OtelConfig such as incrementXCounter(),
incrementYCounter() that check the real counter and perform a no-op if null;
update all call sites to use these new incrementXCounter() helpers instead of
get...Counter().increment(), and keep registerMetrics() responsible only for
swapping in real counters when enabled. Ensure symbols referenced: class
OtelConfig, registerMetrics(), all get...Counter() methods and the new
incrementXCounter()/incrementYCounter() helpers.
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
cim-forward-route/src/main/java/com/crossoverjie/cim/route/config/OtelConfig.javacim-forward-route/src/main/java/com/crossoverjie/cim/route/controller/RouteController.javacim-forward-route/src/main/resources/application.yamlcim-server/src/main/java/com/crossoverjie/cim/server/config/OtelConfig.javacim-server/src/main/resources/application.yamldocker/allin1-ubuntu.Dockerfiledocker/supervisord.conf
🚧 Files skipped from review as they are similar to previous changes (3)
- cim-forward-route/src/main/java/com/crossoverjie/cim/route/config/OtelConfig.java
- cim-server/src/main/resources/application.yaml
- docker/supervisord.conf
| otlp: | ||
| grpc: | ||
| endpoint: ${OTEL_EXPORTER_OTLP_GRPC_ENDPOINT:http://localhost:4317} | ||
|
|
There was a problem hiding this comment.
Remove trailing blank line.
YAMLlint flags an extra blank line at the end of the file. Remove it to satisfy the linter.
🧰 Tools
🪛 YAMLlint (1.38.0)
[warning] 100-100: too many blank lines (1 > 0)
(empty-lines)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cim-forward-route/src/main/resources/application.yaml` at line 100, Remove
the extra blank line at the end of application.yaml so the file ends immediately
after the last YAML content (no trailing empty line); open application.yaml and
delete the final empty line so YAMLlint no longer flags a trailing blank line.
- Replace raw userName with userNameHash in span attributes to avoid privacy/compliance risk (CIMServerHandle) - Move span lifecycle management into ChannelFutureListener callback to correctly reflect async write results (CIMServer)
What
Integrate OpenTelemetry into
cim-serverandcim-forward-routemodules to enable distributed tracing and business metrics collection, as listed in the TODO.Why
As CIM is a distributed IM system with multiple services (server, route), it's essential to have end-to-end tracing and metrics to monitor message flow, identify bottlenecks, and debug issues in production.
Changes
Dependencies (pom.xml)
micrometer-tracing-bridge-otel,opentelemetry-exporter-otlp,opentelemetry-api,opentelemetry-sdk,micrometer-registry-otlpto root and module POMs.cim-server
LOGIN_REQUESTandPINGcommands with metrics counters.cim-forward-route
Docker
Custom Metrics
cim.server.online.userscim.server.messages.pushedcim.server.heartbeat.countcim.server.login.countcim.route.messages.groupcim.route.messages.p2pcim.route.login.countcim.route.register.countHow to verify
http://localhost:16686.cim-serverorcim-forward-routeto view traces.http://localhost:8081/actuator/metrics/cim.server.online.users.Summary by CodeRabbit
New Features
Documentation
Chores