Skip to content

feat(webinar): add metrics#4942

Open
Tianhui-Han wants to merge 5 commits intowebex:nextfrom
Tianhui-Han:feat/webianr_metrics
Open

feat(webinar): add metrics#4942
Tianhui-Han wants to merge 5 commits intowebex:nextfrom
Tianhui-Han:feat/webianr_metrics

Conversation

@Tianhui-Han
Copy link
Copy Markdown
Contributor

@Tianhui-Han Tianhui-Han commented Apr 30, 2026

This pull request addresses

https://jira-eng-gpk2.cisco.com/jira/browse/SPARK-774846

by making the following changes

  • Sync metrics use version-based correlation: only LLM messages with dataSet.version >= pending.dataSetVersion can complete pending metrics, preventing stale messages from producing incorrect latencies.
  • Pending metrics are only stored when a POST /sync request is actually issued (no false events when leaves already match).
  • totalTime measures from GET /hashtree through sync completion (excludes random backoff, which is reported separately).
  • Initialization syncs at join time are excluded; only in-meeting syncs emit metrics.

Change Type

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update
  • Tooling change
  • Internal code refactor

The following scenarios were tested

< ENUMERATE TESTS PERFORMED, WHETHER MANUAL OR AUTOMATED >

The GAI Coding Policy And Copyright Annotation Best Practices

  • GAI was not used (or, no additional notation is required)
  • Code was generated entirely by GAI
  • GAI was used to create a draft that was subsequently customized or modified
  • Coder created a draft manually that was non-substantively modified by GAI (e.g., refactoring was performed by GAI on manually written code)
  • Tool used for AI assistance (GitHub Copilot / Other - specify)
    • Github Copilot
    • Other - Please Specify
  • This PR is related to
    • Feature
    • Defect fix
    • Tech Debt
    • Automation

I certified that

  • I have read and followed contributing guidelines
  • I discussed changes with code owners prior to submitting this pull request
  • I have not skipped any automated checks
  • All existing and new tests passed
  • I have updated the documentation accordingly

Make sure to have followed the contributing guidelines before submitting.

@Tianhui-Han Tianhui-Han marked this pull request as ready for review May 6, 2026 01:04
@Tianhui-Han Tianhui-Han requested review from a team as code owners May 6, 2026 01:04
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e185ad3534

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +1168 to +1170
for (const dataSet of message.dataSets) {
if (this.pendingSyncMetrics.has(dataSet.name)) {
this.completeSyncMetrics(dataSet.name, messageReceivedAt);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Correlate sync completion to request, not just dataset name

Completing pending metrics by matching only dataSet.name allows unrelated hash-tree messages to close a pending sync and emit client.locus.sync.complete with incorrect latencies. If a normal main update arrives before the sync-triggered broadcast, syncMessageReceiveTime/totalTime are attributed to the wrong message and the real sync completion is effectively lost. This is especially risky because tracked datasets like main can receive frequent independent updates.

Useful? React with 👍 / 👎.

Comment on lines +1390 to +1395
if (shouldCollectMetrics) {
const randomBackoffTime = dataSet.lastBackoffTime ?? 0;
dataSet.lastBackoffTime = undefined;

this.pendingSyncMetrics.set(dataSet.name, {
syncResponseReceivedAt: performance.now(),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Create pending metrics only when a sync request is issued

Pending sync metrics are inserted unconditionally for tracked datasets, even when no POST /sync call was made (for example, when diffHashes yields no mismatched leaves and Object.keys(leavesData).length is 0). In that path, the next message for the dataset can still emit client.locus.sync.complete, producing a fabricated "sync complete" event for a sync that never happened.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e7c479708c

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +6653 to +6657
if (isInitialJoin && timings) {
this.sendLLMConnectMetric(timings);
}

return Promise.resolve(timings);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Use register result instead of undefined timings

Replace timings with the value returned by registerAndConnect in this callback. As written, timings is never defined in scope, so any successful registerAndConnect call will hit a ReferenceError when evaluating if (isInitialJoin && timings), causing updateLLMConnection() to reject after connecting and preventing both listener setup completion and the new LLM latency metric emission during join.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9dfc6a666f

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +1417 to +1419
this.pendingSyncMetrics.set(dataSet.name, {
syncResponseReceivedAt: performance.now(),
totalStartTime: totalStart,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Clear stale pending sync metrics before reusing parser

When a new pending metric is stored here, it can survive parser lifecycle transitions because pendingSyncMetrics is never cleared in stop(), resumeFromMessage(), or resumeFromApiResponse(). In LocusInfo, stopped parsers are explicitly resumed and reused, so a stale entry from a previous locus session can be completed by the next unrelated hash-tree message and emit a bogus client.locus.sync.complete latency. Please clear pending entries when stopping/resuming (or when replacing datasets) so metrics cannot leak across parser sessions.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9446a31d85

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

// Store pending sync metrics only when a sync request was actually issued.
// If no leaves were mismatched, no sync was needed, so no metrics should be emitted.
if (shouldCollectMetrics && syncRequestMade) {
this.pendingSyncMetrics.set(dataSet.name, {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve pending sync metrics across overlapping sync attempts

Avoid overwriting an existing pending entry for the same dataset here. pendingSyncMetrics is keyed only by dataset name, and set() replaces the previous request state unconditionally; because runSyncAlgorithm() is restarted on every sync, a delayed broadcast can let a second sync run first, so the first request’s latencies are lost and the eventual completion is attributed to the wrong attempt. This skews client.locus.sync.complete timing data under normal retry/latency conditions.

Useful? React with 👍 / 👎.

error?: any
) {
// @ts-ignore
const llmWebsocketUrl = this.webex.internal.llm.getDatachannelUrl?.() || undefined;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Use actual WebSocket URL for llmWebsocketUrl metric field

This metric populates llmWebsocketUrl using getDatachannelUrl(), but that getter returns the registration/datachannel endpoint (sessionData.datachannelUrl), not the connected socket URL (sessionData.webSocketUrl). As a result, both client.llm.connect.response and client.locus.sync.complete emit a mislabeled identifier, which breaks downstream correlation that expects the websocket URL value.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6656ad6f23

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +1598 to +1599
// Record the actual backoff time for sync metrics
dataSet.lastBackoffTime = Math.round(performance.now() - timerSetAt);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Record only jitter in randomBackoffTime

randomBackoffTime is populated from elapsed timer duration here, but this timer waits for idleMs + backoff, not just the random backoff portion. As a result every emitted sync metric overstates random backoff by at least idleMs, which skews latency analysis and makes dataset-to-dataset comparisons inaccurate.

Useful? React with 👍 / 👎.

Comment on lines +6318 to +6319
// Send LLM connect metric with error on failure
this.sendLLMConnectMetric({clientLLMDatachannelResponseTime: 0}, error);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve measured latency on LLM connect failures

This failure path always emits clientLLMDatachannelResponseTime: 0, even when registerAndConnect() may have already spent non-zero time in the datachannel POST and then failed during websocket connect. That introduces systematically false latency values for failed joins and degrades diagnostic signal in client.llm.connect.response metrics.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant