Skip to content

llo: scaling improvements and safeguard channel definition version management#17435

Merged
jmank88 merged 6 commits intodevelopfrom
bm/llo-observation-cache
May 6, 2025
Merged

llo: scaling improvements and safeguard channel definition version management#17435
jmank88 merged 6 commits intodevelopfrom
bm/llo-observation-cache

Conversation

@brunotm
Copy link
Copy Markdown
Collaborator

@brunotm brunotm commented Apr 24, 2025

  • between rounds (within 750ms) observation cache to ensure we don't ddos the EAs when dealing with thousands of streams (~3.5k rps -> ~1.7k rps reduction with 1k streams)
  • pace and batch transmit commits to ensure we don't ddos the node database (100% cpu -> ~20% cpu reduction with 1k streams)
  • ensure we don't store invalid block numbers for channel definitions
  • update wsrpc to avoid transmit panic in telemetry client
image image

@brunotm brunotm added the build-test-image Will build the e2e test image in integration-tests workflow for PRs label Apr 24, 2025
@github-actions
Copy link
Copy Markdown
Contributor

I see you updated files related to core. Please run pnpm changeset in the root directory to add a changeset as well as in the text include at least one of the following tags:

  • #added For any new functionality added.
  • #breaking_change For any functionality that requires manual action for the node to boot.
  • #bugfix For bug fixes.
  • #changed For any change to the existing functionality.
  • #db_update For any feature that introduces updates to database schema.
  • #deprecation_notice For any upcoming deprecation functionality.
  • #internal For changesets that need to be excluded from the final changelog.
  • #nops For any feature that is NOP facing and needs to be in the official Release Notes for the release.
  • #removed For any functionality/config that is removed.
  • #updated For any functionality that is updated.
  • #wip For any change that is not ready yet and external communication about it should be held off till it is feature complete.

🎖️ No JIRA issue number found in: PR title, commit message, or branch name. Please include the issue ID in one of these.

@brunotm brunotm changed the title llo: add short lived observation cache to deamplify EA requests llo: add short lived observation cache to deamplify EA requests and commit batch Apr 29, 2025
@smartcontractkit smartcontractkit deleted a comment from github-actions Bot May 1, 2025
@smartcontractkit smartcontractkit deleted a comment from github-actions Bot May 1, 2025
@smartcontractkit smartcontractkit deleted a comment from github-actions Bot May 1, 2025
@brunotm brunotm changed the title llo: add short lived observation cache to deamplify EA requests and commit batch llo: improving scaling and safeguard channel definition version management May 2, 2025
@brunotm brunotm changed the title llo: improving scaling and safeguard channel definition version management llo: improve scaling and safeguard channel definition version management May 2, 2025
@brunotm brunotm changed the title llo: improve scaling and safeguard channel definition version management llo: scaling improvements and safeguard channel definition version management May 2, 2025
Comment thread package.json Outdated
Comment thread core/services/llo/channeldefinitions/onchain_channel_definition_cache.go Outdated
Comment on lines 113 to 116
Copy link
Copy Markdown

@fmunshi fmunshi May 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this mean we have a max of 1 observation for each round?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have a max of 1 transmit per second and up to 4 rounds per second (deltaRound = 250ms)

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements scaling improvements including caching of observation results to reduce the load on external systems, batches commit transmissions to lower CPU usage, and enhances robustness in channel definition management and telemetry reporting. Key changes include:

  • Introducing an observation cache in data sources for reducing duplicate processing and DDoS risk.
  • Updating the mercury transmitter to use a commit channel and batch processing for transmitting reports.
  • Refining channel definition cache handling by introducing a wrapped log construct and updating error logging and persistence logic.

Reviewed Changes

Copilot reviewed 16 out of 23 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
core/services/llo/observation/data_source_test.go Added test cases to verify caching behavior and error handling in stream observations.
core/services/llo/observation/data_source.go Introduced caching functionality with atomic flag controls to prevent redundant observations.
core/services/llo/mercurytransmitter/transmitter_test.go Modified tests to adapt to the updated transmitter commit queue API.
core/services/llo/mercurytransmitter/transmitter.go Refactored transmitter to batch transmissions using a commit channel and commit loop.
core/services/llo/mercurytransmitter/queue_test.go Updated tests to check queue length via a new Len() method.
core/services/llo/mercurytransmitter/queue.go Added a thread-safe Len() method for the transmit queue.
core/services/llo/channeldefinitions/onchain_channel_definition_cache_test.go Updated tests to work with the new wrapped log structure.
core/services/llo/channeldefinitions/onchain_channel_definition_cache.go Refactored channel definition cache to use a wrapped log and improved logging in persistence.
Files not reviewed (7)
  • core/scripts/go.mod: Language not supported
  • deployment/go.mod: Language not supported
  • go.mod: Language not supported
  • integration-tests/go.mod: Language not supported
  • integration-tests/load/go.mod: Language not supported
  • system-tests/lib/go.mod: Language not supported
  • system-tests/tests/go.mod: Language not supported
Comments suppressed due to low confidence (1)

core/services/llo/channeldefinitions/onchain_channel_definition_cache.go:121

  • [nitpick] The name 'wrappedLog' might be made more descriptive (e.g. 'channelDefinitionLogWrapper') to clarify its intent when used with channel definitions.
type wrappedLog struct {

Comment thread core/services/llo/mercurytransmitter/transmitter.go Outdated
jmank88
jmank88 previously approved these changes May 2, 2025
fmunshi
fmunshi previously approved these changes May 2, 2025
Copy link
Copy Markdown

@fmunshi fmunshi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

jmank88
jmank88 previously approved these changes May 3, 2025
mjk90
mjk90 previously approved these changes May 5, 2025
@cl-sonarqube-production
Copy link
Copy Markdown

@jmank88 jmank88 closed this May 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

build-test-image Will build the e2e test image in integration-tests workflow for PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants