Skip to content

Improved perfromance helm execution #23402

@fm1ck3y

Description

@fm1ck3y

Summary

Hi ArgoCD Team!

This issue desccussion to:

We are implementing ArgoCD as a deployer in our company. We chose helm http repository instead of OCI because::

  • OCI does not support wildcard in dependencies
  • OCI does not support alias in dependencies
  • OCI use only direct link

We have 2 regions, and one proxy link that redirects to the nearest region.
When executing helm dependency build & helm pull, we encounter 2 problems:

1. Network Issue

Description:

  • We have 100 clusters, each running ArgoCD
  • Each ArgoCD instance has between 300 to 800 applications
  • Using reposerver.parallelism.limit: '30'
  • Each ArgoCD instance uses a common helm http repository
  • index.yaml file size is 40MB

Based on these parameters, we can derive the following formula for maximum network load:

Max Network Load = (Number of Clusters) × (index.yaml size) × (Number of Parallel Requests)

Where:
- Number of Clusters = 100
- Max Applications per Cluster = 800
- index.yaml size = 40MB
- Number of Parallel Requests = min(reposerver.parallelism.limit, Max Applications per Cluster) = min(30, 800) = 30

Therefore:
Max Network Load = 100 × 40MB × 30 = 120,000MB ~= 118GB

2. High RAM Usage Issue

Description:

  • Using reposerver.parallelism.limit: '30'
  • Through experimental testing, it was found that helm pull consumes ~900MB for a repository with an index.yaml file of 40MB.

We encountered a performance issue with the repo-server when the helm repository (http) is too large.
Helm is required to download the index for the helm pull command. With an index.yaml size of 40 MB in the helm repository, a single helm pull consumes ~900mb.

There are related tickets in Helm:

They mention that helm will consume less if the index is stored in .json format. In any case, when the helm repository is large enough, it significantly loads the repo-server.

Based on these parameters, we can derive a formula for the load on argocd-repo-server purely from helm execution:

Max RAM Usage Repo Server = (Helm RAM Usage) * (Number of Parallel Requests) + (argocd-repo-server RAM usage)

Where 
 - Number of Parallel Requests = 30
 - Helm RAM Usage = 900MB
 - argocd-repo-server RAM usage = ~500MB

Therefore: 
Max Ram Usage Repo Server = 30 * 900 + 500 = 27,500 MB ~= 27GB

Motivation

Image
Image

settings of argocd-cmd-params-cm:

reposerver.parallelism.limit: '30'
reposerver.exec.timeout: '180s'
controller.operation.processors: '10'
controller.status.processors: '20'

Number of replicas: in this case, it doesn't matter since we're looking at the maximum load per pod.

Example of helm pull execution for a chart:

time="2025-05-22T05:21:48Z" level=info msg=Trace args="[helm pull --destination /tmp/67aa0bad-e931-4c59-8b58-715bfdeac363 --version 0.1.0 --repo https://<registry>/helm.charts my-app]" dir= operation_name="exec helm" time_ms=90285.762594

From the logs, we can see that the operation takes 90s, which is quite long for a helm pull.

Proposal

1. Direct Helm Pull

We have an idea to add the ability to enable helm pull via direct URL, bypassing the index.yaml file, as a parameter in the argocd-cmd-params-cm ConfigMap and name it, for example: reposerver.helm.pull.direct.

helm pull --destination /tmp/67aa0bad-e931-4c59-8b58-715bfdeac363 --version 0.1.0 --repo https://<registry>/helm.charts my-app

->

helm pull --destination /tmp/67aa0bad-e931-4c59-8b58-715bfdeac363 https://<registry>/helm.charts/my-app-0.1.0.tgz

Such downloading takes only 1s and consumes almost no resources. At the same time, helm dependency build & helm repo update will work normally if needed.

Conclusions

  • This will reduce the load on the network because helm will not download the index.yaml file
  • This will help with RAM usage because helm will not search & load index.yaml into memory

2. Using Local Cache

Local Cache Within Container

In argocd-repo-server already use shared helm cache between containers. There is another problem associated with this:

This mode assumes a separate thread that makes a helm repo update every n seconds. In this case helm dependency build will be executed with --skip-refresh flag.
In this case the load on argocd-repo-server will be reduced many times, but there will be more pronounced helm concurrency problem.
This can be solved by using a lock on any helm dependency build --skip-refresh when a helm repo update is performed.

It is suggested to make an additional parameter to enable such a mode in argocd-cmd-params-cm ConfigMap and name it, for example: reposerver.helm.background.sync.index.

Risks:

  • The problem occurs with helm concurrency , when a lot of application tries to use cache and at the same time cache from another process is updated.

Local cache using local helm repository

For example: use a chartmuseum as a proxy server that will keep the main load from the main repository, and give a minified index.yaml file to argocd-repo-server

Conclusions

  • This will greatly reduce the load on the network because the download index.yaml will be done per cluster instead of per ArgoCD
  • This will reduce the load on RAM, because the index.yaml file will be much smaller

Metadata

Metadata

Assignees

No one assigned

    Labels

    StaleNo activity for over 90 daysenhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions