Improved perfromance helm execution

# Summary

Hi ArgoCD Team!

This issue desccussion to:
- https://github.com/argoproj/argo-cd/issues/12902
- https://github.com/argoproj/argo-cd/discussions/12877
- https://github.com/helm/helm/issues/10735
- https://github.com/helm/helm/issues/30983

We are implementing ArgoCD as a deployer in our company. We chose helm http repository instead of OCI because::
- OCI does not support wildcard in dependencies
- OCI does not support alias in dependencies
- OCI use only direct link

We have 2 regions, and one proxy link that redirects to the nearest region.
When executing `helm dependency build` & `helm pull`, we encounter 2 problems:

## 1. Network Issue

Description:
- We have 100 clusters, each running ArgoCD
- Each ArgoCD instance has between 300 to 800 applications
- Using `reposerver.parallelism.limit: '30'`
- Each ArgoCD instance uses a common helm http repository
- index.yaml file size is 40MB

Based on these parameters, we can derive the following formula for maximum network load:

```
Max Network Load = (Number of Clusters) × (index.yaml size) × (Number of Parallel Requests)

Where:
- Number of Clusters = 100
- Max Applications per Cluster = 800
- index.yaml size = 40MB
- Number of Parallel Requests = min(reposerver.parallelism.limit, Max Applications per Cluster) = min(30, 800) = 30

Therefore:
Max Network Load = 100 × 40MB × 30 = 120,000MB ~= 118GB
```

## 2. High RAM Usage Issue

Description:
- Using `reposerver.parallelism.limit: '30'`
- Through experimental testing, it was found that `helm pull` consumes ~900MB for a repository with an `index.yaml` file of 40MB.

We encountered a performance issue with the repo-server when the helm repository (http) is too large.
Helm is required to download the index for the `helm pull` command. With an index.yaml size of 40 MB in the helm repository, a single helm pull consumes ~900mb.

There are related tickets in Helm:
- https://github.com/helm/helm/issues/9931
- https://github.com/helm/helm/issues/10542

They mention that helm will consume less if the index is stored in .json format. In any case, when the helm repository is large enough, it significantly loads the repo-server.

Based on these parameters, we can derive a formula for the load on argocd-repo-server purely from helm execution:

```
Max RAM Usage Repo Server = (Helm RAM Usage) * (Number of Parallel Requests) + (argocd-repo-server RAM usage)

Where 
 - Number of Parallel Requests = 30
 - Helm RAM Usage = 900MB
 - argocd-repo-server RAM usage = ~500MB

Therefore: 
Max Ram Usage Repo Server = 30 * 900 + 500 = 27,500 MB ~= 27GB
```

# Motivation

![Image](https://github.com/user-attachments/assets/5158d656-dd99-47dc-95ad-c4aac23f3010)
![Image](https://github.com/user-attachments/assets/87cb7c16-6201-4c07-9b62-b090be714ec9)

settings of argocd-cmd-params-cm:
```yaml
reposerver.parallelism.limit: '30'
reposerver.exec.timeout: '180s'
controller.operation.processors: '10'
controller.status.processors: '20'
```

Number of replicas: in this case, it doesn't matter since we're looking at the maximum load per pod.

Example of helm pull execution for a chart:

```
time="2025-05-22T05:21:48Z" level=info msg=Trace args="[helm pull --destination /tmp/67aa0bad-e931-4c59-8b58-715bfdeac363 --version 0.1.0 --repo https://<registry>/helm.charts my-app]" dir= operation_name="exec helm" time_ms=90285.762594
```

From the logs, we can see that the operation takes 90s, which is quite long for a helm pull.

# Proposal

## 1. Direct Helm Pull

We have an idea to add the ability to enable helm pull via direct URL, bypassing the index.yaml file, as a parameter in the `argocd-cmd-params-cm` ConfigMap and name it, for example: `reposerver.helm.pull.direct`.

`helm pull --destination /tmp/67aa0bad-e931-4c59-8b58-715bfdeac363 --version 0.1.0 --repo https://<registry>/helm.charts my-app`

->

`helm pull --destination /tmp/67aa0bad-e931-4c59-8b58-715bfdeac363 https://<registry>/helm.charts/my-app-0.1.0.tgz`

Such downloading takes only `1s` and consumes almost no resources. At the same time, `helm dependency build` & `helm repo update` will work normally if needed. 

### Conclusions
- This will reduce the load on the network because helm will not download the index.yaml file
- This will help with RAM usage because helm will not search & load index.yaml into memory

## 2. Using Local Cache

### Local Cache Within Container

In argocd-repo-server already use shared helm cache between containers. There is another problem associated with this:
* https://github.com/helm/helm/issues/30983
* https://github.com/argoproj/argo-cd/issues/12902

This mode assumes a separate thread that makes a helm repo update every n seconds. In this case `helm dependency build` will be executed with ``--skip-refresh`` flag.
In this case the load on argocd-repo-server will be reduced many times, but there will be more pronounced helm concurrency problem.
This can be solved by using a lock on any `helm dependency build --skip-refresh` when a `helm repo update` is performed.

It is suggested to make an additional parameter to enable such a mode in `argocd-cmd-params-cm` ConfigMap and name it, for example: `reposerver.helm.background.sync.index`.

Risks:
- The problem occurs with helm concurrency , when a lot of application tries to use cache and at the same time cache from another process is updated.

### Local cache using local helm repository

For example: use a [chartmuseum](https://github.com/helm/chartmuseum) as a proxy server that will keep the main load from the main repository, and give a minified index.yaml file to argocd-repo-server


### Conclusions
- This will greatly reduce the load on the network because the download index.yaml will be done per cluster instead of per ArgoCD
- This will reduce the load on RAM, because the index.yaml file will be much smaller

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improved perfromance helm execution #23402

Summary

1. Network Issue

2. High RAM Usage Issue

Motivation

Proposal

1. Direct Helm Pull

Conclusions

2. Using Local Cache

Local Cache Within Container

Local cache using local helm repository

Conclusions

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Improved perfromance helm execution #23402

Description

Summary

1. Network Issue

2. High RAM Usage Issue

Motivation

Proposal

1. Direct Helm Pull

Conclusions

2. Using Local Cache

Local Cache Within Container

Local cache using local helm repository

Conclusions

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions