[ROB-1923] amp irsa support#1907
Conversation
WalkthroughDocumentation switches AWS AMP guidance to IRSA-first with an Access Keys alternative; code defaults AWS env vars, gates AWSPrometheusConfig on AWS_REGION, avoids emitting AWS credential secrets unless both keys exist, bumps prometrix to 0.2.2, and updates a Helm dependency for holmes. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Env as Env Vars
participant Utils as Prometheus Utils
participant Config as AWSPrometheusConfig
participant Client as Prometheus Client
Env->>Utils: Read AWS_* env vars (ACCESS_KEY=None, SECRET=None, SERVICE_NAME="aps", REGION=?)
alt AWS_REGION is set (new gating)
Utils->>Config: Instantiate with (access_key, secret_access_key, service_name, aws_region)
Config-->>Client: Return AWS AMP config
else AWS_REGION not set
Utils-->>Client: Fall back to other providers/paths
end
note over Utils,Config: Gating switched to AWS_REGION instead of AWS_ACCESS_KEY
sequenceDiagram
autonumber
participant KRR as KRR Secret Generator
participant Cfg as AWSPrometheusConfig
participant Secrets as K8s Secrets Spec
KRR->>Cfg: Retrieve access_key, secret_access_key
alt Both keys present
KRR->>Secrets: Append `AWS_KEY` and `AWS_SECRET` entries
else Missing either key (e.g., IRSA)
KRR-->>Secrets: Do not append AWS credential entries
end
note over KRR,Secrets: Prevents invalid secret values when using IRSA
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. 📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 💡 Knowledge Base configuration:
You can enable these sources in your CodeRabbit configuration. ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (1)
✅ Files skipped from review due to trivial changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
✨ Finishing Touches🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (4)
src/robusta/integrations/prometheus/utils.py (2)
60-67: AWS provider selection may trigger incorrectly; prefer AMP-URL-aware gatingSwitching the gate from “have AWS keys” to “have AWS_REGION” risks mis-detecting AWS AMP when:
- Running on AWS with AWS_REGION set but using non-AMP Prometheus
- Using other providers (e.g., Coralogix or Azure) alongside AWS_REGION in the environment
Because this branch executes before Coralogix/Azure checks, it can shadow those providers. Please gate AWSPrometheusConfig on the URL being an AMP endpoint (aps-workspaces..amazonaws.com) or on explicit AWS credentials, and only then return.
Apply this focused diff to avoid false positives while preserving IRSA support:
@@ - # aws config - if AWS_REGION: - return AWSPrometheusConfig( - access_key=AWS_ACCESS_KEY, - secret_access_key=AWS_SECRET_ACCESS_KEY, - service_name=AWS_SERVICE_NAME, - aws_region=AWS_REGION, - **baseconfig, - ) + # aws config (only when URL looks like AMP or explicit keys provided) + from urllib.parse import urlparse + host = urlparse(url).netloc if url else "" + is_amp_url = "aps-workspaces." in host and host.endswith(".amazonaws.com") + if (AWS_REGION and is_amp_url) or (AWS_ACCESS_KEY and AWS_SECRET_ACCESS_KEY): + return AWSPrometheusConfig( + access_key=AWS_ACCESS_KEY, + secret_access_key=AWS_SECRET_ACCESS_KEY, + service_name=AWS_SERVICE_NAME, + aws_region=AWS_REGION, + **baseconfig, + )Optionally, consider moving the explicit-provider branches (Coralogix/Azure) above the AWS branch to prefer explicit signals over environment heuristics.
43-45: Potential bug: VictoriaMetrics detection always flips to TrueThis line makes VictoriaMetrics mode True regardless of the original value because any bool is “not None”. This can lead to returning a VictoriaMetricsPrometheusConfig unexpectedly if later branches don’t short-circuit.
Proposed fix:
- url = PrometheusDiscovery.find_vm_url() - is_victoria_metrics = is_victoria_metrics is not None + vm_url = PrometheusDiscovery.find_vm_url() + if vm_url: + url = vm_url + is_victoria_metrics = Trueplaybooks/robusta_playbooks/krr.py (2)
5-5: Likely bug: importing pickle.NONE and using it as a sentinel
from pickle import NONEimports the pickle opcode constant (string "N"), not Python’sNone. This can mask errors and leak into user output.Apply this diff and use
Noneexplicitly:-from pickle import NONEAnd at the call site (see Line 448):
- _publish_krr_finding(..., config_json=NONE) + _publish_krr_finding(..., config_json=None)
405-405: Default value is a type, not NoneThe function parameter sets
config_json = Optional[str]as the default, which assigns the typing object instead ofNone. This will break consumers relying on a falsy default.Fix the signature:
-def _publish_krr_finding(event: ExecutionBaseEvent, krr_json: Dict[str, Any],scan_id: str, start_time: str, metadata: Dict[str, Any], timeout: Optional[str] = None, config_json = Optional[str]): +def _publish_krr_finding( + event: ExecutionBaseEvent, + krr_json: Dict[str, Any], + scan_id: str, + start_time: str, + metadata: Dict[str, Any], + timeout: Optional[str] = None, + config_json: Optional[str] = None, +):
🧹 Nitpick comments (2)
playbooks/robusta_playbooks/krr.py (1)
279-295: Consider optional AWS session token (STS) supportIf users supply temporary credentials, you’ll also need AWS_SESSION_TOKEN. Suggest adding it when present to avoid auth failures.
Proposed minimal change:
@@ - if isinstance(prom_config, AWSPrometheusConfig): - if prom_config.access_key and prom_config.secret_access_key: + if isinstance(prom_config, AWSPrometheusConfig): + if prom_config.access_key and prom_config.secret_access_key: krr_secrets.extend( [ KRRSecret( env_var_name="AWS_KEY", secret_key="aws-key", secret_value=prom_config.access_key, command_flag="--eks-access-key", ), KRRSecret( env_var_name="AWS_SECRET", secret_key="aws-secret", secret_value=prom_config.secret_access_key, command_flag="--eks-secret-key", ), ] ) + session_token = getattr(prom_config, "session_token", None) + if session_token: + krr_secrets.append( + KRRSecret( + env_var_name="AWS_SESSION_TOKEN", + secret_key="aws-session-token", + secret_value=session_token, + command_flag="--eks-session-token", + ) + )docs/configuration/metric-providers-aws.rst (1)
100-101: Call out when to set AWS_REGION to avoid accidental provider selectionGiven the code path now considers AWS settings when AWS_REGION is present, please explicitly note that AWS_REGION should be set only when using AMP (IRSA or Access Keys).
Proposed note:
3. :ref:`Update Robusta <Simple Upgrade>` +.. note:: + Set ``AWS_REGION`` only when using Amazon Managed Prometheus. If you use another Prometheus provider, + omit ``AWS_REGION`` to prevent accidental AWS provider selection.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
⛔ Files ignored due to path filters (1)
poetry.lockis excluded by!**/*.lock
📒 Files selected for processing (4)
docs/configuration/metric-providers-aws.rst(3 hunks)playbooks/robusta_playbooks/krr.py(1 hunks)pyproject.toml(1 hunks)src/robusta/integrations/prometheus/utils.py(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Deploy docs
- GitHub Check: Deploy docs
🔇 Additional comments (4)
pyproject.toml (1)
62-62: Pre-release dependency: verify export pin & runtime compatibility
poetry.lockcorrectly pins prometrix to 0.2.2a2.Please run in your local/CI environment to ensure the exported requirements also lock to 0.2.2a2:
poetry export -f requirements.txt --without-hashes \ | grep -E '^prometrix==0\.2\.2a2$'Confirm runtime compatibility with
httpx==0.27.2and your supported Python versions.Plan to upgrade to the next stable prometrix release ASAP to minimize supply-chain volatility.
src/robusta/integrations/prometheus/utils.py (1)
28-31: Sane defaults for AWS envsDefaulting AWS access keys to None and AWS service name to "aps" is correct and aligns with IRSA-first usage.
playbooks/robusta_playbooks/krr.py (1)
279-295: Good: avoid emitting partial AWS secretsConditioning AWS_KEY/AWS_SECRET emission on both values prevents malformed secrets under IRSA. Nice.
docs/configuration/metric-providers-aws.rst (1)
176-181: Nice: required envs match code defaults and expectationsThis aligns with the new defaults (SERVICE_NAME "aps") and AWS region. Good call-out on PROMETHEUS_SSL_ENABLED.
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (1)
docs/configuration/metric-providers-aws.rst (1)
14-18: Clarify prerequisites to avoid implying Access Keys are required with IRSABullet 3 still states AWS access credentials are required. That’s misleading for IRSA where no keys are needed. Reword to scope credentials only to the Access Keys path and keep IRSA as keyless.
Apply this diff:
2. **For IRSA**: Your EKS cluster must have an OIDC identity provider configured. See: `Associate IAM OIDC Provider <https://docs.aws.amazon.com/eks/latest/userguide/enable-iam-roles-for-service-accounts.html>`_. -3. **AWS access credentials (Access Key and Secret Key)** - With sufficient permissions to query AMP (for example, the ``AmazonPrometheusQueryAccess`` policy). +3. **For Access Keys path only**: AWS access credentials (Access Key and Secret Key) + Ensure sufficient permissions to query AMP (for example, the ``AmazonPrometheusQueryAccess`` policy). + With IRSA, no access keys are required.
🧹 Nitpick comments (5)
docs/configuration/metric-providers-aws.rst (5)
42-58: Harden the IRSA trust policy: include the OIDC audience conditionAWS recommends constraining on the OIDC audience "sts.amazonaws.com". Add it alongside the subject condition.
"Condition": { - "StringEquals": { - "oidc.eks.<REGION>.amazonaws.com/id/<OIDC_ID>:sub": "system:serviceaccount:<NAMESPACE>:<SERVICE_ACCOUNT_NAME>" - } + "StringEquals": { + "oidc.eks.<REGION>.amazonaws.com/id/<OIDC_ID>:sub": "system:serviceaccount:<NAMESPACE>:<SERVICE_ACCOUNT_NAME>", + "oidc.eks.<REGION>.amazonaws.com/id/<OIDC_ID>:aud": "sts.amazonaws.com" + } }Optional: if you need multiple service accounts, use either multiple statements or a StringLike pattern (e.g., system:serviceaccount::robusta-*) instead of a single exact match.
128-131: Storing Access Keys in a Secret is correctGood use of a dedicated Secret for credentials. Consider also labeling/annotating the Secret to restrict and track usage, but that’s optional.
175-180: Mark AWS_SERVICE_NAME as optional (defaults to "aps")Code changes in this PR default AWS_SERVICE_NAME to "aps". Reflect that here to prevent unnecessary overrides.
- - ``AWS_SERVICE_NAME``: Always ``"aps"`` for Amazon Prometheus Service + - ``AWS_SERVICE_NAME``: ``"aps"`` for Amazon Prometheus Service (defaults to ``"aps"``; override only if required for a non-standard partition)
186-191: Call out mutual exclusivity of IRSA vs Access KeysHelp users avoid setting both methods simultaneously, which can cause confusing auth behavior.
- AlertManager URL is not needed (AWS handles alerting separately) - Ensure your IAM role or AWS credentials have permissions to query the AMP workspace - The workspace must be in the same region specified in AWS_REGION + - Use a single authentication method: IRSA or Access Keys, not both. When using IRSA, do not set ``AWS_ACCESS_KEY``/``AWS_SECRET_ACCESS_KEY``.
4-4: Consistency: use one product name throughoutThe doc mixes “Amazon Managed Prometheus (AMP)” and “Amazon Managed Service for Prometheus”. Choose one and stick with it for clarity (the short form “Amazon Managed Prometheus (AMP)” is fine after first mention).
Also applies to: 165-165
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
docs/configuration/metric-providers-aws.rst(3 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Deploy docs
🔇 Additional comments (4)
docs/configuration/metric-providers-aws.rst (4)
20-25: Good call making IRSA the primary pathClear recommendation and rationale. This should reduce key management and improve security posture.
86-98: Configuration block looks correct for AMPprometheus_url format, check_prometheus_flags=false, and the three env vars are consistent with AMP requirements.
134-158: Access Keys env-var wiring is soundUsing valueFrom/secretKeyRef is the right approach. Ensure this section is not used together with IRSA (mutually exclusive).
63-68: The script above will confirm how theholmesserviceAccount is defined in both the chart’svalues.yamland its templates, ensuring that the documentation snippet aligns with the actual key structure and naming. Once we have its output, we can finalize the review comment.
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
docs/configuration/metric-providers-aws.rst (1)
17-18: Clarify that access keys are only needed for the Access Keys path (not IRSA).As written, the prerequisite implies access keys are always required. This conflicts with the IRSA guidance above and can mislead users.
Apply this wording update:
-3. **AWS access credentials (Access Key and Secret Key)** - With sufficient permissions to query AMP (for example, the ``AmazonPrometheusQueryAccess`` policy). +3. **For Access Keys path only**: AWS access credentials (Access Key and Secret Key) + Ensure sufficient permissions to query AMP (for example, the ``AmazonPrometheusQueryAccess`` policy).
🧹 Nitpick comments (4)
docs/configuration/metric-providers-aws.rst (4)
101-147: Make explicit that both access key vars must be present, and note naming differs from AWS SDK defaults.The code path only uses access keys if both are set; documenting this prevents partial misconfigurations. Also, the env var names (AWS_ACCESS_KEY, AWS_SECRET_ACCESS_KEY) are Robusta-specific and differ from AWS SDK defaults (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY), which avoids accidental provider-side pickup but may confuse users.
Add a short note:
3. **Configure Robusta** - add to ``generated_values.yaml``: @@ - name: AWS_ACCESS_KEY @@ - name: AWS_SECRET_ACCESS_KEY @@ - name: AWS_SERVICE_NAME value: "aps" - name: AWS_REGION value: "us-east-1" # Your workspace region + + .. note:: + Both ``AWS_ACCESS_KEY`` and ``AWS_SECRET_ACCESS_KEY`` must be provided; if only one is set, Robusta will ignore access-key auth. + These variable names are specific to Robusta and intentionally differ from the AWS SDK default ``AWS_ACCESS_KEY_ID``.
166-169: Document Access Keys-only env vars in “Required Environment Variables”.Currently this list is global. To prevent IRSA users from adding unused secrets, split the list into “Always required” and “Access Keys only”.
-**Required Environment Variables**: +**Required Environment Variables**: @@ - ``PROMETHEUS_SSL_ENABLED``: Always ``"true"`` for AMP - ``AWS_SERVICE_NAME``: Always ``"aps"`` for Amazon Prometheus Service - ``AWS_REGION``: The AWS region where your workspace is located + +**Access Keys path only**: + +- ``AWS_ACCESS_KEY`` and ``AWS_SECRET_ACCESS_KEY``: Only if you choose the Access Keys authentication alternative.
66-68: Clarify when the Holmes service account is needed.If Holmes is optional in some installs, add “if deployed” to reduce confusion for users who do not run Holmes.
- - ``robusta-holmes-service-account`` + - ``robusta-holmes-service-account`` (if Holmes is deployed)
173-179: Strengthen Important Notes with an IRSA precedence hint.Call out that IRSA removes the need to mount access keys and that keys should not be set alongside IRSA to avoid confusion.
- AlertManager URL is not needed (AWS handles alerting separately) - Ensure your IAM role or AWS credentials have permissions to query the AMP workspace - The workspace must be in the same region specified in AWS_REGION + - When using IRSA, do not set access key environment variables; IRSA-provided credentials are sufficient and preferred.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
docs/configuration/metric-providers-aws.rst(3 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Deploy docs
arikalon1
left a comment
There was a problem hiding this comment.
nice work
I think we need a new krr image for this to work, right?
No description provided.