Skip to content

[SVLS-8757] Improve cloud environment detection logic#39

Open
kathiehuang wants to merge 4 commits intomainfrom
kathie.huang/update-cloud-env-detection-logic
Open

[SVLS-8757] Improve cloud environment detection logic#39
kathiehuang wants to merge 4 commits intomainfrom
kathie.huang/update-cloud-env-detection-logic

Conversation

@kathiehuang
Copy link
Copy Markdown
Contributor

@kathiehuang kathiehuang commented Apr 17, 2026

What does this PR do?

  • Updates cloud environment detection logic
  • Adds coverage for Google Cloud Functions with env vars FUNCTION_TARGET and K_SERVICE
  • Return None and error when multiple conflicting environments are detected instead of silently returning the first match

Motivation

https://datadoghq.atlassian.net/browse/SVLS-8757

Follow-up for DataDog/libdatadog#1857

I deployed a Google Cloud Function Gen 1 and saw it had the env vars FUNCTION_TARGET and K_SERVICE interestingly. Java Google Cloud Function Gen 1 doesn't work when instrumented with serverless compat because of this. This adds coverage for these two env vars.

  • We expect these env vars in Gen 2 functions, not Gen 1, but I contacted Google and they said:

Your function is still a 1st Gen function and has not been automatically converted to a 2nd Gen function. It retains all the capabilities of the 1st Gen runtime, and that is why it correctly appears under the 1st Gen page in the Google Cloud console.

Let me clarify the distinction between the function itself and the environment it runs in. As part of a broad infrastructure upgrade, Google is migrating the underlying environment for Cloud Functions to run on the same modern infrastructure as Cloud Run.

So, while your function's code and capabilities remain 1st Gen, the environment it executes in has been updated. This new environment uses a different set of automatically populated environment variables (e.g., K_SERVICE instead of FUNCTION_NAME). This is why you are seeing the new variables, even though your function is still 1st Gen.

As a result, we should ensure that, whenever we're detecting a gen 1 function, we should also cover the "new environment's" environment variables - the gen 2 environment variables.

Additional Notes

Describe how to test/QA your changes

Deployed a Java Azure Function and confirmed the agent was working with traces.

Also deployed a Google Cloud Function Gen 1:

Without this change, deploying a Google Cloud Function Gen 1:
[main] ERROR com.datadog.ServerlessCompatAgent - UNKNOWN environment detected, will not start the Datadog Serverless Compatibility Layer

With the change:

DEFAULT 2026-04-20T17:56:13.438670Z [resource.labels.functionName: kathie-java-gen1-fix] [main] DEBUG com.datadog.ServerlessCompatAgent - Environment detected: GOOGLE_CLOUD_RUN_FUNCTION_2ND_GEN

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Improves ServerlessCompatAgent cloud environment detection to recognize additional Google Cloud Functions signals and to avoid silently selecting a first match when multiple platforms are detected.

Changes:

  • Adds detection for Google Cloud Functions via K_SERVICE + FUNCTION_TARGET and introduces a new GOOGLE_CLOUD_RUN_FUNCTION_2ND_GEN environment.
  • Refactors environment detection to collect all matches and return UNKNOWN when none or multiple environments are detected (with an error log on multiple).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 55 to 66
public static CloudEnvironment getEnvironment() {
Map<String, String> env = System.getenv();
List<CloudEnvironment> detected = new ArrayList<>();

if (env.get("FUNCTIONS_EXTENSION_VERSION") != null &&
env.get("FUNCTIONS_WORKER_RUNTIME") != null) {
return CloudEnvironment.AZURE_FUNCTION;
if (env.get("FUNCTIONS_EXTENSION_VERSION") != null
&& env.get("FUNCTIONS_WORKER_RUNTIME") != null) {
detected.add(CloudEnvironment.AZURE_FUNCTION);
}

if (env.get("ASCSVCRT_SPRING__APPLICATION__NAME") != null) {
return CloudEnvironment.AZURE_SPRING_APP;
detected.add(CloudEnvironment.AZURE_SPRING_APP);
}
Copy link

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description mentions "error when a platform is detected but its name variable is missing", but this change only adds multi-detection handling and a new Google env signature; there’s no new validation/error for a detected platform missing its required name variable. Either implement that validation in getEnvironment()/premain() (with a clear error message) or update the PR description to match the actual behavior.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the PR description

Comment on lines 55 to +61
public static CloudEnvironment getEnvironment() {
Map<String, String> env = System.getenv();
List<CloudEnvironment> detected = new ArrayList<>();

if (env.get("FUNCTIONS_EXTENSION_VERSION") != null &&
env.get("FUNCTIONS_WORKER_RUNTIME") != null) {
return CloudEnvironment.AZURE_FUNCTION;
if (env.get("FUNCTIONS_EXTENSION_VERSION") != null
&& env.get("FUNCTIONS_WORKER_RUNTIME") != null) {
detected.add(CloudEnvironment.AZURE_FUNCTION);
Copy link

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getEnvironment() behavior changed significantly (multi-detection handling + new Google signature), but there are no unit tests covering these branches. Since the repo already has JUnit tests for ServerlessCompatAgent, consider extracting the detection logic into a package-private helper that accepts an env Map<String,String> so tests can cover each detection path and the "multiple environments" case.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added unit tests in c359922

return CloudEnvironment.UNKNOWN;
}
if (detected.size() > 1) {
log.error("Multiple cloud environments detected: {}. Returning UNKNOWN.", detected);
Copy link

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When multiple environments are detected, getEnvironment() logs an error and returns UNKNOWN, and then premain() will also log an UNKNOWN environment detected error. This results in duplicate error logs for the same condition and can be noisy/confusing in production. Consider logging the error in one place (e.g., let premain() handle it, or return a distinct status that preserves the reason).

Suggested change
log.error("Multiple cloud environments detected: {}. Returning UNKNOWN.", detected);

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the "Returning UNKNOWN" section of the multiple environments detected log in c359922

Comment on lines +69 to +72
// Set by Google Cloud Functions for newer runtimes
detected.add(CloudEnvironment.GOOGLE_CLOUD_RUN_FUNCTION_2ND_GEN);
} else if (env.get("FUNCTION_NAME") != null && env.get("GCP_PROJECT") != null) {
// Set by Google Cloud Functions for older runtimes
Copy link

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic/comments treat K_SERVICE + FUNCTION_TARGET as indicating a "newer runtime" / GOOGLE_CLOUD_RUN_FUNCTION_2ND_GEN, but the PR description notes these env vars can also appear on Cloud Functions Gen 1. This makes the enum value/comment misleading and may confuse future behavior keyed off the detected environment. Consider renaming the enum value to reflect what’s actually detected (e.g., a more general Cloud Functions/Cloud Run signature) and/or adjust the comment to avoid claiming it’s specific to newer runtimes.

Suggested change
// Set by Google Cloud Functions for newer runtimes
detected.add(CloudEnvironment.GOOGLE_CLOUD_RUN_FUNCTION_2ND_GEN);
} else if (env.get("FUNCTION_NAME") != null && env.get("GCP_PROJECT") != null) {
// Set by Google Cloud Functions for older runtimes
// This Cloud Functions/Cloud Run-style signature is not exclusive to 2nd gen.
detected.add(CloudEnvironment.GOOGLE_CLOUD_RUN_FUNCTION_2ND_GEN);
} else if (env.get("FUNCTION_NAME") != null && env.get("GCP_PROJECT") != null) {
// Legacy Google Cloud Functions markers used as a fallback when the above signature is absent.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm reaching out to Google to confirm why these environment variables were injected for a Gen 1 function!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They said:

Your function is still a 1st Gen function and has not been automatically converted to a 2nd Gen function. It retains all the capabilities of the 1st Gen runtime, and that is why it correctly appears under the 1st Gen page in the Google Cloud console.

Let me clarify the distinction between the function itself and the environment it runs in. As part of a broad infrastructure upgrade, Google is migrating the underlying environment for Cloud Functions to run on the same modern infrastructure as Cloud Run.

So, while your function's code and capabilities remain 1st Gen, the environment it executes in has been updated. This new environment uses a different set of automatically populated environment variables (e.g., K_SERVICE instead of FUNCTION_NAME). This is why you are seeing the new variables, even though your function is still 1st Gen.

@kathiehuang kathiehuang marked this pull request as ready for review April 20, 2026 19:49
@kathiehuang kathiehuang requested a review from a team as a code owner April 20, 2026 19:49
@kathiehuang kathiehuang requested review from Lewis-E and removed request for a team April 20, 2026 19:49
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c359922cb2

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +71 to +73
if (env.get("K_SERVICE") != null && env.get("FUNCTION_TARGET") != null) {
// Set by Google Cloud Functions for newer runtimes
detected.add(CloudEnvironment.GOOGLE_CLOUD_RUN_FUNCTION_2ND_GEN);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid treating the new GCP signal as Gen2 support

This new branch marks K_SERVICE + FUNCTION_TARGET as GOOGLE_CLOUD_RUN_FUNCTION_2ND_GEN, so a real Cloud Run Functions 2nd gen deployment with this Java agent now proceeds through premain and starts the embedded compatibility process instead of staying UNKNOWN. The package is still documented as supporting Google Cloud Run Functions 1st gen only (README.md:3), so if the goal is to cover 1st-gen runtimes that now expose these vars, this path should be classified or guarded as that supported case rather than enabling gen2 here.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other compatibility layers also allow functions with K_SERVICE and FUNCTION_TARGET to proceed:

Node.js
Go
Python

if (env.get("FUNCTION_NAME") != null &&
env.get("GCP_PROJECT") != null) {
return CloudEnvironment.GOOGLE_CLOUD_RUN_FUNCTION_1ST_GEN;
if (env.get("K_SERVICE") != null && env.get("FUNCTION_TARGET") != null) {
Copy link
Copy Markdown

@Lewis-E Lewis-E Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the compared python logic checks 1st gen and returns 1st gen if all four gen1 + gen2 env vars are present due to order of if/else. Would it make more sense to follow that behavior? Python could be detecting multiple environments as well, it just isn't doing anything about it.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, does compat-go also expect all four variables or do you think this line is a typo? https://github.com/DataDog/datadog-serverless-compat-go/blob/main/datadogserverlesscompat/datadog_serverless_compat.go#L56

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm true, it would be better to return 1st gen if all four env vars are present. I'm not sure if that ever happens, but if it does, it would be less confusing if the customer looks at the debug log. I'll update the order to check 1st gen first! 7e122c5

To your second question, it does seem like a typo (unless someone tested and saw those env vars in a 1st gen function but the observation wasn't documented), but the outcome would be the same in all of these - as long as the environment isn't UNKNOWN, the serverless compatibility layer will start. The only weird thing is the debug/info log that prints the detected environment, but we can't determine the generation of the cloud function given the env vars now (see Google's explanation above) 🫠

return CloudEnvironment.UNKNOWN;
}
if (detected.size() > 1) {
log.error("Multiple cloud environments detected: {}", detected);
Copy link
Copy Markdown

@Lewis-E Lewis-E Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it'd make sense to track gen2/unknown uses of the compat layer, similarly to the telemetry we are thinking about adding for serverless-init? https://datadoghq.atlassian.net/browse/SVLS-8970 (possibility is a separate question)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be interesting if we are able to track that! I can bring it up in the next standup or ask Piyali

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants