[SVLS-8757] Improve cloud environment detection logic#39
[SVLS-8757] Improve cloud environment detection logic#39kathiehuang wants to merge 4 commits intomainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Improves ServerlessCompatAgent cloud environment detection to recognize additional Google Cloud Functions signals and to avoid silently selecting a first match when multiple platforms are detected.
Changes:
- Adds detection for Google Cloud Functions via
K_SERVICE+FUNCTION_TARGETand introduces a newGOOGLE_CLOUD_RUN_FUNCTION_2ND_GENenvironment. - Refactors environment detection to collect all matches and return
UNKNOWNwhen none or multiple environments are detected (with an error log on multiple).
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| public static CloudEnvironment getEnvironment() { | ||
| Map<String, String> env = System.getenv(); | ||
| List<CloudEnvironment> detected = new ArrayList<>(); | ||
|
|
||
| if (env.get("FUNCTIONS_EXTENSION_VERSION") != null && | ||
| env.get("FUNCTIONS_WORKER_RUNTIME") != null) { | ||
| return CloudEnvironment.AZURE_FUNCTION; | ||
| if (env.get("FUNCTIONS_EXTENSION_VERSION") != null | ||
| && env.get("FUNCTIONS_WORKER_RUNTIME") != null) { | ||
| detected.add(CloudEnvironment.AZURE_FUNCTION); | ||
| } | ||
|
|
||
| if (env.get("ASCSVCRT_SPRING__APPLICATION__NAME") != null) { | ||
| return CloudEnvironment.AZURE_SPRING_APP; | ||
| detected.add(CloudEnvironment.AZURE_SPRING_APP); | ||
| } |
There was a problem hiding this comment.
The PR description mentions "error when a platform is detected but its name variable is missing", but this change only adds multi-detection handling and a new Google env signature; there’s no new validation/error for a detected platform missing its required name variable. Either implement that validation in getEnvironment()/premain() (with a clear error message) or update the PR description to match the actual behavior.
There was a problem hiding this comment.
Updated the PR description
| public static CloudEnvironment getEnvironment() { | ||
| Map<String, String> env = System.getenv(); | ||
| List<CloudEnvironment> detected = new ArrayList<>(); | ||
|
|
||
| if (env.get("FUNCTIONS_EXTENSION_VERSION") != null && | ||
| env.get("FUNCTIONS_WORKER_RUNTIME") != null) { | ||
| return CloudEnvironment.AZURE_FUNCTION; | ||
| if (env.get("FUNCTIONS_EXTENSION_VERSION") != null | ||
| && env.get("FUNCTIONS_WORKER_RUNTIME") != null) { | ||
| detected.add(CloudEnvironment.AZURE_FUNCTION); |
There was a problem hiding this comment.
getEnvironment() behavior changed significantly (multi-detection handling + new Google signature), but there are no unit tests covering these branches. Since the repo already has JUnit tests for ServerlessCompatAgent, consider extracting the detection logic into a package-private helper that accepts an env Map<String,String> so tests can cover each detection path and the "multiple environments" case.
| return CloudEnvironment.UNKNOWN; | ||
| } | ||
| if (detected.size() > 1) { | ||
| log.error("Multiple cloud environments detected: {}. Returning UNKNOWN.", detected); |
There was a problem hiding this comment.
When multiple environments are detected, getEnvironment() logs an error and returns UNKNOWN, and then premain() will also log an UNKNOWN environment detected error. This results in duplicate error logs for the same condition and can be noisy/confusing in production. Consider logging the error in one place (e.g., let premain() handle it, or return a distinct status that preserves the reason).
| log.error("Multiple cloud environments detected: {}. Returning UNKNOWN.", detected); |
There was a problem hiding this comment.
Removed the "Returning UNKNOWN" section of the multiple environments detected log in c359922
| // Set by Google Cloud Functions for newer runtimes | ||
| detected.add(CloudEnvironment.GOOGLE_CLOUD_RUN_FUNCTION_2ND_GEN); | ||
| } else if (env.get("FUNCTION_NAME") != null && env.get("GCP_PROJECT") != null) { | ||
| // Set by Google Cloud Functions for older runtimes |
There was a problem hiding this comment.
The logic/comments treat K_SERVICE + FUNCTION_TARGET as indicating a "newer runtime" / GOOGLE_CLOUD_RUN_FUNCTION_2ND_GEN, but the PR description notes these env vars can also appear on Cloud Functions Gen 1. This makes the enum value/comment misleading and may confuse future behavior keyed off the detected environment. Consider renaming the enum value to reflect what’s actually detected (e.g., a more general Cloud Functions/Cloud Run signature) and/or adjust the comment to avoid claiming it’s specific to newer runtimes.
| // Set by Google Cloud Functions for newer runtimes | |
| detected.add(CloudEnvironment.GOOGLE_CLOUD_RUN_FUNCTION_2ND_GEN); | |
| } else if (env.get("FUNCTION_NAME") != null && env.get("GCP_PROJECT") != null) { | |
| // Set by Google Cloud Functions for older runtimes | |
| // This Cloud Functions/Cloud Run-style signature is not exclusive to 2nd gen. | |
| detected.add(CloudEnvironment.GOOGLE_CLOUD_RUN_FUNCTION_2ND_GEN); | |
| } else if (env.get("FUNCTION_NAME") != null && env.get("GCP_PROJECT") != null) { | |
| // Legacy Google Cloud Functions markers used as a fallback when the above signature is absent. |
There was a problem hiding this comment.
There was a problem hiding this comment.
I'm reaching out to Google to confirm why these environment variables were injected for a Gen 1 function!
There was a problem hiding this comment.
They said:
Your function is still a 1st Gen function and has not been automatically converted to a 2nd Gen function. It retains all the capabilities of the 1st Gen runtime, and that is why it correctly appears under the 1st Gen page in the Google Cloud console.
Let me clarify the distinction between the function itself and the environment it runs in. As part of a broad infrastructure upgrade, Google is migrating the underlying environment for Cloud Functions to run on the same modern infrastructure as Cloud Run.
So, while your function's code and capabilities remain 1st Gen, the environment it executes in has been updated. This new environment uses a different set of automatically populated environment variables (e.g., K_SERVICE instead of FUNCTION_NAME). This is why you are seeing the new variables, even though your function is still 1st Gen.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c359922cb2
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (env.get("K_SERVICE") != null && env.get("FUNCTION_TARGET") != null) { | ||
| // Set by Google Cloud Functions for newer runtimes | ||
| detected.add(CloudEnvironment.GOOGLE_CLOUD_RUN_FUNCTION_2ND_GEN); |
There was a problem hiding this comment.
Avoid treating the new GCP signal as Gen2 support
This new branch marks K_SERVICE + FUNCTION_TARGET as GOOGLE_CLOUD_RUN_FUNCTION_2ND_GEN, so a real Cloud Run Functions 2nd gen deployment with this Java agent now proceeds through premain and starts the embedded compatibility process instead of staying UNKNOWN. The package is still documented as supporting Google Cloud Run Functions 1st gen only (README.md:3), so if the goal is to cover 1st-gen runtimes that now expose these vars, this path should be classified or guarded as that supported case rather than enabling gen2 here.
Useful? React with 👍 / 👎.
| if (env.get("FUNCTION_NAME") != null && | ||
| env.get("GCP_PROJECT") != null) { | ||
| return CloudEnvironment.GOOGLE_CLOUD_RUN_FUNCTION_1ST_GEN; | ||
| if (env.get("K_SERVICE") != null && env.get("FUNCTION_TARGET") != null) { |
There was a problem hiding this comment.
It looks like the compared python logic checks 1st gen and returns 1st gen if all four gen1 + gen2 env vars are present due to order of if/else. Would it make more sense to follow that behavior? Python could be detecting multiple environments as well, it just isn't doing anything about it.
There was a problem hiding this comment.
Also, does compat-go also expect all four variables or do you think this line is a typo? https://github.com/DataDog/datadog-serverless-compat-go/blob/main/datadogserverlesscompat/datadog_serverless_compat.go#L56
There was a problem hiding this comment.
Hmm true, it would be better to return 1st gen if all four env vars are present. I'm not sure if that ever happens, but if it does, it would be less confusing if the customer looks at the debug log. I'll update the order to check 1st gen first! 7e122c5
To your second question, it does seem like a typo (unless someone tested and saw those env vars in a 1st gen function but the observation wasn't documented), but the outcome would be the same in all of these - as long as the environment isn't UNKNOWN, the serverless compatibility layer will start. The only weird thing is the debug/info log that prints the detected environment, but we can't determine the generation of the cloud function given the env vars now (see Google's explanation above) 🫠
| return CloudEnvironment.UNKNOWN; | ||
| } | ||
| if (detected.size() > 1) { | ||
| log.error("Multiple cloud environments detected: {}", detected); |
There was a problem hiding this comment.
Do you think it'd make sense to track gen2/unknown uses of the compat layer, similarly to the telemetry we are thinking about adding for serverless-init? https://datadoghq.atlassian.net/browse/SVLS-8970 (possibility is a separate question)
There was a problem hiding this comment.
That would be interesting if we are able to track that! I can bring it up in the next standup or ask Piyali
What does this PR do?
FUNCTION_TARGETandK_SERVICEMotivation
https://datadoghq.atlassian.net/browse/SVLS-8757
Follow-up for DataDog/libdatadog#1857
I deployed a Google Cloud Function Gen 1 and saw it had the env vars
FUNCTION_TARGETandK_SERVICEinterestingly. Java Google Cloud Function Gen 1 doesn't work when instrumented with serverless compat because of this. This adds coverage for these two env vars.As a result, we should ensure that, whenever we're detecting a gen 1 function, we should also cover the "new environment's" environment variables - the gen 2 environment variables.
Additional Notes
Describe how to test/QA your changes
Deployed a Java Azure Function and confirmed the agent was working with traces.
Also deployed a Google Cloud Function Gen 1:
Without this change, deploying a Google Cloud Function Gen 1:
[main] ERROR com.datadog.ServerlessCompatAgent - UNKNOWN environment detected, will not start the Datadog Serverless Compatibility LayerWith the change: