Merge pull request #14 from ivanmkc/tk-safety

ToniCorinne · web-flow · commit d60eb2548731 · 2025-11-06T09:31:30.000-07:00
Feat: Adding go snippets for safety guidelines
diff --git a/docs/safety/index.md b/docs/safety/index.md
@@ -9,12 +9,12 @@ As AI agents grow in capability, ensuring they operate safely, securely, and ali
 1. **Identity and Authorization**: Control who the agent **acts as** by defining agent and user auth.
 2. **Guardrails to screen inputs and outputs:** Control your model and tool calls precisely.
 
-    * *In-Tool Guardrails:* Design tools defensively, using developer-set tool context to enforce policies (e.g., allowing queries only on specific tables).  
-    * *Built-in Gemini Safety Features:* If using Gemini models, benefit from content filters to block harmful outputs and system Instructions to guide the model's behavior and safety guidelines  
+    * *In-Tool Guardrails:* Design tools defensively, using developer-set tool context to enforce policies (e.g., allowing queries only on specific tables).
+    * *Built-in Gemini Safety Features:* If using Gemini models, benefit from content filters to block harmful outputs and system Instructions to guide the model's behavior and safety guidelines
     * *Callbacks and Plugins:* Validate model and tool calls before or after execution, checking parameters against agent state or external policies.
     * *Using Gemini as a safety guardrail:* Implement an additional safety layer using a cheap and fast model (like Gemini Flash Lite) configured via callbacks  to screen inputs and outputs.
 
-3. **Sandboxed code execution:** Prevent model-generated code to cause security issues by sandboxing the environment  
+3. **Sandboxed code execution:** Prevent model-generated code to cause security issues by sandboxing the environment
 4. **Evaluation and tracing**: Use evaluation tools to assess the quality, relevance, and correctness of the agent's final output. Use tracing to gain visibility into agent actions to analyze the steps an agent takes to reach a solution, including its choice of tools, strategies, and the efficiency of its approach.
 5. **Network Controls and VPC-SC:** Confine agent activity within secure perimeters (like VPC Service Controls) to prevent data exfiltration and limit the potential impact radius.
 
@@ -25,20 +25,20 @@ Before implementing safety measures, perform a thorough risk assessment specific
 ***Sources*** **of risk** include:
 
 * Ambiguous agent instructions
-* Prompt injection and jailbreak attempts from adversarial users  
+* Prompt injection and jailbreak attempts from adversarial users
 * Indirect prompt injections via tool use
 
 **Risk categories** include:
 
-* **Misalignment & goal corruption**  
-    * Pursuing unintended or proxy goals that lead to harmful outcomes ("reward hacking")  
-    * Misinterpreting complex or ambiguous instructions  
+* **Misalignment & goal corruption**
+    * Pursuing unintended or proxy goals that lead to harmful outcomes ("reward hacking")
+    * Misinterpreting complex or ambiguous instructions
 * **Harmful content generation, including brand safety**
-    * Generating toxic, hateful, biased, sexually explicit, discriminatory, or illegal content  
-    * Brand safety risks such as Using language that goes against the brand’s values or off-topic conversations  
-* **Unsafe actions**  
+    * Generating toxic, hateful, biased, sexually explicit, discriminatory, or illegal content
+    * Brand safety risks such as Using language that goes against the brand’s values or off-topic conversations
+* **Unsafe actions**
     * Executing commands that damage systems
-    * Making unauthorized purchases or financial transactions.  
+    * Making unauthorized purchases or financial transactions.
     * Leaking sensitive personal data (PII)
     * Data exfiltration
 
@@ -78,16 +78,16 @@ For example, a query tool can be designed to expect a policy to be read from the
     # Conceptual example: Setting policy data intended for tool context
     # In a real ADK app, this might be set in InvocationContext.session.state
     # or passed during tool initialization, then retrieved via ToolContext.
-    
+
     policy = {} # Assuming policy is a dictionary
     policy['select_only'] = True
     policy['tables'] = ['mytable1', 'mytable2']
-    
+
     # Conceptual: Storing policy where the tool can access it via ToolContext later.
     # This specific line might look different in practice.
     # For example, storing in session state:
     invocation_context.session.state["query_tool_policy"] = policy
-    
+
     # Or maybe passing during tool init:
     query_tool = QueryTool(policy=policy)
     # For this example, we'll assume it gets stored somewhere accessible.
@@ -98,20 +98,43 @@ For example, a query tool can be designed to expect a policy to be read from the
     // Conceptual example: Setting policy data intended for tool context
     // In a real ADK app, this might be set in InvocationContext.session.state
     // or passed during tool initialization, then retrieved via ToolContext.
-    
+
     policy = new HashMap<String, Object>(); // Assuming policy is a Map
     policy.put("select_only", true);
     policy.put("tables", new ArrayList<>("mytable1", "mytable2"));
-    
+
     // Conceptual: Storing policy where the tool can access it via ToolContext later.
     // This specific line might look different in practice.
     // For example, storing in session state:
     invocationContext.session().state().put("query_tool_policy", policy);
-    
+
     // Or maybe passing during tool init:
     query_tool = QueryTool(policy);
     // For this example, we'll assume it gets stored somewhere accessible.
     ```
+=== "Go"
+
+    ```go
+    // Conceptual example: Setting policy data intended for tool context
+    // In a real ADK app, this might be set using the session state service.
+    // `ctx` is an `agent.Context` available in callbacks or custom agents.
+
+    policy := map[string]interface{}{
+    	"select_only": true,
+    	"tables":      []string{"mytable1", "mytable2"},
+    }
+
+    // Conceptual: Storing policy where the tool can access it via ToolContext later.
+    // This specific line might look different in practice.
+    // For example, storing in session state:
+    if err := ctx.Session().State().Set("query_tool_policy", policy); err != nil {
+        // Handle error, e.g., log it.
+    }
+
+    // Or maybe passing during tool init:
+    // queryTool := NewQueryTool(policy)
+    // For this example, we'll assume it gets stored somewhere accessible.
+    ```
 
 During the tool execution, [**`Tool Context`**](../tools/index.md#tool-context) will be passed to the tool:
 
@@ -121,60 +144,60 @@ During the tool execution, [**`Tool Context`**](../tools/index.md#tool-context)
     def query(query: str, tool_context: ToolContext) -> str | dict:
       # Assume 'policy' is retrieved from context, e.g., via session state:
       # policy = tool_context.invocation_context.session.state.get('query_tool_policy', {})
-    
+
       # --- Placeholder Policy Enforcement ---
       policy = tool_context.invocation_context.session.state.get('query_tool_policy', {}) # Example retrieval
       actual_tables = explainQuery(query) # Hypothetical function call
-    
+
       if not set(actual_tables).issubset(set(policy.get('tables', []))):
         # Return an error message for the model
         allowed = ", ".join(policy.get('tables', ['(None defined)']))
         return f"Error: Query targets unauthorized tables. Allowed: {allowed}"
-    
+
       if policy.get('select_only', False):
            if not query.strip().upper().startswith("SELECT"):
                return "Error: Policy restricts queries to SELECT statements only."
       # --- End Policy Enforcement ---
-    
+
       print(f"Executing validated query (hypothetical): {query}")
       return {"status": "success", "results": [...]} # Example successful return
     ```
 
 === "Java"
 
     ```java
-    
+
     import com.google.adk.tools.ToolContext;
     import java.util.*;
-    
+
     class ToolContextQuery {
-    
+
       public Object query(String query, ToolContext toolContext) {
 
         // Assume 'policy' is retrieved from context, e.g., via session state:
         Map<String, Object> queryToolPolicy =
             toolContext.invocationContext.session().state().getOrDefault("query_tool_policy", null);
         List<String> actualTables = explainQuery(query);
-    
+
         // --- Placeholder Policy Enforcement ---
         if (!queryToolPolicy.get("tables").containsAll(actualTables)) {
           List<String> allowedPolicyTables =
               (List<String>) queryToolPolicy.getOrDefault("tables", new ArrayList<String>());
 
           String allowedTablesString =
               allowedPolicyTables.isEmpty() ? "(None defined)" : String.join(", ", allowedPolicyTables);
-          
+
           return String.format(
               "Error: Query targets unauthorized tables. Allowed: %s", allowedTablesString);
         }
-    
+
         if (!queryToolPolicy.get("select_only")) {
           if (!query.trim().toUpperCase().startswith("SELECT")) {
             return "Error: Policy restricts queries to SELECT statements only.";
           }
         }
         // --- End Policy Enforcement ---
-    
+
         System.out.printf("Executing validated query (hypothetical) %s:", query);
         Map<String, Object> successResult = new HashMap<>();
         successResult.put("status", "success");
@@ -183,14 +206,69 @@ During the tool execution, [**`Tool Context`**](../tools/index.md#tool-context)
       }
     }
     ```
+=== "Go"
+
+    ```go
+    import (
+    	"fmt"
+    	"strings"
+
+    	"google.golang.org/adk/tool"
+    )
+
+    func query(query string, toolContext *tool.Context) (any, error) {
+    	// Assume 'policy' is retrieved from context, e.g., via session state:
+    	policyAny, err := toolContext.State().Get("query_tool_policy")
+    	if err != nil {
+    		return nil, fmt.Errorf("could not retrieve policy: %w", err)
+    	}    	policy, _ := policyAny.(map[string]interface{})
+    	actualTables := explainQuery(query) // Hypothetical function call
+
+    	// --- Placeholder Policy Enforcement ---
+    	if tables, ok := policy["tables"].([]string); ok {
+    		if !isSubset(actualTables, tables) {
+    			// Return an error to signal failure
+    			allowed := strings.Join(tables, ", ")
+    			if allowed == "" {
+    				allowed = "(None defined)"
+    			}
+    			return nil, fmt.Errorf("query targets unauthorized tables. Allowed: %s", allowed)
+    		}
+    	}
+
+    	if selectOnly, _ := policy["select_only"].(bool); selectOnly {
+    		if !strings.HasPrefix(strings.ToUpper(strings.TrimSpace(query)), "SELECT") {
+    			return nil, fmt.Errorf("policy restricts queries to SELECT statements only")
+    		}
+    	}
+    	// --- End Policy Enforcement ---
+
+    	fmt.Printf("Executing validated query (hypothetical): %s\n", query)
+    	return map[string]interface{}{"status": "success", "results": []string{"..."}}, nil
+    }
+
+    // Helper function to check if a is a subset of b
+    func isSubset(a, b []string) bool {
+    	set := make(map[string]bool)
+    	for _, item := range b {
+    		set[item] = true
+    	}
+    	for _, item := range a {
+    		if _, found := set[item]; !found {
+    			return false
+    		}
+    	}
+    	return true
+    }
+    ```
 
 #### Built-in Gemini Safety Features
 
 Gemini models come with in-built safety mechanisms that can be leveraged to improve content and brand safety.
 
-* **Content safety filters**:  [Content filters](https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/configure-safety-attributes) can help block the output of harmful content. They function independently from Gemini models as part of a layered defense against threat actors who attempt to jailbreak the model. Gemini models on Vertex AI use two types of content filters:  
-* **Non-configurable safety filters** automatically block outputs containing prohibited content, such as child sexual abuse material (CSAM) and personally identifiable information (PII).  
-* **Configurable content filters** allow you to define blocking thresholds in four harm categories (hate speech, harassment, sexually explicit, and dangerous content,) based on probability and severity scores. These filters are default off but you can configure them according to your needs.  
+* **Content safety filters**:  [Content filters](https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/configure-safety-attributes) can help block the output of harmful content. They function independently from Gemini models as part of a layered defense against threat actors who attempt to jailbreak the model. Gemini models on Vertex AI use two types of content filters:
+* **Non-configurable safety filters** automatically block outputs containing prohibited content, such as child sexual abuse material (CSAM) and personally identifiable information (PII).
+* **Configurable content filters** allow you to define blocking thresholds in four harm categories (hate speech, harassment, sexually explicit, and dangerous content,) based on probability and severity scores. These filters are default off but you can configure them according to your needs.
 * **System instructions for safety**: [System instructions](https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/safety-system-instructions) for Gemini models in Vertex AI provide direct guidance to the model on how to behave and what type of content to generate. By providing specific instructions, you can proactively steer the model away from generating undesirable content to meet your organization’s unique needs. You can craft system instructions to define content safety guidelines, such as prohibited and sensitive topics, and disclaimer language, as well as brand safety guidelines to ensure the model's outputs align with your brand's voice, tone, values, and target audience.
 
 While these measures are robust against content safety, you need additional checks to reduce agent misalignment, unsafe actions, and brand safety risks.
@@ -211,22 +289,22 @@ When modifications to the tools to add guardrails aren't possible, the [**`Befor
         args: Dict[str, Any],
         tool_context: ToolContext
         ) -> Optional[Dict]: # Correct return type for before_tool_callback
-    
+
       print(f"Callback triggered for tool: {tool.name}, args: {args}")
-    
+
       # Example validation: Check if a required user ID from state matches an arg
       expected_user_id = callback_context.state.get("session_user_id")
       actual_user_id_in_args = args.get("user_id_param") # Assuming tool takes 'user_id_param'
-    
+
       if actual_user_id_in_args != expected_user_id:
           print("Validation Failed: User ID mismatch!")
           # Return a dictionary to prevent tool execution and provide feedback
           return {"error": f"Tool call blocked: User ID mismatch."}
-    
+
       # Return None to allow the tool call to proceed if validation passes
       print("Callback validation passed.")
       return None
-    
+
     # Hypothetical Agent setup
     root_agent = LlmAgent( # Use specific agent type
         model='gemini-2.0-flash',
@@ -251,22 +329,22 @@ When modifications to the tools to add guardrails aren't possible, the [**`Befor
       ToolContext toolContext) {
 
     System.out.printf("Callback triggered for tool: %s, Args: %s", baseTool.name(), input);
-    
+
     // Example validation: Check if a required user ID from state matches an input parameter
     Object expectedUserId = callbackContext.state().get("session_user_id");
     Object actualUserIdInput = input.get("user_id_param"); // Assuming tool takes 'user_id_param'
-    
+
     if (!actualUserIdInput.equals(expectedUserId)) {
       System.out.println("Validation Failed: User ID mismatch!");
       // Return to prevent tool execution and provide feedback
       return Optional.of(Map.of("error", "Tool call blocked: User ID mismatch."));
     }
-    
+
     // Return to allow the tool call to proceed if validation passes
     System.out.println("Callback validation passed.");
     return Optional.empty();
     }
-    
+
     // Hypothetical Agent setup
     public void runAgent() {
     LlmAgent agent =
@@ -279,6 +357,70 @@ When modifications to the tools to add guardrails aren't possible, the [**`Befor
             .build();
     }
     ```
+=== "Go"
+
+    ```go
+    import (
+    	"fmt"
+    	"reflect"
+
+    	"google.golang.org/adk/agent/llmagent"
+    	"google.golang.org/adk/tool"
+    )
+
+    // Hypothetical callback function
+    func validateToolParams(
+    	ctx tool.Context,
+    	t tool.Tool,
+    	args map[string]any,
+    ) (map[string]any, error) {
+    	fmt.Printf("Callback triggered for tool: %s, args: %v\n", t.Name(), args)
+
+    	// Example validation: Check if a required user ID from state matches an arg
+    	expectedUserID, err := ctx.State().Get("session_user_id")
+    	if err != nil {
+    		// This is an unexpected failure, return an error.
+    		return nil, fmt.Errorf("internal error: session_user_id not found in state: %w", err)
+    	}
+    	    	expectedUserID, ok := expectedUserIDVal.(string)
+    	if !ok {
+    		return nil, fmt.Errorf("internal error: session_user_id in state is not a string, got %T", expectedUserIDVal)
+    	}
+
+    	actualUserIDInArgs, exists := args["user_id_param"]
+    	if !exists {
+    		// Handle case where user_id_param is not in args
+    		fmt.Println("Validation Failed: user_id_param missing from arguments!")
+    		return map[string]any{"error": "Tool call blocked: user_id_param missing from arguments."}, nil
+    	}
+
+    	actualUserID, ok := actualUserIDInArgs.(string)
+    	if !ok {
+    		// Handle case where user_id_param is not a string
+    		fmt.Println("Validation Failed: user_id_param is not a string!")
+    		return map[string]any{"error": "Tool call blocked: user_id_param is not a string."}, nil
+    	}
+
+    	if actualUserID != expectedUserID {
+    		fmt.Println("Validation Failed: User ID mismatch!")
+    		// Return a map to prevent tool execution and provide feedback to the model.
+    		// This is not a Go error, but a message for the agent.
+    		return map[string]any{"error": "Tool call blocked: User ID mismatch."}, nil
+    	}
+    	// Return nil, nil to allow the tool call to proceed if validation passes
+    	fmt.Println("Callback validation passed.")
+    	return nil, nil
+    }
+
+    // Hypothetical Agent setup
+    // rootAgent, err := llmagent.New(llmagent.Config{
+    // 	Model: "gemini-2.0-flash",
+    // 	Name: "root_agent",
+    // 	Instruction: "...",
+    // 	BeforeToolCallbacks: []llmagent.BeforeToolCallback{validateToolParams},
+    // 	Tools: []tool.Tool{queryToolInstance},
+    // })
+    ```
 
 However, when adding security guardrails to your agent applications, plugins are the recommended approach for implementing policies that are not specific to a single agent. Plugins are designed to be self-contained and modular, allowing you to create individual plugins for specific security policies, and apply them globally at the runner level. This means that a security plugin can be configured once and applied to every agent that uses the runner, ensuring consistent security guardrails across your entire application without repetitive code.