oracle-devrel
diff --git a/‎ai/generative-ai-service/guardrails-benchmark/LICENSE‎
Lines changed: 32 additions & 0 deletions b/‎ai/generative-ai-service/guardrails-benchmark/LICENSE‎
Lines changed: 32 additions & 0 deletions
diff --git a/‎ai/generative-ai-service/guardrails-benchmark/README.md‎
Lines changed: 68 additions & 0 deletions b/‎ai/generative-ai-service/guardrails-benchmark/README.md‎
Lines changed: 68 additions & 0 deletions
diff --git a/‎ai/generative-ai-service/guardrails-benchmark/files/.env.example‎
Lines changed: 29 additions & 0 deletions b/‎ai/generative-ai-service/guardrails-benchmark/files/.env.example‎
Lines changed: 29 additions & 0 deletions
diff --git a/‎ai/generative-ai-service/guardrails-benchmark/files/README.md‎
Lines changed: 206 additions & 0 deletions b/‎ai/generative-ai-service/guardrails-benchmark/files/README.md‎
Lines changed: 206 additions & 0 deletions
@@ -0,0 +1,32 @@
+Copyright (c) 2026 Oracle and/or its affiliates.
+
+The Universal Permissive License (UPL), Version 1.0
+
+Subject to the condition set forth below, permission is hereby granted to any
+person obtaining a copy of this software, associated documentation and/or data
+(collectively the "Software"), free of charge and under any and all copyright
+rights in the Software, and any and all patent rights owned or freely
+licensable by each licensor hereunder covering either (i) the unmodified
+Software as contributed to or provided by such licensor, or (ii) the Larger
+Works (as defined below), to deal in both
+
+(a) the Software, and
+
+(b) any piece of software and/or hardware listed in the lrg_coverage.txt file,
+if one is included with the Software (each a "Covered Work", collectively
+"Covered Works"), that is available under a license listed in the
+lrg_coverage.txt file, if one is included with the Software (the "Covered
+License"), and
+
+without limitation, the right to copy, create derivative works of, display,
+perform, and distribute the Software and make, use, sell, offer for sale,
+import, export, have made, and have sold the Software and the Covered Works,
+in each case subject to the conditions and limitations set forth herein.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
@@ -0,0 +1,68 @@
+# OCI Generative AI Safety Benchmark — Guardrails SDK & Model Refusal Testing
+
+*A benchmark suite for evaluating LLM safety features on OCI, testing both model self-refusal behaviour and the OCI Guardrails SDK's ability to detect harmful content, PII, and prompt injection across multiple models (Cohere Command, Llama, Grok, Gemini, GPT).*
+
+Author: Brona Nilsson
+
+Reviewed: 12.02.2026
+
+# When to use this asset?
+
+*Use this asset when you need to evaluate OCI Generative AI model safety behaviour and the effectiveness of the OCI Guardrails SDK.*
+
+### Who
+- AI/ML engineers evaluating OCI Generative AI model safety
+- Security teams assessing guardrails effectiveness for content moderation
+- Solution architects comparing safety features across OCI-hosted models
+
+### When
+- Benchmarking model refusal rates for harmful, PII, or prompt-injection prompts
+- Evaluating OCI Guardrails SDK detection accuracy (pre- and post-inference)
+- Comparing safety behaviour across models (Cohere, Llama, Grok, Gemini, GPT)
+- Demonstrating the added value of guardrails on top of model self-refusal
+
+# How to use this asset?
+
+*The asset runs benchmark prompts against OCI-hosted models and analyses the results with charts and summaries.*
+
+1. Configure OCI credentials and model OCIDs in `.env` (see `.env.example`)
+2. Run benchmarks using the provided shell scripts
+3. Analyse results with the unified analysis script
+
+See the detailed [README](files/README.md) in the `files/` folder for full setup and usage instructions.
+
+### Key Capabilities
+- Tests 5 prompt categories: harmful, PII, prompt injection, ambiguous, and edge cases
+- Supports Cohere models with STRICT/CONTEXTUAL safety modes
+- Measures both model refusal and OCI Guardrails SDK detection (pre- and post-inference)
+- Generates 6 analysis charts and a summary report
+
+### File Structure
+```
+.
+├── README.md              # This file
+├── LICENSE
+└── files/
+    ├── README.md          # Detailed setup and usage guide
+    ├── cohere_benchmark.py
+    ├── generic_benchmark.py
+    ├── analyze_results.py
+    ├── run_cohere.sh
+    ├── run_generic.sh
+    ├── .env.example
+    ├── prompts/           # Test prompt sets
+    ├── results/           # Benchmark results (CSV)
+    └── charts/            # Generated visualisations
+```
+
+# Useful Links
+
+- [OCI Generative AI Documentation](https://docs.oracle.com/en-us/iaas/Content/generative-ai/home.htm)
+- [OCI Generative AI Guardrails](https://docs.oracle.com/en-us/iaas/Content/generative-ai/guardrails.htm)
+
+# License
+
+Copyright (c) 2026 Oracle and/or its affiliates.
+Licensed under the Universal Permissive License (UPL), Version 1.0.
+
+See [LICENSE](LICENSE) for more details.
@@ -0,0 +1,29 @@
+# OCI Configuration
+COMPARTMENT_ID=ocid1.compartment.oc1..aaaaaaaadr37v5isc5gx76kesg7kipgcem6xo44qscpxxl2yrznxkg53u6ya
+OCI_PROFILE=DEFAULT
+
+# Endpoints
+ENDPOINT_EU=https://inference.generativeai.eu-frankfurt-1.oci.oraclecloud.com
+ENDPOINT_US=https://inference.generativeai.us-chicago-1.oci.oraclecloud.com
+
+# =============================================================================
+# COHERE MODELS (use cohere_benchmark.py - has STRICT/CONTEXTUAL modes)
+# =============================================================================
+# Command-R+ 08-2024
+MODEL_COHERE_COMMAND_R_PLUS=ocid1.generativeaimodel.oc1.eu-frankfurt-1.amaaaaaask7dceyabdu6rjjmg75pixtecqvjen4x4st4mhs2a4zzfx5cgkmq
+# Command-A
+MODEL_COHERE_COMMAND_A=ocid1.generativeaimodel.oc1.eu-frankfurt-1.amaaaaaask7dceyaaypm2hg4db3evqkmjfdli5mggcxrhp2i4qmhvggyb4ja
+# Command-Vision (placeholder - replace with actual OCID when available)
+MODEL_COHERE_COMMAND_VISION=
+
+# =============================================================================
+# GENERIC MODELS (use generic_benchmark.py)
+# =============================================================================
+# Llama 3.3 (EU endpoint)
+MODEL_LLAMA_3_3=ocid1.generativeaimodel.oc1.eu-frankfurt-1.amaaaaaask7dceyaxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+# Grok-3 (US endpoint)
+MODEL_GROK_3=ocid1.generativeaimodel.oc1.us-chicago-1.amaaaaaask7dceya6dvgvvj3ovy4lerdl6fvx525x3yweacnrgn4ryfwwcoq
+
+# Placeholder model IDs - replace with actual OCIDs when available
+MODEL_GROK_4=
+MODEL_GEMINI=
@@ -0,0 +1,206 @@
+# OCI Generative AI Safety Benchmark
+
+Benchmark suite for testing LLM safety features and OCI Guardrails SDK efficacy.
+
+## Overview
+
+This benchmark tests:
+1. **Model refusal behavior** - How well models refuse harmful prompts
+2. **OCI Guardrails SDK** - Detection of harmful content, PII, and prompt injection
+
+## Quick Start
+
+```bash
+# 1. Set up environment
+cp .env.example .env
+# Edit .env with your OCI model OCIDs and compartment ID
+
+# 2. Install dependencies
+python -m venv venv
+source venv/bin/activate
+pip install oci pandas python-dotenv openpyxl
+
+# 3. Run benchmarks
+./run_generic.sh      # Llama, Grok, Gemini, GPT models
+./run_cohere.sh       # Cohere models (with STRICT/CONTEXTUAL modes)
+
+# 4. Analyze results
+python analyze_results.py
+```
+
+## Project Structure
+
+```
+.
+├── cohere_benchmark.py      # Cohere models (STRICT/CONTEXTUAL modes)
+├── generic_benchmark.py     # All other models (Llama, Grok, Gemini, GPT)
+├── run_cohere.sh           # Run all Cohere models
+├── run_generic.sh          # Run all generic models
+├── analyze_results.py             # Unified analysis: charts + summary for refusal & guardrails
+├── .env                    # Model IDs and configuration (not committed)
+├── .env.example            # Template for .env
+├── prompts/                # Test prompt sets
+│   ├── harmful_prompts.py
+│   ├── pii_prompts.py
+│   ├── promptinjection_prompts.py
+│   ├── ambiguous_prompts.py
+│   └── edge_cases_prompts.py
+├── results/                # Benchmark results (CSV)
+├── results_v2/             # Benchmark results v2 (CSV)
+└── charts*/                # Generated visualizations
+```
+
+## Configuration
+
+### .env File
+
+```bash
+# OCI Configuration
+COMPARTMENT_ID=ocid1.compartment.oc1..xxxxx
+OCI_PROFILE=DEFAULT
+
+# Endpoints
+ENDPOINT_EU=https://inference.generativeai.eu-frankfurt-1.oci.oraclecloud.com
+ENDPOINT_US=https://inference.generativeai.us-chicago-1.oci.oraclecloud.com
+
+# Model OCIDs
+MODEL_COHERE_COMMAND_R_PLUS=ocid1.generativeaimodel.oc1...
+MODEL_LLAMA_3_3=ocid1.generativeaimodel.oc1...
+MODEL_GROK_3=ocid1.generativeaimodel.oc1...
+# ... etc
+```
+
+## Running Benchmarks
+
+### Full Benchmark
+```bash
+./run_generic.sh    # All generic models
+./run_cohere.sh     # All Cohere models
+```
+
+### Test Mode (2 prompts per set)
+```bash
+./run_generic.sh --test
+./run_cohere.sh --test
+```
+
+### Single Model
+```bash
+python generic_benchmark.py \
+    --model-name "my-model" \
+    --model-id "ocid1.generativeaimodel..." \
+    --compartment-id "$COMPARTMENT_ID" \
+    --endpoint "https://inference.generativeai.eu-frankfurt-1.oci.oraclecloud.com"
+```
+
+### Options
+
+| Flag | Description |
+|------|-------------|
+| `--test` | Run only 2 prompts per set |
+| `--overwrite` | Overwrite existing result files |
+| `--skip-guardrails` | Skip OCI guardrails calls (model-only) |
+| `--skip-model` | Skip model inference (guardrails-only) |
+| `--output-dir DIR` | Output directory (default: results_v2) |
+
+## Adding New Prompts
+
+Create a new file in `prompts/` following this pattern:
+
+```python
+# prompts/my_prompts.py
+
+my_prompts = [
+    "First test prompt...",
+    "Second test prompt...",
+    # Add more prompts
+]
+```
+
+The benchmark automatically discovers all `*_prompts.py` files and variables ending with `_prompts`.
+
+## Adding New Models
+
+1. Add the model OCID to `.env`:
+   ```bash
+   MODEL_MY_NEW_MODEL=ocid1.generativeaimodel.oc1...
+   ```
+
+2. Add to the appropriate run script (`run_generic.sh` or `run_cohere.sh`):
+   ```bash
+   if [ -n "${MODEL_MY_NEW_MODEL:-}" ]; then
+       python generic_benchmark.py \
+           --model-name "my-new-model" \
+           --model-id "$MODEL_MY_NEW_MODEL" \
+           --compartment-id "$COMPARTMENT_ID" \
+           --endpoint "$ENDPOINT_EU" \
+           --output-dir "$OUTPUT_DIR" \
+           "$@"
+   fi
+   ```
+
+## Output Format
+
+Results are saved as CSV with these columns:
+
+| Column | Description |
+|--------|-------------|
+| `Prompt` | The test prompt |
+| `Model` | Model name |
+| `Mode` | Safety mode (Cohere only: STRICT/CONTEXTUAL) |
+| `Refused` | Did model refuse? (yes/no/error) |
+| `LatencyMs` | Response time in milliseconds |
+| `ModelOutput` | Model's response |
+| `Pre_OCIFlagged` | Guardrails flagged the prompt? (yes/no) |
+| `Pre_FlaggedCategories` | Categories detected in prompt |
+| `Pre_DetectedPIITypes` | PII types found in prompt |
+| `Pre_PromptInjectionScore` | Prompt injection score (0-1) |
+| `Post_OCIFlagged` | Guardrails flagged the response? (yes/no) |
+| `Post_*` | Same fields for model response |
+
+## Analyzing Results
+
+A single script produces all charts and a printed summary:
+
+```bash
+python analyze_results.py                          # auto-detects results dir
+python analyze_results.py --results-dir results_v2 # explicit dir
+python analyze_results.py --output-dir my_charts   # custom output dir
+```
+
+Generates 6 charts:
+1. Model self-refusal rate by model
+2. Guardrails detection rate by model (Guardrails ON only)
+3. Guardrails detection rate by prompt type
+4. Model refusal vs Guardrails vs Combined comparison
+5. Pre (prompt) vs Post (response) guardrails detection
+6. Combined blocked-rate heatmap by model and prompt type
+
+## Key Findings
+
+The OCI Guardrails SDK detects:
+- **PII**: ~80-86% detection (names, addresses, emails)
+- **Prompt Injection**: ~70-75% detection
+- **Violence**: Explicit violence keywords only (~15%)
+- **Other harmful content**: Limited detection for drugs, CSAM, terrorism, etc.
+
+The guardrails add value on top of model refusals:
+- Model refusal alone: ~20-25%
+- Guardrails detection alone: ~45-55%
+- Combined (either): ~50-65%
+
+## Requirements
+
+- Python 3.11+
+- OCI CLI configured (`~/.oci/config`)
+- OCI Generative AI access with model deployments
+
+## Dependencies
+
+```
+oci
+pandas
+python-dotenv
+openpyxl
+matplotlib
+```