|
| 1 | +# Test Scenarios for Code Interpreter Fix |
| 2 | + |
| 3 | +## Prerequisites |
| 4 | +1. Open WebUI is running with the updated Gemini pipeline |
| 5 | +2. Code interpreter feature is enabled in Open WebUI settings |
| 6 | +3. Google API key or Vertex AI credentials are configured |
| 7 | +4. A Gemini model is selected (e.g., gemini-2.5-pro, gemini-2.0-flash) |
| 8 | + |
| 9 | +## Test Scenario 1: Simple Code Execution |
| 10 | + |
| 11 | +**Prompt:** |
| 12 | +``` |
| 13 | +Write and execute Python code to calculate pi to 10 decimal places using the Leibniz formula. |
| 14 | +``` |
| 15 | + |
| 16 | +**Expected Behavior:** |
| 17 | +- Gemini should respond with code explanation |
| 18 | +- Code should execute automatically |
| 19 | +- Results should be displayed showing pi ≈ 3.1415926536 |
| 20 | + |
| 21 | +**What to Check:** |
| 22 | +- No repeated text (the original bug symptom) |
| 23 | +- Code is displayed in a code block |
| 24 | +- Execution results are shown |
| 25 | +- No errors in console logs |
| 26 | + |
| 27 | +## Test Scenario 2: Data Visualization |
| 28 | + |
| 29 | +**Prompt:** |
| 30 | +``` |
| 31 | +Create a simple bar chart showing the first 5 Fibonacci numbers using matplotlib. |
| 32 | +``` |
| 33 | + |
| 34 | +**Expected Behavior:** |
| 35 | +- Code generates a bar chart |
| 36 | +- Chart is displayed in the response |
| 37 | +- No repeated text errors |
| 38 | + |
| 39 | +**What to Check:** |
| 40 | +- Code executes successfully |
| 41 | +- Image/chart is visible |
| 42 | +- Proper error handling if matplotlib isn't available |
| 43 | + |
| 44 | +## Test Scenario 3: Error Handling |
| 45 | + |
| 46 | +**Prompt:** |
| 47 | +``` |
| 48 | +Execute this Python code: print(1/0) |
| 49 | +``` |
| 50 | + |
| 51 | +**Expected Behavior:** |
| 52 | +- Code attempts to execute |
| 53 | +- Division by zero error is caught and displayed |
| 54 | +- Error message is clear and doesn't break the UI |
| 55 | + |
| 56 | +**What to Check:** |
| 57 | +- Error is handled gracefully |
| 58 | +- No system crash or hung requests |
| 59 | +- Error message is visible to user |
| 60 | + |
| 61 | +## Test Scenario 4: Multi-turn Conversation |
| 62 | + |
| 63 | +**Prompt 1:** |
| 64 | +``` |
| 65 | +Create a Python function to calculate factorial of a number. |
| 66 | +``` |
| 67 | + |
| 68 | +**Prompt 2:** |
| 69 | +``` |
| 70 | +Now use that function to calculate factorial of 10. |
| 71 | +``` |
| 72 | + |
| 73 | +**Expected Behavior:** |
| 74 | +- First response creates and shows the function |
| 75 | +- Second response uses the function and shows result (3628800) |
| 76 | +- Context is maintained between turns |
| 77 | + |
| 78 | +**What to Check:** |
| 79 | +- Multi-turn context works correctly |
| 80 | +- Variables/functions from previous turns are available |
| 81 | +- No repeated text issues |
| 82 | + |
| 83 | +## Test Scenario 5: Complex Calculation |
| 84 | + |
| 85 | +**Prompt:** |
| 86 | +``` |
| 87 | +Calculate the first 20 prime numbers using Python. |
| 88 | +``` |
| 89 | + |
| 90 | +**Expected Behavior:** |
| 91 | +- Code is generated and executed |
| 92 | +- List of primes is displayed: [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71] |
| 93 | + |
| 94 | +**What to Check:** |
| 95 | +- Complex logic executes correctly |
| 96 | +- Results are accurate |
| 97 | +- No timeout errors |
| 98 | + |
| 99 | +## Test Scenario 6: Streaming Mode |
| 100 | + |
| 101 | +Enable streaming responses and test: |
| 102 | + |
| 103 | +**Prompt:** |
| 104 | +``` |
| 105 | +Generate a simple "Hello, World!" program in Python and execute it. |
| 106 | +``` |
| 107 | + |
| 108 | +**Expected Behavior:** |
| 109 | +- Response streams in real-time |
| 110 | +- Code execution still works |
| 111 | +- No "chunk too big" errors |
| 112 | + |
| 113 | +**What to Check:** |
| 114 | +- Streaming works smoothly |
| 115 | +- Tool calls are detected in streaming mode |
| 116 | +- No response corruption |
| 117 | + |
| 118 | +## Test Scenario 7: Non-Streaming Mode |
| 119 | + |
| 120 | +Disable streaming responses and test: |
| 121 | + |
| 122 | +**Prompt:** |
| 123 | +``` |
| 124 | +Calculate the sum of numbers from 1 to 100 using Python. |
| 125 | +``` |
| 126 | + |
| 127 | +**Expected Behavior:** |
| 128 | +- Response arrives all at once |
| 129 | +- Code executes and shows result (5050) |
| 130 | +- No repeated text |
| 131 | + |
| 132 | +**What to Check:** |
| 133 | +- Non-streaming mode works |
| 134 | +- Tool calls are detected and emitted |
| 135 | +- Format is correct |
| 136 | + |
| 137 | +## Debugging Tips |
| 138 | + |
| 139 | +If tests fail, check: |
| 140 | + |
| 141 | +1. **Browser Console**: Look for JavaScript errors or failed API calls |
| 142 | +2. **Open WebUI Logs**: Check for Python exceptions or warnings |
| 143 | +3. **Network Tab**: Inspect the API request/response format |
| 144 | +4. **Event Emitter**: Verify events are being emitted correctly |
| 145 | + |
| 146 | +Key indicators of success: |
| 147 | +- ✅ No repeated text in responses |
| 148 | +- ✅ Code blocks are properly formatted |
| 149 | +- ✅ Execution results are displayed |
| 150 | +- ✅ Tool call events appear in logs |
| 151 | +- ✅ Format matches OpenAI tool call structure |
| 152 | + |
| 153 | +Key indicators of issues: |
| 154 | +- ❌ Text repeats multiple times |
| 155 | +- ❌ Code doesn't execute |
| 156 | +- ❌ "function_call" errors in logs |
| 157 | +- ❌ Missing tool_calls in API response |
| 158 | +- ❌ Malformed JSON in arguments |
| 159 | + |
| 160 | +## Log Monitoring |
| 161 | + |
| 162 | +Watch for these log messages (set log level to DEBUG): |
| 163 | + |
| 164 | +**Success indicators:** |
| 165 | +``` |
| 166 | +Detected tool call: <function_name> with args: <args> |
| 167 | +Emitted tool call: <function_name> with args: <args> |
| 168 | +``` |
| 169 | + |
| 170 | +**Error indicators:** |
| 171 | +``` |
| 172 | +Error processing content part: ... |
| 173 | +Failed to access content parts: ... |
| 174 | +``` |
| 175 | + |
| 176 | +## Comparison with Azure Pipeline |
| 177 | + |
| 178 | +To verify the fix matches Azure's behavior, test the same prompts with both: |
| 179 | +1. Azure AI pipeline (known working) |
| 180 | +2. Gemini pipeline (after fix) |
| 181 | + |
| 182 | +Both should execute code successfully and show results. |
0 commit comments