Fix: Enable LLM-as-Judge base model evaluation integration tests, Add cleanup mechanism for MC dataset integ test #291
| Job | Run time |
|---|---|
| 5s | |
| 3s | |
| 9s | |
| 30m 31s | |
| 6m 29s | |
| 29m 56s | |
| 21m 41s | |
| 6m 40s | |
| 5m 41s | |
| 42m 57s | |
| 1h 1m 41s | |
| 32m 6s | |
| 5s | |
| 9s | |
| 3s | |
| 1h 2m 22s | |
| 21m 41s | |
| 6m 40s | |
| 5m 41s | |
| 6m 29s | |
| 42m 57s | |
| 32m 6s | |
| 30m 31s | |
| 29m 56s | |
| 3s | |
| 1h 20m 22s | |
| 9s | |
| 30m 31s | |
| 32m 6s | |
| 29m 56s | |
| 42m 57s | |
| 6m 40s | |
| 6m 29s | |
| 5m 41s | |
| 21m 41s | |
| 5s | |
| 12h 13m 19s |