@@ -1090,37 +1090,38 @@ Build up complexity gradually:
10901090## Summary
10911091
10921092<Tip>
1093- **Key takeaways for advanced eval testing:**
1093+ **Key takeaways for advanced eval testing:**
10941094
1095- **Testing strategy:**
1095+ **Testing strategy:**
10961096
1097- - Use smoke tests before comprehensive suites
1098- - Build regression tests when fixing bugs
1099- - Cover edge cases systematically
1097+ - Use smoke tests before comprehensive suites
1098+ - Build regression tests when fixing bugs
1099+ - Cover edge cases systematically
11001100
1101- **Validation selection:**
1101+ **Validation selection:**
11021102
1103- - Exact match for critical data
1104- - Regex for pattern matching
1105- - AI judge for semantic evaluation
1103+ - Exact match for critical data
1104+ - Regex for pattern matching
1105+ - AI judge for semantic evaluation
11061106
1107- **Performance:**
1107+ **Performance:**
11081108
1109- - Exit early on critical failures
1110- - Keep conversations focused (5-10 turns)
1111- - Batch related tests together
1109+ - Exit early on critical failures
1110+ - Keep conversations focused (5-10 turns)
1111+ - Batch related tests together
11121112
1113- **Maintenance:**
1113+ **Maintenance:**
11141114
1115- - Version control evaluations
1116- - Review failures promptly
1117- - Update tests with features
1118- - Document test purpose clearly
1115+ - Version control evaluations
1116+ - Review failures promptly
1117+ - Update tests with features
1118+ - Document test purpose clearly
11191119
1120- **CI/CD:**
1120+ **CI/CD:**
11211121
1122- - Automate critical tests in pipelines
1123- - Use staging for full suite validation
1124- - Set quality gate thresholds
1125- - Run regression suites regularly
1126- </Tip>
1122+ - Automate critical tests in pipelines
1123+ - Use staging for full suite validation
1124+ - Set quality gate thresholds
1125+ - Run regression suites regularly
1126+
1127+ </Tip>
0 commit comments