Commit a5a7f5d
fix: Replace hard-coded precision thresholds with std-based bounds (#1864)
* Update coordinator guide: run only relevant tests, not full suite
Worker agents were running the full test suite (10+ min) which is
wasteful when only a small area of code changed. Updated the completion
workflow to instruct agents to run only relevant test files/functions.
The full suite will be run separately later.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: Replace hard-coded precision thresholds with std-based bounds
Precision tests were flaky because thresholds were set too close to the
empirical mean error, leaving insufficient margin for GPU architecture
differences. For example, test_4bit_quant for fp4/blocksize=256 used a
threshold of 0.2908 + 0.001 = 0.2918, but Blackwell GPUs observed values
around 0.2909 — only ~5 sigma from the mean, causing sporadic failures.
Collected (mean, std) statistics from 200 samples per configuration on
RTX 4090. Thresholds are now set at mean + 7*std, giving ~7 sigma of
headroom for the measured GPU and enough margin to accommodate
cross-architecture mean shifts (e.g., T4, Blackwell, XPU).
Changes in test_functional.py:
- test_4bit_quant: error_dict now stores (mean, std) tuples instead of
bare means. Removed ad-hoc errtol/reltol special-casing for CPU fp32.
- test_gemv_4bit: Replaced complex if/elif threshold tree (with GPU-
specific carve-outs like T4 compute cap checks and XPU conditionals)
with a clean per-dtype/dim-range (mean, std) table. Individual-sample
std is used (not divided by sqrt(iters)) so thresholds naturally
accommodate architecture-specific kernel behavior.
Changes in test_parametrize.py:
- test_replace_parameter_4bit: Same (mean, std) approach as test_4bit_quant.
- test_moe_parameter_shape: Replaced flat 0.085/0.25 bounds with measured
MoE-tensor-specific (mean, std).
- test_different_blocksizes: Same (mean, std) approach as test_4bit_quant.
- test_parametrization_forward_method: Replaced flat 0.08/0.25 bounds with
small-tensor-specific (mean, std); small 64x64 tensors have ~16x higher
relative std than 1024x1024 due to fewer quantization blocks.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* style: Fix ruff lint and format violations
- Replace ambiguous unicode multiplication sign with ASCII x
- Apply ruff format to long assert lines
- Fix test_linear4bit.py pre-existing format violation
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>1 parent d77e01c commit a5a7f5d
2 files changed
+146
-119
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
865 | 865 | | |
866 | 866 | | |
867 | 867 | | |
868 | | - | |
869 | | - | |
870 | | - | |
871 | | - | |
872 | | - | |
873 | | - | |
874 | | - | |
875 | | - | |
876 | | - | |
877 | | - | |
878 | | - | |
879 | | - | |
880 | | - | |
| 868 | + | |
| 869 | + | |
| 870 | + | |
| 871 | + | |
| 872 | + | |
| 873 | + | |
| 874 | + | |
| 875 | + | |
| 876 | + | |
| 877 | + | |
| 878 | + | |
| 879 | + | |
| 880 | + | |
| 881 | + | |
| 882 | + | |
| 883 | + | |
| 884 | + | |
| 885 | + | |
| 886 | + | |
| 887 | + | |
| 888 | + | |
| 889 | + | |
| 890 | + | |
| 891 | + | |
| 892 | + | |
| 893 | + | |
| 894 | + | |
| 895 | + | |
| 896 | + | |
| 897 | + | |
| 898 | + | |
| 899 | + | |
| 900 | + | |
| 901 | + | |
| 902 | + | |
| 903 | + | |
| 904 | + | |
| 905 | + | |
| 906 | + | |
| 907 | + | |
| 908 | + | |
| 909 | + | |
| 910 | + | |
| 911 | + | |
| 912 | + | |
| 913 | + | |
| 914 | + | |
| 915 | + | |
| 916 | + | |
881 | 917 | | |
882 | | - | |
883 | | - | |
884 | | - | |
885 | | - | |
886 | | - | |
887 | | - | |
888 | | - | |
889 | | - | |
890 | | - | |
891 | | - | |
892 | | - | |
893 | | - | |
894 | | - | |
895 | | - | |
896 | | - | |
897 | | - | |
898 | | - | |
899 | | - | |
900 | | - | |
901 | | - | |
902 | | - | |
903 | | - | |
904 | | - | |
905 | | - | |
906 | | - | |
907 | | - | |
908 | | - | |
909 | | - | |
910 | | - | |
911 | | - | |
912 | | - | |
913 | | - | |
914 | | - | |
915 | | - | |
916 | | - | |
917 | 918 | | |
918 | | - | |
919 | | - | |
| 919 | + | |
| 920 | + | |
| 921 | + | |
| 922 | + | |
| 923 | + | |
| 924 | + | |
| 925 | + | |
| 926 | + | |
920 | 927 | | |
921 | 928 | | |
922 | 929 | | |
| |||
1122 | 1129 | | |
1123 | 1130 | | |
1124 | 1131 | | |
1125 | | - | |
1126 | | - | |
1127 | | - | |
1128 | | - | |
1129 | | - | |
1130 | | - | |
1131 | | - | |
1132 | | - | |
1133 | | - | |
1134 | | - | |
1135 | | - | |
1136 | | - | |
| 1132 | + | |
| 1133 | + | |
| 1134 | + | |
| 1135 | + | |
| 1136 | + | |
| 1137 | + | |
| 1138 | + | |
| 1139 | + | |
| 1140 | + | |
| 1141 | + | |
| 1142 | + | |
| 1143 | + | |
| 1144 | + | |
| 1145 | + | |
| 1146 | + | |
| 1147 | + | |
| 1148 | + | |
| 1149 | + | |
| 1150 | + | |
| 1151 | + | |
| 1152 | + | |
| 1153 | + | |
| 1154 | + | |
| 1155 | + | |
| 1156 | + | |
| 1157 | + | |
| 1158 | + | |
| 1159 | + | |
| 1160 | + | |
| 1161 | + | |
| 1162 | + | |
| 1163 | + | |
| 1164 | + | |
| 1165 | + | |
| 1166 | + | |
| 1167 | + | |
| 1168 | + | |
| 1169 | + | |
| 1170 | + | |
| 1171 | + | |
1137 | 1172 | | |
1138 | | - | |
1139 | | - | |
1140 | | - | |
1141 | | - | |
1142 | | - | |
1143 | | - | |
1144 | | - | |
1145 | | - | |
1146 | | - | |
1147 | | - | |
1148 | | - | |
1149 | | - | |
1150 | | - | |
1151 | | - | |
1152 | | - | |
1153 | | - | |
1154 | | - | |
1155 | 1173 | | |
1156 | 1174 | | |
1157 | 1175 | | |
1158 | 1176 | | |
1159 | | - | |
1160 | | - | |
1161 | | - | |
1162 | | - | |
1163 | | - | |
1164 | | - | |
1165 | | - | |
1166 | | - | |
1167 | 1177 | | |
1168 | 1178 | | |
1169 | 1179 | | |
1170 | 1180 | | |
1171 | | - | |
1172 | | - | |
1173 | | - | |
1174 | | - | |
1175 | | - | |
1176 | | - | |
1177 | | - | |
1178 | | - | |
1179 | | - | |
1180 | 1181 | | |
1181 | 1182 | | |
1182 | 1183 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
67 | 67 | | |
68 | 68 | | |
69 | 69 | | |
70 | | - | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
71 | 73 | | |
72 | 74 | | |
73 | | - | |
74 | | - | |
75 | | - | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
76 | 78 | | |
77 | 79 | | |
78 | | - | |
79 | | - | |
80 | | - | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
81 | 83 | | |
82 | 84 | | |
83 | 85 | | |
84 | | - | |
85 | | - | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
86 | 94 | | |
87 | 95 | | |
88 | 96 | | |
| |||
117 | 125 | | |
118 | 126 | | |
119 | 127 | | |
120 | | - | |
121 | | - | |
122 | | - | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
123 | 132 | | |
124 | | - | |
125 | | - | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
126 | 139 | | |
127 | 140 | | |
128 | 141 | | |
| |||
346 | 359 | | |
347 | 360 | | |
348 | 361 | | |
349 | | - | |
350 | | - | |
351 | | - | |
| 362 | + | |
| 363 | + | |
| 364 | + | |
| 365 | + | |
352 | 366 | | |
353 | | - | |
354 | | - | |
| 367 | + | |
| 368 | + | |
| 369 | + | |
| 370 | + | |
| 371 | + | |
| 372 | + | |
| 373 | + | |
355 | 374 | | |
356 | | - | |
357 | 375 | | |
358 | 376 | | |
359 | 377 | | |
| |||
380 | 398 | | |
381 | 399 | | |
382 | 400 | | |
383 | | - | |
384 | | - | |
385 | | - | |
| 401 | + | |
| 402 | + | |
| 403 | + | |
| 404 | + | |
| 405 | + | |
| 406 | + | |
| 407 | + | |
| 408 | + | |
| 409 | + | |
| 410 | + | |
| 411 | + | |
386 | 412 | | |
387 | 413 | | |
388 | 414 | | |
| |||
0 commit comments