Commit fe11a6b
committed
kubernetes: decode pod logs with errors=replace to survive torn UTF-8 bytes
The Kubernetes client's default read_namespaced_pod_log path does a strict
.decode('utf8') over the full log payload before checking HTTP status.
When a pod with high-volume tqdm progress bars (block glyphs █▉▊▋▌▍▎▏,
3-byte UTF-8) runs with num_proc>1, concurrent writes to the same fd can
split a multi-byte glyph across a chunk boundary, leaving an orphaned
continuation byte. The strict decode throws UnicodeDecodeError, which
bubbles through the log-upload retry wrapper and marks an otherwise-healthy
training run as SYSTEM_ERROR.
Fix: pass _preload_content=False to get the raw urllib3 response and decode
manually with errors="replace". This is applied to both the single-pod
(LaunchedKubernetesContainer.get_log) and multi-pod
(LaunchedKubernetesJob._get_log_by_pod_key) log-read paths.
A warning is logged whenever replacement characters are injected, so the
next occurrence is observable in Observe without requiring a separate
debug build.
The existing "Bad Request" catch for PodInitializing is unaffected:
the kubernetes client's status check runs outside the _preload_content
block and still raises ApiException with the correct reason phrase.1 parent decae68 commit fe11a6b
1 file changed
Lines changed: 26 additions & 3 deletions
Lines changed: 26 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
922 | 922 | | |
923 | 923 | | |
924 | 924 | | |
925 | | - | |
| 925 | + | |
| 926 | + | |
926 | 927 | | |
927 | 928 | | |
928 | 929 | | |
| |||
931 | 932 | | |
932 | 933 | | |
933 | 934 | | |
| 935 | + | |
934 | 936 | | |
| 937 | + | |
| 938 | + | |
| 939 | + | |
| 940 | + | |
| 941 | + | |
| 942 | + | |
| 943 | + | |
| 944 | + | |
| 945 | + | |
935 | 946 | | |
936 | 947 | | |
937 | 948 | | |
| |||
1490 | 1501 | | |
1491 | 1502 | | |
1492 | 1503 | | |
1493 | | - | |
| 1504 | + | |
| 1505 | + | |
| 1506 | + | |
| 1507 | + | |
1494 | 1508 | | |
1495 | 1509 | | |
1496 | 1510 | | |
1497 | 1511 | | |
1498 | 1512 | | |
| 1513 | + | |
1499 | 1514 | | |
1500 | | - | |
| 1515 | + | |
| 1516 | + | |
| 1517 | + | |
| 1518 | + | |
| 1519 | + | |
| 1520 | + | |
| 1521 | + | |
| 1522 | + | |
| 1523 | + | |
1501 | 1524 | | |
1502 | 1525 | | |
1503 | 1526 | | |
| |||
0 commit comments