Commit db2050f
committed
kubernetes: suppress tqdm progress bars in all container pods
Inject TQDM_DISABLE=1 and HF_DATASETS_DISABLE_PROGRESS_BARS=1 into every
Kubernetes container's environment unless the component has already set
those keys explicitly (user values take precedence).
High-volume tqdm block-glyph output (█▉▊▋▌▍▎▏, 3-byte UTF-8) from
concurrent HF datasets workers (num_proc>1) is the dominant source of
non-ASCII bytes in pod log streams. Eliminating the glyphs at the source
makes the log stream pure ASCII for tokenization/packing phases, removing
any possibility of torn multi-byte sequences reaching the Kubernetes API
read path regardless of the defensive decode added in the previous commit.
Side effect: log sizes for heavy tokenization jobs drop significantly
(observed ~6 MB → tens of KB), since tqdm progress bars account for the
bulk of the raw byte volume.1 parent 7a5ff1b commit db2050f
1 file changed
Lines changed: 14 additions & 0 deletions
Lines changed: 14 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
69 | 69 | | |
70 | 70 | | |
71 | 71 | | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
72 | 82 | | |
73 | 83 | | |
74 | 84 | | |
| |||
352 | 362 | | |
353 | 363 | | |
354 | 364 | | |
| 365 | + | |
| 366 | + | |
| 367 | + | |
| 368 | + | |
355 | 369 | | |
356 | 370 | | |
357 | 371 | | |
| |||
0 commit comments