|
| 1 | +--- |
| 2 | +title: Egg-bin Windows shell probe hotspot |
| 3 | +type: workflow |
| 4 | +summary: How PR #6014 diagnosed hosted-Windows egg-bin startup slowness and why the final fix only presets SHELL. |
| 5 | +source_files: |
| 6 | + - tools/egg-bin/bin/run.js |
| 7 | + - tools/egg-bin/test/fixtures/my-egg-bin/bin/run.js |
| 8 | + - https://github.com/eggjs/egg/pull/6014 |
| 9 | + - https://github.com/eggjs/egg/actions/runs/28316987317 |
| 10 | + - https://github.com/eggjs/egg/actions/runs/28317525249 |
| 11 | +updated_at: 2026-06-28 |
| 12 | +status: active |
| 13 | +--- |
| 14 | + |
| 15 | +## Finding |
| 16 | + |
| 17 | +The Windows `test-egg-bin` hotspot in PR #6014 was repeated child-process CLI |
| 18 | +startup, not Vitest sharding, globby, or `egg-bin test` config construction. |
| 19 | +Hosted Windows runners started spawned `egg-bin` children without `SHELL`, which |
| 20 | +made oclif synchronously probe the parent process shell through PowerShell/CIM. |
| 21 | +Files such as `tools/egg-bin/test/commands/test.test.ts` amplify that cost |
| 22 | +because they spawn many child `egg-bin` processes. |
| 23 | + |
| 24 | +Inference: the hosted-runner slowdown looked like a long test file, but the |
| 25 | +dominant repeated cost happened before the command's own test work. Temporary |
| 26 | +CI-only timing showed normal `test` command globby/config work was tiny, while |
| 27 | +pre-command oclif startup dominated. |
| 28 | + |
| 29 | +## Final Fix |
| 30 | + |
| 31 | +The final code fix is deliberately small: |
| 32 | + |
| 33 | +- in `tools/egg-bin/bin/run.js`, preset `process.env.SHELL` on Windows when it is |
| 34 | + missing, deriving the shell name from `COMSPEC`/`ComSpec` or falling back to |
| 35 | + `cmd.exe` |
| 36 | +- do the same in `tools/egg-bin/test/fixtures/my-egg-bin/bin/run.js`, because the |
| 37 | + custom CLI fixture has its own oclif entrypoint |
| 38 | +- dynamically import `@oclif/core` after the preset, so oclif observes `SHELL` |
| 39 | + before it initializes |
| 40 | + |
| 41 | +The diagnostic instrumentation used during the investigation was removed from |
| 42 | +the final PR. The CI workflow stays as one Windows `test-egg-bin` leg; sharding |
| 43 | +is not needed after the shell-probe fix. |
| 44 | + |
| 45 | +## Evidence |
| 46 | + |
| 47 | +Observed on CI while diagnosing PR #6014: |
| 48 | + |
| 49 | +- Before the fix, `test/commands/test.test.ts` on Windows took about 1,077s. |
| 50 | +- After the main entrypoint preset, `test/commands/test.test.ts` dropped to about |
| 51 | + 42-48s. |
| 52 | +- After applying the same preset to the custom CLI fixture, `test/my-egg-bin.test.ts` |
| 53 | + dropped from about 299s to about 7.6s. |
| 54 | +- After removing the diagnostic code and keeping the minimal fix, the single |
| 55 | + Windows `test-egg-bin` job passed in about 2m41s; file-level timings included |
| 56 | + `test/commands/test.test.ts` at 44.2s, `test/my-egg-bin.test.ts` at 7.0s, and |
| 57 | + `test/commands/cov.test.ts` at 21.9s. |
| 58 | + |
| 59 | +Remaining slow cases are real test/app startup work, mainly TypeScript fixture |
| 60 | +cases around 9-10s, not repeated oclif shell probing. |
0 commit comments