Skip to content

Commit e6c9dd4

Browse files
ThomasK33claude
andauthored
fix(server): retry port binding on EADDRINUSE for parallel instances (#283) (#284)
## Summary Fixes #283 — a second Neovim instance failing to start the WebSocket server with `Failed to listen on port 48811: EADDRINUSE: address already in use` when another instance is already running. ## Root cause Three contributing factors; only the first is new: 1. **Regression (introduced in #282).** `find_available_port` was rewritten from a shuffle (`utils.shuffle_array`, which seeded the PRNG via `math.randomseed(os.time())` as a side effect) into a bare, **unseeded** `math.random`. Under LuaJIT's fixed default seed, every fresh Neovim process draws the identical offset and selects the *identical* port, so parallel instances always collide. 2. **Broken probe (pre-existing).** The throwaway bind probe cannot detect an active listener: libuv's `uv_tcp_bind` defers `EADDRINUSE` into `delayed_error` and returns success, so the error only surfaces at `listen()` — which is why the message reads "Failed to listen", not "Failed to bind". 3. **No retry (pre-existing).** `create_server` committed to a single pre-probed port and gave up on failure. The lost seeding turned a previously-rare same-second collision into a deterministic, every-time failure. ## Fix - Seed the PRNG once per process in `tcp.lua` (`os.time` plus a sub-second `hrtime` jitter; `hrtime` is guarded because some test stubs omit it), restoring per-process port spread. - `create_server` now binds **and** listens on a single fresh handle per candidate port and advances to the next candidate on failure, so racing instances recover instead of erroring out. The bind-only `find_available_port` is retained as a best-effort pre-filter only (it cannot detect a live listener). - Removed the now-orphaned `utils.shuffle_array` (dead code since #282). - The candidate iterator stays a closure rather than materializing the ~55k-entry range, preserving the startup-cost improvement from #282. ## Verification - `mise run all` — 658 tests pass, luacheck clean (0 warnings), formatted. - New regression tests in `tests/unit/server/tcp_spec.lua`: retry past a busy candidate, range exhaustion (clear error, every handle closed), the bind-success/`listen()`-`EADDRINUSE` case, and listening-handle identity. - `scripts/repro_issue_283.sh` (added in the first commit) now **exits 0** — two parallel instances start on *different* ports instead of colliding; it exited 1 before the fix. A full AI-generated triage with the mechanism analysis (including libuv `delayed_error` source references) is posted on #283. 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Signed-off-by: Thomas Kosiewski <tk@coder.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1 parent 7644451 commit e6c9dd4

6 files changed

Lines changed: 658 additions & 59 deletions

File tree

fixtures/issue-283/init.lua

Lines changed: 99 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,99 @@
1+
-- Fixture for issue #283:
2+
-- "find_available_port probe-then-rebind races; create_server has no retry ->
3+
-- EADDRINUSE with parallel Neovim instances (regression in #282)"
4+
-- https://github.com/coder/claudecode.nvim/issues/283
5+
--
6+
-- This fixture starts the REAL claudecode WebSocket server on launch and prints
7+
-- a big banner showing whether THIS instance got a listening port or failed.
8+
--
9+
-- Reproduction (from repo root), in TWO terminals:
10+
-- source fixtures/nvim-aliases.sh
11+
-- vv issue-283 # terminal 1 -> "LISTENING on port 48811"
12+
-- vv issue-283 # terminal 2 -> "FAILED ... Failed to listen on port 48811: EADDRINUSE"
13+
--
14+
-- Because #282 dropped the per-process RNG seeding, every fresh Neovim picks the
15+
-- SAME port (48811 with the default 10000-65535 range), so the second instance
16+
-- always collides. The probe in find_available_port cannot notice the first
17+
-- instance's listener (libuv defers EADDRINUSE to listen()), and create_server
18+
-- does not retry, so the integration never starts in instance 2.
19+
--
20+
-- :ReproStatus re-print this instance's server status
21+
-- :ReproStop stop this instance's server (frees the port / lockfile)
22+
23+
local config_dir = vim.fn.stdpath("config")
24+
local repo_root = vim.fn.fnamemodify(config_dir, ":h:h")
25+
vim.opt.rtp:prepend(repo_root)
26+
27+
vim.g.mapleader = " "
28+
vim.g.maplocalleader = "\\"
29+
vim.o.showtabline = 0
30+
vim.o.laststatus = 2
31+
32+
local ok, claudecode = pcall(require, "claudecode")
33+
assert(ok, "Failed to load claudecode.nvim from repo root: " .. tostring(claudecode))
34+
35+
-- auto_start = false so we can call start() ourselves and capture its result.
36+
claudecode.setup({
37+
auto_start = false,
38+
log_level = "info",
39+
terminal = {
40+
provider = "native",
41+
auto_close = false,
42+
},
43+
})
44+
45+
local started_ok, started_info = claudecode.start(false)
46+
47+
local function status_lines()
48+
local running = claudecode.state and claudecode.state.server ~= nil
49+
local port = claudecode.state and claudecode.state.port or nil
50+
local lines = {
51+
"claudecode.nvim -- issue #283 reproduction fixture",
52+
"",
53+
"Run `vv issue-283` in a SECOND terminal while this one is open.",
54+
"",
55+
}
56+
if started_ok and running then
57+
lines[#lines + 1] = "THIS INSTANCE: ✅ server LISTENING on port " .. tostring(port)
58+
lines[#lines + 1] = ""
59+
lines[#lines + 1] = "Now open a second instance: it should FAIL on the same port"
60+
lines[#lines + 1] = "with EADDRINUSE, because every fresh Neovim deterministically"
61+
lines[#lines + 1] = "selects this same port (lost RNG seeding in #282)."
62+
else
63+
lines[#lines + 1] = "THIS INSTANCE: ❌ server FAILED to start"
64+
lines[#lines + 1] = ""
65+
lines[#lines + 1] = " " .. tostring(started_info)
66+
lines[#lines + 1] = ""
67+
lines[#lines + 1] = "This is #283: another Neovim already holds this port, the probe"
68+
lines[#lines + 1] = "could not detect it, and create_server did not retry."
69+
end
70+
lines[#lines + 1] = ""
71+
lines[#lines + 1] = ":ReproStatus re-print status :ReproStop stop this server"
72+
return lines, (started_ok and running)
73+
end
74+
75+
local function show_banner()
76+
local lines, good = status_lines()
77+
vim.bo.modifiable = true
78+
vim.api.nvim_buf_set_lines(0, 0, -1, false, lines)
79+
vim.bo.modifiable = false
80+
vim.bo.modified = false
81+
-- Keep the echo SHORT (port only) so it stays below the hit-enter threshold;
82+
-- the full error text lives in the banner buffer above.
83+
local msg = good and ("issue283: LISTENING on port " .. tostring(claudecode.state.port))
84+
or "issue283: FAILED -- port in use (EADDRINUSE); see buffer above"
85+
vim.api.nvim_echo({ { msg, good and "MoreMsg" or "ErrorMsg" } }, false, {})
86+
end
87+
88+
vim.api.nvim_create_user_command("ReproStatus", show_banner, { desc = "Re-print #283 server status" })
89+
vim.api.nvim_create_user_command("ReproStop", function()
90+
claudecode.stop()
91+
vim.api.nvim_echo({ { "issue283: server stopped", "MoreMsg" } }, false, {})
92+
end, { desc = "Stop this instance's server (#283)" })
93+
94+
-- Populate the buffer synchronously at load time so it is already non-empty when
95+
-- startup finishes -- this suppresses Neovim's intro screen without depending on
96+
-- a deferred redraw (a hit-enter prompt from the plugin's own error log can
97+
-- otherwise block a scheduled callback). The plugin's native error message still
98+
-- appears in the message area, exactly as a real user sees it.
99+
show_banner()

lua/claudecode/server/tcp.lua

Lines changed: 115 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,58 @@ local M = {}
1313
---@field on_disconnect function Callback for client disconnections
1414
---@field on_error fun(err_msg: string) Callback for errors
1515

16-
---Find an available port by attempting to bind
16+
-- Seed Lua's PRNG exactly once per process. #282 removed the implicit seeding
17+
-- that used to happen via utils.shuffle_array (math.randomseed(os.time())), which
18+
-- left LuaJIT's fixed default seed in place -- so every fresh Neovim picked the
19+
-- *same* starting port and parallel instances always collided (#283). Mixing in a
20+
-- sub-second source avoids two instances launched in the same second seeding
21+
-- identically. hrtime is guarded because some test stubs omit it.
22+
local rng_seeded = false
23+
local function ensure_rng_seeded()
24+
if rng_seeded then
25+
return
26+
end
27+
local jitter
28+
local ok, hr = pcall(function()
29+
return vim.loop and vim.loop.hrtime and vim.loop.hrtime()
30+
end)
31+
if ok and type(hr) == "number" then
32+
jitter = hr % 1000000
33+
else
34+
jitter = math.floor((os.clock() % 1) * 1000000)
35+
end
36+
math.randomseed((os.time() * 1000000) + jitter)
37+
rng_seeded = true
38+
end
39+
40+
-- Iterate the port range exactly once, starting from a random offset and wrapping
41+
-- around. Returns a closure rather than materializing the range: the default
42+
-- 10000-65535 range is ~55k entries, and building/shuffling it on every startup
43+
-- was the cost #282 set out to remove.
44+
local function port_iterator(min_port, max_port)
45+
local port_count = max_port - min_port + 1
46+
if port_count <= 0 then
47+
return function()
48+
return nil
49+
end
50+
end
51+
ensure_rng_seeded()
52+
local start_offset = math.random(port_count) - 1
53+
local checked = -1
54+
return function()
55+
checked = checked + 1
56+
if checked >= port_count then
57+
return nil
58+
end
59+
return min_port + ((start_offset + checked) % port_count)
60+
end
61+
end
62+
63+
---Find an available port using a best-effort bind probe.
64+
---NOTE: this is only a pre-filter. A successful throwaway bind does NOT guarantee
65+
---the port is free: libuv's bind() defers EADDRINUSE to listen()/connect(), so a
66+
---port another process is actively listening on still passes this probe. The
67+
---authoritative check is create_server's bind+listen with retry.
1768
---@param min_port number Minimum port to try
1869
---@param max_port number Maximum port to try
1970
---@return number|nil port Available port number, or nil if none found
@@ -25,14 +76,7 @@ function M.find_available_port(min_port, max_port)
2576
return nil
2677
end
2778

28-
local port_count = max_port - min_port + 1
29-
local start_offset = math.random(port_count) - 1
30-
31-
-- Pick a random starting point, then scan the range once. This keeps the
32-
-- selection spread across the configured range without building and shuffling
33-
-- a 55k-entry table for the default 10000-65535 range on every startup.
34-
for checked = 0, port_count - 1 do
35-
local port = min_port + ((start_offset + checked) % port_count)
79+
for port in port_iterator(min_port, max_port) do
3680
local test_server = vim.loop.new_tcp()
3781
if test_server then
3882
local success = test_server:bind("127.0.0.1", port)
@@ -47,27 +91,58 @@ function M.find_available_port(min_port, max_port)
4791
return nil
4892
end
4993

94+
---Bind AND listen on a single fresh TCP handle, returning that same handle.
95+
---Binding then listening on one socket (instead of probing a throwaway socket and
96+
---re-binding) is what makes a busy port detectable: libuv's bind() defers
97+
---EADDRINUSE to listen(), so the listen() call is the real test, and keeping the
98+
---handle we listened on removes the probe/rebind TOCTOU window.
99+
---@param server TCPServer The server object whose connection handler to wire up
100+
---@param port number Port to bind and listen on
101+
---@return table|nil handle The bound+listening TCP handle, or nil on failure
102+
---@return string|nil error Error message if failed
103+
function M._bind_and_listen(server, port)
104+
local handle = vim.loop.new_tcp()
105+
if not handle then
106+
return nil, "Failed to create TCP server"
107+
end
108+
109+
local bind_success, bind_err = handle:bind("127.0.0.1", port)
110+
if not bind_success then
111+
handle:close()
112+
return nil, "Failed to bind to port " .. port .. ": " .. (bind_err or "unknown error")
113+
end
114+
115+
local listen_success, listen_err = handle:listen(128, function(err)
116+
if err then
117+
server.on_error("Listen error: " .. err)
118+
return
119+
end
120+
121+
M._handle_new_connection(server)
122+
end)
123+
124+
if not listen_success then
125+
handle:close()
126+
return nil, "Failed to listen on port " .. port .. ": " .. (listen_err or "unknown error")
127+
end
128+
129+
return handle, nil
130+
end
131+
50132
---Create and start a TCP server
51133
---@param config ClaudeCodeConfig Server configuration
52134
---@param callbacks table Callback functions
53135
---@param auth_token string|nil Authentication token for validating connections
54136
---@return TCPServer|nil server The server object, or nil on error
55137
---@return string|nil error Error message if failed
56138
function M.create_server(config, callbacks, auth_token)
57-
local port = M.find_available_port(config.port_range.min, config.port_range.max)
58-
if not port then
59-
return nil, "No available ports in range " .. config.port_range.min .. "-" .. config.port_range.max
60-
end
61-
62-
local tcp_server = vim.loop.new_tcp()
63-
if not tcp_server then
64-
return nil, "Failed to create TCP server"
65-
end
139+
local min_port = config.port_range.min
140+
local max_port = config.port_range.max
66141

67-
-- Create server object
142+
-- Build the server object up front so the listen callback can close over it.
68143
local server = {
69-
server = tcp_server,
70-
port = port,
144+
server = nil,
145+
port = nil,
71146
auth_token = auth_token,
72147
clients = {},
73148
on_message = callbacks.on_message or function() end,
@@ -76,28 +151,28 @@ function M.create_server(config, callbacks, auth_token)
76151
on_error = callbacks.on_error or function() end,
77152
}
78153

79-
local bind_success, bind_err = tcp_server:bind("127.0.0.1", port)
80-
if not bind_success then
81-
tcp_server:close()
82-
return nil, "Failed to bind to port " .. port .. ": " .. (bind_err or "unknown error")
83-
end
84-
85-
-- Start listening
86-
local listen_success, listen_err = tcp_server:listen(128, function(err)
87-
if err then
88-
callbacks.on_error("Listen error: " .. err)
89-
return
154+
-- Walk candidate ports and bind+listen on each until one succeeds. Retrying
155+
-- here (rather than committing to a single pre-probed port) is what fixes #283:
156+
-- when several Neovim instances race for the same port, the losers just advance
157+
-- to the next candidate instead of giving up with EADDRINUSE.
158+
local last_err
159+
local tried_any = false
160+
for port in port_iterator(min_port, max_port) do
161+
tried_any = true
162+
local handle, err = M._bind_and_listen(server, port)
163+
if handle then
164+
server.server = handle
165+
server.port = port
166+
return server, nil
90167
end
91-
92-
M._handle_new_connection(server)
93-
end)
94-
95-
if not listen_success then
96-
tcp_server:close()
97-
return nil, "Failed to listen on port " .. port .. ": " .. (listen_err or "unknown error")
168+
last_err = err
98169
end
99170

100-
return server, nil
171+
if not tried_any then
172+
return nil, "No available ports in range " .. min_port .. "-" .. max_port
173+
end
174+
return nil,
175+
"Failed to bind to any port in range " .. min_port .. "-" .. max_port .. ": " .. (last_err or "unknown error")
101176
end
102177

103178
---Handle a new client connection

lua/claudecode/server/utils.lua

Lines changed: 0 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -386,25 +386,6 @@ function M.apply_mask(data, mask)
386386
return table.concat(result)
387387
end
388388

389-
local rng_seeded = false
390-
391-
---Shuffle an array in place using Fisher-Yates algorithm
392-
---@param tbl table The array to shuffle
393-
function M.shuffle_array(tbl)
394-
-- Seed the PRNG once per process so port selection order varies across editor
395-
-- starts. Seeding lazily on first use (rather than on every call, as a prior
396-
-- version did with os.time()) avoids identical orderings within the same
397-
-- second while still giving each process a distinct sequence.
398-
if not rng_seeded then
399-
math.randomseed(os.time())
400-
rng_seeded = true
401-
end
402-
for i = #tbl, 2, -1 do
403-
local j = math.random(i)
404-
tbl[i], tbl[j] = tbl[j], tbl[i]
405-
end
406-
end
407-
408389
---Compare two strings in constant time relative to their length.
409390
---Returns false immediately on a length mismatch; otherwise every byte is
410391
---examined so total work does not depend on the matching-prefix length.

0 commit comments

Comments
 (0)