|
| 1 | +# Systematic Troubleshooting: Port 26654 (IPLD Resolver) Not Listening |
| 2 | + |
| 3 | +## Diagnostic Results (from your run) |
| 4 | + |
| 5 | +| Check | Result | |
| 6 | +|-------|--------| |
| 7 | +| Config `listen_addr` | ✓ `/ip4/0.0.0.0/tcp/26654` | |
| 8 | +| Config `subnet_id` | ✓ `/r314159/t410fjzsmxroshdmvdq5bg4zwqxx5lznwxaga4h7zgqa` | |
| 9 | +| Config `[resolver] enabled` | ✓ `true` | |
| 10 | +| Start script | ✓ Has correct env vars | |
| 11 | +| Manual `env FM_... ipc-cli node start` | ✗ Port 26654 still not listening | |
| 12 | +| Logs: "IPLD Resolver disabled" or "starting..." | ✗ **Neither appears** | |
| 13 | +| Logs: "snapshots disabled" at node.rs | Line **142** (remote) vs **243** (current code) | |
| 14 | + |
| 15 | +**ROOT CAUSE:** The remote binary was built from a different branch (e.g. f3-lifecycle). Line numbers don't match current code; the resolver block may not exist or is structured differently in that binary. The config and env vars are correct—the binary simply doesn't have the resolver code. |
| 16 | + |
| 17 | +--- |
| 18 | + |
| 19 | +## Fix |
| 20 | + |
| 21 | +Rebuild the binary on validators from the branch that has the resolver code: |
| 22 | + |
| 23 | +```bash |
| 24 | +./ipc-manager update-binaries --branch feature/subnet-bootstrapping |
| 25 | +./ipc-manager restart --yes |
| 26 | +``` |
| 27 | + |
| 28 | +Then verify: |
| 29 | + |
| 30 | +```bash |
| 31 | +./ipc-manager check |
| 32 | +ssh philip@34.16.93.183 "ss -tuln | grep 26654" |
| 33 | +``` |
| 34 | + |
| 35 | +--- |
| 36 | + |
| 37 | +## Root Cause Logic (from fendermint) |
| 38 | + |
| 39 | +The resolver starts only when `resolver_enabled()` returns true: |
| 40 | +```rust |
| 41 | +// fendermint/app/settings/src/lib.rs:523-527 |
| 42 | +pub fn resolver_enabled(&self) -> bool { |
| 43 | + !self.resolver.connection.listen_addr.is_empty() |
| 44 | + && self.ipc.subnet_id != *ipc_api::subnet_id::UNDEF |
| 45 | +} |
| 46 | +``` |
| 47 | + |
| 48 | +**Both conditions must be true:** |
| 49 | +1. `resolver.connection.listen_addr` must be non-empty (e.g. `/ip4/0.0.0.0/tcp/26654`) |
| 50 | +2. `ipc.subnet_id` must not be UNDEF (root: 0, children: []) |
| 51 | + |
| 52 | +If disabled, logs show: `"IPLD Resolver disabled."` |
| 53 | +If enabled, logs show: `"starting the IPLD Resolver Service..."` |
| 54 | + |
| 55 | +--- |
| 56 | + |
| 57 | +## Step 1: Check Config on Remote |
| 58 | + |
| 59 | +SSH to validator-1 and inspect the fendermint config: |
| 60 | + |
| 61 | +```bash |
| 62 | +ssh philip@34.16.93.183 "sudo -u ipc cat /home/ipc/.ipc-node/fendermint/config/default.toml" |
| 63 | +``` |
| 64 | + |
| 65 | +**Look for:** |
| 66 | +- `[resolver]` or `[resolver.connection]` section |
| 67 | +- `listen_addr = "/ip4/0.0.0.0/tcp/26654"` (or similar) |
| 68 | +- `[ipc]` section with `subnet_id = "/r314159/t410fjzsmxroshdmvdq5bg4zwqxx5lznwxaga4h7zgqa"` |
| 69 | + |
| 70 | +**Grep for key sections:** |
| 71 | +```bash |
| 72 | +ssh philip@34.16.93.183 "sudo -u ipc grep -A5 '\[resolver\]' /home/ipc/.ipc-node/fendermint/config/default.toml" |
| 73 | +ssh philip@34.16.93.183 "sudo -u ipc grep -A2 '\[ipc\]' /home/ipc/.ipc-node/fendermint/config/default.toml" |
| 74 | +ssh philip@34.16.93.183 "sudo -u ipc grep listen_addr /home/ipc/.ipc-node/fendermint/config/default.toml" |
| 75 | +``` |
| 76 | + |
| 77 | +--- |
| 78 | + |
| 79 | +## Step 2: Check Logs for Resolver Decision (CRITICAL) |
| 80 | + |
| 81 | +```bash |
| 82 | +# Resolver decision |
| 83 | +ssh philip@34.16.93.183 "sudo -u ipc grep -E 'IPLD Resolver|resolver' /home/ipc/.ipc-node/logs/*.log 2>/dev/null | tail -20" |
| 84 | + |
| 85 | +# Also check startup logs |
| 86 | +ssh philip@34.16.93.183 "sudo -u ipc tail -100 /home/ipc/.ipc-node/logs/*.app.log 2>/dev/null | grep -E 'Resolver|resolver|listen|26654'" |
| 87 | +``` |
| 88 | + |
| 89 | +**Interpretation:** |
| 90 | +- `"IPLD Resolver disabled."` → resolver_enabled() returned false (listen_addr empty and/or subnet_id UNDEF) |
| 91 | +- `"starting the IPLD Resolver Service..."` → resolver started (port issue may be elsewhere) |
| 92 | + |
| 93 | +**If logs show "disabled":** The binary is loading config but resolver_enabled() is false. Possible causes: |
| 94 | +- `validator.toml` or `local.toml` overrides and clears listen_addr |
| 95 | +- Config parsing bug (e.g. Multiaddr type) |
| 96 | +- Different binary (f3-lifecycle) with different logic |
| 97 | + |
| 98 | +**If logs show "starting...":** Resolver runs but port doesn't bind. Check for "IPLD Resolver Service failed" or bind errors. |
| 99 | + |
| 100 | +--- |
| 101 | + |
| 102 | +## Step 3: Check Start Script (What Actually Runs) |
| 103 | + |
| 104 | +```bash |
| 105 | +ssh philip@34.16.93.183 "sudo -u ipc cat /home/ipc/.ipc-node/start-node.sh 2>/dev/null || echo 'File not found'" |
| 106 | +``` |
| 107 | + |
| 108 | +**Verify:** Does it contain `export FM_RESOLVER__CONNECTION__LISTEN_ADDR` and `export FM_IPC__SUBNET_ID`? |
| 109 | + |
| 110 | +--- |
| 111 | + |
| 112 | +## Step 4: Check How Node Is Currently Running |
| 113 | + |
| 114 | +```bash |
| 115 | +ssh philip@34.16.93.183 "ps aux | grep 'ipc-cli node start' | grep -v grep" |
| 116 | +``` |
| 117 | + |
| 118 | +**Check:** Is the process started by start-node.sh or by a direct nohup command? (env vars only apply if set before the process starts) |
| 119 | + |
| 120 | +--- |
| 121 | + |
| 122 | +## Step 5: Manual Test – Run With Explicit Env Vars |
| 123 | + |
| 124 | +Stop the node, then run manually with env vars to isolate whether config or env is the issue: |
| 125 | + |
| 126 | +```bash |
| 127 | +# On validator-1 (34.16.93.183) |
| 128 | +ssh philip@34.16.93.183 |
| 129 | + |
| 130 | +# Stop existing node |
| 131 | +sudo pkill -f "ipc-cli node start" || true |
| 132 | +sleep 3 |
| 133 | + |
| 134 | +# Run as ipc user with explicit env vars (no wrapper script) |
| 135 | +sudo -u ipc env \ |
| 136 | + FM_RESOLVER__CONNECTION__LISTEN_ADDR=/ip4/0.0.0.0/tcp/26654 \ |
| 137 | + FM_IPC__SUBNET_ID=/r314159/t410fjzsmxroshdmvdq5bg4zwqxx5lznwxaga4h7zgqa \ |
| 138 | + /home/ipc/ipc/target/release/ipc-cli node start --home /home/ipc/.ipc-node |
| 139 | + |
| 140 | +# Let it run 15-20 seconds, then Ctrl+C to stop |
| 141 | +# In another terminal, check port: |
| 142 | +# ssh philip@34.16.93.183 "ss -tuln | grep 26654" |
| 143 | +``` |
| 144 | + |
| 145 | +**If port 26654 appears:** Env vars work; the wrapper script or how it's invoked is the problem. |
| 146 | +**If port 26654 does NOT appear:** Config or binary (e.g. f3-lifecycle branch) may disable the resolver. |
| 147 | + |
| 148 | +--- |
| 149 | + |
| 150 | +## Step 6: Check for Override Configs |
| 151 | + |
| 152 | +Config load order: default.toml → validator.toml → local.toml → env. Later overrides can clear earlier values. |
| 153 | + |
| 154 | +```bash |
| 155 | +ssh philip@34.16.93.183 "sudo -u ipc ls -la /home/ipc/.ipc-node/fendermint/config/" |
| 156 | +ssh philip@34.16.93.183 "sudo -u ipc cat /home/ipc/.ipc-node/fendermint/config/validator.toml 2>/dev/null || echo 'No validator.toml'" |
| 157 | +ssh philip@34.16.93.183 "sudo -u ipc cat /home/ipc/.ipc-node/fendermint/config/local.toml 2>/dev/null || echo 'No local.toml'" |
| 158 | +``` |
| 159 | + |
| 160 | +## Step 7: Check Binary / Branch |
| 161 | + |
| 162 | +```bash |
| 163 | +# Fix safe.directory first, then check branch |
| 164 | +ssh philip@34.16.93.183 "sudo -u ipc git -C /home/ipc/ipc config --global --add safe.directory /home/ipc/ipc 2>/dev/null; sudo -u ipc bash -c 'cd /home/ipc/ipc && git branch -v && git log -1 --oneline'" |
| 165 | +``` |
| 166 | + |
| 167 | +**Note:** If validators run `f3-lifecycle` (or another branch), resolver logic may differ from `feature/subnet-bootstrapping`. |
| 168 | + |
| 169 | +--- |
| 170 | + |
| 171 | +## Step 8: Check Default Config Template |
| 172 | + |
| 173 | +If the node was initialized with a different node-init, the default.toml may have been generated without resolver settings: |
| 174 | + |
| 175 | +```bash |
| 176 | +ssh philip@34.16.93.183 "sudo -u ipc head -100 /home/ipc/.ipc-node/fendermint/config/default.toml" |
| 177 | +``` |
| 178 | + |
| 179 | +--- |
| 180 | + |
| 181 | +## Summary: Decision Tree |
| 182 | + |
| 183 | +| Config has listen_addr? | Config has subnet_id? | Log says "disabled"? | Likely cause | |
| 184 | +|-------------------------|----------------------|----------------------|--------------| |
| 185 | +| No / empty | - | Yes | Config missing resolver.connection.listen_addr | |
| 186 | +| Yes | No / UNDEF | Yes | Config missing ipc.subnet_id | |
| 187 | +| Yes | Yes | Yes | Env override not applied (script/quoting) or binary differs | |
| 188 | +| Yes | Yes | No ("starting...") | Resolver starts but port bind fails (e.g. permission, conflict) | |
| 189 | + |
| 190 | +--- |
| 191 | + |
| 192 | +## After Finding Root Cause |
| 193 | + |
| 194 | +1. **If config is wrong:** Fix default.toml (or re-run node init with correct node-init.yml) |
| 195 | +2. **If env vars not applied:** Fix start script invocation (wrapper script, quoting, or use systemd with Environment=) |
| 196 | +3. **If binary/branch differs:** Build from feature/subnet-bootstrapping or adapt to that branch's config |
0 commit comments