Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
8dcebcb
feat(ensrainbow): implement background database bootstrapping and new…
djstrong Apr 20, 2026
3c23870
Merge branch 'main' into 1610-start-ensrainbow-server-immediately-and…
djstrong Apr 20, 2026
5ec830f
feat(ensrainbow): enhance readiness checks and API response handling
djstrong Apr 20, 2026
243667f
feat(ensrainbow): implement graceful shutdown during bootstrap
djstrong Apr 20, 2026
1f9d822
refactor(ensrainbow): implement closeHttpServer utility for graceful …
djstrong Apr 20, 2026
d3f9c0d
fix(ensrainbow): improve database handling during bootstrap failure
djstrong Apr 20, 2026
74cf754
fix(ensrainbow): improve database closure handling in the close method
djstrong Apr 20, 2026
89ac55f
fix(ensrainbow): enhance shutdown handling and error management
djstrong Apr 20, 2026
a438a6a
fix(ensrainbow): enhance signal handling and database extraction cleanup
djstrong Apr 24, 2026
ac844ce
fix(ensrainbow): improve logging and error handling during database o…
djstrong Apr 24, 2026
2b498f9
fix(ensrainbow): refine readiness checks and documentation updates
djstrong Apr 24, 2026
cac82ed
fix(ensrainbow): enhance public config readiness handling
djstrong Apr 24, 2026
83a4453
Merge branch 'main' into 1610-start-ensrainbow-server-immediately-and…
djstrong Apr 24, 2026
66be6f7
Merge branch 'main' into 1610-start-ensrainbow-server-immediately-and…
djstrong Apr 29, 2026
a9c46de
refactor: enhance entrypoint command to include DB config in bootstra…
djstrong Apr 29, 2026
762036a
feat: introduce new /ready endpoint and enhance Docker entrypoint for…
djstrong Apr 29, 2026
c181652
refactor: remove unused comments and streamline code in entrypoint co…
djstrong Apr 29, 2026
f05bfe7
refactor: update test setup to use temporary directories for entrypoi…
djstrong Apr 29, 2026
1453be2
feat: implement structured error handling in EnsRainbowApiClient and …
djstrong Apr 29, 2026
6410edd
Merge branch 'main' into 1610-start-ensrainbow-server-immediately-and…
djstrong Apr 29, 2026
73848f8
refactor: simplify ENSRainbow readiness check by removing retry logic…
djstrong Apr 29, 2026
816bba5
refactor: replace ErrorCode.ServiceUnavailable with HTTP status code …
djstrong Apr 29, 2026
8dbe6df
Merge branch 'main' into 1610-start-ensrainbow-server-immediately-and…
djstrong Apr 29, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions .changeset/ready-endpoint-bg-bootstrap.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
---
"ensrainbow": minor
"@ensnode/ensrainbow-sdk": minor
"ensindexer": patch
---

ENSRainbow now starts its HTTP server immediately and downloads/validates its database in the background, instead of blocking container startup behind a netcat placeholder.

- **New `GET /ready` endpoint**: returns `200 { status: "ok" }` once the database is attached, or `503 Service Unavailable` while ENSRainbow is still bootstrapping. `/health` is now a pure liveness probe that succeeds as soon as the HTTP server is listening.
- **503 responses for API routes during bootstrap**: `/v1/heal`, `/v1/labels/count`, and `/v1/config` return a structured `ServiceUnavailableError` (`errorCode: 503`) until the database is ready.
- **New Docker entrypoint**: the container now runs `pnpm run entrypoint` from the `apps/ensrainbow` working directory (implemented in Node via `tsx src/cli.ts entrypoint`), which replaces `scripts/entrypoint.sh` and the `netcat` workaround.
- **Graceful shutdown during bootstrap**: SIGTERM/SIGINT now abort an in-flight bootstrap. Spawned `download`/`tar` child processes are terminated (SIGTERM → SIGKILL after a 5s grace period) and any partially-opened LevelDB handle is closed before the HTTP server and DB-backed server shut down, so the container exits promptly without leaking child processes or LevelDB locks.
- **SDK client**: added `EnsRainbowApiClient.ready()`, plus `EnsRainbow.ReadyResponse` / `EnsRainbow.ServiceUnavailableError` types and `ErrorCode.ServiceUnavailable`. The client now throws a typed `EnsRainbowHttpError` (with structured `status` / `statusText` properties) from `ready()`, `health()`, and `config()` whenever the service responds with a non-2xx HTTP status, so callers can branch their retry/abort logic on the status without parsing message strings.
- **ENSIndexer**: `waitForEnsRainbowToBeReady` now polls `/ready` (via `ensRainbowClient.ready()`) instead of `/health`, so it correctly waits for the database to finish bootstrapping. It also aborts retries immediately on non-503 HTTP responses (e.g. `404` from a misconfigured `ENSRAINBOW_URL`, `500` from a broken instance) instead of blocking startup for ~1h, while still retrying on `503 Service Unavailable` and on transient network errors.

**Migration**: if you previously polled `GET /health` to gate traffic on database readiness, switch to `GET /ready` (or `client.ready()`). `/health` is still available and still returns `200`, but it now indicates liveness only.
57 changes: 57 additions & 0 deletions apps/ensindexer/src/lib/ensrainbow/singleton.test.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
import { describe, expect, it } from "vitest";

import "@/lib/__test__/mockLogger";

import { setupConfigMock } from "@/lib/__test__/mockConfig";

setupConfigMock();

import { EnsRainbowHttpError } from "@ensnode/ensrainbow-sdk";

import { shouldRetryReadinessCheck } from "./singleton";

/**
* `shouldRetryReadinessCheck` is the heart of the readiness-check retry policy used by
* `waitForEnsRainbowToBeReady`. The integration with `p-retry` is a thin wiring (passing this
* predicate into `pRetry({ shouldRetry })`), so we exhaustively unit-test the predicate here
* rather than running the full retry loop with fake timers (which is fragile against `p-retry`
* internals and module-cache resets).
*/
describe("shouldRetryReadinessCheck", () => {
it("retries on EnsRainbowHttpError with status 503 (still bootstrapping)", () => {
const error = new EnsRainbowHttpError("not ready", 503, "Service Unavailable");
expect(shouldRetryReadinessCheck(error)).toBe(true);
});

it("aborts on EnsRainbowHttpError with status 404 (likely misconfigured base URL)", () => {
const error = new EnsRainbowHttpError("not found", 404, "Not Found");
expect(shouldRetryReadinessCheck(error)).toBe(false);
});

it("aborts on EnsRainbowHttpError with status 500 (server error)", () => {
const error = new EnsRainbowHttpError("boom", 500, "Internal Server Error");
expect(shouldRetryReadinessCheck(error)).toBe(false);
});

it("aborts on EnsRainbowHttpError with status 502 (bad gateway)", () => {
const error = new EnsRainbowHttpError("bad gateway", 502, "Bad Gateway");
expect(shouldRetryReadinessCheck(error)).toBe(false);
});

it("aborts on EnsRainbowHttpError with status 401 (auth misconfiguration)", () => {
const error = new EnsRainbowHttpError("unauthorized", 401, "Unauthorized");
expect(shouldRetryReadinessCheck(error)).toBe(false);
});

it("retries on plain Error (network/DNS/ECONNREFUSED), since these are transient during cold start", () => {
expect(shouldRetryReadinessCheck(new TypeError("fetch failed"))).toBe(true);
expect(shouldRetryReadinessCheck(new Error("connect ECONNREFUSED 127.0.0.1:3223"))).toBe(true);
});

it("retries on non-Error rejection values (defensive fallback)", () => {
expect(shouldRetryReadinessCheck("string error")).toBe(true);
expect(shouldRetryReadinessCheck(undefined)).toBe(true);
expect(shouldRetryReadinessCheck(null)).toBe(true);
expect(shouldRetryReadinessCheck({ message: "weird" })).toBe(true);
});
});
96 changes: 66 additions & 30 deletions apps/ensindexer/src/lib/ensrainbow/singleton.ts
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ import config from "@/config";
import { secondsToMilliseconds } from "date-fns";
import pRetry from "p-retry";

import { EnsRainbowApiClient } from "@ensnode/ensrainbow-sdk";
import { EnsRainbowApiClient, EnsRainbowHttpError } from "@ensnode/ensrainbow-sdk";

import { logger } from "@/lib/logger";

Expand Down Expand Up @@ -96,17 +96,44 @@ export function waitForEnsRainbowToBeHealthy(): Promise<void> {
*/
let waitForEnsRainbowToBeReadyPromise: Promise<void> | undefined;

/**
* Determine whether a readiness-check failure should be retried.
*
* Retry policy:
* - HTTP 503 (`ServiceUnavailable`) — ENSRainbow is still bootstrapping. Retryable.
* - Any other `EnsRainbowHttpError` (e.g. 404, 500) — almost certainly indicates a
* misconfigured `ENSRAINBOW_URL`, a broken instance, or routing/ingress issue. These do
* not fix themselves over the course of an hour, so we abort fast to surface the
* configuration/outage problem instead of stalling startup for ~60 minutes.
* - Anything else (network errors like `ECONNREFUSED`/DNS failures, JSON parse errors,
* etc.) — retryable. These are common during cold start, before the ENSRainbow HTTP
* server has bound its port.
*
* Exported for testing.
*/
export function shouldRetryReadinessCheck(error: unknown): boolean {
if (error instanceof EnsRainbowHttpError) {
return error.status === 503;
}
return true;
}

/**
* Wait for ENSRainbow to be ready
*
* Blocks execution until the ENSRainbow instance is ready to serve requests.
*
* Note: It may take 30+ minutes for the ENSRainbow instance to become ready in
* a cold start scenario. We use retries with a fixed interval between attempts
* for the ENSRainbow health check to allow for ample time for ENSRainbow to
* become ready.
* for the ENSRainbow readiness check to allow for ample time for bootstrap to
* complete.
*
* Non-503 HTTP failures (e.g. 404 misrouting, 500 server errors) abort retries
* immediately via {@link shouldRetryReadinessCheck}, so configuration/outage
* problems surface quickly instead of being masked by an hour of retries.
*
* @throws When ENSRainbow fails to become ready after all configured retry attempts.
* @throws When ENSRainbow fails to become ready after all configured retry attempts,
* or as soon as a non-retryable error (e.g. non-503 HTTP status) is encountered.
* This error will trigger termination of the ENSIndexer process.
*/
export function waitForEnsRainbowToBeReady(): Promise<void> {
Expand All @@ -119,45 +146,54 @@ export function waitForEnsRainbowToBeReady(): Promise<void> {
ensRainbowInstance: ensRainbowUrl.href,
});

waitForEnsRainbowToBeReadyPromise = pRetry(
// TODO: replace this count check with an explicit `ready()` method in ENSRainbow Client.
async () => {
const { count } = await ensRainbowClient.count();

if (count === 0) {
throw new Error("ENSRainbow instance is not ready yet.");
}
},
{
retries: 60, // This allows for a total of over 1 hour of retries with 1 minute between attempts.
minTimeout: secondsToMilliseconds(60),
maxTimeout: secondsToMilliseconds(60),
onFailedAttempt: ({ attemptNumber, retriesLeft }) => {
logger.warn({
msg: `ENSRainbow readiness check failed`,
attempt: attemptNumber,
retriesLeft,
ensRainbowInstance: ensRainbowUrl.href,
advice: `This might be due to ENSRainbow having a cold start, which can take 30+ minutes.`,
});
},
waitForEnsRainbowToBeReadyPromise = pRetry(async () => ensRainbowClient.ready(), {
retries: 60, // This allows for a total of over 1 hour of retries with 1 minute between attempts.
minTimeout: secondsToMilliseconds(60),
maxTimeout: secondsToMilliseconds(60),
Comment thread
djstrong marked this conversation as resolved.
shouldRetry: ({ error }) => shouldRetryReadinessCheck(error),
Comment thread
djstrong marked this conversation as resolved.
onFailedAttempt: ({ error, attemptNumber, retriesLeft }) => {
const willAbort = !shouldRetryReadinessCheck(error);
const isHttpError = error instanceof EnsRainbowHttpError;
logger.warn({
msg: willAbort
? `ENSRainbow readiness check failed with a non-retryable error; aborting retries`
: `ENSRainbow readiness check failed`,
attempt: attemptNumber,
retriesLeft,
// Always surface the error on abort or final attempt; otherwise keep logs concise.
error: willAbort || retriesLeft === 0 ? error : undefined,
httpStatus: isHttpError ? error.status : undefined,
ensRainbowInstance: ensRainbowUrl.href,
advice: willAbort
? `This usually indicates a misconfigured ENSRAINBOW_URL, a broken ENSRainbow instance, or an ingress/routing issue. Verify the URL points at a healthy ENSRainbow server.`
: `This might be due to ENSRainbow still bootstrapping its database, which can take 30+ minutes during a cold start.`,
});
Comment thread
djstrong marked this conversation as resolved.
},
)
})
.then(() => {
logger.info({
msg: `ENSRainbow instance is ready`,
ensRainbowInstance: ensRainbowUrl.href,
});
})
.catch((error) => {
const errorMessage = error instanceof Error ? error.message : "Unknown error";
const isHttpError = error instanceof EnsRainbowHttpError;
const isAbort = isHttpError && error.status !== 503;

logger.error({
msg: `ENSRainbow readiness check failed after multiple attempts`,
msg: isAbort
? `ENSRainbow readiness check aborted due to non-retryable HTTP error`
: `ENSRainbow readiness check failed after multiple attempts`,
error,
httpStatus: isHttpError ? error.status : undefined,
ensRainbowInstance: ensRainbowUrl.href,
});

// Throw the error to terminate the ENSIndexer process due to the failed health check of a critical dependency
throw error;
// Throw the error to terminate the ENSIndexer process due to the failed readiness check of a critical dependency
throw new Error(errorMessage, {
cause: error instanceof Error ? error : undefined,
});
});

return waitForEnsRainbowToBeReadyPromise;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -230,9 +230,49 @@ describe("PublicConfigBuilder", () => {
expect(result).toBe(customConfig);
expect(result.isSubgraphCompatible).toBe(false);
});

it("awaits readiness before fetching ENSRainbow config", async () => {
const callOrder: string[] = [];
const ensRainbowClientMock = {
config: vi.fn().mockImplementation(async () => {
callOrder.push("config");
return mockEnsRainbowConfig;
}),
} as unknown as EnsRainbow.ApiClient;
const waitForReady = vi.fn().mockImplementation(async () => {
callOrder.push("wait");
});

setupStandardMocks();
const mockPublicConfig = createMockPublicConfig();
vi.mocked(validateEnsIndexerPublicConfig).mockReturnValue(mockPublicConfig);

const builder = new PublicConfigBuilder(ensRainbowClientMock, waitForReady);
const result = await builder.getPublicConfig();

expect(waitForReady).toHaveBeenCalledTimes(1);
expect(ensRainbowClientMock.config).toHaveBeenCalledTimes(1);
expect(callOrder).toEqual(["wait", "config"]);
expect(result).toBe(mockPublicConfig);
});
});

describe("getPublicConfig() - error handling", () => {
it("throws when readiness check fails and does not call config()", async () => {
const readinessError = new Error("ENSRainbow not ready");
const ensRainbowClientMock = {
config: vi.fn(),
} as unknown as EnsRainbow.ApiClient;
const waitForReady = vi.fn().mockRejectedValue(readinessError);

const builder = new PublicConfigBuilder(ensRainbowClientMock, waitForReady);

await expect(builder.getPublicConfig()).rejects.toThrow(readinessError);
expect(waitForReady).toHaveBeenCalledTimes(1);
expect(ensRainbowClientMock.config).not.toHaveBeenCalled();
expect(validateEnsIndexerPublicConfig).not.toHaveBeenCalled();
});

it("throws when ENSRainbow client config() fails", async () => {
// Arrange
const ensRainbowError = new Error("ENSRainbow service unavailable");
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,15 @@ export class PublicConfigBuilder {
*/
private ensRainbowClient: EnsRainbow.ApiClient;

/**
* One-time async readiness hook awaited before the first
* `ensRainbowClient.config()` invocation, so callers don't race ENSRainbow's
* background bootstrap. Defaults to a no-op for callers that don't need to
* gate on readiness (e.g. tests or environments where ENSRainbow is already
* known to be ready).
*/
private waitForEnsRainbowReady: () => Promise<void>;
Comment thread
coderabbitai[bot] marked this conversation as resolved.

/**
* Immutable ENSIndexer Public Config
*
Expand All @@ -29,9 +38,15 @@ export class PublicConfigBuilder {

/**
* @param ensRainbowClient ENSRainbow Client instance used to fetch ENSRainbow Public Config
* @param waitForEnsRainbowReady One-time async readiness hook awaited before the first
* `ensRainbowClient.config()` invocation. Defaults to a no-op.
*/
constructor(ensRainbowClient: EnsRainbow.ApiClient) {
constructor(
ensRainbowClient: EnsRainbow.ApiClient,
waitForEnsRainbowReady: () => Promise<void> = async () => {},
) {
this.ensRainbowClient = ensRainbowClient;
this.waitForEnsRainbowReady = waitForEnsRainbowReady;
}

/**
Expand All @@ -47,6 +62,8 @@ export class PublicConfigBuilder {
*/
async getPublicConfig(): Promise<EnsIndexerPublicConfig> {
if (typeof this.immutablePublicConfig === "undefined") {
await this.waitForEnsRainbowReady();

const [versionInfo, ensRainbowPublicConfig] = await Promise.all([
this.getEnsIndexerVersionInfo(),
// TODO: remove dependency on ENSRainbow by dropping `ensRainbowPublicConfig` from `EnsIndexerPublicConfig`.
Expand Down
7 changes: 5 additions & 2 deletions apps/ensindexer/src/lib/public-config-builder/singleton.ts
Original file line number Diff line number Diff line change
@@ -1,4 +1,7 @@
import { ensRainbowClient } from "@/lib/ensrainbow/singleton";
import { ensRainbowClient, waitForEnsRainbowToBeReady } from "@/lib/ensrainbow/singleton";
import { PublicConfigBuilder } from "@/lib/public-config-builder/public-config-builder";

export const publicConfigBuilder = new PublicConfigBuilder(ensRainbowClient);
export const publicConfigBuilder = new PublicConfigBuilder(
ensRainbowClient,
waitForEnsRainbowToBeReady,
);
26 changes: 12 additions & 14 deletions apps/ensrainbow/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,14 +1,10 @@
# Runtime image for ENSRainbow
FROM node:24-slim AS runtime

# Install only essential system dependencies for runtime
# netcat-openbsd: Used during container initialization to keep the service port open
# while the database is being downloaded and validated (which can take up to 20 minutes).
# Without a listener on the port during this phase, Render's health checks fail and orchestration
# systems may mark the container as unhealthy or restart it prematurely. See scripts/entrypoint.sh for implementation details.
# Note: The netcat listener only keeps the port open and accepts connections; it does not respond
# to HTTP requests, so it will not work with Docker HEALTHCHECK commands that expect HTTP responses. See https://github.com/namehash/ensnode/issues/1610
RUN apt-get update && apt-get install -y wget tar netcat-openbsd && rm -rf /var/lib/apt/lists/*
# Install only essential system dependencies for runtime.
# `wget` and `tar` are required by scripts/download-prebuilt-database.sh, which the in-process
# entrypoint spawns to fetch the pre-built database archive.
RUN apt-get update && apt-get install -y wget tar && rm -rf /var/lib/apt/lists/*

# Set up pnpm
ENV PNPM_HOME="/pnpm"
Expand All @@ -34,16 +30,18 @@ COPY apps/ensrainbow/tsconfig.json apps/ensrainbow/
COPY apps/ensrainbow/vitest.config.ts apps/ensrainbow/

# Make scripts executable
RUN chmod +x /app/apps/ensrainbow/scripts/entrypoint.sh
RUN chmod +x /app/apps/ensrainbow/scripts/download-prebuilt-database.sh

# Set environment variables
ENV NODE_ENV=production
# PORT will be used by entrypoint.sh, defaulting to 3223 if not set at runtime
# DB_SCHEMA_VERSION, LABEL_SET_ID, LABEL_SET_VERSION must be provided at runtime to the entrypoint
# PORT is consumed by the entrypoint command, defaulting to 3223 if not set at runtime.
# DB_SCHEMA_VERSION, LABEL_SET_ID, LABEL_SET_VERSION must be provided at runtime to the entrypoint.

# Default port, can be overridden by PORT env var for the entrypoint/serve command
# Default port, can be overridden by PORT env var for the entrypoint command
EXPOSE 3223

# Set the entrypoint
ENTRYPOINT ["/app/apps/ensrainbow/scripts/entrypoint.sh"]
# The entrypoint binds the HTTP server immediately (so /health and /ready respond while the
# database is still being downloaded) and runs download + validation in the background.
# See src/commands/entrypoint-command.ts for implementation details.
WORKDIR /app/apps/ensrainbow
ENTRYPOINT ["pnpm", "run", "entrypoint"]
1 change: 1 addition & 0 deletions apps/ensrainbow/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
"homepage": "https://github.com/namehash/ensnode/tree/main/apps/ensrainbow",
"scripts": {
"serve": "tsx src/cli.ts serve",
"entrypoint": "tsx src/cli.ts entrypoint",
"ingest": "tsx src/cli.ts ingest",
"ingest-ensrainbow": "tsx src/cli.ts ingest-ensrainbow",
"validate": "tsx src/cli.ts validate",
Expand Down
Loading
Loading