Skip to content
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -38,4 +38,6 @@ dist
.nostr

# Docker Compose overrides
docker-compose.overrides.yml
docker-compose.overrides.yml
# Export output
*.jsonl
11 changes: 11 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -543,6 +543,17 @@ To see the integration test coverage report open `.coverage/integration/lcov-rep
open .coverage/integration/lcov-report/index.html
```

## Export Events

Export all stored events to a [JSON Lines](https://jsonlines.org/) (`.jsonl`) file. Each line is a valid NIP-01 Nostr event JSON object. The export streams rows from the database using cursors, so it works safely on relays with millions of events without loading them into memory.

```
npm run export # writes to events.jsonl
npm run export -- backup-2024-01-01.jsonl # custom filename
```

The script reads the same `DB_*` environment variables used by the relay (see [CONFIGURATION.md](CONFIGURATION.md)).

## Configuration

You can change the default folder by setting the `NOSTR_CONFIG_DIR` environment variable to a different path.
Expand Down
1 change: 1 addition & 0 deletions package.json
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@
"pretest:integration": "mkdir -p .test-reports/integration",
"test:integration": "cucumber-js",
"cover:integration": "nyc --report-dir .coverage/integration npm run test:integration -- -p cover",
"export": "node -r ts-node/register src/scripts/export-events.ts",
"docker:compose:start": "./scripts/start",
"docker:compose:stop": "./scripts/stop",
"docker:compose:clean": "./scripts/clean",
Expand Down
99 changes: 99 additions & 0 deletions src/scripts/export-events.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
import 'pg-query-stream'
import dotenv from 'dotenv'
dotenv.config()

import fs from 'fs'
import knex from 'knex'
import path from 'path'
import { pipeline } from 'stream/promises'
import { Transform } from 'stream'

const getDbConfig = () => ({
client: 'pg',
connection: process.env.DB_URI || {
host: process.env.DB_HOST ?? 'localhost',
port: Number(process.env.DB_PORT ?? 5432),
user: process.env.DB_USER ?? 'postgres',
password: process.env.DB_PASSWORD ?? 'postgres',
database: process.env.DB_NAME ?? 'nostream',
},
})

Copilot AI Apr 18, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getDbConfig() duplicates the repo’s Knex configuration logic and doesn’t honor several DB_* settings the relay supports (e.g. pool sizing / acquire timeout). It also introduces default host/user/password/db values, which can make the script silently export from an unexpected database when env vars are missing. Consider reusing src/database/client.ts (or factoring out a shared config helper) so the export command uses the same connection behavior as the relay and fails fast when required env vars aren’t set.

Suggested change
const getDbConfig = () => ({
client: 'pg',
connection: process.env.DB_URI || {
host: process.env.DB_HOST ?? 'localhost',
port: Number(process.env.DB_PORT ?? 5432),
user: process.env.DB_USER ?? 'postgres',
password: process.env.DB_PASSWORD ?? 'postgres',
database: process.env.DB_NAME ?? 'nostream',
},
})
const parseOptionalNumber = (name: string): number | undefined => {
const value = process.env[name]
if (value == null || value === '') {
return undefined
}
const parsed = Number(value)
if (!Number.isFinite(parsed)) {
throw new Error(`Invalid ${name}: expected a number`)
}
return parsed
}
const requireEnv = (name: string): string => {
const value = process.env[name]
if (value == null || value === '') {
throw new Error(`Missing required environment variable: ${name}`)
}
return value
}
const getDbConfig = () => {
const acquireConnectionTimeout = parseOptionalNumber('DB_ACQUIRE_TIMEOUT')
const poolMin = parseOptionalNumber('DB_POOL_MIN')
const poolMax = parseOptionalNumber('DB_POOL_MAX')
return {
client: 'pg',
connection: process.env.DB_URI
? process.env.DB_URI
: {
host: requireEnv('DB_HOST'),
port: parseOptionalNumber('DB_PORT') ?? 5432,
user: requireEnv('DB_USER'),
password: requireEnv('DB_PASSWORD'),
database: requireEnv('DB_NAME'),
},
...(acquireConnectionTimeout === undefined
? {}
: { acquireConnectionTimeout }),
...((poolMin === undefined && poolMax === undefined)
? {}
: {
pool: {
...(poolMin === undefined ? {} : { min: poolMin }),
...(poolMax === undefined ? {} : { max: poolMax }),
},
}),
}
}

Copilot uses AI. Check for mistakes.
async function exportEvents(): Promise<void> {
const filename = process.argv[2] || 'events.jsonl'
const outputPath = path.resolve(filename)
const db = knex(getDbConfig())

Comment on lines +22 to +43

Copilot AI Apr 18, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR description mentions cleaning up the DB connection “on exit”, but the script doesn’t currently trap SIGINT/SIGTERM. If the process is interrupted mid-export, the transaction/stream and file descriptor may not be closed cleanly. Consider adding signal handlers to destroy the db stream, close the output stream, and db.destroy() before exiting.

Copilot uses AI. Check for mistakes.
try {
const [{ count }] = await db('events')
.whereNull('deleted_at')
.count('* as count')
const total = Number(count)

if (total === 0) {
console.log('No events to export.')
return
}

console.log(`Exporting ${total} events to ${outputPath}`)

const output = fs.createWriteStream(outputPath)
let exported = 0

const trx = await db.transaction(null, { isolationLevel: 'repeatable read' })

Copilot AI Apr 18, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The count(*) is executed outside the repeatable read transaction, so total may not match the snapshot being streamed (new inserts/soft-deletes between the count and BEGIN can cause progress to be misleading and exported !== total). If you want a consistent snapshot, run the count inside the same read-only transaction before starting the stream; otherwise consider dropping the transaction / total and just log exported rows.

Copilot uses AI. Check for mistakes.

Copilot AI Apr 18, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running a long-lived repeatable read transaction for a multi-million-row export can hold an old MVCC snapshot for the duration of the export, which can increase bloat and interfere with vacuum on busy relays. Consider using READ COMMITTED (still READ ONLY) or avoiding an explicit transaction unless a consistent snapshot is strictly required; alternatively document this operational impact and recommend running against a read replica.

Suggested change
const trx = await db.transaction(null, { isolationLevel: 'repeatable read' })
const trx = await db.transaction(null, { isolationLevel: 'read committed' })

Copilot uses AI. Check for mistakes.
try {
await trx.raw('SET TRANSACTION READ ONLY')

const dbStream = trx('events')
.select(
'event_id',
'event_pubkey',
'event_kind',
'event_created_at',
'event_content',
'event_tags',
'event_signature',
)
.whereNull('deleted_at')
.orderBy('event_created_at', 'asc')

Copilot AI Apr 18, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

orderBy('event_created_at', 'asc') does not guarantee deterministic ordering when multiple events share the same event_created_at value, so repeated exports can legitimately produce different line orders. If stable output is desired, add a secondary tie-breaker (e.g. event_id or the PK id) to the ORDER BY.

Suggested change
.orderBy('event_created_at', 'asc')
.orderBy('event_created_at', 'asc')
.orderBy('event_id', 'asc')

Copilot uses AI. Check for mistakes.
.stream()

const toJsonLine = new Transform({
objectMode: true,
transform(row: any, _encoding, callback) {
const event = {
id: row.event_id.toString('hex'),
pubkey: row.event_pubkey.toString('hex'),
created_at: row.event_created_at,
kind: row.event_kind,
tags: row.event_tags || [],
content: row.event_content,
sig: row.event_signature.toString('hex'),
}

exported++
if (exported % 10000 === 0) {
console.log(`Exported ${exported}/${total} events...`)
}

callback(null, JSON.stringify(event) + '\n')
},
})

await pipeline(dbStream, toJsonLine, output)
await trx.commit()
} catch (err) {
await trx.rollback()
throw err
}

console.log(`Export complete: ${exported} events written to ${outputPath}`)
} finally {
await db.destroy()
}
}

exportEvents().catch((error) => {
console.error('Export failed:', error.message)
Comment thread
cameri marked this conversation as resolved.
process.exit(1)
})
Loading