Creating Generators

This guide explains how to create new documentation generators for @node-core/doc-kit.

Generator Concepts

Generators in doc-kit transform API documentation through a pipeline. Each generator:

Takes input from a previous generator or raw files
Processes the data into a different format
Yields output for the next generator or final output

Generator Pipeline

Raw Markdown Files
    ↓
  [ast] - Parse to MDAST
    ↓
  [metadata] - Extract structured metadata
    ↓
  [jsx-ast] - Convert to JSX AST
    ↓
  [web] - Generate HTML/CSS/JS bundles

Each generator declares its dependency using the dependsOn export, allowing automatic pipeline construction.

Generator Structure

A generator is a single module (index.mjs) that exports its metadata and logic as named exports:

name - The generator's short name (used for config keys and logging)
generate - The main generation function (required)
processChunk - Worker thread processing function (optional — presence enables parallel processing)
dependsOn - Import specifier of the dependency generator (optional)
defaultConfiguration - Default config values (optional)

Creating a Basic Generator

Step 1: Create the Generator Directory

Generators live inside one of the workspace packages under packages/<package>/src/generators/. The existing groupings are:

@doc-kittens/legacy — historical JSON/HTML formats
@doc-kittens/internal — ast, ast-js, metadata (foundational generators consumed by everything else)
@doc-kittens/react — React/JSX-based generators (web, orama-db, jsx-ast)
@doc-kittens/website — public website outputs (sitemap, llms-txt, api-links)
@doc-kittens/extras — specialised one-offs (addon-verify, json-simple, man-page)

Place your new generator in the package whose theme it matches, or create a new workspace package if none fit.

packages/<package>/src/generators/my-format/
├── index.mjs         # Generator entry point (required)
├── constants.mjs     # Constants (optional)
├── types.d.ts        # TypeScript types (required)
└── utils/            # Utility functions (optional)
    └── formatter.mjs

Step 2: Define Types

Create a types.d.ts file containing a Generator export. Use this when typing your generator.

export type Generator = GeneratorMetadata<
  {
    // If your generator supports a custom configuration,
    // define it here
    myCustomOption: string;
  },
  Generate<InputToMyGenerator, Promise<OutputOfMyGenerator>>,
  // If your generator supports parallel processing:
  ProcessChunk<
    InputToMyParallelProcessor,
    OutputOfMyParallelProcessor,
    DependenciesOfMyParallelProcessor
  >
>;

Step 3: Implement the Generator

Create index.mjs with your generator's metadata and logic:

// packages/<package>/src/generators/my-format/index.mjs
'use strict';

import { writeFile } from 'node:fs/promises';
import { join } from 'node:path';

import getConfig from '@node-core/doc-kit/src/utils/configuration/index.mjs';

export const name = 'my-format';
export const dependsOn = '@doc-kittens/internal/metadata';
export const defaultConfiguration = {
  myCustomOption: 'myDefaultValue',
};

/**
 * Main generation function
 *
 * @type {import('./types').Generator['generate']}
 */
export async function generate(input, worker) {
  const config = getConfig('my-format');

  // Transform input to your format
  const result = transformToMyFormat(input, config.version);

  // Write to file if output directory specified
  if (config.output) {
    await writeFile(
      join(config.output, 'documentation.myformat'),
      result,
      'utf-8'
    );
  }

  return result;
}

/**
 * Transform metadata entries to MyFormat
 * @param {Array<MetadataEntry>} entries
 * @param {import('semver').SemVer} version
 * @returns {string}
 */
function transformToMyFormat(entries, version) {
  // Your transformation logic here
  return entries
    .map(entry => `${entry.api}: ${entry.heading.data.name}`)
    .join('\n');
}

Step 4: Register the Generator

Add a short subpath entry to the exports map in your package's package.json, e.g. "./my-format": "./src/generators/my-format/index.mjs". The ./src/* wildcard already exposes utilities and types under the longer path form for cross-package imports.

Parallel Processing with Workers

For generators processing large datasets, implement parallel processing using worker threads. Export a processChunk function from your index.mjs — its presence automatically enables parallel processing.

Implementing Worker-Based Processing

// packages/<package>/src/generators/parallel-generator/index.mjs
import getConfig from '@node-core/doc-kit/src/utils/configuration/index.mjs';

export const name = 'parallel-generator';
export const dependsOn = '@doc-kittens/internal/metadata';

/**
 * Process a chunk of items in a worker thread.
 * This function runs in isolated worker threads.
 *
 * @type {import('./types').Generator['processChunk']}
 */
export async function processChunk(fullInput, itemIndices, deps) {
  const results = [];

  // Process only the items at specified indices
  for (const idx of itemIndices) {
    const item = fullInput[idx];
    const result = await processItem(item, deps);
    results.push(result);
  }

  return results;
}

/**
 * Main generation function that orchestrates worker threads
 *
 * @type {import('./types').Generator['generate']}
 */
export async function* generate(input, worker) {
  // Configuration for this generator is based on its name
  const config = getConfig('parallel-generator');

  // Prepare serializable dependencies
  const deps = {
    version: config.version,
    // ...other config
  };

  // Stream chunks as they complete
  for await (const chunkResult of worker.stream(input, deps)) {
    // Process chunk result if needed
    yield chunkResult;
  }
}

Key Points for Worker Processing

processChunk executes in worker threads - No access to main thread state
Only serializable data can be passed to workers (no functions, classes, etc.)
fullInput and itemIndices - Workers receive full input but only process specified indices
deps must be serializable - Pass only JSON-compatible data

When to Use Workers

Use parallel processing when:

Processing many independent items (files, modules, entries)
Each item takes significant time to process
Operations are CPU-intensive

Don't use workers when:

Items have dependencies on each other
Output must be in specific order
Operation is I/O bound rather than CPU bound

Streaming Results

Generators can yield results as they're produced using async generators. Export processChunk to enable parallel processing, then use async function* for generate:

// src/generators/streaming-generator/index.mjs
export const name = 'streaming-generator';
export const dependsOn = '@doc-kittens/internal/metadata';

/**
 * Process a chunk of data
 *
 * @type {import('./types').Generator['processChunk']}
 */
export async function processChunk(fullInput, itemIndices, deps) {
  // Process chunk
  return results;
}

/**
 * Generator function that yields results incrementally
 *
 * @type {import('./types').Generator['generate']}
 */
export async function* generate(input, worker) {
  // Stream results as workers complete chunks
  for await (const chunkResult of worker.stream(input, {})) {
    // Yield immediately - downstream can start processing
    yield chunkResult;
  }
}

Benefits of Streaming

Reduced memory usage - Process data in chunks
Earlier downstream starts - Next generator can begin before this one finishes
Better parallelism - Multiple generators can work simultaneously

Non-Streaming Generators

Some generators must collect all input before processing:

// src/generators/batch-generator/index.mjs
export const name = 'batch-generator';
export const dependsOn = '@doc-kittens/react/jsx-ast';

/**
 * Non-streaming - returns Promise instead of AsyncGenerator
 *
 * @type {import('./types').Generator['generate']}
 */
export async function generate(input, worker) {
  // Collect all input (if dependency is streaming, this waits for completion)
  const allData = await collectAll(input);

  // Process everything together
  const result = processBatch(allData);

  return result;
}

Use non-streaming when:

You need all data to make decisions (e.g., code splitting, global analysis)
Output format requires complete dataset
Cross-references between items need resolution

Generator Dependencies

Declaring Dependencies

// src/generators/my-generator/index.mjs
export const name = 'my-generator';
export const dependsOn = '@doc-kittens/internal/metadata';

export async function generate(input, worker) {
  // input contains the output from the metadata generator
}

Dependency Chain Example

// Step 1: Parse markdown to AST (no dependency)
// src/generators/ast/index.mjs
export const name = 'ast';
// No dependsOn — processes raw markdown files

// Step 2: Extract metadata from AST
// src/generators/metadata/index.mjs
export const name = 'metadata';
export const dependsOn = '@doc-kittens/internal/ast';

// Step 3: Generate HTML from metadata
// src/generators/html-generator/index.mjs
export const name = 'html-generator';
export const dependsOn = '@doc-kittens/internal/metadata';

Multiple Consumers

Multiple generators can depend on the same generator:

    metadata
    ↙  ↓  ↘
  html json man-page

The framework ensures metadata runs once and its output is cached for all consumers.

File Output

Writing Output Files

import { mkdir, writeFile } from 'node:fs/promises';
import { join } from 'node:path';

import getConfig from '../../utils/configuration/index.mjs';

export async function generate(input, worker) {
  const config = getConfig('my-format');

  if (!config.output) {
    // Return data without writing
    return result;
  }

  // Ensure directory exists
  await mkdir(config.output, { recursive: true });

  // Write single file
  await writeFile(join(config.output, 'output.txt'), content, 'utf-8');

  // Write multiple files
  for (const item of items) {
    await writeFile(
      join(config.output, `${item.name}.txt`),
      item.content,
      'utf-8'
    );
  }

  return result;
}

Copying Assets

import { cp } from 'node:fs/promises';
import { join } from 'node:path';

import getConfig from '../../utils/configuration/index.mjs';

export async function generate(input, worker) {
  const config = getConfig('my-format');

  if (config.output) {
    // Copy asset directory
    await cp(
      new URL('./assets', import.meta.url),
      join(config.output, 'assets'),
      { recursive: true }
    );
  }

  return result;
}

Output Structure

Organize output clearly:

output/
├── index.html
├── api/
│   ├── fs.html
│   ├── http.html
│   └── path.html
├── assets/
│   ├── style.css
│   └── script.js
└── data/
    └── search-index.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Creating Generators

Generator Concepts

Generator Pipeline

Generator Structure

Creating a Basic Generator

Step 1: Create the Generator Directory

Step 2: Define Types

Step 3: Implement the Generator

Step 4: Register the Generator

Parallel Processing with Workers

Implementing Worker-Based Processing

Key Points for Worker Processing

When to Use Workers

Streaming Results

Benefits of Streaming

Non-Streaming Generators

Generator Dependencies

Declaring Dependencies

Dependency Chain Example

Multiple Consumers

File Output

Writing Output Files

Copying Assets

Output Structure

FilesExpand file tree

generators.md

Latest commit

History

generators.md

File metadata and controls

Creating Generators

Generator Concepts

Generator Pipeline

Generator Structure

Creating a Basic Generator

Step 1: Create the Generator Directory

Step 2: Define Types

Step 3: Implement the Generator

Step 4: Register the Generator

Parallel Processing with Workers

Implementing Worker-Based Processing

Key Points for Worker Processing

When to Use Workers

Streaming Results

Benefits of Streaming

Non-Streaming Generators

Generator Dependencies

Declaring Dependencies

Dependency Chain Example

Multiple Consumers

File Output

Writing Output Files

Copying Assets

Output Structure