Skip to content

Latest commit

 

History

History
464 lines (302 loc) · 9.09 KB

File metadata and controls

464 lines (302 loc) · 9.09 KB

Validation Rules and Error Messages

This document defines all validation rules for the manifest-driven content generator and their associated error messages.

Validation Phases

Validation occurs in three phases:

  1. Manifest Structure Validation - Check manifest file format and structure
  2. File Existence Validation - Verify all referenced files exist
  3. Configuration Validation - Validate extraction configurations
  4. Template Matching Validation - Ensure template placeholders match extractions

Phase 1: Manifest Structure Validation

Rule 1.1: Valid YAML Syntax

Check: Manifest file must be valid YAML format.

Error Message:

ERROR: Invalid YAML syntax in manifest file
  File: manifest.yml
  Details: [YAML parser error message]

Fix: Use a YAML validator to identify syntax errors.


Rule 1.2: Required Top-Level Keys

Check: Manifest must contain documents key.

Error Message:

ERROR: Missing required key 'documents' in manifest
  File: manifest.yml
  
  Expected structure:
    documents:
      output/file.md:
        ...

Fix: Add the documents section to your manifest.


Rule 1.3: Valid Document Structure

Check: Each document must have template and extractions keys.

Error Message:

ERROR: Invalid document configuration for 'output/website.md'
  Missing required key: 'template'
  
  Expected structure:
    documents:
      output/website.md:
        template: website.md
        extractions:
          ...

Fix: Ensure each document has both template and extractions defined.


Phase 2: File Existence Validation

Rule 2.1: Primary Source Exists

Check: If primary_source is specified, the file must exist.

Error Message:

ERROR: Primary source file not found
  File: ../merged/Guide Template.md
  
  Please verify:
    1. The file path is correct
    2. The file exists at the specified location
    3. You have read permissions

Fix: Verify the path to your primary source file.


Rule 2.2: Template File Exists

Check: Template file specified for each document must exist.

Error Message:

ERROR: Template file not found for document 'output/website.md'
  Template: website.md
  Searched in: tools/content-generator/
  
  Please verify:
    1. Template file exists
    2. Template path is correct (relative to manifest)

Fix: Create the template file or correct the path.


Rule 2.3: Extraction Source File Exists

Check: If an extraction specifies a custom source, that file must exist.

Error Message:

ERROR: Source file not found for extraction 'objectives'
  File: custom-source.md
  Document: output/website.md
  
  Either:
    1. Create the file at the specified path, or
    2. Remove the 'source' key to use primary_source, or
    3. Correct the file path

Fix: Verify the source file exists or use the primary source.


Phase 3: Configuration Validation

Rule 3.1: Required Parameters - Pattern

Check: Each extraction must have a pattern parameter.

Error Message:

ERROR: Missing required parameter 'pattern'
  Document: output/website.md
  Extraction: objectives
  
  Required structure:
    objectives:
      pattern: "#### Objectives"
      scope: bullets

Fix: Add the pattern parameter to your extraction.


Rule 3.2: Required Parameters - Scope

Check: Each extraction must have a scope parameter.

Error Message:

ERROR: Missing required parameter 'scope'
  Document: output/website.md
  Extraction: topics
  
  Required structure:
    topics:
      pattern: "## "
      scope: self_plain

Fix: Add the scope parameter to your extraction.


Rule 3.3: Valid Scope Value

Check: Scope must be one of the supported types.

Valid Values:

  • self, self_plain
  • bullets, bullets_plain
  • text, text_plain
  • code, code_plain
  • all, all_plain

Error Message:

ERROR: Invalid scope value 'bullet' for extraction 'objectives'
  Document: output/website.md
  
  Valid scope values:
    - self, self_plain
    - bullets, bullets_plain
    - text, text_plain
    - code, code_plain
    - all, all_plain
  
  Did you mean: bullets?

Fix: Use a valid scope value from the list above.


Rule 3.4: Valid Format Value

Check: If specified, format must be one of the supported types.

Valid Values:

  • markdown-list (default)
  • plain-list
  • inline

Error Message:

ERROR: Invalid format value 'bullet-list' for extraction 'objectives'
  Document: output/website.md
  
  Valid format values:
    - markdown-list (default)
    - plain-list
    - inline
  
  Did you mean: markdown-list?

Fix: Use a valid format value or omit for default.


Rule 3.5: Non-Empty Pattern

Check: Pattern cannot be an empty string.

Error Message:

ERROR: Empty pattern value for extraction 'topics'
  Document: output/website.md
  
  Pattern must be a non-empty string that matches line content.
  Examples:
    - "## "
    - "#### Objectives"
    - "# Module"

Fix: Provide a valid pattern string.


Phase 4: Template Matching Validation

Rule 4.1: Template Has Placeholders

Check: Template file should contain placeholder markers.

Warning Message:

WARNING: Template file contains no placeholders
  Template: website.md
  Document: output/website.md
  
  Expected format in template:
    //Generated by content-generator
    //extraction_key
  
  This document will be created but no content will be inserted.

Fix: Add placeholder markers to your template.


Rule 4.2: Placeholder Format

Check: Placeholders must follow the two-line comment format.

Error Message:

ERROR: Invalid placeholder format in template
  Template: website.md
  Line: 27
  
  Found: //objectives
  Expected format:
    //Generated by content-generator
    //objectives
  
  Placeholders must be preceded by '//Generated by content-generator'

Fix: Ensure placeholders follow the two-line format.


Rule 4.3: Extraction Keys Match Placeholders

Check: All extraction keys should have corresponding placeholders in template.

Warning Message:

WARNING: Extraction defined but no matching placeholder in template
  Document: output/website.md
  Extraction: 'objectives'
  Template: website.md
  
  Expected to find in template:
    //Generated by content-generator
    //objectives
  
  This extraction will be processed but not inserted into output.

Fix: Add matching placeholder to template or remove unused extraction.


Rule 4.4: Placeholders Have Matching Extractions

Check: All template placeholders should have matching extraction configs.

Warning Message:

WARNING: Placeholder in template has no matching extraction
  Template: website.md
  Placeholder: //agenda
  Document: output/website.md
  
  Either:
    1. Add extraction configuration for 'agenda', or
    2. Remove the placeholder from template
  
  The placeholder will remain unchanged in output.

Fix: Add extraction config or remove placeholder.


Validation Summary Output

When validation completes, a summary is displayed:

Success (No Errors)

✓ Validation passed
  Documents: 1
  Extractions: 2
  Templates verified: 1
  
Ready to generate content.

With Warnings

✓ Validation passed with warnings
  Documents: 1
  Extractions: 2
  Warnings: 1
  
See warnings above. Content generation will continue.

With Errors

✗ Validation failed
  Documents: 1
  Errors: 3
  
Fix the errors above before generating content.

Validation Command

To validate manifest without generating content:

node generator.js --validate

This runs all validation phases and reports errors/warnings without processing.


Error Categories

Fatal Errors (Stop Execution)

  • Invalid YAML syntax
  • Missing required keys
  • Missing required parameters
  • Invalid parameter values
  • File not found errors

Warnings (Continue with Caution)

  • Missing placeholders for extractions
  • Missing extractions for placeholders
  • Empty template file
  • No patterns found in source

Best Practices

  1. Always validate before committing - Run --validate flag
  2. Fix warnings - They often indicate configuration issues
  3. Use descriptive extraction keys - Match them to template placeholders
  4. Test with small documents first - Verify behavior before scaling up
  5. Keep manifest organized - Group related extractions together

Debugging Tips

No Content Generated

Check:

  1. Pattern matches lines in source (case-sensitive!)
  2. Scope is appropriate for content type
  3. Collection boundary isn't immediately after pattern
  4. Placeholder exists in template

Wrong Content Extracted

Check:

  1. Pattern is specific enough (e.g., "## " matches ALL H2)
  2. Scope type matches content (bullets vs text vs code)
  3. Collection stops at next heading as expected

Content Not Inserted

Check:

  1. Placeholder key matches extraction key exactly
  2. Placeholder follows two-line format
  3. Template file is being used for correct document