Skip to content

Latest commit

 

History

History
545 lines (355 loc) · 7.63 KB

File metadata and controls

545 lines (355 loc) · 7.63 KB

Scope Types Reference

This document describes all available scope types for content extraction in the manifest-driven content generator.

Overview

The scope parameter defines what type of content to collect from the pattern match point until the next heading. Collection automatically stops at any heading level (# through ######) or the end of the document.

Core Concept

Pattern Match → Collect content by type → Stop at next heading → Combine all results

Scope Types

self

Collects: Only the matched line, keeping all markdown formatting.

Use Case: Extract headings with their markdown markers intact.

Example:

Source:

## Introduction to AEM
## Advanced Topics

Configuration:

pattern: "## "
scope: self

Output:

- ## Introduction to AEM
- ## Advanced Topics

self_plain

Collects: Only the matched line, removing markdown formatting.

Use Case: Extract heading text without markdown markers.

Example:

Source:

## Introduction to AEM
## Advanced Topics
### Subtopic Name

Configuration:

pattern: "## "
scope: self_plain

Output:

- Introduction to AEM
- Advanced Topics

Note: ### Subtopic Name is not matched because pattern is "## " (H2 only).


bullets

Collects: All bullet points (lines starting with - or *) from match until next heading.

Use Case: Extract list items, objectives, prerequisites, etc.

Example:

Source:

#### Objectives
- Learn content modeling
- Understand GraphQL
- Build queries

## Next Section

Configuration:

pattern: "#### Objectives"
scope: bullets

Output:

- Learn content modeling
- Understand GraphQL
- Build queries

Complex Example with Multiple Headings:

Source:

# Module 1
#### Objectives
- Learn X
- Learn Y

## Activity 1-1
- Step 1
- Step 2

# Module 2
#### Objectives
- Learn Z

Configuration:

pattern: "#### Objectives"
scope: bullets

Output (bullets from BOTH matches combined):

- Learn X
- Learn Y
- Step 1
- Step 2
- Learn Z

Note: Collects ALL bullets between "#### Objectives" and the next H1 heading for each match.


bullets_plain

Collects: All bullet points with the bullet marker (- or *) removed.

Use Case: Get bullet content as plain text without markers.

Example:

Source:

#### Prerequisites
- Experience with JavaScript
- Basic knowledge of React

Configuration:

pattern: "#### Prerequisites"
scope: bullets_plain

Output:

- Experience with JavaScript
- Basic knowledge of React

Note: Output is still formatted as markdown-list by default, but original bullet markers are stripped from source content.


text

Collects: Only paragraph text (excludes headings, bullets, code blocks).

Use Case: Extract descriptive paragraphs, introductions, explanations.

Example:

Source:

## Overview

This module introduces Adobe Experience Manager. It covers the basics of content management and digital asset organization.

- Some bullet
- Another bullet

More descriptive text here.

## Next Section

Configuration:

pattern: "## Overview"
scope: text

Output:

- This module introduces Adobe Experience Manager. It covers the basics of content management and digital asset organization.
- More descriptive text here.

text_plain

Collects: Paragraph text with all markdown formatting removed (no bold, italic, links, etc.).

Use Case: Extract plain text without any markdown styling.

Example:

Source:

## Description

This guide covers **AEM Sites** and [Edge Delivery Services](https://example.com). It includes *important* concepts.

## Next Section

Configuration:

pattern: "## Description"
scope: text_plain

Output:

- This guide covers AEM Sites and Edge Delivery Services. It includes important concepts.

code

Collects: Code blocks (content between ``` markers).

Use Case: Extract code examples, snippets, configurations.

Example:

Source:

## Implementation

```javascript
function init() {
  console.log('Hello');
}

Some text here.

.button { color: red; }

Next Section


Configuration:
```yaml
pattern: "## Implementation"
scope: code

Output:

- ```javascript
function init() {
  console.log('Hello');
}

.button { color: red; }


code_plain

Collects: Code blocks without the ``` markers.

Use Case: Extract raw code content.

Example:

Source:

## Example

```javascript
const x = 10;

Next Section


Configuration:
```yaml
pattern: "## Example"
scope: code_plain

Output:

- const x = 10;

all

Collects: All content between the match and next heading (bullets, text, code, everything).

Use Case: Extract complete sections including all content types.

Example:

Source:

## Setup Instructions

Follow these steps carefully.

- Install Node.js
- Clone repository

```bash
npm install

That's it!

Next Section


Configuration:
```yaml
pattern: "## Setup Instructions"
scope: all

Output:

- Follow these steps carefully.
- Install Node.js
- Clone repository
- ```bash
npm install
  • That's it!

---

### `all_plain`

**Collects:** All content with markdown formatting removed.

**Use Case:** Extract complete sections as plain text.

**Example:**

Source:
```markdown
## Summary

This covers **important** topics:

- Topic 1
- Topic 2

See [documentation](url) for more.

## Next

Configuration:

pattern: "## Summary"
scope: all_plain

Output:

- This covers important topics:
- Topic 1
- Topic 2
- See documentation for more.

Format Options

The format parameter controls how collected items are output (default: markdown-list):

markdown-list (default)

Outputs each item as a markdown bullet:

- Item 1
- Item 2
- Item 3

plain-list

Outputs each item on its own line without bullets:

Item 1
Item 2
Item 3

inline

Outputs items comma-separated on one line:

Item 1, Item 2, Item 3

Collection Boundaries

Collection ALWAYS stops at:

  • Next heading of ANY level (#, ##, ###, ####, #####, ######)
  • End of document

You cannot specify custom boundaries in v1.


Pattern Matching

  • Matches at start of line after trimming whitespace
  • Case-sensitive
  • Exact string match (no regex in v1)
  • Partial matches work (e.g., "## " matches any H2)

Examples:

  • "## " → Matches all H2 headings
  • "## Activity" → Matches only H2 starting with "Activity"
  • "#### Objectives" → Matches exact H4 heading "Objectives"

Best Practices

  1. Use self_plain for titles: Extract heading text without markdown markers
  2. Use bullets for lists: Extract objectives, prerequisites, steps
  3. Use text for descriptions: Extract paragraph content
  4. Use all for complete sections: When you need everything
  5. Add _plain suffix: When you want content without formatting

Common Patterns

Extract Module Titles

pattern: "# "
scope: self_plain

Extract Activity Titles

pattern: "## Activity"
scope: self_plain

Extract All Objectives from All Modules

pattern: "#### Objectives"
scope: bullets

Extract Prerequisites

pattern: "## Prerequisites"
scope: bullets

Extract Introductions

pattern: "## Introduction"
scope: text