Skip to content

Commit fc95bf5

Browse files
feat: relative links, TOC generation, and link validation (#1)
* feat: transform relative links to absolute paths Fix <base href="/"> issue where #anchor links navigate to /#anchor. The build now transforms relative links to absolute paths: - #section → /blog/slug#section - ../other-slug → /blog/other-slug - ../other-slug#section → /blog/other-slug#section Uses path.posix.resolve() for cross-platform path resolution. linkBasePath is derived from folder structure (folder = URL path). Also fixes: mailto:, tel:, ftp:// links are now correctly preserved (isAbsoluteUrl uses /^\w+:/ regex to match all protocols). Includes comprehensive documentation of URL transformation system. * feat: add automatic TOC generation with [[toc]] marker - Add TOC_MARKER constant for [[toc]] placeholder - Add generateToc() using getHeadingList() from marked-gfm-heading-id - Add decodeHtmlEntities() for proper heading text display - Include h2 and h3 headings, skip headings before marker - Generate nested markdown list that gets processed normally Also includes: - Tests for mailto:, tel:, ftp:// links (not transformed) - Tests for TOC generation (6 new tests) * fix: add missing type exports and update test parameters - Add HeadingData, getHeadingList, and resetHeadings exports to types.d.ts - Add missing linkBasePath parameter to markdownToEntry test calls * docs: comprehensive README with feature documentation - Document image URL transformation (placeholder system) - Document link transformation (relative to absolute) - Document TOC generation with [[toc]] marker - Add YAML frontmatter reference - Add architecture overview - Add submodule warning * docs: clarify Angular website reference * docs: translate README to English * refactor: replace marked-gfm-heading-id with inline fork - Fork marked-gfm-heading-id to shared/gfm-heading-id/ - Convert to TypeScript with improved types - Simplified API (removed unused globalSlugs option) - Better entity handling (decode before slugging) - Replace marked-gfm-heading-id dependency with github-slugger - Remove now-empty types.d.ts - Update README with github-slugger reference The fork is only 100 lines and gives us full control over heading ID generation. All 131 tests pass. * refactor: simplify TOC generation using h.raw from fork - Remove decodeHtmlEntities() from JekyllMarkdownParser (now in fork) - Use h.raw directly instead of decoding h.text - Update comment to reference our gfm-heading-id fork The fork provides heading.raw which is already decoded and stripped of HTML tags, eliminating duplicate code. * refactor: major cleanup and simplification Code Quality Improvements: - Extract shared utilities to html.utils.ts (stripHtmlTags, decodeHtmlEntities, escapeHtml) - Precompile regex patterns for better performance - Simplify parse() return type (remove unused yaml/markdown fields) - Update documentation to reference our gfm-heading-id fork TOC Generation Simplification: - Replace complex position-tracking algorithm with simple split approach - Split markdown at [[toc]] marker, parse only content after it - Reduces generateToc from 60+ lines to 30 lines - More robust: no regex matching of heading text needed Net result: -40 lines of code, cleaner architecture, same functionality. All 131 tests pass. * feat: preserve HTML formatting in TOC links + add comprehensive tests - TOC links now preserve inline formatting (<code>, <strong>, <em>) - Add warning for duplicate headings (known limitation) - Move gfm-heading-id.ts out of subdirectory - Add html.utils.spec.ts (45 tests) - Add gfm-heading-id.spec.ts (24 tests) - Add 5 TOC formatting tests - Document HeadingData text vs raw separation - Clarify slug terminology in base.types.ts Total: 205 tests passing * feat: add anchor link validation with fuzzy "did you mean?" suggestions Build-time validation for broken anchor links: - Collects all heading IDs during parsing - Extracts all anchor links from HTML - Validates links point to existing anchors - Suggests similar anchors using Levenshtein distance (typo detection) - Warnings only, does not fail the build New files: - string.utils.ts: Levenshtein distance + findSimilar (27 tests) - link-validator.ts: Anchor validation logic (14 tests) Changes: - jekyll-markdown-parser.ts: Now returns headingIds - base.utils.ts: Registers anchors and links during parsing - build.ts: Runs validation at end of build Total: 246 tests passing * fix(README): indentation * refactor: improve code quality based on audit findings - link-validator: use matchAll() instead of exec() loop with global regex - html.utils: add single quote escaping for complete HTML safety - base.utils: replace string concat sort key with proper multi-field comparison * fix: skip external URLs in anchor link validation * fix: URL-decode anchors before validation * feat: include h4 headings in [[toc]] generation --------- Co-authored-by: Ferdinand Malcher <ferdinand@malcher.media>
1 parent 7b9f23b commit fc95bf5

19 files changed

Lines changed: 2387 additions & 154 deletions

README.md

Lines changed: 263 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -1,58 +1,286 @@
11
# website-articles-build
22

3-
Shared build scripts for processing Markdown articles into JSON.
3+
Build system for blog and material articles. Transforms Markdown to JSON for Angular websites.
44

5-
Used as a git subtree in:
5+
Used as a Git submodule in:
66
- [angular-buch/website-articles](https://github.com/angular-buch/website-articles)
77
- [angular-schule/website-articles](https://github.com/angular-schule/website-articles)
88

9-
## Usage
9+
## Setup
1010

1111
```bash
1212
npm install
13-
npm run build
13+
npm run build # Single build
14+
npm run watch # Watch mode for development
15+
npm test # Run tests
16+
npm run typecheck # TypeScript check
1417
```
1518

16-
## Scripts
17-
18-
| Script | Description |
19-
|--------------|--------------------------------------|
20-
| `build` | Build blog and material entries |
21-
| `test` | Run tests |
22-
| `test:watch` | Run tests in watch mode |
23-
| `typecheck` | TypeScript type checking |
24-
| `watch` | Watch mode for development |
25-
26-
## Folder Structure
19+
## Project Structure
2720

2821
```
29-
├── build.ts # Main entry point
22+
website-articles-build/
23+
├── build.ts # Main build script
3024
├── blog/
31-
│ ├── blog.types.ts # Blog-specific types
32-
│ └── blog.utils.ts # Blog list utilities
25+
│ ├── blog.types.ts # Blog-specific types
26+
│ └── blog.utils.ts # Blog-specific utilities
3327
├── material/
34-
│ └── material.types.ts # Material-specific types
28+
│ └── material.types.ts # Material-specific types
3529
└── shared/
36-
├── base.types.ts # Shared base types
37-
├── base.utils.ts # File/folder utilities
38-
├── list.utils.ts # List extraction utilities
39-
└── jekyll-markdown-parser.ts # Markdown parser
30+
├── jekyll-markdown-parser.ts # Markdown parser
31+
├── base.utils.ts # Shared utilities
32+
└── list.utils.ts # List utilities
33+
```
34+
35+
## Output
36+
37+
The build generates for each article:
38+
39+
| Output | Description |
40+
|--------|-------------|
41+
| `dist/blog/{slug}/entry.json` | Full article with HTML |
42+
| `dist/blog/list.json` | List of all articles (light version) |
43+
| `dist/material/{slug}/entry.json` | Full material entry |
44+
| `dist/material/list.json` | List of all material entries |
45+
46+
---
47+
48+
## Features for Markdown Authors
49+
50+
### 1. Images
51+
52+
Relative image paths are automatically transformed:
53+
54+
```markdown
55+
![Screenshot](screenshot.png)
56+
![Logo](./images/logo.png)
57+
```
58+
59+
**Build output:**
60+
```html
61+
<img src="%%MARKDOWN_BASE_URL%%/blog/my-article/screenshot.png">
62+
```
63+
64+
The placeholder `%%MARKDOWN_BASE_URL%%` is replaced at runtime by the Angular app (CDN on prod, proxy in dev).
65+
66+
**Not transformed:**
67+
- Absolute URLs: `https://example.com/image.png`
68+
- Protocol-relative URLs: `//cdn.example.com/image.png`
69+
- Asset paths: `assets/img/icon.svg`
70+
- Absolute paths: `/images/logo.png`
71+
- Data URIs: `data:image/png;base64,...`
72+
73+
### 2. Links
74+
75+
Relative links are transformed to absolute paths. This is necessary because our Angular website uses `<base href="/">`.
76+
77+
#### Anchor Links (TOC)
78+
79+
```markdown
80+
[Introduction](#introduction)
81+
```
82+
83+
**Build output:**
84+
```html
85+
<a href="/blog/my-article#introduction">Introduction</a>
4086
```
4187

42-
## URL Placeholder
88+
#### Cross-Article Links
4389

44-
Generated URLs use `%%MARKDOWN_BASE_URL%%` as a placeholder:
45-
- `%%MARKDOWN_BASE_URL%%/blog/2024-post/image.png`
46-
- `%%MARKDOWN_BASE_URL%%/material/chapter-1/diagram.svg`
90+
```markdown
91+
[Other Article](../other-article)
92+
[Other Article with Anchor](../other-article#setup)
93+
```
94+
95+
**Build output:**
96+
```html
97+
<a href="/blog/other-article">Other Article</a>
98+
<a href="/blog/other-article#setup">Other Article with Anchor</a>
99+
```
100+
101+
**Not transformed:**
102+
- Absolute URLs: `https://angular.io/docs`
103+
- Already absolute paths: `/blog/other-article`
104+
- mailto: `mailto:team@example.com`
105+
- tel: `tel:+49123456`
106+
- ftp: `ftp://files.example.com/file.zip`
107+
108+
### 3. Automatic Table of Contents (TOC)
109+
110+
Place `[[toc]]` in your Markdown to generate an automatic table of contents.
111+
112+
#### Example
113+
114+
```markdown
115+
---
116+
title: My Article
117+
published: 2024-01-15
118+
---
119+
120+
## Contents
121+
122+
[[toc]]
123+
124+
## Introduction
47125

48-
The consuming website replaces this placeholder with the actual base URL at runtime.
126+
Lorem ipsum...
49127

50-
## Input/Output
128+
### Subchapter
51129

52-
**Input:** `../blog/` and `../material/` folders with Markdown READMEs
130+
More text...
53131

54-
**Output:** `../dist/` folder (parent directory) with:
55-
- `../dist/blog/list.json` - Light blog list for overview
56-
- `../dist/blog/{slug}/entry.json` - Full blog entry
57-
- `../dist/material/list.json` - Light material list
58-
- `../dist/material/{slug}/entry.json` - Full material entry
132+
## Conclusion
133+
134+
End.
135+
```
136+
137+
#### Generated Output
138+
139+
```html
140+
<h2 id="contents">Contents</h2>
141+
<ul>
142+
<li><a href="/blog/my-article#introduction">Introduction</a></li>
143+
<li>
144+
<ul>
145+
<li><a href="/blog/my-article#subchapter">Subchapter</a></li>
146+
</ul>
147+
</li>
148+
<li><a href="/blog/my-article#conclusion">Conclusion</a></li>
149+
</ul>
150+
```
151+
152+
#### Rules
153+
154+
| Rule | Description |
155+
|------|-------------|
156+
| **Only h2 and h3** | h1 and h4+ are ignored |
157+
| **After the marker** | Headings before `[[toc]]` are skipped |
158+
| **Automatic IDs** | Heading IDs follow [GitHub's algorithm](https://github.com/Flet/github-slugger) |
159+
| **Special characters** | Umlauts preserved (`Über uns``#über-uns`), `&` removed |
160+
161+
### 4. Syntax Highlighting
162+
163+
Code blocks are automatically formatted with highlight.js:
164+
165+
````markdown
166+
```typescript
167+
const greeting = 'Hello World';
168+
console.log(greeting);
169+
```
170+
````
171+
172+
### 5. Raw HTML
173+
174+
HTML in Markdown is passed through unchanged:
175+
176+
```markdown
177+
<div class="custom-box">
178+
<p>Custom styled content</p>
179+
</div>
180+
181+
<iframe src="https://stackblitz.com/edit/angular" width="100%"></iframe>
182+
```
183+
184+
**Security note:** This is intentional. We trust our own repository. There is no user-generated content.
185+
186+
### 6. Emojis
187+
188+
Emoji shortcodes are converted to Unicode:
189+
190+
```markdown
191+
Hello :smile: World :rocket:
192+
```
193+
194+
**Output:** Hello 😄 World 🚀
195+
196+
---
197+
198+
## YAML Frontmatter
199+
200+
Every article requires YAML frontmatter:
201+
202+
```yaml
203+
---
204+
title: "Article Title"
205+
author: John Doe
206+
mail: john@example.com
207+
published: 2024-01-15
208+
language: en
209+
header: header.jpg
210+
keywords:
211+
- Angular
212+
- TypeScript
213+
# Optional:
214+
lastModified: 2024-02-01
215+
hidden: false # Don't show article in list
216+
sticky: false # Pin article to top
217+
darkenHeader: false
218+
author2: Co-Author
219+
mail2: co@example.com
220+
bio: Short author bio
221+
---
222+
```
223+
224+
### Date Formats
225+
226+
Both formats are supported:
227+
228+
```yaml
229+
published: 2024-01-15 # Converted to ISO string
230+
published: "2024-01-15T10:00:00Z" # Stays as string
231+
```
232+
233+
---
234+
235+
## Development
236+
237+
### Tests
238+
239+
```bash
240+
npm test # Single run
241+
npm run test:watch # Watch mode
242+
```
243+
244+
131 tests cover:
245+
- Markdown parsing and HTML generation
246+
- Image and link transformation
247+
- TOC generation
248+
- Edge cases (mailto, tel, CRLF, etc.)
249+
250+
### TypeScript
251+
252+
```bash
253+
npm run typecheck # Type check
254+
```
255+
256+
### Architecture
257+
258+
```
259+
Markdown (README.md)
260+
261+
JekyllMarkdownParser
262+
├── YAML Frontmatter → parsedYaml
263+
├── Markdown → marked → HTML
264+
├── Image URLs → transformed with placeholder
265+
├── Links → transformed to absolute paths
266+
└── TOC → generated from headings
267+
268+
entry.json
269+
```
270+
271+
---
272+
273+
## Submodule Warning
274+
275+
This repository is included as a Git submodule in `website-articles`.
276+
277+
**Always make changes here**, not in the `build/` folder of the parent repo!
278+
279+
```bash
280+
# CORRECT: Work here
281+
cd website-articles-build
282+
git checkout -b feature/xyz
283+
284+
# WRONG: Don't work in the submodule
285+
cd website-articles/build #
286+
```

build.ts

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ import { copyEntriesToDist, getEntryList } from './shared/base.utils';
88
import { makeLightBlogList } from './blog/blog.utils';
99
import { makeLightList } from './shared/list.utils';
1010
import { MARKDOWN_BASE_URL_PLACEHOLDER } from './shared/jekyll-markdown-parser';
11+
import { printValidationResults } from './shared/link-validator';
1112

1213
const DIST_FOLDER = '../dist';
1314
const BLOG_FOLDER = '../blog';
@@ -65,7 +66,11 @@ async function build(): Promise<void> {
6566
await buildBlog();
6667
await buildMaterial();
6768

68-
console.log('Build complete!');
69+
// Validate all anchor links (warnings only, does not fail build)
70+
console.log('\nValidating anchor links...');
71+
printValidationResults();
72+
73+
console.log('\nBuild complete!');
6974
}
7075

7176
build().catch((error) => {

package-lock.json

Lines changed: 3 additions & 15 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

package.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,8 +16,8 @@
1616
"highlight.js": "^11.10.0",
1717
"image-size": "^2.0.2",
1818
"js-yaml": "^4.1.0",
19+
"github-slugger": "^2.0.0",
1920
"marked": "^17.0.1",
20-
"marked-gfm-heading-id": "^4.1.3",
2121
"marked-highlight": "^2.2.3",
2222
"node-emoji": "^2.1.3"
2323
},

0 commit comments

Comments
 (0)