Skip to content

Commit 1306a05

Browse files
committed
Add RubyKaigi 2026 talk prep: outline, research, and slides
- Slide outline with 33-slide structure across 3 sections: caught up (Aliki, server mode, Prism), getting there (Markdown, RBS), and getting ahead (AI evaluation) - HTML presentation in Swiss Modern style with inline editing - Research files: PR history for ruby/rdoc and ruby/ruby, llms.txt adoption data, call-seq vs RBS comparison, Markdown/RDoc coupling analysis, tompng's Prism work - Google Slides Apps Script as alternative (populate_slides.js)
1 parent 2d74d02 commit 1306a05

11 files changed

Lines changed: 2229 additions & 0 deletions

populate_slides.js

Lines changed: 236 additions & 0 deletions
Large diffs are not rendered by default.

research/callseq_vs_rbs.md

Lines changed: 64 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,64 @@
1+
# call-seq vs RBS: Can We Replace call-seq?
2+
3+
## The Overlap
4+
5+
Both express method signatures for documentation purposes:
6+
7+
```
8+
# call-seq:
9+
# readlines(sep=$/) -> array
10+
# readlines(limit) -> array
11+
# readlines(sep, limit) -> array
12+
```
13+
14+
```ruby
15+
#: (?String sep) -> Array[String]
16+
#: (Integer limit) -> Array[String]
17+
#: (String sep, Integer limit) -> Array[String]
18+
```
19+
20+
| Aspect | call-seq | RBS `#:` |
21+
|--------|----------|----------|
22+
| Multiple signatures/overloads | Yes (multiple lines) | Yes (multiple `#:` lines) |
23+
| Argument names | Yes (`sep`, `limit`) | Yes (named params in RBS) |
24+
| Return type | Yes (arrow notation) | Yes (formal type) |
25+
| Parameter types | Implicit/prose | Explicit (String, Integer) |
26+
| Default values | **YES** (`sep=$/`) | **NO** |
27+
| Block/yield | Can describe | Formal block signature |
28+
| Machine-readable | No (free-form text) | Yes (parseable by RBS tools) |
29+
30+
## The Gap: Default Values
31+
32+
call-seq from Ruby core (C extensions):
33+
```c
34+
/* call-seq:
35+
* commercial(cwyear, cweek=1, cwday=1, sg=nil) -> Date
36+
*/
37+
```
38+
39+
RBS cannot express `cweek=1` — only that the parameter is optional:
40+
```ruby
41+
#: (Integer cwyear, ?Integer cweek, ?Integer cwday, ?Symbol? sg) -> Date
42+
```
43+
44+
The default value `1` is lost.
45+
46+
## Why This Matters
47+
48+
- In ruby/ruby, C extension methods have no Ruby source — call-seq is the ONLY way to document their signatures
49+
- call-seq is free-form text (hard to parse, easy to get wrong, inconsistent across contributors)
50+
- RBS is structured and machine-readable (enables type linking, validation, tooling)
51+
- For a language that "doesn't want typing," using type signatures for documentation is a significant philosophical shift
52+
53+
## The Open Questions
54+
55+
1. Can RBS syntax be extended to support default values? (e.g., `?Integer cweek = 1`)
56+
2. What's the migration path for thousands of existing call-seq entries in ruby/ruby?
57+
3. Should call-seq remain for cases RBS can't express (version-specific overloads, prose descriptions)?
58+
59+
## Sources
60+
61+
- call-seq parsing: lib/rdoc/comment.rb `extract_call_seq` (lines 95-120)
62+
- call-seq storage: lib/rdoc/code_object/any_method.rb `call_seq=`
63+
- Real examples: test/rdoc/rdoc_comment_test.rb (ARGF.readlines, Date.commercial)
64+
- RBS inline syntax: https://sorbet.org/docs/rbs-comments

research/current_features.md

Lines changed: 176 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,176 @@
1+
# RDoc Current Feature State (branch: stan-talk-prep)
2+
3+
Research for RubyKaigi talk preparation. Based on code at `/Users/hung-wulo/src/github.com/Shopify/rdoc-talk-prep`.
4+
5+
---
6+
7+
## 1. Aliki Theme
8+
9+
**What it is:** A modern, from-scratch HTML theme that subclasses `RDoc::Generator::Darkfish`. Authored by Stan Lo. Located at `lib/rdoc/generator/aliki.rb` with templates in `lib/rdoc/generator/template/aliki/`.
10+
11+
### How it differs from Darkfish
12+
13+
| Feature | Darkfish | Aliki |
14+
|---------|----------|-------|
15+
| Layout | Two-column (sidebar + content) | Three-column (sidebar + content + right-side TOC) |
16+
| Dark mode | None | Full dark mode with `data-theme` toggle, localStorage persistence, and system preference detection |
17+
| CSS size | 702 lines, ships embedded fonts (Lato, SourceCodePro) | 1994 lines, uses system font stack (no embedded fonts = lighter output) |
18+
| CSS architecture | Flat styles | Design system with CSS custom properties (tokens for colors, spacing, typography, shadows, transitions, z-index) |
19+
| JS files | 2 files (260 lines total: darkfish.js, search.js) | 7 files: aliki.js, theme-toggle.js, search_controller.js, search_navigation.js, search_ranker.js, c_highlighter.js, bash_highlighter.js |
20+
| Search | Basic search.js | Custom ranked search with fuzzy matching, namespace/method-aware queries, tiered scoring (exact > prefix > substring > fuzzy), type-aware priority, search snippets, type badges |
21+
| Search index | Uses JsonIndex generator (separate pass) | Built-in `write_search_index` / `build_search_index` (no extra generator needed), writes `js/search_data.js` |
22+
| Mobile | No specific mobile support | Responsive grid layout, mobile search modal, hamburger sidebar toggle, viewport-aware JS |
23+
| Syntax highlighting | None for C or shell code | Client-side C highlighter (keywords, types, macros, strings, preprocessor directives, Ruby C API types like VALUE/ID) and bash/shell highlighter (prompts, commands, options, strings, env vars, comments) |
24+
| TOC | Server-side sidebar TOC only | Auto-generated right-sidebar "On This Page" TOC from headings with IntersectionObserver scroll-spy, smooth scrolling |
25+
| Code blocks | Plain `<pre>` | Copy-to-clipboard buttons dynamically added to all `<pre>` elements |
26+
| Header/Footer | Minimal | Top navbar with brand, search bar, theme toggle; customizable footer via `footer_content` option in `.rdoc_options` |
27+
| Open Graph / SEO | None | Full Open Graph and Twitter Card meta tags, canonical URL support, rich `<meta>` descriptions |
28+
| Icons | Silk icon sprites (images/) | Inline SVG symbol sprites (no image files) |
29+
| Breadcrumbs | Yes | Yes (same approach) |
30+
| Method entries | Standard list | Styled as "signature cards" (commit dc7a1679) |
31+
| Ancestor tree | Shows parent class only | Full ancestor chain with recursive nested `<ul>` |
32+
| Source language | `<pre>` (no class) | `<pre class="c">` or `<pre class="ruby">` via `method.source_language` - enables language-specific highlighting |
33+
34+
### Key design decisions
35+
- Sidebar is hidden by default with `hidden` attribute, shown by JS on large viewports. This avoids sidebar flicker on mobile page load.
36+
- Search data is written as `.js` (not `.json`) to avoid CORS issues when viewing generated docs via `file://` protocol.
37+
- `resolve_url(rel_prefix, url)` helper ensures absolute URLs pass through unchanged while relative URLs get prefixed correctly.
38+
39+
---
40+
41+
## 2. Markdown Support (GFM)
42+
43+
**Parser:** `lib/rdoc/markdown.kpeg` (PEG grammar compiled to `lib/rdoc/markdown.rb`).
44+
45+
### Default extensions enabled
46+
- `definition_lists` - PHP Markdown Extra style
47+
- `github` - fenced code blocks, syntax highlighting, tables, strikethrough
48+
- `html` - raw HTML blocks
49+
- `notes` - footnotes
50+
- `strike` - `~~strikethrough~~`
51+
52+
### GFM features supported
53+
- Fenced code blocks (triple backtick with optional language tag)
54+
- Tables (header, alignment, body rows with inline markdown in cells)
55+
- Strikethrough (`~~text~~`)
56+
- Auto-linking of bare URLs
57+
- Underscores in words are never treated as emphasis
58+
59+
### `break_on_newline` extension
60+
- Converts all newlines into hard line breaks (GFM-style). Commit `d62b0321` enabled this by default. **Note:** it is listed as an extension but NOT in `DEFAULT_EXTENSIONS` in the kpeg source - it appears the default enablement may be done elsewhere or the commit was on a different branch.
61+
62+
### Recent markdown improvements (from git log)
63+
- `bd0e544f` - Fix blockquote lazy continuation parsing
64+
- `d62b0321` - Enable `break_on_newline` by default
65+
- `c59a7a89` - Fix table parser consuming lines without pipes
66+
- `52b24c2d` - Implement escapes in Markdown-to-RDoc conversion
67+
- `0602d13b` - Align strikethrough with GitHub Markdown spec
68+
- `39f5a2d9` - Fix backslash handling in table cell code spans
69+
- `eaac67d3` - Support markdown syntax in table cells
70+
- `393c0e87` - Add comparison with GitHub Flavored Markdown spec
71+
72+
### Markdown output (`lib/rdoc/markup/to_markdown.rb`)
73+
- `RDoc::Markup::ToMarkdown` converts RDoc's internal markup tree back to Markdown format
74+
- Subclasses `ToRdoc`, overrides heading markers to `#`/`##`/etc.
75+
- Handles lists (bullet, numbered, definition/label)
76+
77+
### What is NOT supported
78+
- Task lists / checkboxes
79+
- GitHub-style alerts (`> [!NOTE]`, `> [!WARNING]`)
80+
- Autolinks without angle brackets (bare URL auto-linking is RDoc-specific, not the GFM autolink extension)
81+
82+
---
83+
84+
## 3. RBS Integration
85+
86+
**Status: Not present in this branch.**
87+
88+
Grep for `rbs`, `RBS`, `type_sig`, `rbs_` across `lib/` returned zero matches. There is no RBS type signature parsing, display, or integration in the generator, parser, or templates.
89+
90+
The memory file mentions prior investigation into "RBS Integration Phase 1" with inline `#:` type sigs in HTML output, but this code is not on the current `stan-talk-prep` branch.
91+
92+
---
93+
94+
## 4. LLM/AI Support (llms.txt)
95+
96+
**Status: Not present.**
97+
98+
Grep for `llms`, `llm`, `LLM`, `llms.txt` across the entire codebase (`.rb` files and all files) returned zero relevant matches. There is no `llms.txt` generator, no LLM-friendly output format, and no AI-related features.
99+
100+
---
101+
102+
## 5. Server Mode (Live Reload)
103+
104+
**File:** `lib/rdoc/server.rb` (394 lines)
105+
106+
### Architecture
107+
- Invoked via `rdoc --server` (added in commit `3c6f5f6f`)
108+
- Uses Ruby's built-in `TCPServer` - no WEBrick, no external dependencies
109+
- Binds to `127.0.0.1:<port>`
110+
- Multi-threaded: one thread per client connection, plus a background file watcher thread
111+
- Uses the Aliki generator exclusively (`RDoc::Generator::Aliki`)
112+
113+
### Live reload mechanism
114+
1. Background watcher thread polls source file mtimes every 1 second
115+
2. Detects modified, new, and deleted files
116+
3. On change: re-parses only changed files via `@rdoc.parse_file(f)`, clears stale contributions, refreshes generator data, invalidates page cache
117+
4. Injects a `<script>` polling snippet before `</body>` in every HTML response
118+
5. Browser polls `/__status` every 1 second, comparing `last_change` timestamp
119+
6. If timestamp changed, browser does `location.reload()`
120+
121+
### Request routing
122+
- `/__status` - returns JSON `{last_change: <float>}` for live reload
123+
- `/css/*`, `/js/*` - serves static assets from Aliki template directory (with path traversal protection)
124+
- `/js/search_data.js` - dynamically generated search index
125+
- `/index.html` - calls `@generator.generate_index`
126+
- `/table_of_contents.html` - calls `@generator.generate_table_of_contents`
127+
- `/ClassName.html` - looks up class/module in store, renders via generator
128+
- `/filename.html` - looks up text page in store
129+
- 404s rendered through `generate_servlet_not_found`
130+
131+
### Page caching
132+
- Pages are cached in `@page_cache` (hash)
133+
- Cache is fully invalidated on any file change (entire hash cleared)
134+
- Mutex protects all shared state (`@page_cache`, `@last_change_time`, store operations)
135+
136+
### Terminal output
137+
- Prints clickable hyperlink to terminal using OSC 8 escape sequences
138+
- Logs `<status> <path> (<duration>ms)` for page requests
139+
- Logs re-parse timing: `Re-parsed <files> (<duration>ms)`
140+
- Status/asset requests are not logged (to reduce noise)
141+
142+
### Recent fixes
143+
- `e4e332f2` - Print timing for page requests and re-parsing
144+
- `237f113d` - Fix deadlock on Ctrl+C
145+
- `78325e18` - Fix live reload for C files
146+
- `8323a434` - Fix page links returning 404
147+
148+
---
149+
150+
## 6. Markup Generator (`lib/rdoc/generator/markup.rb`)
151+
152+
This is NOT a standalone generator - it is a mixin module (`RDoc::Generator::Markup`) included into `RDoc::CodeObject` and `RDoc::Context::Section`. It provides HTML rendering helpers used by the Darkfish/Aliki generators:
153+
154+
- `description` - renders a CodeObject's comment as HTML
155+
- `formatter` - creates an `RDoc::Markup::ToHtmlCrossref` formatter for cross-reference linking
156+
- `aref_to(target_path)` / `as_href(from_path)` - relative URL generation between pages
157+
- `cvs_url(url, full_path)` - web repository link construction
158+
- `canonical_url` - builds canonical URL using `@store.options.canonical_root`
159+
- `RDoc::MethodAttr#markup_code` - converts token stream to HTML with optional line numbers
160+
- `RDoc::ClassModule#description` - renders from `@comment_location` (multi-file comment support)
161+
162+
The separate `RDoc::Markup::ToMarkdown` class in `lib/rdoc/markup/to_markdown.rb` converts RDoc markup tree back to Markdown text output (headings, lists, etc.) but is not used by any generator for documentation output.
163+
164+
---
165+
166+
## Summary for Talk
167+
168+
### Shipping / Ready
169+
1. **Aliki theme** - Complete modern theme with dark mode, responsive layout, three-column design, advanced search, C/bash syntax highlighting, copy-to-clipboard, SVG icons, Open Graph metadata, customizable footer
170+
2. **Server mode** - Zero-dependency live-reload server using TCPServer, file watching with incremental re-parsing, page caching
171+
3. **GFM improvements** - Tables with inline markdown, strikethrough aligned with GFM spec, blockquote fixes, fenced code blocks, GFM spec comparison tests
172+
173+
### Not present
174+
4. **RBS integration** - No type signature display in generated docs
175+
5. **LLM support** - No `llms.txt` generation or LLM-friendly output
176+
6. **Markdown output generator** - `ToMarkdown` exists as a formatter but is not wired to any documentation output generator

research/llms_txt_adoption.md

Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
# llms.txt Adoption Research (for RubyKaigi talk)
2+
3+
## Key Takeaway
4+
5+
llms.txt has **no measurable impact** on AI citations. As an RDoc maintainer, the trade-off (generation time + code complexity) isn't worth the negligible benefit.
6+
7+
## Adoption Numbers
8+
9+
- **10.13%** of ~300,000 domains analyzed have an llms.txt file (SE Ranking study)
10+
- **~9 out of 10 websites** haven't adopted it
11+
- **0.3%** adoption among the top 1,000 most visited websites globally
12+
- Adoption is flat across traffic tiers: low (9.88%), mid (10.54%), high (8.27%)
13+
14+
## Effectiveness: No Measurable Impact
15+
16+
- **No correlation** between having llms.txt and AI citation frequency
17+
- XGBoost ML model **improved accuracy when llms.txt was removed** as a variable
18+
- Statistical methods used: Spearman correlation, XGBoost regression, SHAP analysis
19+
- From mid-Aug to late Oct 2025: llms.txt pages received **zero visits** from Google-Extended, GPTBot, PerplexityBot, or ClaudeBot
20+
21+
## Platform Support
22+
23+
- **No major LLM provider** currently supports llms.txt (not OpenAI, not Anthropic, not Google)
24+
- Google AI Overviews rely on traditional SEO signals, not llms.txt
25+
- OpenAI/Anthropic have not officially recognized it as a ranking signal
26+
27+
## Sources
28+
29+
- [SE Ranking: "LLMs.txt: Why Brands Rely On It and Why It Doesn't Work"](https://seranking.com/blog/llms-txt/)
30+
- [Search Engine Journal: "LLMs.txt Shows No Clear Effect On AI Citations, Based On 300k Domains"](https://www.searchenginejournal.com/llms-txt-shows-no-clear-effect-on-ai-citations-based-on-300k-domains/561542/)
31+
- [Rankability LLMS.txt Adoption Report](https://www.rankability.com/llms-report/)
32+
- [PPC.land: "llms.txt adoption stalls as major AI platforms ignore proposed standard"](https://ppc.land/llms-txt-adoption-stalls-as-major-ai-platforms-ignore-proposed-standard/)
33+
34+
## Talk Framing
35+
36+
This is a good example of **responsible maintainership** — evaluating hype vs. real impact before adding complexity. The data shows:
37+
38+
1. The spec exists but nobody reads it (zero bot visits)
39+
2. Having it doesn't help (no citation correlation)
40+
3. Major platforms explicitly don't use it
41+
4. Adding it to RDoc would increase generation time and code complexity for all users
42+
43+
**Better alternative**: Focus on making documentation itself better (Markdown migration, better structure, RBS types) — these improve docs for both humans AND AI consumption naturally.

research/markdown_rdoc_coupling.md

Lines changed: 88 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,88 @@
1+
# Why Markdown Improvements Are Slow: The Shared IR Problem
2+
3+
## The One-Line Summary
4+
5+
Markdown doesn't have its own intermediate representation — it converts inline syntax to **RDoc markup strings**, which then get re-parsed by a shared inline parser. Fixing Markdown means not breaking RDoc markup, and vice versa.
6+
7+
## The Architecture
8+
9+
```
10+
Block-level parsing Inline parsing Rendering
11+
───────────────── ────────────── ─────────
12+
Markdown source ──► RDoc::Markdown.parse ──┐
13+
(markdown.kpeg) │
14+
├──► RDoc::Markup::Document ──► InlineParser ──► ToHtml
15+
│ (Paragraph, Heading, (shared!) (shared!)
16+
RDoc markup ──► RDoc::Markup.parse ──┘ List, Verbatim, etc.)
17+
(parser.rb) SAME NODE TYPES
18+
```
19+
20+
## The Core Problem: Two-Phase Parsing
21+
22+
When the Markdown parser encounters `**bold text**`, it does NOT produce a structured "bold" node. Instead:
23+
24+
```ruby
25+
# lib/rdoc/markdown.kpeg, line 303
26+
def strong(text)
27+
if text =~ /\A[a-z\d.\/-]+\z/i
28+
"*#{text}*" # → RDoc word-pair markup string
29+
else
30+
"<b>#{text}</b>" # → RDoc HTML tag string
31+
end
32+
end
33+
```
34+
35+
The Markdown parser outputs **RDoc-formatted strings** inside `Paragraph` nodes. These strings are then re-parsed by `RDoc::Markup::InlineParser` — the same parser that handles RDoc markup's inline formatting.
36+
37+
## Why This Makes Fixes Hard
38+
39+
### 1. Shared InlineParser
40+
Any change to `InlineParser` (lib/rdoc/markup/inline_parser.rb) affects both Markdown and RDoc markup. Adding strikethrough support for Markdown required modifying the shared parser, which could break RDoc markup rendering.
41+
42+
### 2. Escape Rules Are Coupled
43+
Markdown must escape RDoc-special characters when generating strings:
44+
45+
```ruby
46+
# lib/rdoc/markdown.kpeg, line 309
47+
def rdoc_escape(text)
48+
text.gsub(/[*+<\\_]/) {|s| "\\#{s}" }
49+
end
50+
```
51+
52+
If InlineParser's escape handling changes, Markdown's escape generation breaks.
53+
54+
### 3. Shared Formatters
55+
All rendering goes through the same `RDoc::Markup::Formatter` subclasses (ToHtml, ToHtmlCrossref, ToRdoc, etc.). Changes to formatters must work for documents produced by both parsers.
56+
57+
### 4. Cross-References Are Unified
58+
Both formats share the same cross-reference linking system via `regexp_handling` in `ToHtmlCrossref`. Link resolution behavior can't diverge between formats.
59+
60+
## Concrete Examples of Cross-Format Breakage
61+
62+
| Fix | What happened |
63+
|-----|---------------|
64+
| **Strikethrough** (Jan 2026) | Markdown parsed `~~text~~` correctly but InlineParser didn't recognize `~` as delimiter or `<del>` tags. Had to modify shared InlineParser. |
65+
| **Backtick quoting** (Jan 2026) | Extended backtick support — had to work in both Markdown and RDoc markup contexts |
66+
| **Table parsing** (Nov 2024) | Markdown table parser's special behavior affected general parsing |
67+
| **Escape handling** (Aug 2024) | Markdown escapes had to align with InlineParser's escape rules |
68+
69+
## What Would Fix This
70+
71+
The ideal fix: give Markdown its own structured inline representation instead of outputting RDoc strings. But this would require:
72+
- A parallel inline node system or tagged nodes that carry format origin
73+
- Separate formatter paths for Markdown-originated vs RDoc-originated content
74+
- Massive test infrastructure changes
75+
76+
This is a fundamental architectural debt from RDoc's original design, where Markdown was bolted on as a second input format that feeds into a pipeline designed for one format.
77+
78+
## tompng's InlineParser Rewrite (Jan 2026)
79+
80+
The replacement of `AttributeManager` with `InlineParser` (PR #1559) was a step forward — it moved from string-replacing macros to structured inline nodes. But the Markdown parser still outputs RDoc-formatted strings that feed into this shared parser, so the coupling remains.
81+
82+
## Talk Framing
83+
84+
This explains why:
85+
1. **Markdown improvements take so long** — every fix must be tested against both formats
86+
2. **The fix space is constrained** — sometimes the "right" Markdown fix would break RDoc markup
87+
3. **It's a 15+ year architectural decision** — Markdown was added to RDoc around 2011-2012, reusing the existing pipeline rather than building a parallel one
88+
4. **Progress is real but incremental** — GFM spec comparison (#1550), strikethrough, heading anchors, table fixes all chipped away at this

0 commit comments

Comments
 (0)