Skip to content

feat(ipynb): CommonMark ipynb export + image attachment embedding#16

Open
mmcky wants to merge 31 commits intomainfrom
feature/myst-to-ipynb
Open

feat(ipynb): CommonMark ipynb export + image attachment embedding#16
mmcky wants to merge 31 commits intomainfrom
feature/myst-to-ipynb

Conversation

@mmcky
Copy link
Copy Markdown

@mmcky mmcky commented May 7, 2026

Note: This PR replaces the auto-closed #1, which was administratively closed by GitHub when its source branch was renamed myst-to-ipynbfeature/myst-to-ipynb to fit the new fork-maintenance branching scheme (quantecon/README.md, #14). No code changes vs. #1.

Summary

Adds myst build --ipynb support that produces notebooks with plain CommonMark markdown cells — compatible with vanilla Jupyter Notebook, JupyterLab (without jupyterlab-myst), and Google Colab. Also adds an option to embed images as base64 cell attachments for fully self-contained notebooks.

This branch is a QuantEcon carry-patch built on top of upstream PR jupyter-book/mystmd#1882. It is consumed by quantecon/build.sh to produce the integration quantecon branch and is registered in quantecon/features.txt (see #15). It will eventually be opened against jupyter-book/mystmd:main once the upstream maintainers have bandwidth to review.


What's Changed

Bug fixes (built on top of upstream PR jupyter-book#1882)

  • Kernelspec from frontmatterfrontmatter parameter was accepted but never used; now populates metadata.kernelspec and metadata.language_info correctly
  • Language detection — no longer hardcoded to "python"; derives from frontmatter kernelspec
  • Log message — says "Exported IPYNB" instead of "Exported MD"
  • +++ markers — stripped globally from markdown cells (was only matching start of string)
  • Package homepage — corrected URL from myst-to-md to myst-to-ipynb

Additionally fixes two issues discovered during real-world validation against QuantEcon lectures:

  • Epigraph/pull-quote directives silently dropped during export (#7)
  • Cross-references with empty URLs in resolved nodes (#8)

CommonMark serialization mode

New markdown: commonmark option in export config triggers an AST pre-transform that converts MyST-specific nodes to CommonMark equivalents before writeMd serialization.

exports:
  - format: ipynb
    markdown: commonmark

18 directive/role mappings implemented:

MyST Node CommonMark Output
math block $$...$$
inlineMath $...$
admonition > **Title** ... blockquote
exercise **Exercise N** ...
solution **Solution** ... (or dropped via dropSolutions)
proof/theorem/lemma **Theorem N (Title)** ...
tabSet Bold tab titles + content
container (figure) ![alt](url) + italic caption
container (table) Bold caption + GFM table
card / grid / details / aside Appropriate blockquote/bold fallbacks
mystDirective / mystRole Unwrapped children
mystTarget / comment Dropped
code blocks Stripped MyST-specific attributes
Node identifier/label Stripped to prevent (id)= prefixes

Key design: uses { type: 'html' } AST nodes for math to prevent mdast-util-to-markdown from escaping LaTeX special characters.

Image attachment embedding

New images: attachment option embeds images as base64 cell attachments for self-contained notebooks.

exports:
  - format: ipynb
    markdown: commonmark
    images: attachment

Two-phase hybrid architecture:

  1. Phase 1 (myst-cli): collectImageData() walks AST image nodes, resolves filesystem paths, reads & base64-encodes into Record<url, ImageData>
  2. Phase 2 (myst-to-ipynb): embedImagesAsAttachments() post-serialization regex rewrites ![alt](url)![alt](attachment:name) with cell attachments field

Validated on 24 QuantEcon lectures: 50 images across 12 notebooks embedded, 0 external references remaining.

Epigraph / pull-quote serialization (fixes #7)

The myst-to-md container handler only handled figure, table, and code kinds — quote containers (produced by {epigraph} and {pull-quote} directives) were silently dropped.

Fix: Added kind === 'quote' branch that:

  • Finds the blockquote child and serializes it via the default blockquote handler
  • Appends optional caption as > — Attribution line

Cross-reference URL fallback (fixes #8)

When MyST resolves same-page cross-references (e.g., {ref}, {eq}), the addChildrenFromTargetNode transform sets html_id on the node but doesn't always propagate identifier or label. The myst-to-md serializer was only checking urlSource → #label → #identifier, resulting in empty [text]() links.

Fix: Extended the URL fallback chain to: urlSource → #label → #identifier → #html_id → url → ''

Note: A third case was identified where {ref} roles with multi-line bodies fail to parse entirely. This is an upstream parser limitation filed as jupyter-book/mystmd#2724.

Exercise / solution code cell lifting

Exercises and solutions can contain code-cell nodes that should become executable notebook cells. These were being serialized as markdown instead of being lifted to top-level code cells.

Fix: liftCodeCellsFromGatedNodes() pre-processes the AST to extract code cells from exercise/solution containers, splitting surrounding markdown content into separate cells. Handles both direct gated nodes and blocks wrapping gated nodes.

Documentation

  • New docs/creating-notebooks.md — comprehensive ipynb export guide
  • Updated docs/documents-exports.md — added ipynb to format table
  • Updated docs/frontmatter.md — added ipynb to format values
  • Updated docs/myst.yml — TOC entry for creating-notebooks
  • Updated packages/myst-to-ipynb/README.md — features and usage

Test suite — 154 tests passing

myst-to-ipynb (55 tests)

File Tests Coverage
basic.yml 13 Core: styles, headings, code, links, images, block markers
frontmatter.yml 4 Kernelspec: Python, Julia, Python3, R
commonmark.yml 18 All directive/role mappings, identifier stripping
attachments.yml 5 Integration: single/multi image, dedup, no-match, no-data
attachments.spec.ts 7 Unit: basename, embed/skip/dedup logic
run.spec.ts 8 Exercise/solution code cell lifting

myst-to-md (99 tests, 6 new)

File New Tests Coverage
directives.yml 3 Epigraph, epigraph with attribution, pull-quote
references.yml 3 URL fallback for remote refs, html_id heading, html_id equation

Files Changed (36 files, +3122 / -11)

New files

  • packages/myst-to-ipynb/src/commonmark.ts — AST pre-transform (~500 lines, 18 node types)
  • packages/myst-to-ipynb/src/attachments.ts — Post-serialization image embedding (~100 lines)
  • packages/myst-to-ipynb/src/types.ts — Shared ImageData interface
  • packages/myst-to-ipynb/tests/commonmark.yml — 18 CommonMark mode tests
  • packages/myst-to-ipynb/tests/attachments.yml — 5 attachment integration tests
  • packages/myst-to-ipynb/tests/attachments.spec.ts — 7 attachment unit tests
  • docs/creating-notebooks.md — New documentation page

Modified files

  • packages/myst-to-ipynb/src/index.tsIpynbOptions, attachment wiring, empty cell filter, exercise lifting
  • packages/myst-to-ipynb/tests/run.spec.ts — Test runner supporting frontmatter + options in YAML
  • packages/myst-to-ipynb/tests/basic.yml — Expanded to 13 tests
  • packages/myst-to-ipynb/tests/frontmatter.yml — 4 kernelspec tests
  • packages/myst-to-ipynb/package.json — Fixed homepage URL
  • packages/myst-to-ipynb/README.md — Updated documentation
  • packages/myst-cli/src/build/ipynb/index.tscollectImageData(), options passthrough, log fix
  • packages/myst-to-md/src/directives.ts — Epigraph/pull-quote container handler
  • packages/myst-to-md/src/references.ts — Cross-reference URL fallback chain with html_id
  • packages/myst-to-md/tests/directives.yml — 3 epigraph/pull-quote tests
  • packages/myst-to-md/tests/references.yml — 3 cross-reference fallback tests
  • docs/documents-exports.md — ipynb in format table
  • docs/frontmatter.md — ipynb in format values
  • docs/myst.yml — TOC entry

Status

  • Upstream: not yet submitted; will open against jupyter-book/mystmd:main when the upstream team has bandwidth.
  • Fork integration: consumed by quantecon/build.sh once registered in features.txt (#15). The combined build is published on the quantecon branch.
  • Replaces: #1 (auto-closed during branch rename).

Tracking issue: QuantEcon/mystmd#2 (full PLAN)
Related: QuantEcon/meta#292 · jupyter-book/mystmd#1882
Real-world validation: QuantEcon/lecture-python-programming.myst#363 — 24 lectures, 0 MyST syntax leaks

agoose77 and others added 30 commits February 27, 2026 14:36
- Use frontmatter kernelspec to populate notebook metadata (name,
  display_name, language) instead of ignoring the frontmatter parameter
- Derive language_info.name from frontmatter instead of hardcoding 'python'
- Strip leading +++ block markers from markdown cells (MyST-specific
  separators that have no meaning in notebooks)
- Fix log message from 'Exported MD' to 'Exported IPYNB'
- Fix package.json homepage URL to point to myst-to-ipynb (not myst-to-md)

Ref: QuantEcon/meta#292
Add AST pre-transform that converts MyST-specific nodes to CommonMark
equivalents before markdown serialization, producing notebooks compatible
with vanilla Jupyter Notebook and Google Colab.

New option: markdown: 'commonmark' (default: 'myst')

Transforms implemented:
- math block directive to $$ delimiters
- inline math role to $ delimiters
- admonition to blockquote with bold title
- exercise to bold header with content
- solution to bold header with content (or dropped via option)
- proof/theorem/lemma to bold header with content
- tab-set to bold tab titles with tab content
- figure to image + italic caption
- table container to bold caption + table
- card/grid to unwrapped content
- details to blockquote with summary title
- aside/sidebar to blockquote
- mystDirective/mystRole to unwrapped content or plain text

Uses html-type AST nodes for math content to prevent the markdown
serializer from escaping LaTeX special characters (underscores, etc).

CLI wiring: reads 'markdown: commonmark' from export config in myst.yml.

Ref: QuantEcon/meta#292
- Rewrite basic.yml with 13 proper YAML-object test cases (was 2 active)
- Add frontmatter.yml with 4 kernelspec/metadata test cases
- Add commonmark.yml with 13 CommonMark-mode test cases covering:
  inline math, math blocks, admonitions, exercises, theorems,
  tabSets, solutions (kept/dropped), underscore preservation
- Update run.spec.ts to support frontmatter and options fields in
  YAML test cases, enabling CommonMark and metadata tests

Ref: QuantEcon/meta#292
…er empty cells

Real-world validation with QuantEcon lecture content revealed:
- myst-to-md labelWrapper was adding (identifier)= prefixes to headings,
  paragraphs, blockquotes, and lists with identifier/label properties
- mystTarget nodes need to be dropped in CommonMark mode
- comment nodes (% syntax) need to be dropped in CommonMark mode
- code blocks with extra MyST attributes rendered as code-block directives
- +++ block markers appearing mid-cell (not just leading)
- Empty markdown cells from dropped nodes should be filtered out

Changes:
- commonmark.ts: strip identifier/label from all transformed children,
  add mystTarget and comment handlers, add code handler
- index.ts: filter empty markdown cells, fix stripBlockMarkers /gm regex
- commonmark.yml: add 5 new tests, update solution-dropped test
Standalone {image} directives with class/width/align properties were
being serialized as ```{image} directives by myst-to-md. Added
transformImage handler that strips directive-specific properties so
they render as plain ![alt](url) markdown syntax.

Found during full-project validation against lecture-python-programming.myst
(24 lectures, all clean after this fix).
Add 'images: attachment' option that embeds local images as base64
cell attachments in exported notebooks, producing self-contained
.ipynb files that don't depend on external image files.

Architecture (two-phase hybrid):
- Phase 1 (myst-cli): collectImageData() walks AST image nodes,
  resolves filesystem paths, reads files, and base64-encodes them
- Phase 2 (myst-to-ipynb): embedImagesAsAttachments() rewrites
  serialized markdown image refs to attachment: references

Usage in frontmatter:
  exports:
    - format: ipynb
      images: attachment

New files:
- packages/myst-to-ipynb/src/attachments.ts
- packages/myst-to-ipynb/tests/attachments.spec.ts (7 tests)
- packages/myst-to-ipynb/tests/attachments.yml (5 tests)

47/47 tests passing.
- Fix prettier formatting in commonmark.ts, index.ts, and myst-cli ipynb/index.ts
- Add docs/creating-notebooks.md with full ipynb export documentation
  (CommonMark markdown, image attachments, export options table)
- Add ipynb to export format table in docs/documents-exports.md
- Add --ipynb CLI example to docs/documents-exports.md
- Add ipynb to format list in docs/frontmatter.md
- Add creating-notebooks.md to docs/myst.yml TOC
- Update packages/myst-to-ipynb/README.md with features and usage
Move ImageData interface to shared types.ts so attachments.ts and
index.ts no longer import from each other. Fixes madge lint:circular
check.
Address Copilot review comments:
- Strip leading '/' from image URLs before path.join in collectImageData()
  so project-root URLs like '/_static/img/foo.png' resolve correctly.
- Fix misleading 'reverse order' comment in embedImagesAsAttachments().
Include nodes that have been resolved by includeDirectiveTransform
retain type 'include', causing myst-to-md to serialize them back as
```{include} directive syntax. Add an 'include' case to
transformNode() that unwraps resolved children into the parent,
so the included content (e.g. admonitions) is emitted as plain
CommonMark in notebook cells.
…b export

When gated syntax ({exercise-start}/{exercise-end}, {solution-start}/
{solution-end}) is used, joinGatesTransform nests all content between
the gates, including {code-cell} blocks, as children of the
exercise/solution node. During ipynb export these were absorbed into a
single markdown cell, silently dropping executable code cells.

Add liftCodeCellsFromGatedNodes() preprocessing step in writeIpynb that
detects exercise/solution nodes containing code-cell blocks and splits
them into alternating top-level markdown and code cells, preserving
document order. When dropSolutions is true, solution nodes are left
intact for transformToCommonMark to drop entirely.

Also fix stripBlockMarkers regex to handle +++ at end-of-string without
trailing newline, preventing empty markdown cells.

Closes #5
The previous fix (05bdc24) assumed exercise/solution nodes would be the
sole child of a block. In reality, blockNestingTransform groups all
consecutive non-block siblings into a single wrapper block, so the AST is:

  root > block { para, exercise {...}, solution {..., block{code}}, para }

The fix now scans inside each block's children for exercise/solution
nodes containing code-cell blocks, and splits the block accordingly.
Extracted helper functions for clarity:

- isGatedNodeWithCodeCells: identifies target nodes
- liftFromExerciseSolution: splits a single node's children
- splitBlockWithGatedNodes: processes a block with mixed children

Added tests for the shared-block structure (exercise + solution + other
content in the same block) and for dropSolutions with shared blocks.

Refs #5, #6
…k/ipynb export

The container handler in myst-to-md only handled figure, table, and code
kinds. Containers with kind 'quote' (produced by the epigraph, pull-quote,
and blockquote directives) fell through and returned empty string, silently
dropping all content during ipynb export.

Add a 'quote' branch that serializes the blockquote child as a standard
markdown blockquote, with optional attribution rendered as an em-dash line.

Closes #7
…back

Add MYST_DEBUG_XREF env var to dump full AST node details when a
crossReference resolves with an empty URL during CommonMark serialization.
This helps diagnose #8 where some {ref} roles produce
[text]() links in ipynb export.

Also add a defensive fallback to use node.url (set by
MultiPageReferenceResolver for cross-page refs) when urlSource, label,
and identifier are all missing. This prevents empty URLs for resolved
remote references without changing behaviour for any existing case.
…port

The reference resolver (addChildrenFromTargetNode) marks crossReferences as
resolved and sets html_id + kind, but for same-page targets the identifier
and label fields end up undefined. The CommonMark serializer then generates
empty URLs like [Section 7]().

Add html_id to the URL fallback chain:
  urlSource → #label → #identifier → #html_id → url → ''

This fixes all 23 unique empty-URL crossReferences found in the QuantEcon
lectures (headings, equations, exercises, code blocks, paragraphs).

Closes #8
The ad-hoc debug logging served its purpose for diagnosing the html_id
fallback issue and is no longer needed. A system-wide debug infrastructure
should be designed separately.
- Update regex to handle escaped brackets in alt text and escaped
  parentheses in URLs produced by mdast-util-to-markdown
- Unescape URLs before looking up in imageData dictionary
- Refactor to single-pass replacement using md.replace(regex, callback)
- Add tests for escaped parentheses in URLs and escaped brackets in alt text
Copilot AI review requested due to automatic review settings May 7, 2026 00:35
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds first-class Jupyter Notebook (.ipynb) export support to myst build, including an optional CommonMark serialization mode for broad notebook compatibility and an optional mode to embed images as base64 cell attachments.

Changes:

  • Introduces a new myst-to-ipynb package implementing MyST→ipynb conversion, plus CommonMark AST pre-transform and image-attachment embedding.
  • Wires ipynb into myst-cli (CLI flag, export dispatch, allowed extensions) and myst-frontmatter (export format + .ipynb extension).
  • Fixes/extends myst-to-md serialization for quote containers and cross-reference URL fallback, with added tests and new documentation pages.

Reviewed changes

Copilot reviewed 36 out of 36 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
packages/myst-to-md/tests/references.yml Adds regression tests for cross-reference URL fallback behavior.
packages/myst-to-md/tests/directives.yml Adds tests for epigraph/pull-quote (quote container) markdown serialization.
packages/myst-to-md/src/references.ts Extends cross-reference URL fallback chain (incl. html_id and url).
packages/myst-to-md/src/directives.ts Adds support for container.kind === 'quote' serialization/validation.
packages/myst-to-ipynb/tsconfig.json Adds TypeScript configuration for the new package build.
packages/myst-to-ipynb/tests/run.spec.ts Adds YAML-driven test runner for ipynb serialization.
packages/myst-to-ipynb/tests/frontmatter.yml Adds kernelspec/frontmatter → notebook metadata tests.
packages/myst-to-ipynb/tests/example.ipynb Adds a fixture notebook example for testing/reference.
packages/myst-to-ipynb/tests/commonmark.yml Adds extensive CommonMark-mode conversion tests.
packages/myst-to-ipynb/tests/basic.yml Adds basic ipynb conversion tests (markdown + code cells).
packages/myst-to-ipynb/tests/attachments.yml Adds integration tests for image attachment embedding.
packages/myst-to-ipynb/tests/attachments.spec.ts Adds unit tests for attachment rewriting/dedup behavior.
packages/myst-to-ipynb/src/types.ts Introduces shared ImageData type for attachment embedding.
packages/myst-to-ipynb/src/index.ts Implements ipynb writer, CommonMark option wiring, cell filtering, and code-cell lifting.
packages/myst-to-ipynb/src/commonmark.ts Adds AST pre-transform converting MyST nodes to CommonMark equivalents.
packages/myst-to-ipynb/src/attachments.ts Adds markdown post-processing to embed images as Jupyter cell attachments.
packages/myst-to-ipynb/README.md Documents ipynb export features and configuration options.
packages/myst-to-ipynb/package.json Adds new package manifest and dependencies.
packages/myst-to-ipynb/CHANGELOG.md Initializes changelog file for the new package.
packages/myst-to-ipynb/.eslintrc.cjs Adds ESLint config for the new package.
packages/myst-frontmatter/src/exports/validators.ts Registers .ipynb extension → ipynb export format.
packages/myst-frontmatter/src/exports/types.ts Adds ipynb to ExportFormats.
packages/myst-cli/src/cli/options.ts Adds --ipynb CLI option helper.
packages/myst-cli/src/cli/build.ts Wires --ipynb into the myst build command options.
packages/myst-cli/src/build/utils/localArticleExport.ts Routes ExportFormats.ipynb to the ipynb export runner.
packages/myst-cli/src/build/utils/collectExportOptions.ts Allows .ipynb as an output extension for ipynb exports.
packages/myst-cli/src/build/ipynb/index.ts Implements the ipynb export runner and image data collection for attachments.
packages/myst-cli/src/build/build.ts Adds ipynb to requested/allowed export format selection logic.
packages/myst-cli/src/build/build.spec.ts Updates build-format selection tests to include ipynb.
packages/myst-cli/package.json Adds myst-to-ipynb dependency to CLI package.
docs/myst.yml Adds the new notebook export doc page to the docs TOC.
docs/frontmatter.md Documents ipynb as an allowed export format value.
docs/documents-exports.md Updates export overview to include ipynb and --ipynb.
docs/creating-notebooks.md Adds end-user documentation for ipynb export, CommonMark mode, and attachments.
.changeset/witty-tigers-hunt.md Adds changeset for the new ipynb export format plumbing.
.changeset/config.json Links versioning between myst-to-md and myst-to-ipynb for releases.
Comments suppressed due to low confidence (1)

packages/myst-to-md/src/directives.ts:190

  • Quote containers that lack a blockquote child currently serialize to an empty string in container() (silently dropping content). containerValidator() was updated to accept kind === 'quote', but it doesn’t validate the presence of a blockquote child like it does for figure/table. Consider adding a validation error/warning for missing blockquote so broken ASTs don’t fail silently.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread packages/myst-to-md/src/references.ts
Comment thread packages/myst-to-ipynb/src/attachments.ts
Comment thread packages/myst-to-ipynb/src/commonmark.ts Outdated
- myst-to-md/references: treat empty-string urlSource/url as missing so the
  cross-ref fallback chain reaches #label / #identifier / #html_id (||
  instead of ??).
- myst-to-ipynb/commonmark: drop unused `selectAll` import.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cross-references resolve with empty URLs in ipynb export Epigraph directive content silently dropped in ipynb export

4 participants