Skip to content

Commit 91cc0ce

Browse files
committed
Add dastdown package: lossless textual serialization for dast
A markdown-flavored format that round-trips through dast, enabling programmatic edits to structured text via plain string manipulation (search/replace, regex, diff/merge) instead of walking the AST.
1 parent 611a3a8 commit 91cc0ce

21 files changed

Lines changed: 3820 additions & 0 deletions

packages/dastdown/.eslintrc.js

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
module.exports = require('../../.eslintrc.js');

packages/dastdown/LICENSE.md

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
MIT License
2+
3+
Copyright (c) [year] [fullname]
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy
6+
of this software and associated documentation files (the "Software"), to deal
7+
in the Software without restriction, including without limitation the rights
8+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
copies of the Software, and to permit persons to whom the Software is
10+
furnished to do so, subject to the following conditions:
11+
12+
The above copyright notice and this permission notice shall be included in all
13+
copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
SOFTWARE.

packages/dastdown/README.md

Lines changed: 195 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,195 @@
1+
# `datocms-structured-text-dastdown`
2+
3+
Lossless textual serialization for [DatoCMS Structured Text (`dast`)](https://www.datocms.com/docs/structured-text/dast) documents, with a parser and a serializer.
4+
5+
`dastdown` is a markdown-flavored format that round-trips through `dast` without losing information. It exists so you can do programmatic edits to structured text via plain string manipulation (search/replace, regex, diff/merge) instead of walking the AST.
6+
7+
The full grammar is documented in [`SPEC.md`](./SPEC.md).
8+
9+
## When to use it
10+
11+
Best for **text-heavy content** (articles, docs, book chapters) where edits are textual and may cross node boundaries: bulk find/replace, regex refactors, LLM rewrites, meaningful `git diff`s.
12+
13+
Not for landing pages made of opaque blocks. Referenced blocks stay opaque — `dastdown` only lets you move, duplicate, or remove them.
14+
15+
## Installation
16+
17+
```sh
18+
npm install datocms-structured-text-dastdown
19+
```
20+
21+
## At a glance
22+
23+
```js
24+
import { parse, serialize } from 'datocms-structured-text-dastdown';
25+
26+
// 1. fetch the record and turn its structured text field into dastdown
27+
const record = await client.items.find('article-id');
28+
const text = serialize(record.body);
29+
30+
// → # Title
31+
//
32+
// A paragraph about Acme Corp with **strong** text and a [link](https://example.com).
33+
//
34+
// > A quote.
35+
// {attribution="Anon"}
36+
//
37+
// <block id="1234"/>
38+
39+
// 2. edit as plain text — search/replace, regex, diff/merge, LLM rewrite, …
40+
const edited = text
41+
.replace(/Acme Corp/g, '**Acme Inc.**') // rename + bold every occurrence
42+
.replace(/^# (.+)$/m, '# $1 (2026 edition)'); // tweak the H1
43+
44+
// 3. parse back to dast and push the update
45+
await client.items.update('article-id', { body: parse(edited) });
46+
```
47+
48+
## Format cheat-sheet
49+
50+
| Construct | Syntax |
51+
| --------------- | ----------------------------------------- |
52+
| Heading | `# H1``###### H6` |
53+
| Paragraph style | `{style="lead"}` on the line after |
54+
| Bullet list | `- item` |
55+
| Numbered list | `1. item` (numbers are not semantic) |
56+
| Blockquote | `> line` plus `{attribution="…"}` trailer |
57+
| Code block | ` ```lang `` ``` ` (`{highlight=…}`) |
58+
| Thematic break | `---` |
59+
| Block reference | `<block id="…"/>` (root-level only) |
60+
| Strong | `**text**` |
61+
| Emphasis | `*text*` |
62+
| Code | `` `text` `` |
63+
| Strikethrough | `~~text~~` |
64+
| Highlight | `==text==` |
65+
| Underline | `++text++` |
66+
| Custom mark | `<m k="footnote-ref">text</m>` |
67+
| Link | `[label](url){meta="…"}` |
68+
| Item link | `[label](dato:item/123){meta="…"}` |
69+
| Inline item | `<inlineItem id="…"/>` |
70+
| Inline block | `<inlineBlock id="…"/>` |
71+
| Hard line break | `<br/>` inside a span |
72+
73+
Marks nest in canonical outer-to-inner order: `highlight → strikethrough → underline → strong → emphasis → code`, with custom marks innermost in alphabetical order.
74+
75+
## API
76+
77+
### `parse(input)`
78+
79+
```ts
80+
parse(input: string | null | undefined): Document | null
81+
```
82+
83+
Parses a `dastdown` source string into a `dast` document.
84+
85+
- `null` / `undefined``null` (so the return type matches `StructuredTextFieldValue` from `@datocms/cma-client` exactly).
86+
- `''` or whitespace-only string → a document with a single empty paragraph.
87+
- Otherwise → the parsed document, validated against the `dast` schema.
88+
89+
`block` / `inlineBlock` / `inlineItem` / `itemLink` references always come back with their `item` field as a string id, since `dastdown` only encodes ids on the wire.
90+
91+
If the input is malformed, `parse` throws a `DastdownParseError` carrying `line` and `column` info:
92+
93+
```js
94+
import { parse, DastdownParseError } from 'datocms-structured-text-dastdown';
95+
96+
try {
97+
parse('####### too many hashes');
98+
} catch (err) {
99+
if (err instanceof DastdownParseError) {
100+
console.log(err.line, err.column, err.message);
101+
}
102+
}
103+
```
104+
105+
### `serialize(document)`
106+
107+
```ts
108+
type SerializableBlockId = string | { id: string };
109+
110+
serialize<
111+
B extends SerializableBlockId = string,
112+
IB extends SerializableBlockId = string
113+
>(document: Document<B, IB> | null | undefined): string
114+
```
115+
116+
Serializes a `dast` document into a `dastdown` string.
117+
118+
- `null` / `undefined``''`.
119+
- A document whose only content is an empty paragraph → `''` (so `serialize(parse(''))` round-trips).
120+
- Any other invalid document → throws.
121+
122+
The signature accepts both the plain field-value shape (block items as string ids) and the `?nested=true` response shape (block items as full record-like objects). When `item` is an object, its `.id` is used.
123+
124+
```js
125+
import { serialize } from 'datocms-structured-text-dastdown';
126+
127+
// works with plain ids:
128+
serialize({
129+
schema: 'dast',
130+
document: {
131+
type: 'root',
132+
children: [{ type: 'block', item: 'abc-123' }],
133+
},
134+
});
135+
// → '<block id="abc-123"/>\n'
136+
137+
// works with nested-response items too — only `id` is read:
138+
serialize({
139+
schema: 'dast',
140+
document: {
141+
type: 'root',
142+
children: [
143+
{ type: 'block', item: { id: 'abc-123' /* ...rest of Item */ } },
144+
],
145+
},
146+
});
147+
// → '<block id="abc-123"/>\n'
148+
```
149+
150+
The request shape (where new blocks may not yet have an id) is intentionally **not** supported — there is no way to render a reference to a block that has no id.
151+
152+
### `canonicalize(document)`
153+
154+
```ts
155+
canonicalize<B, IB>(document: Document<B, IB>): Document<B, IB>
156+
```
157+
158+
Returns a structurally normalized copy of the document. It does not touch block items; only spans and marks are rewritten:
159+
160+
- Adjacent spans with identical mark sets are coalesced.
161+
- Empty spans are dropped (except when removing them would leave a parent paragraph/heading/link with no children).
162+
- Mark order is sorted into the canonical outer-to-inner sequence; custom marks are placed innermost in alphabetical order.
163+
164+
Round-trip property:
165+
166+
```js
167+
import {
168+
parse,
169+
serialize,
170+
canonicalize,
171+
} from 'datocms-structured-text-dastdown';
172+
173+
parse(serialize(d)); // ≡ canonicalize(d)
174+
```
175+
176+
### `DastdownParseError`
177+
178+
Thrown by `parse` on malformed input. Exposes `line` (1-indexed) and `column` (1-indexed) properties.
179+
180+
## Round-trip semantics
181+
182+
| `text` | `parse(text)` | `serialize(parse(text))` |
183+
| -------------------- | ------------------------ | ------------------------ |
184+
| `null` / `undefined` | `null` | `''` |
185+
| `''` / whitespace | empty paragraph document | `''` |
186+
| any valid `dastdown` | a `dast` document | the same text |
187+
188+
For two documents `d1` and `d2`:
189+
190+
- `parse(serialize(d)) ≡ canonicalize(d)`
191+
- `serialize(parse(text)) ≡ text` after one canonicalization pass
192+
193+
## Why not CommonMark?
194+
195+
`dastdown` extends markdown with constructs that vanilla CommonMark cannot represent: `==highlight==`, `++underline++`, `{attribute="trailer"}` on blocks, and self-closing XML tags for opaque references. It is not designed to be rendered by a generic markdown pipeline; for that, parse to `dast` and use one of the `to-html-string` / `to-dom-nodes` renderers.

0 commit comments

Comments
 (0)