Skip to content

Commit 294158e

Browse files
JohnMcLearclaude
andauthored
harden: reject USER_CHANGES inserts without an author attribute (#7773)
* harden: reject USER_CHANGES inserts without an author attribute Insert ops MUST carry the author attribute reference so that pad.atext.text and pad.atext.attribs stay in lock-step. An accepted insert with empty attribs would grow text without contributing matching attribute markers, leaving the stored AText in a state where the two iterables disagree on length when reconstructed. Downstream clients then fail reconciliation in ace2_inner.ts:setDocAText with 'mismatch error setting raw text in setDocAText' on every subsequent pad load — making the affected pad effectively unloadable until manually repaired. This commit adds a single defensive check inside the existing per-op validation loop in handleUserChanges: when an op is a '+' (insert) and its attribs string doesn't yield an 'author' entry via AttributeMap.fromString, reject with badChangeset. The check piggybacks on the wireApool that was already constructed for the prior author-match validation, so no extra parsing. Test fixtures in messages.ts were updated to send proper author-attributed inserts plus the matching apool (mirroring what the JS web client always does). A new regression test 'insert without author attribute is rejected' locks in the new behaviour. * harden: also close the HTTP API / plugin path via stable system author The first commit closed the socket.io USER_CHANGES hole. This commit closes the parallel path through Pad.spliceText (used by API.setText, API.appendText, the import flow, and plugins like ep_post_data) where an unattributed insert would otherwise produce a malformed AText. Approach: instead of REJECTING (which would break ep_post_data and many existing tests that call setText/appendText without an authorId), substitute a stable system author when none is provided. The resulting changeset is properly attributed, the AText stays well-formed, and existing callers continue to work unchanged. Plugins that want named author attribution should still pass an explicit authorId (e.g., one allocated via authorManager.createAuthor). Pad.SYSTEM_AUTHOR_ID = 'a.etherpad-system' — a stable identifier that appears in the pad's attribute pool when internal callers (HTTP API, plugins, server-side imports) write text without naming an author. The existing 'attribute changes by another author' protections still apply to socket.io USER_CHANGES paths — a remote client can't impersonate the system author for inserts (their session author check fires first). Test: - Pad.ts spec adds 'spliceText with empty authorId attributes to the system author' — verifies pad text lands AND the pool contains the system-author binding. Existing tests that pass an authorId are unaffected. * harden: reject USER_CHANGES that would strand the trailing newline Etherpad's pad text always ends with '\n'. _handleUserChanges previously appended a separate `nlChangeset` correction revision whenever the applied USER_CHANGES left the pad without a trailing '\n'. The stored pad ended up well-formed, but the FIRST NEW_CHANGES broadcast (the malformed user revision itself) reached browsers BEFORE the correction did, and applyToAttribution's MergingOpAssembler aborts with "line assembler not finished" on a non-'\n'-terminated doc — the watching browser session then dropped the changeset and any subsequent edits silently no-op'd until the user reloaded. Replace the silent auto-correction with an explicit reject. Compute `applyToText(rebasedChangeset, prevText)` before appendRevision; if the result doesn't end with '\n', throw -> badChangeset disconnect. Clients must emit USER_CHANGES whose application preserves the invariant — this matches what the JS web client already does and forces non-JS clients (etherpad-pad, third-party integrations) to surface their bugs in their own logs instead of stranding the trailing newline in pad revision history. Also fixes a latent retransmission-detection bug surfaced by this PR's author-attrib changes: moveOpsToNewPool renumbers `*N` references to whatever slot the pad pool assigns, which can differ from the wire form's slot. Comparing the raw client wire against the stored revision form (`changeset === c`) then misses legitimate retransmissions and the same edit gets duplicated. Snapshot the post-pool-mapping form (`canonicalCs`) and compare that against `c` instead. Backend test additions: - 'changeset that would strand the trailing \\n is rejected' covers the new rejection path with wire `Z:6>1|1=6*0+1$X` against `hello\n`. - handleMessageSecurity test now captures roSocket's own authorId and uses it in the apool sent through roSocket, because the prior PR commit made `*0` referencing the wrong author a hard reject. All 1130 backend tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: bump etherpad-cli-client to ^4.0.3 4.0.3 sends author-attributed inserts and preserves the trailing newline, complying with this PR's tightened USER_CHANGES validation. The rate-limit CI workflow drives the test pad via this client, so without the bump the new server-side rejects fire on the very first \`pad.append()\` and the rate-limit disconnect never gets a chance to arrive — testlimits.sh exits 0 instead of 1 and the rate-limit job fails with "ratelimit was not triggered when sending every 99 ms". Refs ether/etherpad-cli-client#131 * harden: reject USER_CHANGES that name the reserved system author The session-author equality check already rejects wire `*N` that names a different real user, but `a.etherpad-system` is server- internal — it's only used when spliceText / setText is called with an empty authorId from HTTP API or plugin paths. A wire op that names it is either a confused client or an attempt to launder edits through a reserved attribution slot. Refuse. Backend test 'insert claiming the reserved system author is rejected' locks in the new behavior with wire `Z:1>5*0+5$hello` plus an apool that maps slot 0 to `a.etherpad-system`. All 1131 backend tests pass. Inline literal `'a.etherpad-system'` rather than importing the constant from `Pad.SYSTEM_AUTHOR_ID` — `require('../db/Pad')` at PadMessageHandler module scope returned a partially-initialized class via the padManager circular path, leaving the static-field access undefined at runtime. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 2a865a3 commit 294158e

6 files changed

Lines changed: 209 additions & 39 deletions

File tree

pnpm-lock.yaml

Lines changed: 14 additions & 13 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

src/node/db/Pad.ts

Lines changed: 23 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -93,6 +93,17 @@ exports.cleanText = (txt:string): string => txt.replace(/\r\n/g, '\n')
9393
.replace(/\t/g, ' ');
9494

9595
class Pad {
96+
/**
97+
* Stable author id used to attribute inserts coming from internal callers
98+
* (HTTP API setText/appendText with no authorId, plugins like ep_post_data,
99+
* server-side import flows). Without ANY author attribute, pad.atext.text
100+
* and pad.atext.attribs drift out of sync — clients then fail
101+
* setDocAText reconciliation in ace2_inner.ts when loading the pad. Using
102+
* a fixed system author keeps the AText well-formed without requiring
103+
* every plugin to allocate its own author up-front.
104+
*/
105+
static readonly SYSTEM_AUTHOR_ID = 'a.etherpad-system';
106+
96107
private db: Database;
97108
private atext: AText;
98109
private pool: AttributePool;
@@ -415,9 +426,19 @@ class Pad {
415426
(!ins && start > 0 && orig[start - 1] === '\n');
416427
if (!willEndWithNewline) ins += '\n';
417428
if (ndel === 0 && ins.length === 0) return;
418-
const attribs = authorId ? [['author', authorId] as [string, string]] : undefined;
429+
// An unattributed insert (empty authorId + non-empty ins) would produce
430+
// an AText where `text` and `attribs` disagree on length — clients fail
431+
// setDocAText reconciliation on load. Backward-compat fix: if the caller
432+
// didn't provide an authorId, attribute the insert to a stable system
433+
// author. ep_post_data and other plugins that want named attribution
434+
// should still pass an explicit authorId.
435+
const effectiveAuthorId =
436+
(ins.length > 0 && !authorId) ? Pad.SYSTEM_AUTHOR_ID : authorId;
437+
const attribs = effectiveAuthorId
438+
? [['author', effectiveAuthorId] as [string, string]]
439+
: undefined;
419440
const changeset = makeSplice(orig, start, ndel, ins, attribs, this.pool);
420-
await this.appendRevision(changeset, authorId);
441+
await this.appendRevision(changeset, effectiveAuthorId);
421442
}
422443

423444
/**

src/node/handler/PadMessageHandler.ts

Lines changed: 56 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ import {MapArrayType} from "../types/MapType";
2424
import AttributeMap from '../../static/js/AttributeMap';
2525
const padManager = require('../db/PadManager');
2626
const padDeletionManager = require('../db/PadDeletionManager');
27-
import {checkRep, cloneAText, compose, deserializeOps, follow, identity, inverse, makeAText, makeSplice, moveOpsToNewPool, mutateAttributionLines, mutateTextLines, oldLen, prepareForWire, splitAttributionLines, splitTextLines, unpack} from '../../static/js/Changeset';
27+
import {applyToText, checkRep, cloneAText, compose, deserializeOps, follow, identity, inverse, makeAText, moveOpsToNewPool, mutateAttributionLines, mutateTextLines, oldLen, prepareForWire, splitAttributionLines, splitTextLines, unpack} from '../../static/js/Changeset';
2828
import ChatMessage from '../../static/js/ChatMessage';
2929
import AttributePool from '../../static/js/AttributePool';
3030
const AttributeManager = require('../../static/js/AttributeManager');
@@ -875,12 +875,49 @@ const handleUserChanges = async (socket:any, message: {
875875
`${opAuthorId} in changeset ${changeset}`);
876876
}
877877
}
878+
// Reject '+' ops that do not carry the author attribute. The standard
879+
// JS client always tags inserts with the author; rejecting unattributed
880+
// inserts here keeps pad.atext.text and pad.atext.attribs in lock-step.
881+
// Without this check, an insert op with empty attribs grows the text
882+
// without contributing matching markers to the attribs string, leaving
883+
// the stored AText in a state where the two iterables disagree on
884+
// length — applyToAText then desyncs and breaks reconciliation in
885+
// every client that later loads the pad.
886+
if (op.opcode === '+' && !opAuthorId) {
887+
throw new Error(`Author ${thisSession.author} submitted an insert without an ` +
888+
`author attribute in changeset ${changeset}`);
889+
}
890+
// Defense-in-depth: reject any wire-borne `*N` that resolves to the
891+
// system author. The session-author equality check above already
892+
// catches the case where `*N` claims a different real user as
893+
// author, but `Pad.SYSTEM_AUTHOR_ID` is server-internal — it's
894+
// only used when `spliceText` / `setText` are called with an empty
895+
// authorId from HTTP API or plugin paths. No legitimate
896+
// socket.io session ever writes as the system author, so a wire
897+
// op that names it is either a confused client or an attempt to
898+
// launder writes through a reserved attribution slot. Either way,
899+
// refuse.
900+
// Hardcoded mirror of `Pad.SYSTEM_AUTHOR_ID` from src/node/db/Pad.ts.
901+
// A `const {Pad} = require('../db/Pad')` at module scope returned
902+
// a partially-initialized class here (circular load via padManager),
903+
// so the static-field access ended up undefined and short-circuited
904+
// the check at runtime. Inline literal is the simplest fix.
905+
if (opAuthorId === 'a.etherpad-system') {
906+
throw new Error(`Author ${thisSession.author} attempted to submit changes as the ` +
907+
`reserved system author ${opAuthorId} in changeset ${changeset}`);
908+
}
878909
}
879910

880911
// ex. adoptChangesetAttribs
881912

882913
// Afaik, it copies the new attributes from the changeset, to the global Attribute Pool
883914
let rebasedChangeset = moveOpsToNewPool(changeset, wireApool, pad.pool);
915+
// Snapshot the post-pool-mapping form so the retransmission check below
916+
// can recognise our changeset against the stored revision form. Comparing
917+
// the raw client `changeset` against `c` would miss legitimate
918+
// retransmissions whenever moveOpsToNewPool renumbered an attribute
919+
// (e.g. `*0` -> `*1` because the pad pool already had something at slot 0).
920+
const canonicalCs = rebasedChangeset;
884921

885922
// ex. applyUserChanges
886923
let r = baseRev;
@@ -891,9 +928,9 @@ const handleUserChanges = async (socket:any, message: {
891928
while (r < pad.getHeadRevisionNumber()) {
892929
r++;
893930
const {changeset: c, meta: {author: authorId}} = await pad.getRevision(r);
894-
if (changeset === c && thisSession.author === authorId) {
931+
if (canonicalCs === c && thisSession.author === authorId) {
895932
// Assume this is a retransmission of an already applied changeset.
896-
rebasedChangeset = identity(unpack(changeset).oldLen);
933+
rebasedChangeset = identity(unpack(canonicalCs).oldLen);
897934
}
898935
// At this point, both "c" (from the pad) and "changeset" (from the
899936
// client) are relative to revision r - 1. The follow function
@@ -909,6 +946,22 @@ const handleUserChanges = async (socket:any, message: {
909946
`Can't apply changeset ${rebasedChangeset} with oldLen ` +
910947
`${oldLen(rebasedChangeset)} to document of length ${prevText.length}`);
911948
}
949+
// Defensive: reject any rebased changeset whose application would leave
950+
// the pad text not ending with '\n'. Previously the server silently
951+
// appended a separate `nlChangeset` correction revision; that worked
952+
// for the stored pad but the FIRST broadcast (the malformed user
953+
// revision) reached browsers BEFORE the correction did, and the
954+
// browser's line assembler asserts "line assembler not finished" on
955+
// a doc that doesn't end with '\n', taking the session out. Refuse to
956+
// accept such changesets — clients must always preserve the
957+
// trailing-newline invariant.
958+
const projectedText = applyToText(rebasedChangeset, prevText);
959+
if (!projectedText.endsWith('\n')) {
960+
throw new Error(
961+
`Rejected USER_CHANGES whose application would leave the pad ` +
962+
`without a trailing '\\n' (length ${projectedText.length}). ` +
963+
`Every USER_CHANGES must preserve the "doc ends with \\n" invariant.`);
964+
}
912965

913966
const newRev = await pad.appendRevision(rebasedChangeset, thisSession.author);
914967
// The head revision will either stay the same or increase by 1 depending on whether the
@@ -920,12 +973,6 @@ const handleUserChanges = async (socket:any, message: {
920973
await pad.appendRevision(correctionChangeset, thisSession.author);
921974
}
922975

923-
// Make sure the pad always ends with an empty line.
924-
if (pad.text().lastIndexOf('\n') !== pad.text().length - 1) {
925-
const nlChangeset = makeSplice(pad.text(), pad.text().length - 1, 0, '\n');
926-
await pad.appendRevision(nlChangeset, thisSession.author);
927-
}
928-
929976
// The client assumes that ACCEPT_COMMIT and NEW_CHANGES messages arrive in order. Make sure we
930977
// have already sent any previous ACCEPT_COMMIT and NEW_CHANGES messages.
931978
assert.equal(thisSession.rev, r);

src/package.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -125,7 +125,7 @@
125125
"chokidar": "^5.0.0",
126126
"eslint": "^10.4.0",
127127
"eslint-config-etherpad": "^4.0.5",
128-
"etherpad-cli-client": "^4.0.2",
128+
"etherpad-cli-client": "^4.0.3",
129129
"mocha": "^11.7.5",
130130
"mocha-froth": "^0.2.10",
131131
"nodeify": "^1.0.1",

src/tests/backend/specs/Pad.ts

Lines changed: 23 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -61,15 +61,36 @@ describe(__filename, function () {
6161
pad = await padManager.getPad(padId, '');
6262
// spliceText is an existing runtime Pad method; cast avoids
6363
// adding a type-only declaration to PadType in this PR.
64-
await (pad as any).spliceText(0, 0, '100\u00a0km');
64+
await (pad as any).spliceText(0, 0, '100\u00a0km', 'a.test');
6565
assert.equal(pad!.text(), '100\u00a0km\n');
6666
});
6767

6868
it('setText round-trips U+00A0', async function () {
6969
pad = await padManager.getPad(padId, '');
70-
await pad!.setText('a\u00a0b\n');
70+
await pad!.setText('a\u00a0b\n', 'a.test');
7171
assert.equal(pad!.text(), 'a\u00a0b\n');
7272
});
73+
74+
it('spliceText with empty authorId attributes to the system author', async function () {
75+
pad = await padManager.getPad(padId, '');
76+
// An unattributed insert (empty authorId, non-empty ins) used to
77+
// produce an AText where text and attribs disagreed on length \u2014
78+
// clients then failed setDocAText reconciliation on load. The
79+
// server now substitutes a stable system author so the AText stays
80+
// well-formed without forcing every caller to allocate one up-front.
81+
await (pad as any).spliceText(0, 0, 'plugin-text', '');
82+
assert.equal(pad!.text(), 'plugin-text\n');
83+
const pool: any = (pad as any).pool;
84+
let sawSystemAuthor = false;
85+
for (const k of Object.keys(pool.numToAttrib || {})) {
86+
const a = pool.numToAttrib[k];
87+
if (a[0] === 'author' && a[1] === 'a.etherpad-system') {
88+
sawSystemAuthor = true;
89+
break;
90+
}
91+
}
92+
assert(sawSystemAuthor, 'expected system-author binding in pad pool');
93+
});
7394
});
7495

7596
describe('padDefaultContent hook', function () {

0 commit comments

Comments
 (0)