Skip to content

Commit 3af0191

Browse files
authored
fix(pdf-server): form-field save robustness (#591)
* fix(pdf-server): one bad form field no longer aborts saving the rest #577 dropped the per-field try/catch in buildAnnotatedPdfBytes when it swapped to type-dispatch, so the first field whose pdf-lib write throws (max-length text, missing /Yes appearance, radio buttonValue mapping to neither label nor index, ...) bubbled to the outer catch and silently dropped every subsequent field. Compounded by getAnnotatedPdfBytes passing every baseline field, so even an untouched problematic field in the PDF poisoned the whole save. - Re-wrap each field write in its own try/catch (warn + continue). - getAnnotatedPdfBytes now only sends fields whose value differs from baseline (still includes explicit ''/false sentinels for cleared fields so pdf-lib overwrites the original /V). - Regression test: a maxLength=2 text field fed a long string, followed by a normal field; assert the second one still lands. * fix(pdf-server): radio groups misclassified as PDFCheckBox save the chosen widget Some PDFs (e.g. third-party forms like the IRS f1040 family, and the demo Form.pdf) omit the /Ff Radio flag bit on button fields, so pdf-lib classifies a multi-widget radio as PDFCheckBox. The viewer stores pdf.js's buttonValue (the widget's on-state name, e.g. '0'/'1') as a string. The PDFCheckBox branch did 'if (value) field.check()', which always sets the FIRST widget's on-state - so any choice saved as the first option. When the value is a string on a PDFCheckBox, treat it as a radio on-value: write /V and per-widget /AS directly via the low-level acroField (mirroring PDFAcroRadioButton.setValue minus its first-widget-only onValues guard). Booleans keep check()/uncheck(). Test: build a radio fixture, clear the Radio flag so reload sees PDFCheckBox, save with '1', assert /V = /1 and second widget /AS = /1. * fix(pdf-server): clearing a radio from the panel unchecks all widgets clearFieldInStorage wrote the same string clearValue to every widget's storage, hitting pdf.js's inverted string-coercion bug (the one setFieldInStorage already works around): the wrong widget rendered checked. For radio, write {value:false} to every widget instead. * fix(pdf-server): only treat multi-widget PDFCheckBox as misclassified radio A single-widget checkbox getting a string value (e.g. 'Yes'/'Off' if something upstream stores the export string instead of boolean) was being routed through setButtonGroupValue, which no-ops when no widget's on-state matches the value - leaving the box unchanged. Gate the radio-path on widgets.length > 1; single-widget falls through to check()/uncheck() with liberal truthy/'Off' semantics. * fix(pdf-server): generate missing field appearances on save PDFs whose checkbox/radio widgets lack an /AP/N/<onValue> stream: check() sets /V and /AS but Preview/Acrobat can't render a state with no appearance, so the field looks unchanged. Call form.updateFieldAppearances() after the write loop so pdf-lib synthesizes defaults for dirty fields. * feat(pdf-server): save PDFOptionList (multiselect listbox) values Language field in Form.pdf is a PDFOptionList; the save loop skipped it. Parse the viewer's comma-joined string, select() the subset that exists in getOptions() (pdf-lib throws on unknowns and there's no edit-mode fallback like Dropdown has). * fix(pdf-server): capture all selected options on <select multiple> target.value on a multiselect is only the first option; join selectedOptions so the save path gets the full set.
1 parent f3bf106 commit 3af0191

File tree

4 files changed

+218
-41
lines changed

4 files changed

+218
-41
lines changed

examples/pdf-server/src/annotation-panel.ts

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -831,8 +831,15 @@ function clearFieldInStorage(name: string): void {
831831
meta?.[0]?.defaultValue ??
832832
"";
833833
const type = meta?.find((f) => f.type)?.type;
834-
const clearValue =
835-
type === "checkbox" || type === "radiobutton" ? (dv ?? "Off") : (dv ?? "");
834+
// Radio: per-widget BOOLEANS, never a string. pdf.js's
835+
// RadioButtonWidgetAnnotation render() has inverted string coercion (see
836+
// setFieldInStorage), so writing the same string to every widget checks
837+
// the wrong one. {value:false} on all = nothing selected.
838+
if (type === "radiobutton") {
839+
for (const id of ids) storage.setValue(id, { value: false });
840+
return;
841+
}
842+
const clearValue = type === "checkbox" ? (dv ?? "Off") : (dv ?? "");
836843
for (const id of ids) storage.setValue(id, { value: clearValue });
837844
}
838845

examples/pdf-server/src/mcp-app.ts

Lines changed: 20 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -2971,15 +2971,23 @@ async function getAnnotatedPdfBytes(): Promise<Uint8Array> {
29712971
}
29722972
}
29732973

2974-
// buildAnnotatedPdfBytes gates on formFields.size > 0 and only writes
2975-
// entries present in the map. After clearAllItems() the map is empty →
2976-
// zero setText/uncheck calls → pdf-lib leaves original /V intact →
2977-
// the "stripped PDF" we promised keeps all its form data. To actually
2978-
// clear, send an explicit sentinel for every baseline field the user
2979-
// dropped: "" for text, false for checkbox (matching baseline type).
2980-
const formFieldsOut = new Map(formFieldValues);
2974+
// Only write fields that actually changed vs. what's already in the PDF.
2975+
// Unchanged fields are no-ops at best, and at worst trip pdf-lib edge
2976+
// cases (max-length text, missing /Yes appearance, …) on fields the user
2977+
// never touched — which, before the per-field catch in
2978+
// buildAnnotatedPdfBytes, aborted every subsequent field.
2979+
//
2980+
// Fields the user cleared (present in baseline, absent from formFieldValues
2981+
// after clearAllItems()) still need an explicit "" / false so pdf-lib
2982+
// overwrites the original /V instead of leaving it intact.
2983+
const formFieldsOut = new Map<string, string | boolean>();
2984+
for (const [name, value] of formFieldValues) {
2985+
if (pdfBaselineFormValues.get(name) !== value) {
2986+
formFieldsOut.set(name, value);
2987+
}
2988+
}
29812989
for (const [name, baselineValue] of pdfBaselineFormValues) {
2982-
if (!formFieldsOut.has(name)) {
2990+
if (!formFieldValues.has(name)) {
29832991
formFieldsOut.set(name, typeof baselineValue === "boolean" ? false : "");
29842992
}
29852993
}
@@ -3499,6 +3507,10 @@ formLayerEl.addEventListener("input", (e) => {
34993507
if (!target.checked) return; // unchecking siblings — ignore
35003508
const wid = target.getAttribute("data-element-id");
35013509
value = (wid && radioButtonValues.get(wid)) ?? target.value;
3510+
} else if (target instanceof HTMLSelectElement && target.multiple) {
3511+
// .value on a <select multiple> is only the first option; join them all
3512+
// so save can select() the full set on a PDFOptionList.
3513+
value = Array.from(target.selectedOptions, (o) => o.value).join(",");
35023514
} else {
35033515
value = target.value;
35043516
}

examples/pdf-server/src/pdf-annotations.test.ts

Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1152,6 +1152,76 @@ describe("buildAnnotatedPdfBytes", () => {
11521152
const form = (await PDFDocument.load(out)).getForm();
11531153
expect(form.getTextField("name").getText()).toBe("kept");
11541154
});
1155+
1156+
it("a field that throws on write does not abort subsequent fields", async () => {
1157+
// Regression for #577: the per-field try/catch was dropped, so the
1158+
// first throwing field bubbled to the outer catch and silently dropped
1159+
// every field after it. setText() throws when value exceeds maxLength.
1160+
const doc = await PDFDocument.create();
1161+
const page = doc.addPage([612, 792]);
1162+
const form = doc.getForm();
1163+
const limited = form.createTextField("limited");
1164+
limited.setMaxLength(2);
1165+
limited.addToPage(page, { x: 10, y: 700 });
1166+
form.createTextField("after").addToPage(page, { x: 10, y: 660 });
1167+
const fixture = await doc.save();
1168+
1169+
const out = await buildAnnotatedPdfBytes(
1170+
fixture,
1171+
[],
1172+
new Map<string, string | boolean>([
1173+
["limited", "way too long"], // throws
1174+
["after", "kept"],
1175+
]),
1176+
);
1177+
1178+
const saved = (await PDFDocument.load(out)).getForm();
1179+
expect(saved.getTextField("after").getText()).toBe("kept");
1180+
// The throwing field is left at whatever pdf-lib could do with it —
1181+
// we only assert it didn't poison "after".
1182+
});
1183+
1184+
it("radio misclassified as PDFCheckBox: string value selects the matching widget", async () => {
1185+
// Some PDFs (e.g. IRS/third-party forms) omit the /Ff Radio bit, so
1186+
// pdf-lib hands us a PDFCheckBox. The viewer stored pdf.js's
1187+
// buttonValue ("0"/"1"), not a boolean — check() would always pick
1188+
// the first widget. setButtonGroupValue writes /V + per-widget /AS
1189+
// directly so the chosen widget sticks.
1190+
const doc = await PDFDocument.create();
1191+
const page = doc.addPage([612, 792]);
1192+
const form = doc.getForm();
1193+
const rg = form.createRadioGroup("Gender");
1194+
rg.addOptionToPage("Male", page, { x: 10, y: 700 });
1195+
rg.addOptionToPage("Female", page, { x: 60, y: 700 });
1196+
// pdf-lib's addOptionToPage writes widget on-values "0","1". Clear the
1197+
// Radio flag (bit 16) so the reloaded form classifies it as checkbox.
1198+
// eslint-disable-next-line @typescript-eslint/no-explicit-any
1199+
(rg.acroField as any).clearFlag(1 << 15);
1200+
const fixture = await doc.save();
1201+
1202+
// Sanity: reload sees it as PDFCheckBox now.
1203+
const reForm = (await PDFDocument.load(fixture)).getForm();
1204+
// eslint-disable-next-line @typescript-eslint/no-explicit-any
1205+
const reField = reForm.getFieldMaybe("Gender") as any;
1206+
expect(reField?.constructor?.name).toBe("PDFCheckBox");
1207+
1208+
const out = await buildAnnotatedPdfBytes(
1209+
fixture,
1210+
[],
1211+
new Map<string, string | boolean>([["Gender", "1"]]),
1212+
);
1213+
1214+
const saved = await PDFDocument.load(out);
1215+
// eslint-disable-next-line @typescript-eslint/no-explicit-any
1216+
const acro = (saved.getForm().getFieldMaybe("Gender") as any).acroField;
1217+
const v = acro.dict.get(PDFName.of("V"));
1218+
expect(v).toBeInstanceOf(PDFName);
1219+
expect((v as PDFName).decodeText()).toBe("1");
1220+
// Second widget /AS is the on-state, first is /Off.
1221+
const widgets = acro.getWidgets();
1222+
expect(widgets[0].getAppearanceState()?.decodeText()).toBe("Off");
1223+
expect(widgets[1].getAppearanceState()?.decodeText()).toBe("1");
1224+
});
11551225
});
11561226
});
11571227

examples/pdf-server/src/pdf-annotations.ts

Lines changed: 119 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,9 @@ import {
2121
PDFTextField,
2222
PDFCheckBox,
2323
PDFDropdown,
24+
PDFOptionList,
2425
PDFRadioGroup,
26+
type PDFForm,
2527
} from "pdf-lib";
2628

2729
// =============================================================================
@@ -800,6 +802,40 @@ export async function addAnnotationDicts(
800802
}
801803
}
802804

805+
/**
806+
* Select a radio-style button group by widget on-value, bypassing pdf-lib's
807+
* type-level guards. Used when pdf-lib classifies a radio as `PDFCheckBox`
808+
* (PDF lacks the /Ff Radio bit) — `check()` would always pick the first
809+
* widget. Mirrors `PDFAcroRadioButton.setValue` minus its `onValues` throw.
810+
*/
811+
function setButtonGroupValue(
812+
field: PDFCheckBox | PDFRadioGroup,
813+
onValue: string,
814+
): void {
815+
const acro = field.acroField;
816+
const off = PDFName.of("Off");
817+
const widgets = acro.getWidgets();
818+
// Match by PDFName identity (pdf-lib interns names) — the viewer stored
819+
// pdf.js's buttonValue, which IS the widget's /AP /N on-state name.
820+
let target = onValue && onValue !== "Off" ? PDFName.of(onValue) : off;
821+
if (
822+
target !== off &&
823+
!widgets.some((w: { getOnValue(): PDFName | undefined }) => {
824+
return w.getOnValue() === target;
825+
})
826+
) {
827+
// No widget has this on-state — leave as-is rather than corrupt /V.
828+
return;
829+
}
830+
acro.dict.set(PDFName.of("V"), target);
831+
for (const w of widgets) {
832+
const on = (w as { getOnValue(): PDFName | undefined }).getOnValue();
833+
(w as { setAppearanceState(s: PDFName): void }).setAppearanceState(
834+
on === target ? target : off,
835+
);
836+
}
837+
}
838+
803839
/**
804840
* Build annotated PDF bytes from the original document.
805841
* Applies user annotations and form fills, returns Uint8Array of the new PDF.
@@ -815,44 +851,96 @@ export async function buildAnnotatedPdfBytes(
815851
await addAnnotationDicts(pdfDoc, annotations);
816852

817853
// Apply form fills. Dispatch on actual field type — getTextField(name) throws
818-
// for dropdowns/radios, so the old try/catch silently dropped those on save.
854+
// for dropdowns/radios, so we look up the generic field and instanceof it.
855+
// Each field is wrapped in its own try/catch: pdf-lib can throw on
856+
// length-constrained text, radios whose buttonValue maps to neither label
857+
// nor index, checkboxes missing a /Yes appearance, etc. One bad field must
858+
// not abort the rest of the loop (regressed in #577 when the inner catch
859+
// was dropped along with the type-specific getters).
819860
if (formFields.size > 0) {
861+
let form: PDFForm | undefined;
820862
try {
821-
const form = pdfDoc.getForm();
863+
form = pdfDoc.getForm();
864+
} catch {
865+
// No AcroForm in this PDF
866+
}
867+
if (form) {
822868
for (const [name, value] of formFields) {
823-
const field = form.getFieldMaybe(name);
824-
if (!field) continue;
825-
826-
if (field instanceof PDFCheckBox) {
827-
if (value) field.check();
828-
else field.uncheck();
829-
} else if (field instanceof PDFRadioGroup) {
830-
// The viewer stores pdf.js's buttonValue, which for PDFs with an
831-
// /Opt array is a numeric index ("0","1","2") rather than the
832-
// option label pdf-lib's select() expects. Try the label first,
833-
// then fall back to indexing into getOptions().
834-
const opts = field.getOptions();
835-
const s = String(value);
836-
if (opts.includes(s)) {
837-
field.select(s);
838-
} else {
839-
const idx = Number(s);
840-
if (Number.isInteger(idx) && idx >= 0 && idx < opts.length) {
841-
field.select(opts[idx]);
869+
try {
870+
const field = form.getFieldMaybe(name);
871+
if (!field) continue;
872+
873+
if (field instanceof PDFCheckBox) {
874+
const widgets = field.acroField.getWidgets();
875+
if (typeof value === "string" && widgets.length > 1) {
876+
// Multi-widget "checkbox" with a string value = pdf-lib
877+
// misclassified a radio group (PDF lacks the /Ff Radio flag).
878+
// The viewer stored pdf.js's buttonValue (the widget's on-state
879+
// name, e.g. "0"/"1"); check()/uncheck() would hit the FIRST
880+
// widget regardless. Write /V and per-widget /AS directly.
881+
setButtonGroupValue(field, value);
882+
} else {
883+
// Single-widget (real) checkbox. Viewer normally stores boolean,
884+
// but be liberal: any truthy non-"Off" string counts as checked.
885+
const on =
886+
value === true ||
887+
(typeof value === "string" && value !== "" && value !== "Off");
888+
if (on) field.check();
889+
else field.uncheck();
842890
}
843-
// else: value is neither label nor index — leave unset
891+
} else if (field instanceof PDFRadioGroup) {
892+
// The viewer stores pdf.js's buttonValue, which for PDFs with an
893+
// /Opt array is a numeric index ("0","1","2") rather than the
894+
// option label pdf-lib's select() expects. Try the label first,
895+
// then fall back to indexing into getOptions().
896+
const opts = field.getOptions();
897+
const s = String(value);
898+
if (opts.includes(s)) {
899+
field.select(s);
900+
} else {
901+
const idx = Number(s);
902+
if (Number.isInteger(idx) && idx >= 0 && idx < opts.length) {
903+
field.select(opts[idx]);
904+
}
905+
// else: value is neither label nor index — leave unset
906+
}
907+
} else if (field instanceof PDFDropdown) {
908+
// select() auto-enables edit mode for values outside getOptions(),
909+
// so this works for both enumerated and free-text combos.
910+
field.select(String(value));
911+
} else if (field instanceof PDFOptionList) {
912+
// Viewer stores multiselect listboxes as a comma-joined string
913+
// (pdf.js's annotationStorage value for /Ch is string|string[];
914+
// our Map<string,string|boolean> flattens arrays). Only select
915+
// values that are actually in the option list — pdf-lib throws
916+
// on unknowns and there's no edit-mode fallback like Dropdown.
917+
const opts = field.getOptions();
918+
const wanted = String(value)
919+
.split(",")
920+
.map((s) => s.trim())
921+
.filter((s) => opts.includes(s));
922+
if (wanted.length > 0) field.select(wanted);
923+
else field.clear();
924+
} else if (field instanceof PDFTextField) {
925+
field.setText(String(value));
844926
}
845-
} else if (field instanceof PDFDropdown) {
846-
// select() auto-enables edit mode for values outside getOptions(),
847-
// so this works for both enumerated and free-text combos.
848-
field.select(String(value));
849-
} else if (field instanceof PDFTextField) {
850-
field.setText(String(value));
927+
// PDFButton, PDFSignature: no fill_form support yet
928+
} catch (err) {
929+
// Skip this field; carry on with the rest. Surfacing per-field
930+
// failures is the caller's job (see fill_form result), not save's.
931+
console.warn(`buildAnnotatedPdfBytes: skipped field "${name}":`, err);
851932
}
852-
// PDFButton, PDFOptionList, PDFSignature: no fill_form support yet
853933
}
854-
} catch {
855-
// pdfDoc.getForm() throws if the PDF has no AcroForm
934+
// Some PDFs ship widgets with no on-state appearance stream (no
935+
// /AP/N/<onValue>). pdf-lib's check()/select()/setText() set /V and /AS
936+
// but don't synthesize the missing appearance, so other readers
937+
// (Preview, Acrobat) show the field as if unchanged. This generates a
938+
// default appearance for any field marked dirty above.
939+
try {
940+
form.updateFieldAppearances();
941+
} catch (err) {
942+
console.warn("buildAnnotatedPdfBytes: updateFieldAppearances:", err);
943+
}
856944
}
857945
}
858946

0 commit comments

Comments
 (0)