Skip to content

feat(website): XLSX submission template with Data + Guidance sheets and dropdown enforcement#6637

Open
theosanderson wants to merge 12 commits into
mainfrom
excel-val
Open

feat(website): XLSX submission template with Data + Guidance sheets and dropdown enforcement#6637
theosanderson wants to merge 12 commits into
mainfrom
excel-val

Conversation

@theosanderson

@theosanderson theosanderson commented Jun 9, 2026

Copy link
Copy Markdown
Member

Implements #6636 — a richer XLSX submission template with enforced dropdown choices for controlled-vocabulary fields.

What it does

The XLSX template (/[organism]/submission/template?fileType=xlsx) is now a three-sheet workbook:

Sheet Visible Protected Purpose
Data yes no What the submitter fills in.
Guidance yes read-only Human-readable reference for every field.
_lists hidden read-only Lookup source backing the dropdowns.

Data sheet

  • All input fields as columns, priority (default/metadataTemplate) fields first, then the rest. Headers stay machine field names so the file round-trips through upload.
  • Enforced dropdowns on every field with a controlled vocabulary (options): a value outside the list is rejected (errorStyle=stop). Free-text columns get no validation, so they accept anything.
  • Dropdowns are per-column and header-driven: each validation reads its own column header and resolves that field's options in _lists (MATCH/OFFSET), so the dropdown follows the field if the column is renamed or moved.
  • Header hover notes carrying each field's definition, guidance and example.
  • Colour-coded headers by tier — required / priority / optional — with a matching legend on Guidance.
  • Frozen header row; date-typed columns formatted yyyy-mm-dd (what upload expects); columns sized to the field name.

Guidance sheet (read-only)

One row per field: Field name, Display name, Required, Definition, Guidance, Example, Allowed values — plus the colour legend.

Other changes

  • Upload coupling: the parser reads the first sheet (Data); its multi-sheet warning now ignores the Guidance/_lists reference sheets, so a freshly-downloaded template uploads without a spurious warning.
  • Library: generation uses ExcelJS (@lokalise/xlsx/SheetJS-CE cannot write data validations, hidden sheets or sheet protection); upload parsing is unchanged. TSV output is unchanged.
  • Upload help text now leads with the XLSX template (recommended) and labels the TSV one minimal.

Previews

🤖 Generated with Claude Code

🚀 Preview: https://excel-val.loculus.org

…nforcement

Rework the downloadable XLSX submission template into a richer workbook:

- `Data` sheet (first): every input field as a column, ordered template
  (default-enabled) fields first, then the remaining opt-in fields. Headers
  remain machine field names so the file round-trips through upload.
- `Config` sheet (read-only): human-readable reference listing every field with
  its display name, whether it is enabled by default, whether it is required,
  and its allowed values.
- `_lists` sheet (hidden): one column per field with a controlled vocabulary,
  the lookup source for dropdowns.

Dropdown enforcement is a single, header-driven data validation over the whole
Data grid: each cell reads its own column header and looks it up in `_lists`, so
the dropdown follows a column even if the user reorders or renames it. Free-text
columns are left unconstrained (the lookup errors, which Excel treats
permissively).

Generation switches to ExcelJS, which (unlike @lokalise/xlsx / SheetJS CE) can
write data validations, hidden sheets and sheet protection. Upload parsing is
unchanged and the multi-sheet upload warning now ignores the `Config`/`_lists`
reference sheets. TSV output is unchanged.

Refs #6636

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@theosanderson theosanderson added the preview Triggers a deployment to argocd label Jun 9, 2026
@claude claude Bot added the website Tasks related to the web application label Jun 9, 2026
theosanderson-agent and others added 3 commits June 9, 2026 15:50
Only fields with a controlled vocabulary now get a dropdown, each pointing
directly at that field's column in `_lists`. Free-text columns get no
validation, so they accept any value (a single grid-wide `list` rule rejected
free text, because list validation cannot express "allow anything").

The dropdowns are advisory rather than strict: a value typed outside the list
warns ("Yes" keeps it) instead of being rejected.

Also widen the Data and Config columns so headers and option values are readable.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…amed columns

Each option field's dropdown now resolves its allowed values by looking its own
column header up in `_lists` (via MATCH/OFFSET) instead of referencing a fixed
`_lists` range. The header reference is column-relative to the validation range,
so the dropdown follows the field when the column is renamed or moved.

Still per-column and advisory: free-text columns get no validation (so they
never warn), and off-list values on option columns warn rather than reject.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…eze, date format

Make the generated template friendlier to fill in:

- Header hover notes carrying each field's definition, guidance and example.
- Colour-coded headers by tier (required / included-by-default / opt-in), with a
  matching colour legend on the Config sheet.
- Frozen header row and an auto-filter on the Data sheet.
- Date-typed columns formatted as yyyy-mm-dd (the format upload expects), via a
  new `metadataType` on the ordered template fields.
- Config header row styled and frozen.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
theosanderson-agent and others added 4 commits June 9, 2026 16:26
…p filter

- Remove the Data sheet auto-filter (its header arrows clutter the sheet and
  are easily confused with the validation dropdowns); keep the frozen header.
- Rename the `Config` reference sheet to `Guidance` (and update the upload
  parser's reference-sheet allowlist accordingly).
- Drop the "Enabled by default" column; relabel the blue tier to "Priority
  fields" in the colour key.
- Add Definition, Guidance and Example columns to the Guidance sheet so the
  field descriptions are readable in the workbook, not only in header notes.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Long option values (e.g. country names) were stretching columns to the 45-char
cap. Width is now based on the field name only.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…st values)

Switch the per-column option dropdowns from advisory to strict (errorStyle
stop): a value outside a field's controlled vocabulary is rejected. Only columns
with options are validated, so free-text columns remain unrestricted.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Lead with the XLSX template (recommended) and call the TSV one a minimal
template, on the metadata upload page.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@theosanderson theosanderson changed the title feat(website): XLSX template with Data + Config sheets and dropdown enforcement feat(website): XLSX template with Data + Guidance sheets, hover notes, and dropdown enforcement Jun 9, 2026
@theosanderson theosanderson changed the title feat(website): XLSX template with Data + Guidance sheets, hover notes, and dropdown enforcement feat(website): XLSX submission template with Data + Guidance sheets and dropdown enforcement Jun 9, 2026
…rder

The XLSX upload parser read the first sheet by position, so reordering the
template's sheets (e.g. dragging Guidance to the front) would parse the
reference sheet as metadata. Select the 'Data' sheet by name when present,
falling back to the first sheet for non-template workbooks, and compute the
multi-sheet warning relative to the parsed sheet.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@theosanderson

Copy link
Copy Markdown
Member Author

@codex review

theosanderson-agent and others added 2 commits June 9, 2026 17:08
The upload parser now selects the Data sheet by name, so its position is no
longer load-bearing; it is still added first so the workbook opens on it.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9b8d8fae91

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread website/src/components/Submission/FileUpload/fileProcessing.ts
@theosanderson theosanderson marked this pull request as ready for review June 9, 2026 16:15
@claude

claude Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Claude finished @theosanderson's task in 2m 37s —— View job


Code Review

  • Gather context and understand the PR
  • Read key changed files
  • Review template generation (index.ts)
  • Review file processing changes (fileProcessing.ts)
  • Review config changes (config.ts)
  • Review tests
  • Post inline comments and summary

This is a well-structured and thoroughly documented feature. The approach — a three-sheet workbook with enforced dropdowns, a hidden lookup sheet, and a read-only guidance sheet — is solid, and the test suite is comprehensive including raw-XML assertions to catch the one ExcelJS abstraction leak (the dataValidations.add internal API).

General observations

fileProcessing.tsData sheet name collision (acknowledged by author)
The updated logic prefers any sheet named Data over the first sheet (line 254–256). The Codex reviewer flagged that a user-supplied workbook with a coincidental Data tab could have the wrong sheet processed silently. The PR author accepted this tradeoff, but it's worth noting the original first-sheet fallback was predictable; the new heuristic is not. A comment at that branch explaining the known edge case would help future readers who might otherwise try to "fix" it.

config.tsgetOrderedTemplateInputFields silently drops unknown template field names
Fields listed in schema.metadataTemplate that don't exist in inputFields are silently dropped via .filter((field): field is InputField => field !== undefined) (line 199). The existing validateWebsiteConfig catches and throws on these at startup, so this is fine in production, but the silent drop could be confusing during local development before that validation runs. This is low-priority.

index.tsContent-Disposition filename sanitization
The filename is built with .replaceAll(' ', '_') on displayName (line 50). If a displayName ever contains a " character, the header value becomes malformed (filename="foo"bar_template.xlsx"). Since displayName comes from operator-controlled config this is low-severity, but adding .replaceAll('"', '') would be defensive.

index.ts — nested OFFSET formula untested in real clients
The PR honestly flags this in its description: the header-driven cross-sheet MATCH/OFFSET data validation source is "the flakiest corner of the format" and hasn't been eyeballed in Excel/LibreOffice. The logic is sound and unit-tested against raw XML, but the real-client gap is worth keeping in a follow-up issue.

Tests
Coverage is thorough: sheet structure, ordering, colour tiers, hover notes, date format, column widths, raw XML for the compact validation block, the Guidance contents and legend, the revision filename, and both the happy-path and the edge-case for the multi-sheet warning filter. Only missing scenario I'd note: a workbook where _lists is the first sheet and Data is second — ensuring the "parse Data by name" logic still works when _lists (not just Guidance) is dragged to the front. Not blocking.

See inline comments for two small fixable items (a comment typo and a missing intent comment on the empty-string sheet password).

add(sqref: string, validation: ExcelJS.DataValidation): void;
}

/** Visual tiers used to colour-code field headers (and explained by the legend on `Config`). */

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in comment: Config should be Guidance.

Suggested change
/** Visual tiers used to colour-code field headers (and explained by the legend on `Config`). */
/** Visual tiers used to colour-code field headers (and explained by the legend on `Guidance`). */

listsSheet.getCell(optionIndex + 2, columnIndex + 1).value = option.name;
});
});
void listsSheet.protect('', { selectLockedCells: false, selectUnlockedCells: false });

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empty-string password is functional (prevents accidental edits without locking out anyone), but it's worth a short comment to make the intent clear — otherwise it reads as an oversight.

Suggested change
void listsSheet.protect('', { selectLockedCells: false, selectUnlockedCells: false });
// Empty-string password: protection is cosmetic (guard against accidental edits), not a security boundary.
void listsSheet.protect('', { selectLockedCells: false, selectUnlockedCells: false });


if (optionFields.length > 0) {
addOptionsLookup(workbook, dataSheet, fields, optionFields);
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sheet creation order is Data_listsGuidance, but the PR description table lists them as DataGuidance_lists. Not a bug (the tests use toContain not positional ordering for these two), but the doc/code ordering should agree to avoid confusion for future readers.

- Link text is 'XLSX template' with '(recommended)' outside the link.
- Make the revision page object click the template links by href (fileType
  query param) instead of exact 'TSV'/'XLSX' text, which broke when the copy
  changed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

preview Triggers a deployment to argocd website Tasks related to the web application

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant