Skip to content

feat: add Escape Smart Characters operation#2391

Open
HarelKatz wants to merge 1 commit into
gchq:masterfrom
HarelKatz:feat/escape-smart-characters
Open

feat: add Escape Smart Characters operation#2391
HarelKatz wants to merge 1 commit into
gchq:masterfrom
HarelKatz:feat/escape-smart-characters

Conversation

@HarelKatz
Copy link
Copy Markdown

Description
Adds a new "Escape Smart Characters" operation that converts typographic Unicode characters into their plain ASCII equivalents — smart quotes, em/en dashes, ellipses, ©, ®, ™, arrows, guillemets, and common math symbols.

Characters with no obvious ASCII mapping are handled via an "Unmappable characters" option:

  • Include (default) — pass through unchanged
  • Remove — drop them
  • Replace with '.' — substitute with a period

Lives in the Default module (pure JS, no new dependencies) and sits in the Data format category next to "Normalise Unicode".

Existing Issue
Closes #419. (Prior attempt #1291 was closed for inactivity after the maintainer asked for a rebase; this is a fresh implementation against current master.)

Screenshots
N/A — no visual changes; this is a pure data-transformation operation.

AI disclosure
I used Claude Code to assist with implementation: drafting the character map, writing the test cases, and verifying the lint/test/build workflow. I have reviewed all code being submitted and can answer questions about it.

Test Coverage
11 new tests in tests/operations/tests/EscapeSmartCharacters.mjs covering: smart quotes, dashes & ellipsis, trademark symbols, arrows & guillemets, math symbols, NBSP, all three "Unmappable" modes (Include / Remove / Replace), ASCII passthrough, and empty input. npm run lint, npm test, and npm run testnodeconsumer all pass locally.

Converts smart (typographic) Unicode characters (smart quotes, em/en
dashes, ellipses, ©, ®, ™, arrows, guillemets, common math symbols)
into ASCII equivalents. Characters with no ASCII mapping are handled
via a user option: Include (default), Remove, or Replace with '.'.

Closes gchq#419
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Operation request: Escape Smart Characters

1 participant