Skip to content

webviewer-redaction-ai sample#84

Merged
DavidEGutierrez merged 10 commits intomainfrom
new-webviewer-ai-redaction-as208
Apr 9, 2026
Merged

webviewer-redaction-ai sample#84
DavidEGutierrez merged 10 commits intomainfrom
new-webviewer-ai-redaction-as208

Conversation

@Mohammed-AbdulRahman-Apryse
Copy link
Copy Markdown
Collaborator

Description

A brand-new sample to identify and apply redaction of the personal information in the provided PDF

Resources

Checklist

  • [✓] I understand that this is a public repo and my changes will be publicly visible

If you are adding a new sample

  • [✓] I have added an entry to the root level README
  • [✓] The name of my sample is consistent with the other samples
  • [✓] I have added a README to my sample
  • [✓] The sample is fully functional
  • [✓] I have updated lerna.json with the new sample name

If you are removing an old sample

  • I have removed the entry from the root level README
  • I have removed the sample from lerna.json

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new webviewer-redaction-ai sample demonstrating AI-assisted PII detection (via LangChain/OpenAI) and redaction in Apryse WebViewer, plus E2E test scaffolding and repo-level registration.

Changes:

  • Introduces a new WebViewer client that can analyze document text for PII and create redaction annotations.
  • Adds an Express server that handles text ingestion and LLM-based analysis (with a client/server mocking mode).
  • Registers the sample in the monorepo (lerna.json) and root README.md, and adds Playwright E2E config/tests.

Reviewed changes

Copilot reviewed 25 out of 29 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
webviewer-redaction-ai/tools/copy-webviewer-files.js Postinstall script to copy WebViewer assets into client/lib.
webviewer-redaction-ai/server/serve.js Express dev server with MODE_ENV injection + static hosting.
webviewer-redaction-ai/server/llmManager.js LangChain/OpenAI initialization and message execution helper.
webviewer-redaction-ai/server/handler.js API endpoints for sending text, analyzing PII, and retrieving results.
webviewer-redaction-ai/server/config.json Guardrail prompt/rules and PII classification config for the LLM prompt.
webviewer-redaction-ai/playwright.config.js Playwright configuration for E2E tests.
webviewer-redaction-ai/package.json Sample scripts (start/mock/e2e) and dependencies.
webviewer-redaction-ai/package-lock.json Lockfile for the new sample’s Node dependencies.
webviewer-redaction-ai/mainsamplesource.json Metadata file listing sample source paths.
webviewer-redaction-ai/client/ui/functionMap.js Maps custom UI actions to analyzer + redaction functions.
webviewer-redaction-ai/client/ui/custom.json Modular UI config adding an “AI PII Redaction” button to the toolbar.
webviewer-redaction-ai/client/redaction.js Applies redaction annotations based on analysis results.
webviewer-redaction-ai/client/index.js WebViewer initialization and modular UI import.
webviewer-redaction-ai/client/index.html Client entry HTML loading WebViewer + app modules.
webviewer-redaction-ai/client/globals.js Defines shared app state and initial document URLs.
webviewer-redaction-ai/client/document/manager.js Loads and caches full document text from WebViewer.
webviewer-redaction-ai/client/document/analyzer.js Client-side API calls (or mock responses) to run PII analysis.
webviewer-redaction-ai/client/assets/favicon.svg Sample favicon asset.
webviewer-redaction-ai/client/assets/ai-icon.svg Toolbar icon for the AI redaction button.
webviewer-redaction-ai/tests/package.playwright.json Playwright-related package snippet/config (currently JSON).
webviewer-redaction-ai/tests/e2e/webviewer-redaction-ai.spec.js Playwright E2E tests for the AI redaction workflow.
webviewer-redaction-ai/mocks/webviewer-redaction-ai.mock.js Mock analysis/documentId responses + mocking-mode toggle.
webviewer-redaction-ai/README.md Sample documentation and setup instructions.
webviewer-redaction-ai/LICENSE Sample license file.
webviewer-redaction-ai/.gitignore Ignores client/lib and other generated artifacts.
webviewer-redaction-ai/.env.example Environment variable template for OpenAI + server port.
lerna.json Adds webviewer-redaction-ai to the Lerna package list.
README.md Adds an “Artificial Intelligence” section and lists the new sample.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

1. Exposing global declarations (loadedDocument, aiAnalysisResult, and files) via globalThis. Affected files:
  a. client\document\analyzer.js
  b. client\globals.js
  c. client\index.js
  d. client\redaction.js
2. serve.js -> server.js. Affected file is package.json
3. "npm-run-all --parallel" -> "run-p --race" in test:e2e/test:e2e:ui scripts, so the server is torn down automatically once Playwright completes. Affected file is package.json
4. Eliminate extra code of output parsing and directly using the returned parsed output of llmManager.executeMessages. Affected files:
  a. server\handler.js
  b. server\llmManager.js
5. Replacing the weak PRNGs (Math.random) by CSPRNGs (randomBytes) to achieve security. Affected file is server\handler.js
6. Replaced time delays by Playwright's built-in auto-waiting assertions, to avoid running flaky E2E tests across different machines/CI. Affected file is __tests__\e2e\webviewer-redaction-ai.spec.js
1. __mocks__\webviewer-redaction-ai.mock.js: Exposing global declarations (window) via globalThis.
2. client\document\manager.js: Reporting the catch error message.
3. client\index.js: Using top-level await instead of promise chain.
4. client\redaction.js: Using an optional chain expression to ensure maintainability.
5. server\handler.js: Importing Node.js built-in modules using the "node:". Affected files:
  a. server\handler.js
  b. server\server.js
6. server\llmManager.js:
  a. llm & parser to be declared as fields.
  b. Using number static methods and properties instead of global equivalents.
  c. Replace if-then-else flow by a single return statement.
This is round two of clearing SonarQubeCloud issues, as it reports more new issues at every new commit.

1. __mocks__\webviewer-redaction-ai.mock.js: Compare with `undefined` directly instead of using `typeof`.
2. client\index.js: Using top-level await instead of calling an async function importModularComponents.
This is round three of clearing SonarQubeCloud issues, as it reports more new issues at every new commit.

__mocks__\webviewer-redaction-ai.mock.js: Strict equality operators should not be used with dissimilar types.
@DavidEGutierrez DavidEGutierrez requested a review from Copilot March 19, 2026 13:17
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new WebViewer Redaction AI sample that detects Personally Identifiable Information (PII) in PDFs using an OpenAI/LangChain-backed server, then applies redaction annotations in the WebViewer UI. It also includes mocking + Playwright E2E coverage and registers the sample in the monorepo.

Changes:

  • Added an Express server with LangChain/OpenAI integration and configurable guard-rail prompt rules.
  • Added a WebViewer client with custom modular UI, PII analysis trigger, and redaction application logic.
  • Added mocking mode + Playwright E2E tests and wired the sample into root README + lerna.json.

Reviewed changes

Copilot reviewed 25 out of 29 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
webviewer-redaction-ai/tools/copy-webviewer-files.js Postinstall script to copy WebViewer assets into client/lib.
webviewer-redaction-ai/server/server.js Express server entrypoint + static hosting + MODE_ENV injection for mocking.
webviewer-redaction-ai/server/llmManager.js LangChain/OpenAI wrapper for invoking the model and parsing responses.
webviewer-redaction-ai/server/handler.js API endpoints for sending text, analyzing PII, and retrieving results.
webviewer-redaction-ai/server/config.json Guard-rail prompt template, PII categories, and response rules.
webviewer-redaction-ai/playwright.config.js Playwright test configuration for the sample.
webviewer-redaction-ai/package.json Sample scripts (start/mock/e2e) and dependencies.
webviewer-redaction-ai/package-lock.json Locked dependency tree for the sample.
webviewer-redaction-ai/mainsamplesource.json Metadata pointing to the sample’s key source files.
webviewer-redaction-ai/client/ui/functionMap.js Maps custom UI actions to analysis + redaction workflow.
webviewer-redaction-ai/client/ui/custom.json Custom modular UI configuration adding the AI PII redaction tool button.
webviewer-redaction-ai/client/redaction.js Converts AI results into text searches and redaction annotations.
webviewer-redaction-ai/client/index.js Bootstraps WebViewer, imports modular UI, initializes document manager.
webviewer-redaction-ai/client/index.html Client HTML shell loading WebViewer + app scripts.
webviewer-redaction-ai/client/globals.js Global shared state + sample document list.
webviewer-redaction-ai/client/document/manager.js Extracts and caches document text across pages for analysis.
webviewer-redaction-ai/client/document/analyzer.js Client-side orchestration of send/analyze/get-results + mocking support.
webviewer-redaction-ai/client/assets/favicon.svg Sample favicon asset.
webviewer-redaction-ai/client/assets/ai-icon.svg Icon asset for the AI toolbar button.
webviewer-redaction-ai/tests/package.playwright.json Playwright package metadata for test execution.
webviewer-redaction-ai/tests/e2e/webviewer-redaction-ai.spec.js Playwright E2E validating button presence + redaction flow.
webviewer-redaction-ai/mocks/webviewer-redaction-ai.mock.js Mock responses and mocking-mode detection used by client/server.
webviewer-redaction-ai/README.md Sample documentation and setup instructions.
webviewer-redaction-ai/LICENSE Sample licensing terms.
webviewer-redaction-ai/.gitignore Ignores node_modules and copied WebViewer assets.
webviewer-redaction-ai/.env.example Example env vars for OpenAI + server port.
lerna.json Registers webviewer-redaction-ai as a managed package.
README.md Adds an “Artificial Intelligence” section and lists the new sample.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

1. mocking connection to OpenAI via intercept the three HTTP calls (/api/send-text, /api/analyze-pii, and /api/get-results). Affected file __mocks__\webviewer-redaction-ai.mock.js
2. re-write the tests to call the mocking from __mocks__\webviewer-redaction-ai.mock.js when performing AI PII redaction. Affected file __tests__\e2e\webviewer-redaction-ai.spec.js
3. Remove mocking from the original source code. Affected files:
  a. client\document\analyzer.js
  b. server\handler.js
4. Remove injecting mocking mode. Affected files:
  a. package.json
  b. server\server.js
@Mohammed-AbdulRahman-Apryse
Copy link
Copy Markdown
Collaborator Author

Applied Logan Bittner instruction in aedbbd3, with SonarQube report shows 0 issues.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new webviewer-redaction-ai sample that runs a small Express backend using LangChain/OpenAI to detect PII in PDF text, then applies WebViewer redactions in the client UI. This also updates the repo’s sample listings/monorepo config to include the new sample.

Changes:

  • New client + server sample for AI-assisted PII detection and redaction in WebViewer.
  • Adds Playwright E2E test coverage with mocked API routes.
  • Registers the sample in the root README and lerna.json.

Reviewed changes

Copilot reviewed 25 out of 29 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
webviewer-redaction-ai/tools/copy-webviewer-files.js Postinstall helper to copy WebViewer lib assets into the sample.
webviewer-redaction-ai/server/server.js Express server entrypoint to serve /client and mount API handlers.
webviewer-redaction-ai/server/llmManager.js LangChain/OpenAI initialization + message execution helper.
webviewer-redaction-ai/server/handler.js API routes for sending text, analyzing PII, and returning results.
webviewer-redaction-ai/server/config.json Guardrail prompt/rules + PII classification config for the LLM.
webviewer-redaction-ai/playwright.config.js Playwright setup to run E2E tests against the sample server.
webviewer-redaction-ai/package.json Sample scripts (start, postinstall, Playwright) and dependencies.
webviewer-redaction-ai/package-lock.json Lockfile for sample dependencies.
webviewer-redaction-ai/mainsamplesource.json Metadata referencing key sample source files.
webviewer-redaction-ai/client/ui/functionMap.js Wires the custom UI button to analysis + redaction flow.
webviewer-redaction-ai/client/ui/custom.json Modular UI definition adding an AI PII redaction button/toolgroup.
webviewer-redaction-ai/client/redaction.js Searches for detected PII strings and creates redaction annotations.
webviewer-redaction-ai/client/index.js Bootstraps WebViewer, loads modular UI, initializes document manager.
webviewer-redaction-ai/client/index.html Client HTML entrypoint loading WebViewer + globals + module script.
webviewer-redaction-ai/client/globals.js Defines global state and default sample documents list.
webviewer-redaction-ai/client/document/manager.js Extracts and caches full document text from the loaded PDF.
webviewer-redaction-ai/client/document/analyzer.js Client-side API calls to send text/analyze/fetch results.
webviewer-redaction-ai/client/assets/favicon.svg Sample favicon asset.
webviewer-redaction-ai/client/assets/ai-icon.svg Icon for the AI toolbar button.
webviewer-redaction-ai/tests/package.playwright.json Playwright test package metadata for the test harness.
webviewer-redaction-ai/tests/e2e/webviewer-redaction-ai.spec.js E2E tests for the AI PII redaction flow.
webviewer-redaction-ai/mocks/webviewer-redaction-ai.mock.js Route mocks + fixture data for the E2E flow.
webviewer-redaction-ai/README.md Sample documentation and setup instructions.
webviewer-redaction-ai/LICENSE Sample licensing file.
webviewer-redaction-ai/.gitignore Ignores node_modules and copied WebViewer lib artifacts.
webviewer-redaction-ai/.env.example Example env vars for OpenAI + server port.
lerna.json Adds webviewer-redaction-ai to the Lerna package list.
README.md Adds an “Artificial Intelligence” section and lists the new sample.
Files not reviewed (1)
  • webviewer-redaction-ai/package-lock.json: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…sues

1. Validate this.#instance before using it. Affected file client\document\manager.js
2. Document type declaration is invalid. Changed <!DOCTYPE> into <!DOCTYPE html>. Affected file client\index.html
3. Combine searching modes into one numeric flag before calling textSearchInit. Affected file client\redaction.js
4. Replacing AssistantMessage by SystemMessage to avoid alias, as it is misleading when reading the handler. Affected file server\handler.js
5. Return parsedResponse directly instead of parsedResponse.content. Affected file server\llmManager.js
6. For testing purposes, prevent the server from opening the browser automatically when it starts. Affected file server\server.js
DavidEGutierrez
DavidEGutierrez previously approved these changes Apr 3, 2026
Copy link
Copy Markdown
Collaborator

@DavidEGutierrez DavidEGutierrez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested sample to work as expected. The e2e test definitions work as expected and validate UI elements involved in finding PII data.

…from stores.

1. server/inMemoryStore.js: new inMemoryStore class to be used by handler.js. This class manages releasing indefinitely stored documents and their analysis from an in-memory Map (documentStore / analysisStore), with consider adding a TTL/cleanup and max entries limit.
2. server\handler.js: using the new class inMemoryStore.
3. mainsamplesource.json: referencing the new class in files list.
4. __tests__\e2e\webviewer-redaction-ai.spec.js: new e2e test to indicate that the document was released from memory after analyzing for PII.
Copy link
Copy Markdown
Collaborator

@DavidEGutierrez DavidEGutierrez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add unit test on inMemoryStore to validate memory handling to prevent memory leaks. Add a validation on passing condution within the limits and when exceeding the limit on the new class.

A second issue I ran into was when the user clicks multiple times on the button to create redaction. It should reset the redaction to only have the latest results, nothing on previous requests. See image in ticket AS 208

@lbittner-pdftron
Copy link
Copy Markdown
Contributor

Agree we should have unit tests here for core functionality, but we need to get this PR through; it's been open for almost a month.

Let's get any bugs fixed, make sure our e2e tests are solid, and get this merged. I want to see this merged by tomorrow, April 9

@Mohammed-AbdulRahman-Apryse
Copy link
Copy Markdown
Collaborator Author

Mohammed-AbdulRahman-Apryse commented Apr 9, 2026

I have completed working on the last requests from David in #84 (review)

His both two requests are covered in:

…s on the server

1. Eliminated the documentStore and analysisStore and their related functionality, in order not making repeated requests grow memory unbounded and potentially crash the server.
2. The InMemoryStore now is containing a lonely validation that confirms the received document text and pages count are below the limitation.
3. As an effect of eliminating the documentStore and analysisStore, there is no need to use a documentID.
4. The document manager is exposing pageCount to be considered for number of pages validation.
5. Added two more Playwright e2e tests:
   a. 'Perform AI PII redaction for document text size and page count within limits'
   b. 'Expect an alert when document text exceeds 30000 characters with page count above 20'
This must apply before re-performing the PII identification on the same loaded document.
@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud bot commented Apr 9, 2026

Copy link
Copy Markdown
Collaborator

@DavidEGutierrez DavidEGutierrez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good. Completed review with team.

@DavidEGutierrez DavidEGutierrez merged commit a7c23e3 into main Apr 9, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants