added gliner onnx support by aramikm · Pull Request #1 · ProjectLibertyLabs/presidio

aramikm · 2026-02-18T06:14:06Z

Change Description

Describe your changes

Issue reference

Fixes #XX

Checklist

I have reviewed the contribution guidelines
I have signed the CLA (if required)
My code includes unit tests
All unit tests and lint checks pass locally
My PR contains documentation updates / additions if required

github-actions · 2026-02-19T21:23:25Z

Coverage report (presidio-anonymizer)

This PR does not seem to contain any modification to coverable code.

github-actions · 2026-02-19T21:24:22Z

Coverage report (presidio-cli)

This PR does not seem to contain any modification to coverable code.

github-actions · 2026-02-19T21:24:30Z

Coverage report (presidio-structured)

This PR does not seem to contain any modification to coverable code.

…workflow Add publish ECR GitHub workflow

github-actions · 2026-02-19T22:00:21Z

Coverage report (presidio-image-redactor)

This PR does not seem to contain any modification to coverable code.

…yLabs/presidio into gliner_integration

Rename existing workflow to build-publish-image-pocs-stg-ecr.yaml for consistency. Create new build-publish-image-ics-stg-ecr.yaml that publishes presidio-analyzer image to ics-stg/cheo-pii-analyzer ECR repo using ICS staging AWS credentials. Closes ProjectLibertyLabs/guardian-infra#88

…-image-to-ics-staging-ecr Add ICS staging ECR workflow for PII Analyzer

…points added claw defender endpoints

+                    try:
+                        item = self._validate_input_text(item)
+                    except ValueError as ve:
+                        return jsonify(error=str(ve)), 400


To fix the problem, the endpoint should not return the raw exception text (str(ve)) to the client. Instead, it should return a generic, controlled error message, while logging the exception server‑side for debugging. This aligns with the recommendation to avoid leaking stack traces or internal details.

Concretely, in presidio-analyzer/app.py, inside the /defender/detect route, the inner except ValueError as ve: block at lines 230–231 should be changed so that:

The exception is logged using self.logger (which already exists and is used in the outer handler).

The HTTP response body contains a generic validation error message that does not include ve content.

We can implement this by:

Replacing return jsonify(error=str(ve)), 400 with logging plus a generic message, e.g., return jsonify(error="Invalid input text."), 400.

Optionally including a short, non‑sensitive hint such as “Invalid input text.” or “Input validation failed.” which does not depend on ve.

No new imports, methods, or additional definitions are needed, because self.logger and jsonify are already available in this context.

+
+            except Exception as e:
+                self.logger.error(f"Error in /detect: {e}")
+                return jsonify(error=str(e)), 400


In general, the fix is to avoid returning raw exception information to the client. Instead, log the detailed exception server-side and send a generic, non-sensitive error message in the HTTP response. This preserves debugging capability via logs but prevents potential attackers from learning about internal implementation details.

Concretely for this file, the best minimal change is:

Keep the self.logger.error(f"Error in /detect: {e}") line as-is (or improve it later if desired).

Change the response in the except Exception as e: block of the /defender/detect route to return a fixed, generic error string instead of str(e). For example: return jsonify(error="An internal error occurred while processing the request."), 400.

We only need to edit presidio-analyzer/app.py inside the /defender/detect handler’s except block (around lines 252–254). No new methods or imports are required to implement this change, and existing functionality (HTTP status codes, JSON shape, etc.) will be preserved aside from hiding the internal error text.

+                    try:
+                        item = self._validate_input_text(item)
+                    except ValueError as ve:
+                        return jsonify(error=str(ve)), 400


In general, to fix this kind of issue you should avoid sending raw exception messages (which may contain stack traces or internal details) back to clients. Instead, log the detailed error server-side and return a generic, high-level description to the client, such as "Invalid input" or "An internal error occurred", possibly accompanied by a safe error code.

For this file, the best minimal fix is:

In the /defender/sanitize route, change the inner except ValueError as ve: to:

Log ve (ideally with exc_info=True so the stack trace is recorded in logs).

Return a generic validation error message, not str(ve).

Likewise, in the outer except Exception as e: blocks for both /defender/detect and /defender/sanitize, avoid returning str(e) to the client; instead return a generic internal error message while logging e with full details.

This preserves existing functionality (the routes and behavior remain largely the same: still 400 on validation failure and 400 on other failures) but removes the exposure of internal exception text. Concretely:

Around line 230–231 in detect, replace return jsonify(error=str(ve)), 400 with logging plus a generic "Invalid input" message.

Around line 252–254 in detect, replace the error return to use a generic "An internal error has occurred" message and enhance the logging call to capture the traceback.

Around line 272–273 in sanitize, make the same change as for detect’s validation block.

Around line 284–286 in sanitize, make the same change as for detect’s general exception block.

No new methods are strictly required; we can reuse self.logger which is already present. To improve logging, we can add exc_info=True to the self.logger.error calls so stack traces go to the logs, not the client. No new imports are needed.

+
+            except Exception as e:
+                self.logger.error(f"Error in /sanitize: {e}")
+                return jsonify(error=str(e)), 400


In general, to fix information exposure through exceptions, you should avoid returning raw exception objects or messages directly to clients. Instead, log the detailed error on the server side and return a generic, user-safe message (e.g., “An internal error occurred”) or a controlled validation message that you construct yourself.

For this specific endpoint (/defender/sanitize in presidio-analyzer/app.py), we already log the exception with self.logger.error(f"Error in /sanitize: {e}"), which preserves information for diagnostics. The problematic part is returning jsonify(error=str(e)), 400. The best fix, without changing the existing behavior of successful or validation flows, is:

Keep the logging as-is (so developers still see the details).

Replace the response body in the generic except Exception as e: block with a fixed, non-sensitive error message, such as "Internal error while sanitizing text".

Optionally, keep the status code 400 if the API contract depends on it, but the main security issue is the message content, not the status.

Concretely:

In presidio-analyzer/app.py, within the /defender/sanitize route, replace return jsonify(error=str(e)), 400 with a generic message like return jsonify(error="Internal error while processing sanitize request"), 400.

No new imports or helper methods are required; we reuse existing logging and jsonify.

+
+            except Exception as e:
+                self.logger.error(f"Error in /defender/scan: {e}")
+                return jsonify(error=str(e)), 400


In general, the fix is to avoid returning raw exception details to the client. Instead, log the full exception (ideally with traceback) on the server side and send a generic, non-sensitive error message in the HTTP response. This preserves debuggability while preventing information exposure.

For this file, the best minimally invasive fix is:

In /defender/sanitize and /defender/scan handlers, keep logging the exception but change the JSON response to use a generic message such as "An internal error has occurred" rather than str(e).

Optionally, upgrade the logging call to include the full stack trace (exc_info=True), but without changing the existing logging configuration or imports.

Concretely:

Around line 284–287, replace self.logger.error(f"Error in /sanitize: {e}") with self.logger.error("Error in /sanitize", exc_info=True) (or keep the message) and change return jsonify(error=str(e)), 400 to return jsonify(error="An internal error has occurred"), 500 (a server error is more appropriate for an unhandled exception).

Around line 325–327, similarly replace the logging call with one that records the stack trace, and change return jsonify(error=str(e)), 400 to return jsonify(error="An internal error has occurred"), 500.

No new methods or imports are strictly required; we rely on the existing self.logger and jsonify.

fix linting

fix poetry

added: readyz and livez endpoints

+
+                return jsonify({"status": "ok"}), 200
+            except Exception as e:
+                return jsonify({"status": "not ready", "reason": str(e)}), 503


In general, the fix is to avoid returning raw exception details to the client. Instead, log the full exception (optionally with stack trace) on the server side, and return a generic, non-sensitive error message in the HTTP response.

For this specific case, the best fix is to change the /readyz endpoint’s except Exception as e: block so that:

It logs the exception using self.logger.exception(...) or self.logger.error(..., exc_info=True) so developers still have full diagnostic information.

It returns a generic reason string such as "Internal error" or "Unexpected error while checking readiness" rather than str(e).

Concretely, in presidio-analyzer/app.py within the Server.__init__ method, locate the /readyz route and the except Exception as e: at lines 108–109. Replace the body of that except block so that it logs the exception and then returns jsonify({"status": "not ready", "reason": "Internal error"}) (or similar), without using e or str(e) in the response. No new imports are required; logging is already configured and jsonify is already imported.

added graceful shutdowns

Use hardcoded group names instead of github.workflow to ensure ICS and POCs staging builds don't cancel each other.

Fix CI concurrency groups to prevent cross-workflow cancellation

Adds build-publish-image-ics-prod-ecr.yaml workflow that triggers on _prod branch pushes, building the presidio-analyzer Docker image and pushing to the ics-prod ECR registry.

Add prod ECR CI workflow for pii-analyzer

Images are now published to ICS staging ECR only. The POCs staging workflow is no longer needed.

…#13) Triggers on pushes to the _test branch, builds presidio-analyzer with Dockerfile.gliner-edge and publishes to ics-test/cheo-pii-analyzer ECR namespace.

remove openclaw references

remove openclaw from paths

Update all three environment workflows (test, staging, prod) to push images to the renamed pii-analyzer ECR repos per guardian-infra#379.

aramikm added 2 commits February 17, 2026 14:22

added gliner onnx support

117f4fa

committing poetry.lock

35c7227

demisx added 6 commits February 19, 2026 13:26

add build and publish image to ECR workflow

4380a74

add free disk space step

687091f

copy poetry lock to docker image

ce4e62d

build amd64 version only

bb23ea3

correct ECR repo name

c770018

Merge pull request #2 from ProjectLibertyLabs/add-publish-ecr-github-…

e3b59cc

…workflow Add publish ECR GitHub workflow

demisx and others added 7 commits February 19, 2026 14:09

correct branch name to build docker image from

7d19d37

improved credit card and dob and phone number

a58813c

Merge branch 'gliner_integration' of https://github.com/ProjectLibert…

fd40d57

…yLabs/presidio into gliner_integration

Merge pull request #3 from ProjectLibertyLabs/88-publish-pii-analyzer…

6b0ec0c

…-image-to-ics-staging-ecr Add ICS staging ECR workflow for PII Analyzer

added claw defender endpoints

1e64309

Merge pull request #4 from ProjectLibertyLabs/added_claw_defender_end…

25ea7c6

…points added claw defender endpoints

github-advanced-security AI found potential problems Mar 3, 2026

View reviewed changes

aramikm added 8 commits March 3, 2026 15:42

fix linting

8a9015c

Merge pull request #5 from ProjectLibertyLabs/fix_linting

89e99c6

fix linting

fix linting

d91922c

Merge pull request #6 from ProjectLibertyLabs/another_linting

8fb9806

fix linting

fix poetry

01701e8

Merge pull request #7 from ProjectLibertyLabs/fix_poetry_lock

b2d9ddf

fix poetry

added: readyz and livez endpoints

6f4778d

Merge pull request #8 from ProjectLibertyLabs/added_readyz_livez

134233c

added: readyz and livez endpoints

github-advanced-security AI found potential problems Mar 4, 2026

View reviewed changes

added graceful shutdowns

2e6d2b3

aramikm and others added 13 commits March 4, 2026 14:39

Merge pull request #9 from ProjectLibertyLabs/graceful_shutdowns

c80d4a7

added graceful shutdowns

Fix CI concurrency groups to prevent cross-workflow cancellation

a49a9f9

Use hardcoded group names instead of github.workflow to ensure ICS and POCs staging builds don't cancel each other.

Merge pull request #10 from ProjectLibertyLabs/fix-ci-concurrency

bf9343c

Fix CI concurrency groups to prevent cross-workflow cancellation

Add CI workflow to publish pii-analyzer image to prod ECR

69ab1b8

Adds build-publish-image-ics-prod-ecr.yaml workflow that triggers on _prod branch pushes, building the presidio-analyzer Docker image and pushing to the ics-prod ECR registry.

Merge pull request #11 from ProjectLibertyLabs/130-add-prod-ecr-workflow

b1d6acf

Add prod ECR CI workflow for pii-analyzer

Remove POCs staging ECR workflow (#12)

980f350

Images are now published to ICS staging ECR only. The POCs staging workflow is no longer needed.

Add GitHub Actions workflow to publish pii-analyzer image to test ECR (…

80b480c

…#13) Triggers on pushes to the _test branch, builds presidio-analyzer with Dockerfile.gliner-edge and publishes to ics-test/cheo-pii-analyzer ECR namespace.

remove openclaw references

f5567df

remove openclaw references

2474f01

Merge pull request #14 from ProjectLibertyLabs/remove_openclaw

11c847c

remove openclaw references

remove openclaw from paths

929c661

Merge pull request #15 from ProjectLibertyLabs/path_fixes

d46f4f2

remove openclaw from paths

Rename ECR repos from cheo-pii-analyzer to pii-analyzer (#16)

5971e67

Update all three environment workflows (test, staging, prod) to push images to the renamed pii-analyzer ECR repos per guardian-infra#379.

@@ -228,7 +228,8 @@
                                 try:
                                     item = self._validate_input_text(item)
                                 except ValueError as ve:
-                                    return jsonify(error=str(ve)), 400
+                                    self.logger.warning("Validation error in /detect", exc_info=ve)
+                                    return jsonify(error="Invalid input."), 400
                                 if check_type == "url":
                                     findings = claw_validate_url(item)
@@ -250,8 +251,8 @@
                             return jsonify(results if batch_request else results[0]), 200
                         except Exception as e:
-                            self.logger.error(f"Error in /detect: {e}")
-                            return jsonify(error=str(e)), 400
+                            self.logger.error("Error in /detect", exc_info=e)
+                            return jsonify(error="An internal error has occurred."), 400
                     @self.app.route("/defender/sanitize", methods=["POST"])
                     def sanitize() -> Tuple[str, int]:
@@ -270,7 +271,8 @@
                                 try:
                                     item = self._validate_input_text(item)
                                 except ValueError as ve:
-                                    return jsonify(error=str(ve)), 400
+                                    self.logger.warning("Validation error in /sanitize", exc_info=ve)
+                                    return jsonify(error="Invalid input."), 400
                                 san_result = claw_sanitize(item)
                                 result = {
                                     "text": item,
@@ -282,8 +284,8 @@
                             return jsonify(results if batch_request else results[0]), 200
                         except Exception as e:
-                            self.logger.error(f"Error in /sanitize: {e}")
-                            return jsonify(error=str(e)), 400
+                            self.logger.error("Error in /sanitize", exc_info=e)
+                            return jsonify(error="An internal error has occurred."), 400
                     @self.app.route("/defender/scan", methods=["POST"])
                     def scan() -> Tuple[str, int]:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

added gliner onnx support#1

added gliner onnx support#1
aramikm wants to merge 37 commits into
mainfrom
gliner_integration

aramikm commented Feb 18, 2026

Uh oh!

github-actions Bot commented Feb 19, 2026

Uh oh!

github-actions Bot commented Feb 19, 2026

Uh oh!

github-actions Bot commented Feb 19, 2026

Uh oh!

github-actions Bot commented Feb 19, 2026

Uh oh!

Check warning

Copilot Autofix

Check warning

Copilot Autofix

Check warning

Copilot Autofix

Check warning

Copilot Autofix

Check warning

Copilot Autofix

Check warning

Copilot Autofix

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

@@ -105,8 +105,13 @@
                                 return jsonify({"status": "not ready", "reason": reason}), 503
                             return jsonify({"status": "ok"}), 200
-                        except Exception as e:
-                            return jsonify({"status": "not ready", "reason": str(e)}), 503
+                        except Exception:
+                            self.logger.exception(
+                                "Unexpected error while performing readiness check in /readyz"
+                            )
+                            return jsonify(
+                                {"status": "not ready", "reason": "Internal error"}
+                            ), 503
                     @self.app.route("/analyze", methods=["POST"])
                     def analyze() -> Tuple[str, int]:

Uh oh!

Conversation

aramikm commented Feb 18, 2026

Change Description

Issue reference

Checklist

Uh oh!

github-actions Bot commented Feb 19, 2026

Coverage report (presidio-anonymizer)

Uh oh!

github-actions Bot commented Feb 19, 2026

Coverage report (presidio-cli)

Uh oh!

github-actions Bot commented Feb 19, 2026

Coverage report (presidio-structured)

Uh oh!

github-actions Bot commented Feb 19, 2026

Coverage report (presidio-image-redactor)

Uh oh!

Check warning

Uh oh!

Copilot Autofix

Check warning

Uh oh!

Copilot Autofix

Check warning

Uh oh!

Copilot Autofix

Check warning

Uh oh!

Copilot Autofix

Check warning

Uh oh!

Copilot Autofix

Check warning

Uh oh!

Copilot Autofix

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants