Skip to content

added gliner onnx support#1

Draft
aramikm wants to merge 37 commits into
mainfrom
gliner_integration
Draft

added gliner onnx support#1
aramikm wants to merge 37 commits into
mainfrom
gliner_integration

Conversation

@aramikm
Copy link
Copy Markdown
Collaborator

@aramikm aramikm commented Feb 18, 2026

Change Description

Describe your changes

Issue reference

Fixes #XX

Checklist

  • I have reviewed the contribution guidelines
  • I have signed the CLA (if required)
  • My code includes unit tests
  • All unit tests and lint checks pass locally
  • My PR contains documentation updates / additions if required

@github-actions
Copy link
Copy Markdown

Coverage report (presidio-anonymizer)

This PR does not seem to contain any modification to coverable code.

@github-actions
Copy link
Copy Markdown

Coverage report (presidio-cli)

This PR does not seem to contain any modification to coverable code.

@github-actions
Copy link
Copy Markdown

Coverage report (presidio-structured)

This PR does not seem to contain any modification to coverable code.

@github-actions
Copy link
Copy Markdown

Coverage report (presidio-image-redactor)

This PR does not seem to contain any modification to coverable code.

demisx and others added 7 commits February 19, 2026 14:09
Rename existing workflow to build-publish-image-pocs-stg-ecr.yaml
for consistency. Create new build-publish-image-ics-stg-ecr.yaml
that publishes presidio-analyzer image to ics-stg/cheo-pii-analyzer
ECR repo using ICS staging AWS credentials.

Closes ProjectLibertyLabs/guardian-infra#88
…-image-to-ics-staging-ecr

Add ICS staging ECR workflow for PII Analyzer
Comment thread presidio-analyzer/app.py
try:
item = self._validate_input_text(item)
except ValueError as ve:
return jsonify(error=str(ve)), 400

Check warning

Code scanning / CodeQL

Information exposure through an exception Medium

Stack trace information
flows to this location and may be exposed to an external user.

Copilot Autofix

AI 2 months ago

To fix the problem, the endpoint should not return the raw exception text (str(ve)) to the client. Instead, it should return a generic, controlled error message, while logging the exception server‑side for debugging. This aligns with the recommendation to avoid leaking stack traces or internal details.

Concretely, in presidio-analyzer/app.py, inside the /defender/detect route, the inner except ValueError as ve: block at lines 230–231 should be changed so that:

  • The exception is logged using self.logger (which already exists and is used in the outer handler).
  • The HTTP response body contains a generic validation error message that does not include ve content.

We can implement this by:

  • Replacing return jsonify(error=str(ve)), 400 with logging plus a generic message, e.g., return jsonify(error="Invalid input text."), 400.
  • Optionally including a short, non‑sensitive hint such as “Invalid input text.” or “Input validation failed.” which does not depend on ve.

No new imports, methods, or additional definitions are needed, because self.logger and jsonify are already available in this context.

Suggested changeset 1
presidio-analyzer/app.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/presidio-analyzer/app.py b/presidio-analyzer/app.py
--- a/presidio-analyzer/app.py
+++ b/presidio-analyzer/app.py
@@ -228,7 +228,8 @@
                     try:
                         item = self._validate_input_text(item)
                     except ValueError as ve:
-                        return jsonify(error=str(ve)), 400
+                        self.logger.warning(f"Invalid input text in /defender/detect: {ve}")
+                        return jsonify(error="Invalid input text."), 400
 
                     if check_type == "url":
                         findings = claw_validate_url(item)
EOF
@@ -228,7 +228,8 @@
try:
item = self._validate_input_text(item)
except ValueError as ve:
return jsonify(error=str(ve)), 400
self.logger.warning(f"Invalid input text in /defender/detect: {ve}")
return jsonify(error="Invalid input text."), 400

if check_type == "url":
findings = claw_validate_url(item)
Copilot is powered by AI and may make mistakes. Always verify output.
Comment thread presidio-analyzer/app.py

except Exception as e:
self.logger.error(f"Error in /detect: {e}")
return jsonify(error=str(e)), 400

Check warning

Code scanning / CodeQL

Information exposure through an exception Medium

Stack trace information
flows to this location and may be exposed to an external user.

Copilot Autofix

AI 2 months ago

In general, the fix is to avoid returning raw exception information to the client. Instead, log the detailed exception server-side and send a generic, non-sensitive error message in the HTTP response. This preserves debugging capability via logs but prevents potential attackers from learning about internal implementation details.

Concretely for this file, the best minimal change is:

  • Keep the self.logger.error(f"Error in /detect: {e}") line as-is (or improve it later if desired).
  • Change the response in the except Exception as e: block of the /defender/detect route to return a fixed, generic error string instead of str(e). For example: return jsonify(error="An internal error occurred while processing the request."), 400.

We only need to edit presidio-analyzer/app.py inside the /defender/detect handler’s except block (around lines 252–254). No new methods or imports are required to implement this change, and existing functionality (HTTP status codes, JSON shape, etc.) will be preserved aside from hiding the internal error text.

Suggested changeset 1
presidio-analyzer/app.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/presidio-analyzer/app.py b/presidio-analyzer/app.py
--- a/presidio-analyzer/app.py
+++ b/presidio-analyzer/app.py
@@ -251,7 +251,7 @@
 
             except Exception as e:
                 self.logger.error(f"Error in /detect: {e}")
-                return jsonify(error=str(e)), 400
+                return jsonify(error="An internal error occurred while processing the request."), 400
 
         @self.app.route("/defender/sanitize", methods=["POST"])
         def sanitize() -> Tuple[str, int]:
EOF
@@ -251,7 +251,7 @@

except Exception as e:
self.logger.error(f"Error in /detect: {e}")
return jsonify(error=str(e)), 400
return jsonify(error="An internal error occurred while processing the request."), 400

@self.app.route("/defender/sanitize", methods=["POST"])
def sanitize() -> Tuple[str, int]:
Copilot is powered by AI and may make mistakes. Always verify output.
Comment thread presidio-analyzer/app.py
try:
item = self._validate_input_text(item)
except ValueError as ve:
return jsonify(error=str(ve)), 400

Check warning

Code scanning / CodeQL

Information exposure through an exception Medium

Stack trace information
flows to this location and may be exposed to an external user.

Copilot Autofix

AI 2 months ago

In general, to fix this kind of issue you should avoid sending raw exception messages (which may contain stack traces or internal details) back to clients. Instead, log the detailed error server-side and return a generic, high-level description to the client, such as "Invalid input" or "An internal error occurred", possibly accompanied by a safe error code.

For this file, the best minimal fix is:

  • In the /defender/sanitize route, change the inner except ValueError as ve: to:
    • Log ve (ideally with exc_info=True so the stack trace is recorded in logs).
    • Return a generic validation error message, not str(ve).
  • Likewise, in the outer except Exception as e: blocks for both /defender/detect and /defender/sanitize, avoid returning str(e) to the client; instead return a generic internal error message while logging e with full details.

This preserves existing functionality (the routes and behavior remain largely the same: still 400 on validation failure and 400 on other failures) but removes the exposure of internal exception text. Concretely:

  • Around line 230–231 in detect, replace return jsonify(error=str(ve)), 400 with logging plus a generic "Invalid input" message.
  • Around line 252–254 in detect, replace the error return to use a generic "An internal error has occurred" message and enhance the logging call to capture the traceback.
  • Around line 272–273 in sanitize, make the same change as for detect’s validation block.
  • Around line 284–286 in sanitize, make the same change as for detect’s general exception block.

No new methods are strictly required; we can reuse self.logger which is already present. To improve logging, we can add exc_info=True to the self.logger.error calls so stack traces go to the logs, not the client. No new imports are needed.

Suggested changeset 1
presidio-analyzer/app.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/presidio-analyzer/app.py b/presidio-analyzer/app.py
--- a/presidio-analyzer/app.py
+++ b/presidio-analyzer/app.py
@@ -228,7 +228,8 @@
                     try:
                         item = self._validate_input_text(item)
                     except ValueError as ve:
-                        return jsonify(error=str(ve)), 400
+                        self.logger.warning("Validation error in /detect", exc_info=ve)
+                        return jsonify(error="Invalid input."), 400
 
                     if check_type == "url":
                         findings = claw_validate_url(item)
@@ -250,8 +251,8 @@
                 return jsonify(results if batch_request else results[0]), 200
 
             except Exception as e:
-                self.logger.error(f"Error in /detect: {e}")
-                return jsonify(error=str(e)), 400
+                self.logger.error("Error in /detect", exc_info=e)
+                return jsonify(error="An internal error has occurred."), 400
 
         @self.app.route("/defender/sanitize", methods=["POST"])
         def sanitize() -> Tuple[str, int]:
@@ -270,7 +271,8 @@
                     try:
                         item = self._validate_input_text(item)
                     except ValueError as ve:
-                        return jsonify(error=str(ve)), 400
+                        self.logger.warning("Validation error in /sanitize", exc_info=ve)
+                        return jsonify(error="Invalid input."), 400
                     san_result = claw_sanitize(item)
                     result = {
                         "text": item,
@@ -282,8 +284,8 @@
                 return jsonify(results if batch_request else results[0]), 200
 
             except Exception as e:
-                self.logger.error(f"Error in /sanitize: {e}")
-                return jsonify(error=str(e)), 400
+                self.logger.error("Error in /sanitize", exc_info=e)
+                return jsonify(error="An internal error has occurred."), 400
 
         @self.app.route("/defender/scan", methods=["POST"])
         def scan() -> Tuple[str, int]:
EOF
@@ -228,7 +228,8 @@
try:
item = self._validate_input_text(item)
except ValueError as ve:
return jsonify(error=str(ve)), 400
self.logger.warning("Validation error in /detect", exc_info=ve)
return jsonify(error="Invalid input."), 400

if check_type == "url":
findings = claw_validate_url(item)
@@ -250,8 +251,8 @@
return jsonify(results if batch_request else results[0]), 200

except Exception as e:
self.logger.error(f"Error in /detect: {e}")
return jsonify(error=str(e)), 400
self.logger.error("Error in /detect", exc_info=e)
return jsonify(error="An internal error has occurred."), 400

@self.app.route("/defender/sanitize", methods=["POST"])
def sanitize() -> Tuple[str, int]:
@@ -270,7 +271,8 @@
try:
item = self._validate_input_text(item)
except ValueError as ve:
return jsonify(error=str(ve)), 400
self.logger.warning("Validation error in /sanitize", exc_info=ve)
return jsonify(error="Invalid input."), 400
san_result = claw_sanitize(item)
result = {
"text": item,
@@ -282,8 +284,8 @@
return jsonify(results if batch_request else results[0]), 200

except Exception as e:
self.logger.error(f"Error in /sanitize: {e}")
return jsonify(error=str(e)), 400
self.logger.error("Error in /sanitize", exc_info=e)
return jsonify(error="An internal error has occurred."), 400

@self.app.route("/defender/scan", methods=["POST"])
def scan() -> Tuple[str, int]:
Copilot is powered by AI and may make mistakes. Always verify output.
Comment thread presidio-analyzer/app.py

except Exception as e:
self.logger.error(f"Error in /sanitize: {e}")
return jsonify(error=str(e)), 400

Check warning

Code scanning / CodeQL

Information exposure through an exception Medium

Stack trace information
flows to this location and may be exposed to an external user.

Copilot Autofix

AI 2 months ago

In general, to fix information exposure through exceptions, you should avoid returning raw exception objects or messages directly to clients. Instead, log the detailed error on the server side and return a generic, user-safe message (e.g., “An internal error occurred”) or a controlled validation message that you construct yourself.

For this specific endpoint (/defender/sanitize in presidio-analyzer/app.py), we already log the exception with self.logger.error(f"Error in /sanitize: {e}"), which preserves information for diagnostics. The problematic part is returning jsonify(error=str(e)), 400. The best fix, without changing the existing behavior of successful or validation flows, is:

  • Keep the logging as-is (so developers still see the details).
  • Replace the response body in the generic except Exception as e: block with a fixed, non-sensitive error message, such as "Internal error while sanitizing text".
  • Optionally, keep the status code 400 if the API contract depends on it, but the main security issue is the message content, not the status.

Concretely:

  • In presidio-analyzer/app.py, within the /defender/sanitize route, replace return jsonify(error=str(e)), 400 with a generic message like return jsonify(error="Internal error while processing sanitize request"), 400.
  • No new imports or helper methods are required; we reuse existing logging and jsonify.
Suggested changeset 1
presidio-analyzer/app.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/presidio-analyzer/app.py b/presidio-analyzer/app.py
--- a/presidio-analyzer/app.py
+++ b/presidio-analyzer/app.py
@@ -283,7 +283,7 @@
 
             except Exception as e:
                 self.logger.error(f"Error in /sanitize: {e}")
-                return jsonify(error=str(e)), 400
+                return jsonify(error="Internal error while processing sanitize request"), 400
 
         @self.app.route("/defender/scan", methods=["POST"])
         def scan() -> Tuple[str, int]:
EOF
@@ -283,7 +283,7 @@

except Exception as e:
self.logger.error(f"Error in /sanitize: {e}")
return jsonify(error=str(e)), 400
return jsonify(error="Internal error while processing sanitize request"), 400

@self.app.route("/defender/scan", methods=["POST"])
def scan() -> Tuple[str, int]:
Copilot is powered by AI and may make mistakes. Always verify output.
Comment thread presidio-analyzer/app.py

except Exception as e:
self.logger.error(f"Error in /defender/scan: {e}")
return jsonify(error=str(e)), 400

Check warning

Code scanning / CodeQL

Information exposure through an exception Medium

Stack trace information
flows to this location and may be exposed to an external user.

Copilot Autofix

AI 2 months ago

In general, the fix is to avoid returning raw exception details to the client. Instead, log the full exception (ideally with traceback) on the server side and send a generic, non-sensitive error message in the HTTP response. This preserves debuggability while preventing information exposure.

For this file, the best minimally invasive fix is:

  • In /defender/sanitize and /defender/scan handlers, keep logging the exception but change the JSON response to use a generic message such as "An internal error has occurred" rather than str(e).
  • Optionally, upgrade the logging call to include the full stack trace (exc_info=True), but without changing the existing logging configuration or imports.

Concretely:

  • Around line 284–287, replace self.logger.error(f"Error in /sanitize: {e}") with self.logger.error("Error in /sanitize", exc_info=True) (or keep the message) and change return jsonify(error=str(e)), 400 to return jsonify(error="An internal error has occurred"), 500 (a server error is more appropriate for an unhandled exception).
  • Around line 325–327, similarly replace the logging call with one that records the stack trace, and change return jsonify(error=str(e)), 400 to return jsonify(error="An internal error has occurred"), 500.

No new methods or imports are strictly required; we rely on the existing self.logger and jsonify.

Suggested changeset 1
presidio-analyzer/app.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/presidio-analyzer/app.py b/presidio-analyzer/app.py
--- a/presidio-analyzer/app.py
+++ b/presidio-analyzer/app.py
@@ -282,8 +282,8 @@
                 return jsonify(results if batch_request else results[0]), 200
 
             except Exception as e:
-                self.logger.error(f"Error in /sanitize: {e}")
-                return jsonify(error=str(e)), 400
+                self.logger.error("Error in /sanitize", exc_info=True)
+                return jsonify(error="An internal error has occurred"), 500
 
         @self.app.route("/defender/scan", methods=["POST"])
         def scan() -> Tuple[str, int]:
@@ -323,8 +323,8 @@
                 return jsonify(response), 200
 
             except Exception as e:
-                self.logger.error(f"Error in /defender/scan: {e}")
-                return jsonify(error=str(e)), 400
+                self.logger.error("Error in /defender/scan", exc_info=True)
+                return jsonify(error="An internal error has occurred"), 500
 
         @self.app.errorhandler(HTTPException)
         def http_exception(e):
EOF
@@ -282,8 +282,8 @@
return jsonify(results if batch_request else results[0]), 200

except Exception as e:
self.logger.error(f"Error in /sanitize: {e}")
return jsonify(error=str(e)), 400
self.logger.error("Error in /sanitize", exc_info=True)
return jsonify(error="An internal error has occurred"), 500

@self.app.route("/defender/scan", methods=["POST"])
def scan() -> Tuple[str, int]:
@@ -323,8 +323,8 @@
return jsonify(response), 200

except Exception as e:
self.logger.error(f"Error in /defender/scan: {e}")
return jsonify(error=str(e)), 400
self.logger.error("Error in /defender/scan", exc_info=True)
return jsonify(error="An internal error has occurred"), 500

@self.app.errorhandler(HTTPException)
def http_exception(e):
Copilot is powered by AI and may make mistakes. Always verify output.
Comment thread presidio-analyzer/app.py

return jsonify({"status": "ok"}), 200
except Exception as e:
return jsonify({"status": "not ready", "reason": str(e)}), 503

Check warning

Code scanning / CodeQL

Information exposure through an exception Medium

Stack trace information
flows to this location and may be exposed to an external user.

Copilot Autofix

AI 2 months ago

In general, the fix is to avoid returning raw exception details to the client. Instead, log the full exception (optionally with stack trace) on the server side, and return a generic, non-sensitive error message in the HTTP response.

For this specific case, the best fix is to change the /readyz endpoint’s except Exception as e: block so that:

  • It logs the exception using self.logger.exception(...) or self.logger.error(..., exc_info=True) so developers still have full diagnostic information.
  • It returns a generic reason string such as "Internal error" or "Unexpected error while checking readiness" rather than str(e).

Concretely, in presidio-analyzer/app.py within the Server.__init__ method, locate the /readyz route and the except Exception as e: at lines 108–109. Replace the body of that except block so that it logs the exception and then returns jsonify({"status": "not ready", "reason": "Internal error"}) (or similar), without using e or str(e) in the response. No new imports are required; logging is already configured and jsonify is already imported.

Suggested changeset 1
presidio-analyzer/app.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/presidio-analyzer/app.py b/presidio-analyzer/app.py
--- a/presidio-analyzer/app.py
+++ b/presidio-analyzer/app.py
@@ -105,8 +105,13 @@
                     return jsonify({"status": "not ready", "reason": reason}), 503
 
                 return jsonify({"status": "ok"}), 200
-            except Exception as e:
-                return jsonify({"status": "not ready", "reason": str(e)}), 503
+            except Exception:
+                self.logger.exception(
+                    "Unexpected error while performing readiness check in /readyz"
+                )
+                return jsonify(
+                    {"status": "not ready", "reason": "Internal error"}
+                ), 503
 
         @self.app.route("/analyze", methods=["POST"])
         def analyze() -> Tuple[str, int]:
EOF
@@ -105,8 +105,13 @@
return jsonify({"status": "not ready", "reason": reason}), 503

return jsonify({"status": "ok"}), 200
except Exception as e:
return jsonify({"status": "not ready", "reason": str(e)}), 503
except Exception:
self.logger.exception(
"Unexpected error while performing readiness check in /readyz"
)
return jsonify(
{"status": "not ready", "reason": "Internal error"}
), 503

@self.app.route("/analyze", methods=["POST"])
def analyze() -> Tuple[str, int]:
Copilot is powered by AI and may make mistakes. Always verify output.
aramikm and others added 13 commits March 4, 2026 14:39
Use hardcoded group names instead of github.workflow to ensure
ICS and POCs staging builds don't cancel each other.
Fix CI concurrency groups to prevent cross-workflow cancellation
Adds build-publish-image-ics-prod-ecr.yaml workflow that triggers on
_prod branch pushes, building the presidio-analyzer Docker image and
pushing to the ics-prod ECR registry.
Images are now published to ICS staging ECR only. The POCs staging
workflow is no longer needed.
…#13)

Triggers on pushes to the _test branch, builds presidio-analyzer with
Dockerfile.gliner-edge and publishes to ics-test/cheo-pii-analyzer
ECR namespace.
Update all three environment workflows (test, staging, prod) to push
images to the renamed pii-analyzer ECR repos per guardian-infra#379.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants