added gliner onnx support#1
Conversation
…workflow Add publish ECR GitHub workflow
…yLabs/presidio into gliner_integration
Rename existing workflow to build-publish-image-pocs-stg-ecr.yaml for consistency. Create new build-publish-image-ics-stg-ecr.yaml that publishes presidio-analyzer image to ics-stg/cheo-pii-analyzer ECR repo using ICS staging AWS credentials. Closes ProjectLibertyLabs/guardian-infra#88
…-image-to-ics-staging-ecr Add ICS staging ECR workflow for PII Analyzer
…points added claw defender endpoints
| try: | ||
| item = self._validate_input_text(item) | ||
| except ValueError as ve: | ||
| return jsonify(error=str(ve)), 400 |
Check warning
Code scanning / CodeQL
Information exposure through an exception Medium
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 2 months ago
To fix the problem, the endpoint should not return the raw exception text (str(ve)) to the client. Instead, it should return a generic, controlled error message, while logging the exception server‑side for debugging. This aligns with the recommendation to avoid leaking stack traces or internal details.
Concretely, in presidio-analyzer/app.py, inside the /defender/detect route, the inner except ValueError as ve: block at lines 230–231 should be changed so that:
- The exception is logged using
self.logger(which already exists and is used in the outer handler). - The HTTP response body contains a generic validation error message that does not include
vecontent.
We can implement this by:
- Replacing
return jsonify(error=str(ve)), 400with logging plus a generic message, e.g.,return jsonify(error="Invalid input text."), 400. - Optionally including a short, non‑sensitive hint such as “Invalid input text.” or “Input validation failed.” which does not depend on
ve.
No new imports, methods, or additional definitions are needed, because self.logger and jsonify are already available in this context.
| @@ -228,7 +228,8 @@ | ||
| try: | ||
| item = self._validate_input_text(item) | ||
| except ValueError as ve: | ||
| return jsonify(error=str(ve)), 400 | ||
| self.logger.warning(f"Invalid input text in /defender/detect: {ve}") | ||
| return jsonify(error="Invalid input text."), 400 | ||
|
|
||
| if check_type == "url": | ||
| findings = claw_validate_url(item) |
|
|
||
| except Exception as e: | ||
| self.logger.error(f"Error in /detect: {e}") | ||
| return jsonify(error=str(e)), 400 |
Check warning
Code scanning / CodeQL
Information exposure through an exception Medium
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 2 months ago
In general, the fix is to avoid returning raw exception information to the client. Instead, log the detailed exception server-side and send a generic, non-sensitive error message in the HTTP response. This preserves debugging capability via logs but prevents potential attackers from learning about internal implementation details.
Concretely for this file, the best minimal change is:
- Keep the
self.logger.error(f"Error in /detect: {e}")line as-is (or improve it later if desired). - Change the response in the
except Exception as e:block of the/defender/detectroute to return a fixed, generic error string instead ofstr(e). For example:return jsonify(error="An internal error occurred while processing the request."), 400.
We only need to edit presidio-analyzer/app.py inside the /defender/detect handler’s except block (around lines 252–254). No new methods or imports are required to implement this change, and existing functionality (HTTP status codes, JSON shape, etc.) will be preserved aside from hiding the internal error text.
| @@ -251,7 +251,7 @@ | ||
|
|
||
| except Exception as e: | ||
| self.logger.error(f"Error in /detect: {e}") | ||
| return jsonify(error=str(e)), 400 | ||
| return jsonify(error="An internal error occurred while processing the request."), 400 | ||
|
|
||
| @self.app.route("/defender/sanitize", methods=["POST"]) | ||
| def sanitize() -> Tuple[str, int]: |
| try: | ||
| item = self._validate_input_text(item) | ||
| except ValueError as ve: | ||
| return jsonify(error=str(ve)), 400 |
Check warning
Code scanning / CodeQL
Information exposure through an exception Medium
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 2 months ago
In general, to fix this kind of issue you should avoid sending raw exception messages (which may contain stack traces or internal details) back to clients. Instead, log the detailed error server-side and return a generic, high-level description to the client, such as "Invalid input" or "An internal error occurred", possibly accompanied by a safe error code.
For this file, the best minimal fix is:
- In the
/defender/sanitizeroute, change the innerexcept ValueError as ve:to:- Log
ve(ideally withexc_info=Trueso the stack trace is recorded in logs). - Return a generic validation error message, not
str(ve).
- Log
- Likewise, in the outer
except Exception as e:blocks for both/defender/detectand/defender/sanitize, avoid returningstr(e)to the client; instead return a generic internal error message while loggingewith full details.
This preserves existing functionality (the routes and behavior remain largely the same: still 400 on validation failure and 400 on other failures) but removes the exposure of internal exception text. Concretely:
- Around line 230–231 in
detect, replacereturn jsonify(error=str(ve)), 400with logging plus a generic "Invalid input" message. - Around line 252–254 in
detect, replace the error return to use a generic "An internal error has occurred" message and enhance the logging call to capture the traceback. - Around line 272–273 in
sanitize, make the same change as for detect’s validation block. - Around line 284–286 in
sanitize, make the same change as for detect’s general exception block.
No new methods are strictly required; we can reuse self.logger which is already present. To improve logging, we can add exc_info=True to the self.logger.error calls so stack traces go to the logs, not the client. No new imports are needed.
| @@ -228,7 +228,8 @@ | ||
| try: | ||
| item = self._validate_input_text(item) | ||
| except ValueError as ve: | ||
| return jsonify(error=str(ve)), 400 | ||
| self.logger.warning("Validation error in /detect", exc_info=ve) | ||
| return jsonify(error="Invalid input."), 400 | ||
|
|
||
| if check_type == "url": | ||
| findings = claw_validate_url(item) | ||
| @@ -250,8 +251,8 @@ | ||
| return jsonify(results if batch_request else results[0]), 200 | ||
|
|
||
| except Exception as e: | ||
| self.logger.error(f"Error in /detect: {e}") | ||
| return jsonify(error=str(e)), 400 | ||
| self.logger.error("Error in /detect", exc_info=e) | ||
| return jsonify(error="An internal error has occurred."), 400 | ||
|
|
||
| @self.app.route("/defender/sanitize", methods=["POST"]) | ||
| def sanitize() -> Tuple[str, int]: | ||
| @@ -270,7 +271,8 @@ | ||
| try: | ||
| item = self._validate_input_text(item) | ||
| except ValueError as ve: | ||
| return jsonify(error=str(ve)), 400 | ||
| self.logger.warning("Validation error in /sanitize", exc_info=ve) | ||
| return jsonify(error="Invalid input."), 400 | ||
| san_result = claw_sanitize(item) | ||
| result = { | ||
| "text": item, | ||
| @@ -282,8 +284,8 @@ | ||
| return jsonify(results if batch_request else results[0]), 200 | ||
|
|
||
| except Exception as e: | ||
| self.logger.error(f"Error in /sanitize: {e}") | ||
| return jsonify(error=str(e)), 400 | ||
| self.logger.error("Error in /sanitize", exc_info=e) | ||
| return jsonify(error="An internal error has occurred."), 400 | ||
|
|
||
| @self.app.route("/defender/scan", methods=["POST"]) | ||
| def scan() -> Tuple[str, int]: |
|
|
||
| except Exception as e: | ||
| self.logger.error(f"Error in /sanitize: {e}") | ||
| return jsonify(error=str(e)), 400 |
Check warning
Code scanning / CodeQL
Information exposure through an exception Medium
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 2 months ago
In general, to fix information exposure through exceptions, you should avoid returning raw exception objects or messages directly to clients. Instead, log the detailed error on the server side and return a generic, user-safe message (e.g., “An internal error occurred”) or a controlled validation message that you construct yourself.
For this specific endpoint (/defender/sanitize in presidio-analyzer/app.py), we already log the exception with self.logger.error(f"Error in /sanitize: {e}"), which preserves information for diagnostics. The problematic part is returning jsonify(error=str(e)), 400. The best fix, without changing the existing behavior of successful or validation flows, is:
- Keep the logging as-is (so developers still see the details).
- Replace the response body in the generic
except Exception as e:block with a fixed, non-sensitive error message, such as"Internal error while sanitizing text". - Optionally, keep the status code 400 if the API contract depends on it, but the main security issue is the message content, not the status.
Concretely:
- In
presidio-analyzer/app.py, within the/defender/sanitizeroute, replacereturn jsonify(error=str(e)), 400with a generic message likereturn jsonify(error="Internal error while processing sanitize request"), 400. - No new imports or helper methods are required; we reuse existing logging and
jsonify.
| @@ -283,7 +283,7 @@ | ||
|
|
||
| except Exception as e: | ||
| self.logger.error(f"Error in /sanitize: {e}") | ||
| return jsonify(error=str(e)), 400 | ||
| return jsonify(error="Internal error while processing sanitize request"), 400 | ||
|
|
||
| @self.app.route("/defender/scan", methods=["POST"]) | ||
| def scan() -> Tuple[str, int]: |
|
|
||
| except Exception as e: | ||
| self.logger.error(f"Error in /defender/scan: {e}") | ||
| return jsonify(error=str(e)), 400 |
Check warning
Code scanning / CodeQL
Information exposure through an exception Medium
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 2 months ago
In general, the fix is to avoid returning raw exception details to the client. Instead, log the full exception (ideally with traceback) on the server side and send a generic, non-sensitive error message in the HTTP response. This preserves debuggability while preventing information exposure.
For this file, the best minimally invasive fix is:
- In
/defender/sanitizeand/defender/scanhandlers, keep logging the exception but change the JSON response to use a generic message such as"An internal error has occurred"rather thanstr(e). - Optionally, upgrade the logging call to include the full stack trace (
exc_info=True), but without changing the existing logging configuration or imports.
Concretely:
- Around line 284–287, replace
self.logger.error(f"Error in /sanitize: {e}")withself.logger.error("Error in /sanitize", exc_info=True)(or keep the message) and changereturn jsonify(error=str(e)), 400toreturn jsonify(error="An internal error has occurred"), 500(a server error is more appropriate for an unhandled exception). - Around line 325–327, similarly replace the logging call with one that records the stack trace, and change
return jsonify(error=str(e)), 400toreturn jsonify(error="An internal error has occurred"), 500.
No new methods or imports are strictly required; we rely on the existing self.logger and jsonify.
| @@ -282,8 +282,8 @@ | ||
| return jsonify(results if batch_request else results[0]), 200 | ||
|
|
||
| except Exception as e: | ||
| self.logger.error(f"Error in /sanitize: {e}") | ||
| return jsonify(error=str(e)), 400 | ||
| self.logger.error("Error in /sanitize", exc_info=True) | ||
| return jsonify(error="An internal error has occurred"), 500 | ||
|
|
||
| @self.app.route("/defender/scan", methods=["POST"]) | ||
| def scan() -> Tuple[str, int]: | ||
| @@ -323,8 +323,8 @@ | ||
| return jsonify(response), 200 | ||
|
|
||
| except Exception as e: | ||
| self.logger.error(f"Error in /defender/scan: {e}") | ||
| return jsonify(error=str(e)), 400 | ||
| self.logger.error("Error in /defender/scan", exc_info=True) | ||
| return jsonify(error="An internal error has occurred"), 500 | ||
|
|
||
| @self.app.errorhandler(HTTPException) | ||
| def http_exception(e): |
added: readyz and livez endpoints
|
|
||
| return jsonify({"status": "ok"}), 200 | ||
| except Exception as e: | ||
| return jsonify({"status": "not ready", "reason": str(e)}), 503 |
Check warning
Code scanning / CodeQL
Information exposure through an exception Medium
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 2 months ago
In general, the fix is to avoid returning raw exception details to the client. Instead, log the full exception (optionally with stack trace) on the server side, and return a generic, non-sensitive error message in the HTTP response.
For this specific case, the best fix is to change the /readyz endpoint’s except Exception as e: block so that:
- It logs the exception using
self.logger.exception(...)orself.logger.error(..., exc_info=True)so developers still have full diagnostic information. - It returns a generic reason string such as
"Internal error"or"Unexpected error while checking readiness"rather thanstr(e).
Concretely, in presidio-analyzer/app.py within the Server.__init__ method, locate the /readyz route and the except Exception as e: at lines 108–109. Replace the body of that except block so that it logs the exception and then returns jsonify({"status": "not ready", "reason": "Internal error"}) (or similar), without using e or str(e) in the response. No new imports are required; logging is already configured and jsonify is already imported.
| @@ -105,8 +105,13 @@ | ||
| return jsonify({"status": "not ready", "reason": reason}), 503 | ||
|
|
||
| return jsonify({"status": "ok"}), 200 | ||
| except Exception as e: | ||
| return jsonify({"status": "not ready", "reason": str(e)}), 503 | ||
| except Exception: | ||
| self.logger.exception( | ||
| "Unexpected error while performing readiness check in /readyz" | ||
| ) | ||
| return jsonify( | ||
| {"status": "not ready", "reason": "Internal error"} | ||
| ), 503 | ||
|
|
||
| @self.app.route("/analyze", methods=["POST"]) | ||
| def analyze() -> Tuple[str, int]: |
added graceful shutdowns
Use hardcoded group names instead of github.workflow to ensure ICS and POCs staging builds don't cancel each other.
Fix CI concurrency groups to prevent cross-workflow cancellation
Adds build-publish-image-ics-prod-ecr.yaml workflow that triggers on _prod branch pushes, building the presidio-analyzer Docker image and pushing to the ics-prod ECR registry.
Add prod ECR CI workflow for pii-analyzer
Images are now published to ICS staging ECR only. The POCs staging workflow is no longer needed.
…#13) Triggers on pushes to the _test branch, builds presidio-analyzer with Dockerfile.gliner-edge and publishes to ics-test/cheo-pii-analyzer ECR namespace.
remove openclaw references
remove openclaw from paths
Update all three environment workflows (test, staging, prod) to push images to the renamed pii-analyzer ECR repos per guardian-infra#379.
Change Description
Describe your changes
Issue reference
Fixes #XX
Checklist