fix: support Amazon WAF aamation captcha challenge#38
Conversation
WalkthroughAdds an interceptor framework and integrates it into the proxy request/response pipeline; exposes interceptor types; implements interceptor lifecycle hooks, AJAX-aware header filtering, raw/multipart body handling, form-action absolutization, multi-host routing via interceptors, an Amazon WAF example, and comprehensive tests. Changes
Sequence DiagramsequenceDiagram
participant Client as Client
participant Proxy as AuthCaptureProxy
participant Host as Target Host
participant CVF as CVF/Verification
Client->>Proxy: HTTP request (page/login/AJAX)
Proxy->>Proxy: run interceptors.on_request(ctx)
alt interceptor sets ctx.short_circuit
Proxy-->>Client: return short_circuit response
else
Proxy->>Host: Forward request (headers/body rewritten)
Host-->>Proxy: Response (HTML or AJAX)
Proxy->>Proxy: run interceptors.on_response(ctx)
alt is AJAX and HTML
Proxy->>Proxy: run interceptors.on_ajax_html(ctx)
Proxy-->>Client: filtered AJAX headers + body
else is page HTML
Proxy->>Proxy: absolutize form actions
Proxy->>Proxy: run interceptors.on_page_html(ctx)
Proxy-->>Client: injected page HTML
else non-HTML AJAX
Proxy-->>Client: filtered headers + body
end
Client->>Proxy: AJAX/CVF POST (if applicable)
Proxy->>Proxy: run interceptors.on_request_data(ctx)
Proxy->>CVF: forward/modify verification request
CVF-->>Proxy: verification response
Proxy-->>Client: final response/redirect
end
Estimated Code Review Effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@authcaptureproxy/auth_capture_proxy.py`:
- Around line 365-382: The path-parsing for multi-host proxy can produce an
empty _alt_host (when request URL ends with "/__amzn_host__" or
"/__amzn_host__/"), yielding an invalid site like "https:///..."; update the
logic around _amzn_host_marker/_req_path parsing to validate that _alt_host is
non-empty and well-formed before constructing site, and on failure log a clear
message including _req_path and raise/return a safe error (e.g., HTTP 400 or a
controlled exception) instead of proceeding; ensure the check covers both the
"_remaining" == "" and "_remaining" == "/" cases and reference the variables
_alt_host, _alt_path and the resulting site in the log.
🧹 Nitpick comments (5)
authcaptureproxy/auth_capture_proxy.py (5)
328-335: Narrow exception handling in multipart fallback.Catching bare
Exceptioncan mask programming errors. The fallback logic is sound, but consider catching specific exceptions:Proposed narrower exception handling
- except Exception: + except (JSONDecodeError, ValueError, TypeError): # Best-effort fallback: text, then bytes try: part_text = await part.text() writer.append(part_text) - except Exception: + except (UnicodeDecodeError, ValueError): part_data = await part.read() writer.append(part_data)
793-794: Narrow exception handling for P shim injection.Catching bare
Exceptionhere can hide bugs in the injection logic. Consider catching specific exceptions that could realistically occur during string manipulation.Proposed fix
- except Exception as _e: + except (UnicodeDecodeError, AttributeError, TypeError) as _e: _LOGGER.warning("Failed to inject P shim into aaut response: %s", _e)
803-823: Duplicate header filtering logic.The header filtering at lines 803-823 and 832-849 is nearly identical. Extract to a helper method to reduce duplication and ensure consistency.
Proposed helper method
def _filter_ajax_headers(self, resp: httpx.Response) -> dict: """Filter headers for AJAX responses, removing hop-by-hop and CSP headers.""" filtered = {} skip_headers = { "content-type", "content-length", "content-encoding", "transfer-encoding", "connection", "x-connection-hash", "set-cookie", "content-security-policy", "content-security-policy-report-only", } for k, v in resp.headers.items(): if k.lower() not in skip_headers: filtered[k] = v filtered["Cache-Control"] = "no-cache, no-store, must-revalidate" return filteredThen use:
_ajax_headers = self._filter_ajax_headers(resp) if resp else {}Also applies to: 832-849
927-1068: Complex inline JavaScript for CAPTCHA handling.The submit blocker and AJAX proxy scripts are necessary for the CAPTCHA flow, but maintaining ~140 lines of JavaScript as inline Python strings is error-prone. Consider:
- Moving to separate
.jsfiles loaded at build/runtime- Using a template engine for cleaner escaping
- At minimum, adding unit tests that validate the JS syntax
The 15-second fallback timeout (line 1027) is reasonable but could leave users waiting if the CAPTCHA iframe fails to load entirely.
365-382: Consider server-side hostname validation for multi-host routing.The
__amzn_host__routing accepts any hostname without validation. While the injected client-side JS validates against Amazon domains, an attacker who can inject content into the proxied pages could potentially cause SSRF to arbitrary hosts.Given this proxy is typically used locally for authentication flows and not exposed publicly, the risk is low. However, for defense in depth:
Proposed server-side validation
+ _allowed_host_patterns = ( + r'\.amazon\.(com|it|co\.uk|de|fr|es|co\.jp|ca|com\.au|in|com\.br)$', + r'\.awswaf\.com$', + r'\.amazoncognito\.com$', + r'\.ssl-images-amazon\.com$', + ) site = f"https://{_alt_host}{_alt_path}" + if not any(re.search(p, _alt_host) for p in _allowed_host_patterns): + _LOGGER.warning("Blocked request to non-Amazon host via __amzn_host__: %s", _alt_host) + return await self._build_response(text="Host not allowed", status=403)
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@authcaptureproxy/auth_capture_proxy.py`:
- Line 529: The current assignment to _totp_for_cvf uses
self._login.get_totp_token() but only checks self._login exists, risking
AttributeError if that object lacks get_totp_token; update the logic around the
assignment in the method containing _totp_for_cvf to first check that
self._login has a callable get_totp_token (e.g., using getattr(self._login,
"get_totp_token", None) or callable(getattr(...))) and only call it when
callable, otherwise set _totp_for_cvf to None or a safe default; locate the code
referencing _totp_for_cvf and replace the direct call with this defensive
pattern to avoid AttributeError.
🧹 Nitpick comments (3)
authcaptureproxy/auth_capture_proxy.py (3)
508-516: Consider reducing log level for expected CVF flow.Using
warningfor normal CVF handling flow will clutter logs in production. These appear to be debug-level events (token inspection, field presence checks).
732-733: Injected hostnames lack explicit sanitization.While
_awswaf_hostis extracted via a restrictive regex ([a-z0-9.\-]+) and_amazon_domaincomes from ayarl.URLhost, directly interpolating into JavaScript strings could break or behave unexpectedly if values contain special characters.Optional: Add explicit validation
+ import re as _re + _safe_host_pattern = re.compile(r'^[a-z0-9.\-]+$', re.IGNORECASE) + if _awswaf_host and not _safe_host_pattern.match(_awswaf_host): + _LOGGER.warning("Skipping invalid awswaf host: %s", _awswaf_host) + _awswaf_host = "" + if _amazon_domain and not _safe_host_pattern.match(_amazon_domain): + _LOGGER.warning("Skipping invalid amazon domain: %s", _amazon_domain) + _amazon_domain = "" _aaut_ajax_proxy = (
727-792: Consider extracting inline JavaScript to template files.The injected JavaScript totals ~400 lines embedded in Python strings. This makes:
- Syntax errors hard to spot (no JS linting)
- Debugging difficult (no source maps)
- Maintenance challenging (escaping issues)
An alternative would be loading JS from resource files and using string templating for dynamic values.
Also applies to: 934-1074
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Fix all issues with AI agents
In `@authcaptureproxy/auth_capture_proxy.py`:
- Around line 237-252: The _filter_ajax_headers function currently strips
"set-cookie" from resp.headers; remove "set-cookie" from the _skip_headers set
and instead propagate Set-Cookie headers from the original response (use
resp.headers or the underlying raw header entries if you need to preserve
multiple Set-Cookie values) so AJAX/XHR flows that set tokens continue to work;
keep the existing hop-by-hop exclusions and the added "Cache-Control"
assignment.
- Around line 481-538: The current validation for the aamation token in
auth_capture_proxy.py uses a brittle prefix check on
data["cvf_aamation_response_token"] (the "eyJ" heuristic) inside the CVF verify
block; replace that with a robust decode-and-parse check: read _aam_token from
data.get("cvf_aamation_response_token"), attempt a URL-safe Base64 decode
(adding padding if needed) and json.loads the decoded bytes, and treat any
exceptions (ValueError, binascii.Error, json.JSONDecodeError) as invalid; keep
the same control flow (if parse succeeds, log/forward as valid, else clear the
fields and inject OTP from self._login.get_totp_token if available) and use the
same logging points around _LOGGER.debug and variables (_aam_token, _get_totp,
_totp_for_cvf, data).
- Around line 687-865: The AJAX detection using Upgrade-Insecure-Requests is
unreliable; update the logic that sets _is_ajax (currently reading
request.headers in this block) to first check Sec-Fetch-Mode (treat mode ===
"navigate" as NOT AJAX, anything else as AJAX), then fall back to
Upgrade-Insecure-Requests != "1" and finally to X-Requested-With ===
"XMLHttpRequest" for legacy clients; keep the downstream behavior (the HTML
injection path for "/aaut/verify/cvf", calls to self._filter_ajax_headers, and
calls to self._build_response) unchanged but use the new _is_ajax value to
decide skipping modifiers. Ensure you reference the same variables/strings used
here: request.headers, _is_ajax, _filter_ajax_headers, _build_response, and the
"/aaut/verify/cvf" path check.
|
Thanks. Please also provide a test suite for the changes you're proposing. Basically it should follow test driven development practices. |
|
Please note, this is a general proxy library and should not have Amazon specific code. |
|
@coderabbitai please review and suggest how to integrate Amazon specific code in a way that doesn't pollute the core library functionality. Suggest a well tested industry standard approach whether inheritance, client usage, etc. |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: Length of output: 65162 🏁 Script executed: Length of output: 167 🏁 Script executed: Length of output: 850 🏁 Script executed: Length of output: 15156 🏁 Script executed: Length of output: 1400 🏁 Script executed: Length of output: 1405 🏁 Script executed: Length of output: 251
Reviewed the PR and codebase. The Amazon-specific code is tightly coupled to the core proxy through hardcoded domain checks, URL path matching, and JavaScript injection scattered throughout Recommended Architecture: Handler/Interceptor Chain PatternImplement a middleware-style handler chain that separates vendor-specific logic from the core proxy. This approach is used by frameworks like Django, Express.js, and ASP.NET Core. Core Design# authcaptureproxy/handlers/base.py
from abc import ABC, abstractmethod
from typing import Optional, Dict, Any
import httpx
from aiohttp import web
from yarl import URL
class RequestContext:
"""Encapsulates request/response data passed through the handler chain."""
def __init__(self, request: web.Request, site: URL):
self.request = request
self.site = site
self.data: Dict[str, Any] = {}
self.headers: Dict[str, str] = {}
self.skip_modifiers = False
self.response_override: Optional[web.Response] = None
class ResponseContext:
"""Encapsulates response data."""
def __init__(self, response: httpx.Response, content: bytes):
self.response = response
self.content = content
self.modified_content: Optional[bytes] = None
self.headers: Dict[str, str] = {}
class AuthHandler(ABC):
"""Base handler for domain-specific authentication flows."""
`@abstractmethod`
def can_handle(self, host_url: URL, request_url: URL) -> bool:
"""Determine if this handler should process the request."""
pass
async def on_request(self, ctx: RequestContext, proxy: 'AuthCaptureProxy') -> RequestContext:
"""Intercept and modify outgoing request. Return modified context."""
return ctx
async def on_response(self, ctx: ResponseContext, req_ctx: RequestContext,
proxy: 'AuthCaptureProxy') -> ResponseContext:
"""Intercept and modify incoming response. Return modified context."""
return ctxAmazon Handler Implementation# authcaptureproxy/handlers/amazon.py
import re
import base64
import json
from typing import Optional
from .base import AuthHandler, RequestContext, ResponseContext
from yarl import URL
class AmazonAuthHandler(AuthHandler):
"""Handles Amazon WAF captcha and CVF authentication flows."""
# Domain patterns that trigger this handler
AMAZON_DOMAINS = re.compile(
r'(amazon\.(com|it|co\.uk|de|fr|es|co\.jp|ca|com\.au|in|com\.br)'
r'|awswaf\.com|amazoncognito\.com|ssl-images-amazon\.com)$'
)
def can_handle(self, host_url: URL, request_url: URL) -> bool:
"""Activate for Amazon domains."""
return bool(self.AMAZON_DOMAINS.search(str(host_url.host)))
async def on_request(self, ctx: RequestContext, proxy) -> RequestContext:
"""Handle Amazon-specific request modifications."""
site_str = str(ctx.site)
# Multi-host AJAX routing
if "/__amzn_host__" in ctx.request.url.path:
ctx.site = self._handle_multihost_routing(ctx.request)
# CVF verify POST handling
if ctx.data and "/ap/cvf/verify" in site_str:
ctx.data = await self._handle_cvf_verify(ctx.data, proxy)
# Amazon Origin header override
if ctx.headers.get("Origin"):
ctx.headers["Origin"] = f"{proxy._host_url.with_path('')}"
return ctx
async def on_response(self, ctx: ResponseContext, req_ctx: RequestContext,
proxy) -> ResponseContext:
"""Handle Amazon-specific response modifications."""
# Detect AJAX requests
is_ajax = self._is_ajax_request(req_ctx.request)
content_type = ctx.response.headers.get("content-type", "").split(";")[0].strip()
# CVF page detection
if "/ap/cvf/" in str(ctx.response.url):
if is_ajax and content_type == "text/html":
# Inject captcha support into AJAX CVF responses
ctx.modified_content = self._inject_cvf_ajax_support(
ctx.content, proxy, req_ctx
)
ctx.headers = self._filter_ajax_headers(ctx.response)
req_ctx.skip_modifiers = True
elif not is_ajax and content_type == "text/html":
# Inject captcha support into main CVF pages
ctx.modified_content = self._inject_cvf_page_support(
ctx.content, proxy
)
# AJAX responses - preserve headers and skip modifiers
if is_ajax:
ctx.headers = self._filter_ajax_headers(ctx.response)
req_ctx.skip_modifiers = True
return ctx
def _handle_multihost_routing(self, request) -> URL:
"""Parse __amzn_host__ marker and construct target URL."""
# Implementation of multi-host AJAX routing logic
pass
async def _handle_cvf_verify(self, data: dict, proxy) -> dict:
"""Handle CVF verify POST with aamation token validation."""
# Implementation of CVF verify logic
pass
def _inject_cvf_ajax_support(self, content: bytes, proxy, req_ctx) -> bytes:
"""Inject P shim and AJAX proxy for CVF responses."""
# Implementation of JavaScript injection
pass
def _inject_cvf_page_support(self, content: bytes, proxy) -> bytes:
"""Inject submit blocker and AJAX proxy for CVF pages."""
# Implementation of JavaScript injection
pass
`@staticmethod`
def _is_ajax_request(request) -> bool:
"""Detect AJAX via Sec-Fetch-Mode header."""
sec_fetch_mode = request.headers.get("Sec-Fetch-Mode")
if sec_fetch_mode is not None:
return sec_fetch_mode != "navigate"
return request.headers.get("Upgrade-Insecure-Requests") != "1"
`@staticmethod`
def _filter_ajax_headers(resp) -> dict:
"""Filter headers for AJAX responses."""
# Implementation of header filtering
passCore Proxy Integration# authcaptureproxy/auth_capture_proxy.py (modified)
class AuthCaptureProxy:
def __init__(self, ...):
# ... existing code ...
self._handlers: List[AuthHandler] = []
def register_handler(self, handler: AuthHandler) -> None:
"""Register a domain-specific authentication handler."""
self._handlers.append(handler)
async def all_handler(self, request: web.Request, **kwargs) -> web.Response:
"""Handle all requests - now delegates to registered handlers."""
# ... initial setup ...
# Create request context
req_ctx = RequestContext(request, URL(site))
req_ctx.headers = await self.modify_headers(URL(site), request)
# Find and apply matching handler
active_handler = None
for handler in self._handlers:
if handler.can_handle(self._host_url, URL(str(request.url))):
active_handler = handler
req_ctx = await handler.on_request(req_ctx, self)
break
# Make request to backend
resp = await self._make_backend_request(req_ctx)
# Create response context
resp_ctx = ResponseContext(resp, resp.content)
# Apply handler to response
if active_handler:
resp_ctx = await active_handler.on_response(resp_ctx, req_ctx, self)
# Apply modifiers unless skipped
if not req_ctx.skip_modifiers:
resp_ctx.modified_content = await self._apply_modifiers(
resp_ctx.modified_content or resp_ctx.content, resp
)
# Build and return response
return await self._build_response(
resp,
body=resp_ctx.modified_content or resp_ctx.content,
headers=resp_ctx.headers if resp_ctx.headers else None
)Testing Strategy (TDD)# tests/handlers/test_amazon_handler.py
import pytest
from authcaptureproxy.handlers.amazon import AmazonAuthHandler
from authcaptureproxy.handlers.base import RequestContext, ResponseContext
from yarl import URL
class TestAmazonAuthHandler:
"""Test suite for Amazon authentication handler."""
`@pytest.fixture`
def handler(self):
return AmazonAuthHandler()
def test_can_handle_amazon_com(self, handler):
"""Handler activates for amazon.com domains."""
assert handler.can_handle(URL("https://www.amazon.com"), URL(""))
assert handler.can_handle(URL("https://amazon.it"), URL(""))
def test_can_handle_awswaf(self, handler):
"""Handler activates for awswaf.com (captcha domain)."""
assert handler.can_handle(URL("https://fls-na.awswaf.com"), URL(""))
def test_ignores_non_amazon(self, handler):
"""Handler ignores non-Amazon domains."""
assert not handler.can_handle(URL("https://google.com"), URL(""))
`@pytest.mark.asyncio`
async def test_cvf_verify_strips_invalid_aamation(self, handler, mock_proxy):
"""CVF verify strips invalid aamation tokens."""
data = {
"cvf_aamation_response_token": "invalid_token",
"password": "test123"
}
ctx = RequestContext(mock_request, URL("https://amazon.com/ap/cvf/verify"))
ctx.data = data
result = await handler.on_request(ctx, mock_proxy)
assert result.data["cvf_aamation_response_token"] == ""
`@pytest.mark.asyncio`
async def test_cvf_verify_preserves_valid_aamation(self, handler, mock_proxy):
"""CVF verify preserves valid base64 JSON aamation tokens."""
valid_token = base64.urlsafe_b64encode(b'{"sessionToken":"abc"}').decode()
data = {"cvf_aamation_response_token": valid_token}
ctx = RequestContext(mock_request, URL("https://amazon.com/ap/cvf/verify"))
ctx.data = data
result = await handler.on_request(ctx, mock_proxy)
assert result.data["cvf_aamation_response_token"] == valid_token
`@pytest.mark.asyncio`
async def test_injects_ajax_proxy_into_cvf_page(self, handler, mock_proxy):
"""Handler injects AJAX proxy script into CVF pages."""
html = b"<html><head><script>original</script></head></html>"
resp_ctx = ResponseContext(mock_response, html)
req_ctx = RequestContext(mock_request, URL("https://amazon.com/ap/cvf/request"))
result = await handler.on_response(resp_ctx, req_ctx, mock_proxy)
assert b"XMLHttpRequest.prototype.open" in result.modified_content
assert b"__amzn_host__" in result.modified_content
def test_multihost_routing_parsing(self, handler):
"""Correctly parses __amzn_host__ routing paths."""
url = URL("http://proxy/__amzn_host__fls-na.awswaf.com/captcha/verify")
site = handler._handle_multihost_routing(MockRequest(url))
assert str(site) == "https://fls-na.awswaf.com/captcha/verify"Benefits
Migration Path
Would you like me to generate the full refactored implementation with complete test suite? |
|
I've tested and confirmed the new code is working with the following authentication methods:
Note:I wasn't able to trigger a captcha from Amazon on steps 2 & 3. |
Refactoring Complete: Interceptor Pattern + Test SuiteAddressed all three pieces of feedback: 1. Amazon-specific code separated from core libraryImplemented the Handler/Interceptor Chain Pattern (as suggested by CodeRabbit). The core proxy is now completely vendor-agnostic. New files:
5 interceptor hooks in the
The Consumer usage: from authcaptureproxy.examples.amazon_waf import AmazonWAFInterceptor
proxy.interceptors = [AmazonWAFInterceptor()]2. Test suite added32 new tests across 3 test files:
3. CI fixes
|
|
@danielbrunt57 approve please after you've checked the refactor with the amazon stuff stripped out of the core. |
|
I downloaded these files: replaced them: and restarted HA. I then tested:
Note: |
|
Done! Removed Amazon-specific references from the core proxy:
The only remaining "amazon" reference is in the interceptors property docstring, which correctly points to |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@authcaptureproxy/auth_capture_proxy.py`:
- Around line 688-728: The form-action resolution builds resolved_path =
_resp_dir + action but doesn't normalize '..' segments; update
_resolve_form_action to run resolved_path through posixpath.normpath (import
posixpath if missing) before combining with self.access_url().with_path so paths
like '/a/b/../c' become '/a/c'; keep using _resp_dir, resolved_path, _proxy_base
and abs_url when constructing the final URL and log the normalized path.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@tests/test_all_handler.py`:
- Line 70: Replace the inline lambdas assigned to proxy._tests (e.g., the line
setting proxy._tests = {"dummy": lambda resp, data, query: None}) with a single
shared no-op function (e.g., def _noop(*args, **kwargs): pass) and use that
symbol in each dict entry (e.g., {"dummy": _noop}) so you can remove the noqa
suppressions and avoid unused argument warnings; update every similar occurrence
noted (lines with the same pattern at the other locations) to reference _noop
instead of creating lambdas.
- Around line 218-224: The linter flags the unused parameter ctx in the
interceptor methods; update the on_request signatures in classes First and
Second to mark the parameter as intentionally unused (e.g., rename ctx to _ctx
or _ ) so Ruff stops complaining—keep the method bodies unchanged (they should
still append to call_order).
🧹 Nitpick comments (1)
tests/test_all_handler.py (1)
49-63: Usehttpx.Headersfor response headers to mimic case-insensitive lookups.
This keeps the mock behavior aligned with realhttpx.Responsesemantics.♻️ Suggested change
- resp.headers = {"content-type": "text/html; charset=utf-8"} + resp.headers = httpx.Headers({"content-type": "text/html; charset=utf-8"})
|
Please resolve all comments, including AI. The AI can be wrong but the comment needs to be resolved. |
Amazon's authentication flow now includes a WAF CAPTCHA challenge (aamation) during the Customer Verification Flow (CVF). The proxy was unable to handle this because: 1. AJAX requests from the captcha JavaScript couldn't reach Amazon's servers through the proxy (CORS/cross-origin restrictions) 2. The captcha.js script determined its API base URL from the proxy hostname instead of the real awswaf.com hostname 3. Amazon's A-framework dependency (P.when) was not available in the proxied context, preventing captcha initialization 4. The CVF form was auto-submitted before the captcha could load 5. The session token from the captcha response was being cleared before forwarding to Amazon Changes: - Add multi-host AJAX routing via __amzn_host__ path prefix to proxy requests to arbitrary Amazon subdomains (awswaf.com, amazoncognito, etc.) that the captcha JavaScript needs to reach - Inject P.when()/execute() shim and mini jQuery into the aaut/verify /cvf response so CaptchaScript.renderCaptcha() can initialize - Rewrite captcha.js script src to load through the proxy, ensuring the script's internal base URL resolution points to proxy paths - Rewrite the WAF domain parameter in /problem requests from the proxy IP to the real Amazon domain - Block CVF form auto-submit until the captcha is solved, then intercept the XHR response to extract sessionToken and clientSideContext from the amz-aamation-resp header - Preserve valid aamation tokens (base64 JSON) in CVF verify POST instead of always clearing them; inject OTP only as fallback - Stop appending TOTP to password (Amazon now rejects this) - Forward original Amazon response headers for AJAX responses (required for captcha initialization) - Inject AJAX proxy + submit blocker into CVF pages for the main authentication flow (not just the aaut iframe) - Add raw body forwarding for non-form, non-JSON POST requests - Fix multipart content-type header checking and JSON decode errors - Resolve relative form action URLs before modifier processing - Add httpx.TimeoutException handling with user-friendly message - Fix asyncio task reference handling to prevent GC collection Fixes alandtse/alexa_media_player#3308
- Validate _alt_host is non-empty before constructing multi-host URL; return a clear error for malformed __amzn_host__ paths - Add server-side hostname validation for multi-host AJAX routing to only allow known Amazon domains (SSRF defense-in-depth) - Narrow exception handling in multipart fallback from bare Exception to specific types (JSONDecodeError, ValueError, TypeError, etc.) - Narrow exception handling for P shim injection to specific types (UnicodeDecodeError, AttributeError, TypeError) - Extract duplicate AJAX header filtering logic into a _filter_ajax_headers() static helper method
- Use defensive getattr/callable check before calling self._login.get_totp_token() to prevent AttributeError if _login object lacks the method - Downgrade CVF verify POST logging from warning to debug (normal flow, not an error condition) - Sanitize _awswaf_host and _amazon_domain with regex validation before interpolating into JavaScript strings
- Propagate Set-Cookie headers in AJAX responses instead of stripping them; WAF/CVF flows may set tokens via XHR - Derive allowed hosts from self._host_url dynamically instead of hardcoding Amazon regions, supporting all regional domains (amazon.com.mx, .nl, .se, .pl, .com.tr, .ae, .sa, .eg, etc.) - Replace brittle "eyJ" prefix heuristic for aamation token validation with proper base64 decode + JSON parse - Use Sec-Fetch-Mode (W3C Fetch Metadata standard) as primary AJAX detection signal instead of Upgrade-Insecure-Requests, with legacy fallback for older clients
…or pattern Move all Amazon WAF/CVF-specific code from all_handler into a pluggable interceptor system. The core proxy library is now vendor-agnostic. - Add BaseInterceptor class with 5 hooks (on_request, on_request_data, on_response, on_ajax_html, on_page_html) and InterceptContext dataclass - Move multi-host AJAX routing, CVF POST data modification, P shim/jQuery injection, and submit blocker to AmazonWAFInterceptor in examples/ - Add interceptors property to AuthCaptureProxy (same pattern as tests/modifiers) - Fix Bandit B105 (empty string assignments moved with nosec comments) - Fix mypy no-redef (duplicate site type annotation removed) - Add comprehensive test suite: 32 new tests covering interceptor protocol, pipeline integration, and all 5 Amazon WAF hooks
- Add explicit mock for session.get in short_circuit_request test - Add dummy _tests to short_circuits_response test to avoid warning
- Replace "amazon.com" with "example.com" in docstrings - Generalize Origin header comment to be vendor-agnostic
Handles paths containing '../' segments properly.
Help diagnose CI failures by verifying interceptors are set and called.
0dc46ec to
16d7882
Compare
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Fix all issues with AI agents
In `@authcaptureproxy/auth_capture_proxy.py`:
- Around line 545-557: The code adds a Content-Type header using a
case-sensitive lookup ("Content-Type" not in req_headers) which can duplicate
headers when the original header key is e.g. "content-type"; update the check in
the raw_body branch to perform a case-insensitive presence test (e.g. search
req_headers keys for any k.lower() == "content-type") before setting
req_headers["Content-Type"] = request.content_type so you only add the header
when no content-type exists; modify the block handling raw_body (references:
raw_body, req_headers, request.content_type, getattr(self.session, method)) to
use that case-insensitive check.
- Around line 256-271: The _filter_ajax_headers static function is collapsing
multi-value response headers (like multiple Set-Cookie) by writing into a plain
dict; change it to preserve duplicate header keys by collecting headers into a
multi-value structure (e.g., multidict.CIMultiDict or a list-of-tuples) instead
of a dict: iterate resp.headers (from httpx.Response) and for keys not in
_skip_headers append each (key, value) pair to the multi-value container, then
set the Cache-Control entry appropriately in that container and return it so all
Set-Cookie and other repeated headers are preserved.
In `@authcaptureproxy/examples/amazon_waf.py`:
- Around line 216-221: The JS rw() domain regex inside on_ajax_html and
on_page_html is hardcoded and misses many Amazon TLDs; update the generated JS
to interpolate a pattern derived from the server-side _amazon_domain (extracted
earlier in on_ajax_html) or replace the literal with a broader pattern (e.g. use
amazon\.[a-z.]+) so the rw() function matches all regional Amazon hosts; modify
the code that injects rw() so it inserts the dynamic/expanded regex instead of
the current fixed string, referencing rw(), on_ajax_html, on_page_html and the
_amazon_domain symbol to locate where to change the injected JS.
- Around line 119-126: The current token validation block uses a bare "except
(ValueError, Exception): pass" which redundantly includes ValueError and
silently swallows all errors; replace this with targeted exception handling for
the decode and parse steps (e.g., catch binascii.Error/TypeError from
base64.urlsafe_b64decode and _json.JSONDecodeError from _json.loads) and
surface/log the exception instead of passing; update the except to "except
(binascii.Error, TypeError, _json.JSONDecodeError) as e:" (import binascii if
needed), set _aam_token_valid = False in the handler and call a logger/debug
print with the exception to aid debugging while keeping the rest of the logic
unchanged (referencing _aam_token, _padded, _decoded_token, _json.loads, and
_aam_token_valid).
🧹 Nitpick comments (3)
authcaptureproxy/interceptor.py (1)
7-7: Unused import:fieldfromdataclasses.
fieldis imported but never used in this module.Proposed fix
-from dataclasses import dataclass, field +from dataclasses import dataclassauthcaptureproxy/examples/amazon_waf.py (2)
54-56: Interceptor accesses proxy private API (_build_response,_login).
ctx.proxy._build_response()(lines 54, 69) andgetattr(ctx.proxy, "_login", None)(line 93) couple this example to private internals. If the proxy's internal API changes, this interceptor breaks silently.Consider promoting
_build_responseto a public method (or adding acreate_responsehelper toInterceptContext) and exposing a public interface for login state. Since this lives inexamples/, it's not blocking, but it will bite consumers who model their interceptors on this example.Also applies to: 69-71, 93-93
325-423: Large inline JS strings are hard to maintain and test.The submit blocker alone is ~100 lines of JS embedded as Python string concatenation. Any JS syntax error will be invisible until runtime in the browser. Consider either:
- Extracting to
.jstemplate files loaded at module init, or- At minimum, adding a test that parses the generated JS (e.g., using a basic syntax check).
This is not blocking, but as this logic grows, maintenance cost will increase.
Addressing CodeRabbit Review CommentsI've reviewed all the CodeRabbit suggestions. Here's the status of each: Already Fixed in This PR
Moved to amazon_waf.py (examples/)These are now in the example interceptor, not core proxy: Low Impact / By Design
Should I apply the remaining minor suggestions (7-9), or are they acceptable as-is given their low impact? |
|
You literally need to resolve each comment or it won't get merged. Again, they could be bad comments since it's AI, but I need you to review each and understand what it's saying and either ignore it because it's nonsense, or address it. Once all comments are resolved I can merge. As stated before, I'm not going to speak with an ai chatbot. Please stop copying and pasting it in response to me or I will just close this PR. |
|
All 13 review threads have been addressed and resolved:
Summary of responses:
Ready for final review. 🚀 |
Summary
Amazon's authentication flow now includes a WAF CAPTCHA challenge (aamation) during the Customer Verification Flow (CVF). The auth capture proxy was unable to handle this, causing login failures with "Server got itself in trouble" (500) or "Non siamo in grado di verificare il tuo numero di cellulare" errors.
Root cause: The proxy's AJAX handling couldn't route captcha JavaScript requests to Amazon's WAF servers, and several proxy behaviors (token clearing, TOTP password appending, form auto-submit) conflicted with the new challenge flow.
Key changes:
__amzn_host__path prefix — proxies requests to arbitrary Amazon subdomains (awswaf.com, amazoncognito.com, etc.) that the captcha JavaScript needsCaptchaScript.renderCaptcha()can initializesessionTokenandclientSideContextfrom theamz-aamation-respheaderTest plan
Fixes alandtse/alexa_media_player#3308
Summary by CodeRabbit
New Features
Improvements
Tests