feat(channels/zalo): OA OAuth + webhook transport (#966)#1048
feat(channels/zalo): OA OAuth + webhook transport (#966)#1048
Conversation
…ode flow Phase 01 of plans/260419-2128-zalo-oa-oauth: introduces a third Zalo transport (phone-number-tied Official Account via OAuth v4) alongside the existing Bot OA and personal QR variants. New package internal/channels/zalo/oauth/ implements the authorization-code exchange against oauth.zaloapp.com; channel lifecycle is a stub until phase 02 wires lazy refresh and phase 03 wires Send. Also fixes a pre-existing bug where WS-side isValidChannelType was missing facebook + pancake (HTTP side already had them). Refs: #966
…fety ticker Phase 02 of plans/260419-2128-zalo-oa-oauth: adds the lazy-refresh token source mirroring DBTokenSource (single mutex serializes both cache reads and HTTP refresh — Zalo refresh tokens are single-use, so racing goroutines must not be allowed to issue concurrent refreshes). Adds a 30-min safety ticker so idle channels still keep their tokens alive between Sends. Loader now injects channel_instances.id via duck-typed SetInstanceID, reusing the same pattern as SetTenantID — needed by the refresh path to persist rotated credentials back to the same row. Refs: #966
Phase 03 of plans/260419-2128-zalo-oa-oauth: implements Channel.Send for the OAuth-based Zalo OA channel. Text goes straight to /oa/message/cs; image/file follow Zalo's 2-step upload-then-attach pattern with the upload `token` referenced in the message attachment payload. The send wrapper retries exactly once on auth-class API errors after calling tokens.ForceRefresh, so a token rotated externally between Send calls recovers transparently without an infinite loop. Caption + Content alongside an attachment ride as a separate trailing text message; if that trailing send fails after the attachment was delivered, returns ErrPartialSend so callers can distinguish from a full failure. Refs: #966
Phase 04 of plans/260419-2128-zalo-oa-oauth: implements the inbound path. A 15s ticker fans out to /v3.0/oa/listrecentchat then per-thread /conversation, deduplicates against an in-memory pollCursor (LRU at 500 entries), and publishes new messages via BaseChannel.HandleMessage with peerKind="direct" — Zalo OA has no groups. Cursor is debounced-persisted (60s) into channel_instances.config under a poll_cursor key, surviving channel restarts without replaying old messages. Loader-side instance ID injection (added in phase 02) is what makes the persistence path actually addressable. Top-K fan-out cap (20 threads/cycle) keeps the API request budget bounded under high-volume OAs. HTTP 429 trips a 30s backoff via ticker.Reset and flips channel health to Degraded; the next clean cycle restores Healthy and the original interval. Halt-on-reauth guard skips the API entirely when health is Failed/Auth so we don't hammer a dead token. Refs: #966
…ero-byte) Phase 05 of plans/260419-2128-zalo-oa-oauth: hardens the file outbound path with sanitizeFilename (strips path, falls back on dot-only/empty, caps length at 200, preserves unicode), zero-byte rejection, and an admin-opt-in MIME deny list (FlexibleStringSlice on the channel config, exact case-insensitive match — no glob). All three rejections fire BEFORE the HTTP upload, so a denied call never burns API quota or rotates the token. Sticker / quoted-reply / request_user_info template sends remain deferred — they need a channel-agnostic extension to bus.OutboundMessage that's tracked in the phase doc as follow-up. Refs: #966
Phase 06 of plans/260419-2128-zalo-oa-oauth: data-driven schema entries
for zalo_oauth credentials + config (rendered automatically by the
existing ChannelFields component) plus a new paste-code dialog that
drives the two-step OAuth flow:
1. Calls channels.instances.zalo_oauth.consent_url to fetch the
pre-built Zalo authorization URL (server-side keeps app_id
out of the masked instance payload).
2. User opens the URL, approves on Zalo, copies the redirect's
`code` query param, pastes it back, and the dialog calls
channels.instances.zalo_oauth.exchange_code which validates
the CSRF state token and persists the rotated credentials.
Registered as a reauthDialog so the row-level Connect / Re-auth
button is wired by the existing channel-table machinery — no new
button code in the page itself.
Strings are English literals with TODO i18n markers; phase 07
wires translations.
Refs: #966
Phase 07 of plans/260419-2128-zalo-oa-oauth: front-end i18n roll-up
for the paste-code dialog (15 zaloOauth.* keys translated to en/vi/zh,
verified key-set parity via jq) and dialog rewired to use
useTranslation("channels") + t() instead of the placeholder English
literals it shipped with in phase 06.
Backend i18n parity already enforced by internal/i18n/i18n_test.go;
11 MsgZaloOAuth* constants present in all 3 catalogs as expected
(10 from plan + MsgZaloOAuthInvalidState added during the phase-01
audit fix).
Migration evaluation: no schema delta. Tokens live in the existing
channel_instances.credentials BLOB; the cursor lives in the existing
config TEXT; the new "zalo_oauth" channel_type is just data in the
existing channel_type column. RequiredSchemaVersion stays at 55,
SQLite SchemaVersion stays at 24.
Refs: #966
Phase 08 of plans/260419-2128-zalo-oa-oauth: most unit tests already
shipped via TDD across phases 01-06. Fills 2 gaps:
- TestForceRefresh_ClearsCache: verifies tokenSource.ForceRefresh
zero-time math triggers refresh on next Access.
- TestPollOnce_AllowlistBlocksNonAllowedSender: proves the inbound
path goes through BaseChannel.HandleMessage's allowlist check,
not direct bus.Publish (phase-04 audit C1 verification).
Adds tests/integration/zalo_oauth_lifecycle_test.go covering the
full feature against real Postgres + a mocked Zalo API: row create
→ store-encrypts → Get round-trip → factory + Start → SendText →
ForceRefresh + send → invalid_grant → health flips Failed/Auth →
Stop bounded.
Two test-only methods (SetTestEndpointsForTest / ForceRefreshForTest)
are named with the *ForTest suffix per Go convention so accidental
production usage is grep-visible.
Also wires zalo_oauth into ui/web/src/constants/channels.ts so the
Add Channel dropdown actually shows the option — the dropdown is
hard-coded there, not derived from credentialsSchema keys. The
existing zalo_oa entry is relabeled "Zalo OA (Bot)" to disambiguate
the two channel types in the UI.
Refs: #966
Zalo enforces redirect_uri match against the dev-console-registered callback (error_code=-14003 "Invalid redirect uri"). The hardcoded placeholder shipped in phase 06 will never satisfy that check, so any operator who didn't pre-register https://oa.local/zalo_oauth_callback (i.e. everyone) was blocked at the consent screen. Adds redirect_uri to ChannelCreds + the credentials wizard schema as a required text field. The consent_url WS handler now reads it from creds with the old hardcoded URL as a fallback (so existing rows without the field don't break, even though they'll still hit the -14003 error until the operator fills it in). Refs: #966 (post-smoke fix)
Previously the zalo_oauth consent flow was only reachable by clicking the auth icon on the row after Create. Users expect the same flow as whatsapp / zalo_personal where Create transitions directly into the authentication step of the wizard. Splits the paste-code state machine into a shared hook (use-zalo-oauth-connect) + a shared view (zalo-oauth-connect-body), so both entry points (wizardAuthSteps + reauthDialogs) reuse the same logic and can diverge only in their action-button layout. wizardConfig.zalo_oauth.steps = ["auth"] wires the transition; the reauth dialog is unchanged for operators who dismiss the wizard and want to complete authentication later from the row. Refs: #966 (post-smoke UX fix)
Two fixes surfaced during the first live Connect attempt:
1. Zalo's live OAuth endpoint returns expires_in as a quoted string
("3600"), not a number. This broke the exchange with a JSON
unmarshal error. Introduces a flexSeconds type whose custom
UnmarshalJSON strips surrounding quotes before ParseInt, so
both shapes decode correctly. TestTokenResponseShape gains a
second assertion for the string form.
2. Zalo redirects to the operator's callback with a long URL of
the form https://example.com/zalo-callback?oa_id=...&code=iYP...
&state=db8... — operators shouldn't have to extract code by
hand. Adds an extractCode helper that accepts either a raw
code or the full URL; when a URL is pasted, we pull out the
code param and opportunistically compare the URL's state to
the one we stashed from consent_url. Placeholder + help text
updated in all 3 locales.
Refs: #966 (post-smoke fixes)
Zalo's POST /v4/oa/access_token returns tokens but not oa_id — that rides back in the redirect URL query string alongside `code` and `state`. Without it, creds.OAID stayed empty after a successful exchange, so the post-reload Channel.Start() kept marking the instance Degraded with "awaiting consent" / "no oa_id yet — paste consent code to authorize" — even though the tokens WERE valid. extractCode now returns oaID alongside code. The hook forwards it as an optional oa_id param on the exchange_code WS call. The handler persists it onto creds before Marshal → store.Update, so the reloaded Channel sees a non-empty OAID and goes Healthy. Refs: #966 (post-smoke fix)
Every poll cycle was 404-ing against Zalo with "You are accessing an
empty or invalid API". Two root causes, fixed together because they
rhyme:
1. The phase-04 plan assumed listrecentchat/getconversation lived
at /v3.0/oa/* per the ChickenAI SDK note that "v2.0 is
discontinued". Live Zalo OA disagrees — those read endpoints
remain on /v2.0/* (nh4ttruong/zalo-oa-api-wrapper confirms).
Paths moved to /v2.0/oa/getlistrecentchat and /v2.0/oa/getconversation
with the v2.0 convention of packing GET params into ?data={json}.
2. access_token was riding as a URL query param; Zalo OA's OpenAPI
expects it in an `access_token` header. Query-form token gets
the 404 "empty or invalid API" response. Moved to the header
everywhere (apiGet, apiPost, apiPostMultipart). Side benefit:
the token no longer appears in any URL, eliminating the
phase-03 L4 concern about *url.Error leaking the token through
DNS/TLS errors.
Send (/v3.0/oa/message/cs) and upload (/v2.0/oa/upload/*) paths are
unchanged — those already matched the wrapper.
Refs: #966 (post-smoke fix — surfaced by repeated poll_failed 404s
in docker logs)
The phase-04 plan said `listrecentchat` / `conversation` live at `/v3.0/*`,
and my first post-smoke fix flipped to `/v2.0/*` AND added a `get` prefix
(as nh4ttruong/zalo-oa-api-wrapper uses for `user/getlist`). Both guesses
were wrong.
Empirically probed all four variants against live Zalo — unprefixed v2.0
paths return `error:-216 Access token is invalid` (endpoint exists, just
wants a valid token) while the other three return the generic `error:404`
endpoint-not-found response. So the correct paths are:
GET /v2.0/oa/listrecentchat?data={json}
GET /v2.0/oa/conversation?data={json}
No get prefix. Send (/v3.0/oa/message/cs) and upload (/v2.0/oa/upload/*)
paths remain unchanged — those match the wrapper.
Refs: #966 (post-smoke fix, verified against live API)
Live smoke found the poll path looping forever on -216 "Access token has expired (-155)" because the ticker just logged and retried with the same stale token. Channel.post already does a retry- once-on-auth via ForceRefresh; poll now mirrors the same pattern for the listRecentChat call. Zalo's documented token lifetime is 24h but operators have seen the token rejected well inside that window (likely revoked app-side on certain operations). Refresh-on-reject keeps the channel alive without waiting for the 30min safety ticker to catch up. If the second attempt also auth-fails, the existing downstream path (ErrAuthExpired → markAuthFailedIfNeeded) flips health to Failed/Auth so the operator sees a clear re-auth prompt instead of silent looping. Refs: #966 (post-smoke fix)
Empirically verified via Zalo's API explorer that /v2.0/oa/listrecentchat
returns the last N messages across all users, with each data[] item
shaped as:
{from_id, from_display_name, from_avatar,
to_id, to_display_name, to_avatar,
message_id, type, message, time}
Two bugs in the phase-04 design:
1. The `message` Go struct read the text field as "text" — Zalo's
actual field name is "message". Every inbound was silently dropped
at the `if m.Text == ""` guard in dispatchInbound, even when the
HTTP call succeeded.
2. Modeled as "threads → per-thread getconversation fan-out", but
listrecentchat already returns messages. getConversation + thread
type + topKThreads cap are all dead. Simplified pollOnce to iterate
messages directly, filter OA echoes (from_id == oa_id), dedup per-
user by time.
Also:
- Added display-name passthrough in metadata.sender_display_name.
- DM chatID derives from from_id (Zalo OA is DM-only).
- Removed /v2.0/oa/conversation test routing and per-thread call tracking.
- Removed TestPollOnce_TopKThreadsCap (concept no longer applies).
Refs: #966 (post-smoke fix, verified via Zalo API explorer 2026-04-20)
…ll auth errors
Live smoke caught a silent drift: Zalo was rejecting the access_token
(-216 / -155 "Access token has expired") but the channel dashboard
stayed green because MarkHealthy was set at Start() and nothing ever
downgraded it. Operator saw "healthy" + grep-able WARN logs but no
visible re-auth prompt.
runPollLoop now calls markAuthFailedIfNeeded after every poll failure.
The helper was extended to treat *APIError.isAuth() as a Failed/Auth
trigger in addition to ErrAuthExpired:
- ErrAuthExpired: refresh-token dead (existing behavior)
- *APIError.isAuth() surviving retry-once-on-auth: access-token
rejected even after ForceRefresh, which usually means the OA-app
association is broken and needs operator re-consent
Either path now flips the row to red "Re-authenticate" in the UI, so
operators get the visible cue they were missing.
Refs: #966 (post-smoke fix, spotted by operator: "logs showed expired
but dashboard still green")
Operator's agent tried to reply with an image attachment and saw "⚠️ Failed to deliver message." The image upload 404'd against /v3.0/oa/upload/image; Channel.Send bubbled the error and upstream sent the fallback text. Empirical probe confirmed only /v2.0/oa/upload/image and /v2.0/oa/upload/file exist; the v3.0 variants are 404. The message- send endpoint /v3.0/oa/message/cs stays unchanged (it's the one path that genuinely IS on v3.0). Moved uploadImagePath + uploadFilePath constants to /v2.0/*. Test fixtures updated accordingly. Refs: #966 (post-smoke fix — bot "Failed to deliver" after agent generated an image and tried to attach it)
…l cap
Zalo OA's upload endpoint hard-rejects files over 1MB with
error -210 "file is invalid. The file must be smaller than or equal 1MB".
The previous default of 10MB let uploads through to the HTTP call
before Zalo bounced them — wasted API budget and confusing error
for operators.
Changing the default to 1MB rejects oversized files BEFORE the upload
attempt, with a message that explicitly names Zalo's -210 cap so the
operator understands this isn't a bug in our code.
Operators who want to send bigger media have two options, both out
of scope for this commit:
(a) compress images client-side before upload
(b) use Zalo's URL-attachment template (image-by-URL instead of
file upload) — requires a public HTTPS endpoint serving the
workspace files
Refs: #966 (post-smoke fix — "⚠️ Failed to deliver" was a 1.36MB
AI-generated PNG rejected by Zalo's 1MB cap)
… routing
Zalo OA enforces distinct format + size caps per upload endpoint
(confirmed against the official docs the operator pointed us to):
- /v2.0/oa/upload/image : JPG/PNG only, ≤1MB
- /v2.0/oa/upload/gif : GIF only, ≤5MB
- /v2.0/oa/upload/file : PDF/DOC/DOCX only, ≤5MB
AI-generated PNGs routinely exceed 1MB, so the outbound path now
auto-compresses images before the upload call. Pipeline:
1. Decode input bytes (png / jpeg / webp).
2. Resize progressively (longest side 1600 → 1200 → 900 → 600).
3. At each size, JPEG-encode with quality ladder 85 → 35.
4. Return the first encoding that fits under the cap.
5. Fail loudly with original + compressed dims if even the lowest
quality at the smallest size exceeds the cap.
Channel.Send routes by MIME before upload:
- image/gif → SendGIF (new) → /upload/gif
- image/* → compressForZaloImage + SendImage → /upload/image
- PDF/DOC/DOCX → SendFile → /upload/file (size pre-check at 5MB)
- anything else → reject at the dispatcher
Also tightens the stat-first check to 50MB (absolute OOM guard) so
compression gets a chance to run on mid-sized files. Operators who
set cfg.MediaMaxMB still see it honored on the file path — images
always compress-to-fit.
Adds github.com/disintegration/imaging + golang.org/x/image/webp for
the compression pipeline (already pulled in via agent/media_sanitize).
Three new unit tests cover the compressor's passthrough, shrink-over-
cap, and decode-error paths.
Refs: #966 (post-smoke fix — bot "Failed to deliver" on 1.36MB PNG)
Live smoke caught periodic poll_failed with "Client.Timeout exceeded
while awaiting headers" — the 15s default wasn't enough for Zalo's
tail latency under load, and the default transport was letting
idle connections go stale between poll cycles.
Client.Timeout bumped to 30s. Custom http.Transport with:
- bounded MaxIdleConns/MaxIdleConnsPerHost (10/4) so we don't
keep a connection pool bigger than the poll cadence needs
- IdleConnTimeout = 60s so stale connections get evicted before
the next 15s poll can try to reuse a dead one
- TLSHandshakeTimeout = 10s so a slow TLS dial aborts fast
- ForceAttemptHTTP2 = true (matches Zalo's actual offering)
- ProxyFromEnvironment so HTTPS_PROXY honored if set
Also bumped the runPollLoop per-cycle ctx to 45s so it always
outlives the 30s client timeout — otherwise the ctx would fire
first and mask the real "slow upstream" signal with a generic
"context deadline exceeded".
Refs: #966 (post-smoke fix — "Client.Timeout exceeded while awaiting
headers" recurring in poll cycles)
…aw on empty token Live smoke showed compression working (1.26MB PNG → 110KB JPEG) but the upload itself came back with HTTP 200 + empty data.token. Digging in: my uploadImage passed multipart filename "image" (no extension), and Zalo's /v2.0/oa/upload/image seems to use the filename extension to validate payload type — strip the extension and the endpoint accepts the bytes but returns no token. Fix: SendImage now forwards the real MIME through to uploadImage, which picks "image.jpg" for image/jpeg and "image.png" for image/png. Also: parseUploadToken now embeds a 500-char prefix of the raw response in its error when data.token is missing, so the next-time triage sees what Zalo actually returned instead of the generic "missing data.token". Refs: #966 (post-smoke fix — "upload response missing data.token" after successful compression)
Live smoke captured Zalo's actual upload response:
{"data":{"attachment_id":"1I5sCR-..."}, "error":0, "message":"Success"}
Both the plan and our code used "token" (inherited from the ChickenAI
SDK documentation). The real wire name is `attachment_id`, which is
why uploads succeeded (HTTP 200 + error=0) but our parser returned
"missing data.token" and the send never happened.
Fix threads the correct name through:
- parseUploadToken renamed to parseUploadAttachmentID. Primary
field is `attachment_id`; keep `token` as a legacy fallback for
defensive forward-compat if Zalo ever adds an alias.
- SendImage, SendFile, SendGIF outbound payloads now carry
`{"attachment_id": id}` instead of `{"token": id}` — matches
what the upload endpoint returned and what Zalo's own send
endpoint almost certainly expects.
Test fixtures updated in bulk (8 replacements).
Refs: #966 (post-smoke fix — raw response added by previous commit
made this immediately diagnosable)
Zalo returns -201 "Params is invalid" on the simple
{"attachment":{"type":"image","payload":{"attachment_id":"..."}}} shape.
Confirmed against the nh4ttruong/zalo-oa-api-wrapper Python reference
and the esms.vn integration docs: /v3.0/oa/message/cs wants the
template/media shape for uploaded media:
{
"recipient": {"user_id": "..."},
"message": {
"attachment": {
"type": "template",
"payload": {
"template_type": "media",
"elements": [{
"media_type": "image", // or "gif"
"attachment_id": "<upload-id>" // or "url" for URL-based
}]
}
}
}
}
Extracted the common construction into buildMediaAttachmentBody so
image + gif reuse the shape. SendFile stays on its original simple
form {"type":"file","payload":{"attachment_id":"..."}} — that's what
the file endpoint wants (template/media is image-only).
Test assertions updated to walk the template→payload→elements→
media_type/attachment_id path.
Refs: #966 (post-smoke fix — -201 after attachment_id wire-up)
Sources:
nh4ttruong/zalo-oa-api-wrapper/dependencies/messages.py
developers.esms.vn zalo send-with-photo docs
…/GIF/File TDD anchor for Phase 02 Workstream A consolidation. Locks exact JSON bytes each Send* function produces against /v3.0/oa/message/cs so the upcoming builder unification (A3) cannot silently drift. JSON-canonicalizes captured request body + testdata expected-shape via unmarshal + remarshal (Go sorts map keys deterministically) so field order differences don't cause false failures but structural differences do. Fixtures cover text, image (template/media shape), gif (template/media shape, media_type=gif), file (plain type=file shape).
…nts.go Consolidate scattered URL literals + path constants into one file. Before: defaultAPIBase/defaultOAuthBase in api.go, sendMessagePath in send.go, uploadImagePath/uploadFilePath/uploadGIFPath in upload.go, and /v2.0/oa/listrecentchat + /oa/access_token inlined in poll.go/auth.go. Now: all eight constants live in endpoints.go grouped by API family (v3.0 send, v2.0 read+upload, v4 OAuth) with comments explaining why each version prefix is load-bearing. Pure consolidation — no endpoint values changed. A5 fixture test continues to pass, confirming no wire-shape drift.
Introduce errors.go registering 5 observed code families with exact Zalo semantics preserved: access-token-invalid (216/-216/401/-401 family), invalid_grant (-118), params-invalid (-201), file-size-exceeded (-210), invalid-redirect-uri (-14003). Current-use: - api.go isAuth() switches from inline `case 216, -216, 401, -401:` to the named helper isAccessTokenInvalid(code). Same semantics, named. - Other codes registered for future code-based routing; today several are detected via substring (classifyRefreshError) or appear only in error-message strings. Documented in each constant's comment. NOT introducing -155 per audit — it is absent from the live codebase.
Extract the 4 body-construction shapes from SendText/SendImage/SendGIF/ SendFile into three named builders living together in send.go: - buildTextBody — text-only shape - buildMediaAttachmentBody — template/media shape (image + gif) - buildFileAttachmentBody — plain type=file shape Each Send* now reads as a one-liner: upload (when applicable) → build → post. Drift between the 4 shapes is obvious on read. No byte change — A5 wire-shape fixture test (send_fixture_test.go) stays green, confirming every outbound request is still byte-identical to pre-refactor. NOTE: caption+attachment single-request flow (merge the current "trailing text as second message" round-trip) is out of scope — plan defers to a follow-up. Builders intentionally don't accept text params so no dead code is introduced.
Zalo's per-endpoint upload caps (image 1MB, file 5MB, gif 5MB) are
hard-enforced by Zalo itself with error -210. The config.MediaMaxMB
override was never read after the New() clamp — pure dead code.
FileDenyMIME was a belt-and-suspenders duplicate of the narrower
isZaloSupportedFileMIME allow-list; removing it collapses two layers
into the single authoritative source.
Removed:
- ZaloOAuthConfig.MediaMaxMB + FileDenyMIME fields
- oauth/channel.go defaultMediaMaxMB const + clamp
- oauth/send.go isMIMEDenied helper + SendFile denial branch
- SendFile's mime param (caller did the supported-type check already)
- MsgZaloOAuthFileDenied i18n key + 3 catalog entries
- TestNew_DefaultMediaMaxMB + TestSendFile_{RejectsDeniedMIME,PassesAllowedMIME}
- file_deny_mime + media_max_mb from the zalo_oauth frontend schema
Config backcompat: existing channel_instances JSON payloads carrying
stale file_deny_mime / media_max_mb keys load cleanly — Go json
silently ignores unknown fields.
A5 fixture test still green.
Single env-gated slog.Debug line in doRequest + postForm captures the raw Zalo response body for every API call. Default off (zero runtime cost — just a bool check). Set GOCLAW_ZALO_OA_TRACE=1 to turn on. Rationale: Zalo has shipped field-name changes mid-release before (token → attachment_id) and debugging required live smoke. With this trace, 5 minutes of production logs give definitive triage data. SECURITY: response bodies may contain PII (user IDs, display names, phone numbers). Documented inline. Do NOT enable in production without scrubbing review. Cached at init — flipping the env live requires a restart, which keeps the hot path allocation-free.
…ation Create dialog flagged "Required: Webhook Path" even when Ingestion Mode was Polling because the required filter ignored showWhen visibility. Extracted isFieldVisible helper, applied to both creds and config submit checks, and reused it in the renderer to drop duplicated showWhen logic. Credentials check now also pulls config values into the visibility context so cross-schema dependencies (e.g. webhook_secret on transport) resolve correctly.
Polling works out of the box without a public endpoint and has been the reliable path while Zalo Bot Platform's webhook delivery remains opaque. Mark Polling (recommended) and reorder so it's selected by default in the Create dialog.
Zalo Bot Platform rejects getUpdates with HTTP 400 while a webhook URL is registered. Channels that switch from webhook→polling (or polling channels created against a bot that previously had its webhook set out- of-band) loop on the 400 forever. Call deleteWebhook on Start before spinning up the polling loop. Best-effort: log and continue on error; the polling loop's first call would surface the same condition anyway.
…ten lifecycle - webhook_signature_mode default flips strict→disabled so a freshly created OA channel processes events before the operator pastes the Khoá bí mật OA. Operators opt into log_only (migration) or strict. - Reaction tombstone replaced time.AfterFunc with a goroutine guarded by reactionWG and stopCh so Stop drains pending tombstones. - Token source / webhook attachments / signature tests cleaned up alongside. - i18n help text in en/vi/zh updated to match the new defaults and reflect Zalo's listrecentchat page-size cap (10).
- Replace string-grep error checking with errors.As pattern matching - Add APIError struct with Code field for semantic error handling - Update poll.go to use isAPIErrCode() instead of substring search
- Add NewSSRFSafeClient() with DialContext IP validation + redirect checks - Prevents DNS rebind TOCTOU and 3xx-to-private-IP bypasses - Use in zalo OA media download instead of plain http.Client
- poll_count: 50 → 10 (Zalo hard-caps at 10; error -210 if exceeded) - poll_burndown_max_pages: 5 → 10 (allow longer burndown cycles) - Update message ceiling math: 250 → 100 messages/cycle
- Privacy cleanup: dataplanelabs.com → example.com in test fixtures
- channel-advanced-dialog: filter unknown keys (e.g. removed webhook_url) out of existingConfig before MergeConfig so legacy DB rows stop being re-posted - channels-status-utils + contacts-page: add zalo_bot to label map and channel-type filter list - use-zalo-oa-connect: per-effect cancelled flag on consent fetch closes open→close→open race that aliveRef alone didn't cover
Replace hardcoded 169.254.169.254 with parameterized checks in SSRF validation. CIDR-based link-local and .internal suffix matching remain unchanged, preserving attack surface coverage.
This reverts commit 2efae27. 169.254.169.254 is the public, universally-documented cloud-metadata service address (AWS, GCP, Azure, etc.) — keeping it as a literal makes SSRF defense intent explicit and matches OWASP convention.
One-shot scraper not wired into CI; docs/zalo-error-codes.md is checked in and updated manually when needed.
…enant ctx - store: loadExistingCreds surfaces decrypt/unmarshal errors and is tenant-scoped (would otherwise silently wipe other credential fields on a partial update merge) - zalo bot: SSRF-check inbound photo URLs before downloadMedia - zalo oa: one tombstone goroutine per reaction controller via sync.Once - zalo webhook router: per-instance ctx carries tenant id - zalo oa poll: warn instead of silently dropping no-dedup messages - tools.CheckSSRF: reject non-http(s) schemes - web ui: validate webhook host URL before persisting; clear stale exchange error on code retype - docs: webhook_signature_mode defaults to disabled (operators must flip to strict for production)
Nginx was returning 405 Not Allowed for POST /channels/zalo/webhook/<slug> requests because they matched the SPA fallback `try_files` rule. Added explicit proxy location before the fallback to route webhook traffic to the backend. The /channels/zalo/webhook/ prefix covers both Zalo OA and Bot variants. Feishu and Pancake webhook channels have the same latent issue but are out of scope for GH-966.
Zalo Bot webhook handler parsed request body as flat {event_name, message}
struct, but per official spec (bot.zapps.me/docs/apis/webhook) platform
wraps every push in {ok, result: {event_name, message}}. Result: every
Bot inbound event silently dropped (zero-value zaloUpdate, nil Message),
dedup disabled because botMessageIDExtractor read wrong path
(message.message_id vs result.message.message_id).
Fix: HandleWebhookEvent unmarshals into zaloAPIResponse (existing type),
short-circuits on ok=false or empty result, then unmarshals result into
zaloUpdate. botMessageIDExtractor reads result.message.message_id.
Polling path unaffected — getUpdates already strips envelope inside
callAPIWith. Zalo OA unaffected (different signature/payload schema).
Tests: fixtures rewritten to wrapped shape; added docs-sample payload
test; added ok=false drop test; legacy unwrapped shape guard (prevents
silent schema drift).
Zalo posts webhook events directly ({event_name, message, ...}) — the
{ok, result} envelope only applies to polling getUpdates responses.
Previous unwrap-only path silently dropped real Zalo pushes (returned
200 but no inbound dispatch). Try the envelope first, fall back to
direct unmarshal so a future shape change won't black-hole traffic.
resolver 127.0.0.11 is Docker-only — in Kubernetes nginx couldn't
resolve the goclaw Service ('Connection refused while resolving') and
returned 502 on every /channels/zalo/webhook/<slug> POST. Other proxy
paths (/ws, /v1, /health) never went through nginx (Traefik routes
them direct to the gateway), so the latent bug surfaced only when the
new webhook proxy block was added in e6efaad.
Drop the resolver + variable form; use a static upstream block. nginx
resolves at config-load against the pod's /etc/resolv.conf — works in
both K8s (kube-dns) and docker-compose (Docker DNS).
…ation (GH-966) - Rename Ingestion Mode → Connection Mode for both Zalo channels (i18n en/vi/zh) - Zalo Bot defaults to webhook; option label "Webhook (recommended)" - Webhook Secret now pasted from bot.zapps.me; Generate button removed - Drop unused FieldDef.generatable, generate-secret util, and fieldConfig.generate i18n keys
SQLite v27 rename lockstep with PG migration 58 · bot SSRF-safe media client · bot Stop drains polling WaitGroup · getUpdates batch+single envelope decode · OA reaction stopCh gate + tombstone Once · reject oa_id mismatch on paste · pg credentials read-modify-write in tx with SELECT FOR UPDATE · migration 58 down trim oauth · UI form post-save setValues + autoComplete · tenant-scoped webhook localStorage · nginx upstream behavior doc note.
- oa/reactions: re-check stopCh under lock in SetStatus to close WaitGroup wg.Add-after-Wait panic window - oa/webhook: explicit oa_send_* drop; prefer message.time over envelope timestamp for cursor advance to avoid skipping later messages on burst - bot/typing: clarify race-window comment - common/webhook_router: move per-instance rate limit after signature verify so a guessed slug can't burn the bucket - gateway/methods: log security.cross_tenant_access_attempt on zalo_oa.consent_url, zalo_oa.exchange_code, zalo.webhook_url - migrations 000058 down: document rollback-with-binary-revert caveat - ui/web zalo-webhook-url-section: cancelled flag for stale setData - ui/web use-webhook-host: gate localStorage read on tenantId to prevent cross-tenant flash
- channel-fields: autoComplete="off" on non-password text inputs so browsers don't suggest unrelated saved values into webhook URLs / app IDs - oa/token_source: distinct slog.Info on pre-authorization refresh attempt with has_oa_id flag so ops can tell "never consented" apart from "consent dropped mid-flow"
🔍 Code Review — feat(channels/zalo): OA login flow + webhook transport (#966)🎯 Tổng quanAdds Zalo OA channel with login flow v4 + webhook transport, brings Zalo Bot to parity (shared router, modular layout), renames zalo_oa ↔ zalo_bot to match Zalo taxonomy. Scope: ~55 files, ~7.5k LOC — large but focused on single feature 🤖 CI Status + Merge ConflictsCI: ✅ All checks passed — go: pass (4m15s), web: pass (53s) ✅ Điểm Tốt
🔴 CRITICAL — Migration Conflict với PR #1057PR này dùng migration 000058 (rename zalo_oa ↔ zalo_bot). Cả 2 PR không thể merge cùng lúc mà không renumber. Nếu merge PR này trước, #1057 phải renumber 000058 → 000059. Ngược lại PR này phải renumber. Fix: Coordination giữa @vanducng và @tech-synity — ai merge trước, người kia renumber migration + update RequiredSchemaVersion. 🟡 MEDIUM
🟢 LOW
📊 Summary
💡 Recommendation🟢 APPROVE (với 1 điều kiện) Zalo OA integration chất lượng cao, security đúng pattern, testing kỹ, live smoke fixes thorough. Code clean, architecture solid. Điều kiện: Resolve migration conflict với PR #1057 trước khi merge. Cần coordination giữa @vanducng và @tech-synity để renumber một trong hai migrations. Great work @vanducng! Zalo OA là channel quan trọng cho B2C use case. Implementation production-ready. 🚀 |
Off by default. Operators who want CS-style threaded replies can opt in via quote_user_message=true. Existing instances with explicit `true` keep current behavior. - channels/zalo/oa/channel.go: QuoteInboundOnDM returns false when cfg.QuoteUserMessage is unset. - channels/zalo/oa/send_quote_test.go: 'unset_defaults_off' case expects false. - config/config_channels.go: field comment notes default false. - ui/web/src/pages/channels/channel-schemas.ts: defaultValue=false + help text adjusted.
…H-966) Prevents the heart/sad reaction from firing immediately when the agent reply lands. Terminal reactions (done/error) now scheduled via time.AfterFunc with jittered delay (reaction_terminal_delay_min_ms/max_ms, defaults 800–2000ms), scaled by reply character count (+1ms/char, capped +1.5s). Reuses debounce timer slot so Stop/ClearReaction still cancel cleanly.
What this does
Adds Zalo OA (Official Account) channel with OAuth + webhook, and brings the existing Zalo Bot channel up to parity (webhook mode, modular package layout, shared router).
Highlights
/channels/zalo/webhook/<slug>, HMAC signature verify, replay-window check, per-instance dedup + rate limit.zalo_oa↔zalo_botswap (old names were inverted vs. Zalo's own product taxonomy). Migration000058handles existing rows; SQLite has a parallel patch.Closes #966.