feat: Add owner response extraction for reviews#228
Conversation
There was a problem hiding this comment.
Pull request overview
Adds support for extracting business/owner replies to Google reviews across both the API-based parser and the DOM fallback scraper.
Changes:
- Extended
Review/DOMReviewmodels with owner response text + time fields - Added API parsing at specific nested array paths for owner response data
- Added DOM-side JS extraction + mapping of owner response fields into
Review
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| gmaps/reviews.go | Adds DOM extraction for owner responses and maps new DOM fields into Review. |
| gmaps/entry.go | Extends Review and parses owner response fields from the API response structure. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| ownerResponse := getNthElementAndCast[string](el, 3, 14, 0, 0) | ||
| ownerResponseTime := getNthElementAndCast[string](el, 3, 3) |
There was a problem hiding this comment.
ownerResponseTime is being cast to string from [3][3], but this position in these Google payloads is commonly a numeric timestamp or an array (similar to how When is formatted from a date tuple). With the current getNthElementAndCast[string] it will silently become an empty string when the underlying type isn’t string. Consider extracting it as any (or the expected tuple type) and normalizing/formatting it consistently (e.g., handle []any date tuples or numeric timestamps via fmt.Sprint / dedicated conversion).
| const responseSelectors = [ | ||
| '.CDe7pd', // Owner response container | ||
| '.wiI7pd.xwPlne', // Alternative response text | ||
| '.review-response', // Generic | ||
| '.owner-response', // Generic | ||
| ]; | ||
| for (const sel of responseSelectors) { | ||
| const responseEl = element.querySelector(sel); | ||
| if (responseEl) { | ||
| ownerResponse = responseEl.textContent?.trim() || ''; | ||
| // Try to find response time nearby | ||
| const responseTimeEl = responseEl.closest('.review-response-container')?.querySelector('.rsqaWe') || | ||
| responseEl.parentElement?.querySelector('.rsqaWe, .dehysf'); | ||
| if (responseTimeEl) { | ||
| ownerResponseTime = responseTimeEl.textContent?.trim() || ''; | ||
| } |
There was a problem hiding this comment.
The selector list includes classes that are frequently used for the review body itself (not an owner reply) in Google Maps markup (notably .wiI7pd..., and often .CDe7pd as a general container). This can cause ownerResponse to be populated with the reviewer’s text (duplicating text) even when no owner response exists. Tighten the DOM targeting to an owner-response-specific container (e.g., anchor off a "Response from the owner" label/role, or require the presence of the response metadata/time element within the same container before accepting it).
| const responseSelectors = [ | |
| '.CDe7pd', // Owner response container | |
| '.wiI7pd.xwPlne', // Alternative response text | |
| '.review-response', // Generic | |
| '.owner-response', // Generic | |
| ]; | |
| for (const sel of responseSelectors) { | |
| const responseEl = element.querySelector(sel); | |
| if (responseEl) { | |
| ownerResponse = responseEl.textContent?.trim() || ''; | |
| // Try to find response time nearby | |
| const responseTimeEl = responseEl.closest('.review-response-container')?.querySelector('.rsqaWe') || | |
| responseEl.parentElement?.querySelector('.rsqaWe, .dehysf'); | |
| if (responseTimeEl) { | |
| ownerResponseTime = responseTimeEl.textContent?.trim() || ''; | |
| } | |
| // Prefer containers explicitly marked as an owner response (via ARIA/labels) | |
| let ownerResponseContainer = | |
| element.querySelector('[aria-label="Response from the owner"]') || | |
| (Array.from(element.querySelectorAll('[role="region"]')).find((region) => { | |
| const label = (region.getAttribute('aria-label') || '').toLowerCase(); | |
| return label.includes('response from the owner') || label.includes('owner response'); | |
| }) || null); | |
| if (ownerResponseContainer) { | |
| const responseTextEl = ownerResponseContainer.querySelector('.review-response, .owner-response'); | |
| if (responseTextEl) { | |
| ownerResponse = responseTextEl.textContent?.trim() || ''; | |
| } | |
| const responseTimeEl = ownerResponseContainer.querySelector('.rsqaWe, .dehysf'); | |
| if (responseTimeEl) { | |
| ownerResponseTime = responseTimeEl.textContent?.trim() || ''; | |
| } | |
| } else { | |
| // Conservative fallback: only treat as owner response if text + time metadata coexist | |
| const possibleResponseEls = element.querySelectorAll('.review-response, .owner-response'); | |
| for (const responseEl of possibleResponseEls) { | |
| const container = responseEl.closest('.review-response-container') || responseEl.parentElement; | |
| if (!container) continue; | |
| const responseTimeEl = container.querySelector('.rsqaWe, .dehysf'); | |
| if (!responseTimeEl) continue; | |
| ownerResponse = responseEl.textContent?.trim() || ''; | |
| ownerResponseTime = responseTimeEl.textContent?.trim() || ''; |
| let ownerResponseTime = ''; | ||
| const responseSelectors = [ | ||
| '.CDe7pd', // Owner response container | ||
| '.wiI7pd.xwPlne', // Alternative response text |
There was a problem hiding this comment.
I would love to merge this feature but I have noticed that using these type of css selector is very fragile since they change too often.
Can you confirm that this is still stable.
|
Hi @gosom, Thanks for flagging the selector fragility — your concern was correct. I went and verified it on live pages on 2026-04-28 and confirmed several of the original class selectors had already started rotating: I refactored the JS extractor in Container div[data-review-id] (deduped — Google nests an empty inside each outer wrapper, which would otherwise double-count every review)
Author a[href*="/maps/contrib/"], button[data-href*="/contrib/"], [aria-label*="Photo of"] Profile pic img[src*="googleusercontent"], img[src*="lh3.google"] Rating [role="img"][aria-label*="star" i] FIRST, then text-pattern fallback ^N/5$ (catches the new plain-text format)Time regex extract (\d+ <unit> ago | a <unit> ago | just now | yesterday | today | il y a N <unit>) with negative lookbehind (?<![\d\/]) so a rating glued to the time ("4/53 months ago") doesn't poison the matchReview text [data-expandable-section] span, span[jsname], then class fallback (.wiI7pd, .MyEned span, etc.) "More" button button[aria-expanded="false"], button[aria-label*="More" i], button[jsaction*="expand"] Owner response TEXT-CONTENT match: a localized "Response from the owner" header regex covering English / French / Spanish / German / Italian / Portuguese / Dutch / Polish / Danish / Swedish / Turkish, then the response body is the longest sibling text inside the same container. Class names (.CDe7pd etc.) are kept only as Strategy-2 fallback. One important behavioural fix I had to ship inside the JS: clicking the "More" button on Google Maps schedules an animated DOM update — reading Live verification, 6 places (April 28, 2026, before the cleanup): Hilton Times Square (NYC, EN) 10 reviews via DOM: 10 names / 9 ratings / 10 times / 9 owner responses [the 1 zero is a Priceline-syndicated review with no GMaps rating] End-to-end production-CLI runs (RPC path, primary code path) on 2 places to sanity-check the bundled changes: Hilton Times Square: 1000/1000 with text, 952/1000 with rating, 662/1000 with owner reply, longest reply 2432 chars, no truncation, ~14s. Two unrelated correctness fixes I bundled in this PR since I was already in the file:
Happy to split (1) and (2) out into their own PR if you'd prefer to keep this one focused on the owner-response feature — just say the word. The selectors I marked STABLE in the comments are anchors Google would have to break their own product schema URL conventions or accessibility model to invalidate, so they should be sturdier than the obfuscated class names. Class fallbacks remain so we still match cases where Google moves things around but keeps Let me know if you'd prefer: Otherwise the code is ready for re-review. |
This adds the ability to capture business owner responses to reviews. Changes: - Added OwnerResponse and OwnerResponseTime fields to Review struct - Updated parseReviews() to extract owner response from JSON paths [3][14][0][0] (text) and [3][3] (time) - Updated DOMReview struct and DOM extraction JavaScript for fallback method - Updated ConvertDOMReviewsToReviews() to map new fields The owner response data is extracted from Google Maps' review JSON structure where: - Owner response text is at index [3][14][0][0] - Owner response time (e.g., "2 months ago") is at index [3][3] This enables users to track which reviews have already been responded to by the business owner, useful for review management workflows.
- Lead with structural/semantic anchors (data-review-id, ARIA roles,
contributor URLs) before obfuscated class names, several of which
(.CDe7pd, .wiI7pd.xwPlne, .WMbnJf, .bwb7ce, .dehysf) have rotated.
- Detect owner response via localized "Response from the owner" header
regex (en/fr/es/de/it/pt/nl/pl/da/sv/tr) instead of class selectors.
- Click "More"/"Voir plus" expanders globally and await 800ms for the
animated DOM update before reading textContent. Synchronous reads
were silently capturing truncated text for long reviews/replies.
- Dedupe nested empty <div data-review-id> markers that double-counted
every review.
- Rating extraction adds plain-text "N/5" fallback for the new format
Google now uses; star aria-label remains primary.
- Time extraction uses regex with negative lookbehind so a rating glued
to the time string ("4/53 months ago") no longer poisons the match.
- Fix uninitialized patterns map in extractPlaceID that would panic on
the first call.
- Add ReviewId field to Review struct, populated from RPC JSON-array
paths when available.
ea7b4f8 to
a16653e
Compare
Summary
This PR adds the ability to extract owner/business responses to Google reviews. Currently, the scraper only captures reviewer information but misses whether the business has responded to reviews.
Changes
gmaps/entry.go:OwnerResponseandOwnerResponseTimefields to theReviewstructparseReviews()to capture owner response data from the Google Maps API response[3][14][0][0]and response time at[3][3]gmaps/reviews.go:OwnerResponseandOwnerResponseTimefields toDOMReviewstructConvertDOMReviewsToReviews()to map the new fieldsUse Case
This feature enables users to:
Testing
Tested with multiple businesses:
Backward Compatibility
This change is fully backward compatible - it only adds new optional fields to the output.
🤖 Generated with Claude Code