Skip to content

feat: Add owner response extraction for reviews#228

Open
rakhaxor wants to merge 2 commits into
gosom:mainfrom
rakhaxor:feature/owner-response-extraction
Open

feat: Add owner response extraction for reviews#228
rakhaxor wants to merge 2 commits into
gosom:mainfrom
rakhaxor:feature/owner-response-extraction

Conversation

@rakhaxor
Copy link
Copy Markdown

@rakhaxor rakhaxor commented Feb 6, 2026

Summary

This PR adds the ability to extract owner/business responses to Google reviews. Currently, the scraper only captures reviewer information but misses whether the business has responded to reviews.

Changes

gmaps/entry.go:

  • Added OwnerResponse and OwnerResponseTime fields to the Review struct
  • Added extraction logic in parseReviews() to capture owner response data from the Google Maps API response
  • Owner response text is at path [3][14][0][0] and response time at [3][3]

gmaps/reviews.go:

  • Added OwnerResponse and OwnerResponseTime fields to DOMReview struct
  • Added DOM extraction JavaScript to capture owner responses from the rendered page
  • Updated ConvertDOMReviewsToReviews() to map the new fields

Use Case

This feature enables users to:

  • Track which reviews have been responded to by the business
  • Analyze response times and patterns
  • Calculate response rates for reputation management
  • Monitor customer service engagement

Testing

Tested with multiple businesses:

  • Successfully extracts owner responses where present
  • Returns empty strings when no owner response exists
  • Works with both API extraction and DOM fallback methods

Backward Compatibility

This change is fully backward compatible - it only adds new optional fields to the output.


🤖 Generated with Claude Code

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds support for extracting business/owner replies to Google reviews across both the API-based parser and the DOM fallback scraper.

Changes:

  • Extended Review / DOMReview models with owner response text + time fields
  • Added API parsing at specific nested array paths for owner response data
  • Added DOM-side JS extraction + mapping of owner response fields into Review

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
gmaps/reviews.go Adds DOM extraction for owner responses and maps new DOM fields into Review.
gmaps/entry.go Extends Review and parses owner response fields from the API response structure.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread gmaps/entry.go
Comment on lines +514 to +515
ownerResponse := getNthElementAndCast[string](el, 3, 14, 0, 0)
ownerResponseTime := getNthElementAndCast[string](el, 3, 3)
Copy link

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ownerResponseTime is being cast to string from [3][3], but this position in these Google payloads is commonly a numeric timestamp or an array (similar to how When is formatted from a date tuple). With the current getNthElementAndCast[string] it will silently become an empty string when the underlying type isn’t string. Consider extracting it as any (or the expected tuple type) and normalizing/formatting it consistently (e.g., handle []any date tuples or numeric timestamps via fmt.Sprint / dedicated conversion).

Copilot uses AI. Check for mistakes.
Comment thread gmaps/reviews.go Outdated
Comment on lines +593 to +608
const responseSelectors = [
'.CDe7pd', // Owner response container
'.wiI7pd.xwPlne', // Alternative response text
'.review-response', // Generic
'.owner-response', // Generic
];
for (const sel of responseSelectors) {
const responseEl = element.querySelector(sel);
if (responseEl) {
ownerResponse = responseEl.textContent?.trim() || '';
// Try to find response time nearby
const responseTimeEl = responseEl.closest('.review-response-container')?.querySelector('.rsqaWe') ||
responseEl.parentElement?.querySelector('.rsqaWe, .dehysf');
if (responseTimeEl) {
ownerResponseTime = responseTimeEl.textContent?.trim() || '';
}
Copy link

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The selector list includes classes that are frequently used for the review body itself (not an owner reply) in Google Maps markup (notably .wiI7pd..., and often .CDe7pd as a general container). This can cause ownerResponse to be populated with the reviewer’s text (duplicating text) even when no owner response exists. Tighten the DOM targeting to an owner-response-specific container (e.g., anchor off a "Response from the owner" label/role, or require the presence of the response metadata/time element within the same container before accepting it).

Suggested change
const responseSelectors = [
'.CDe7pd', // Owner response container
'.wiI7pd.xwPlne', // Alternative response text
'.review-response', // Generic
'.owner-response', // Generic
];
for (const sel of responseSelectors) {
const responseEl = element.querySelector(sel);
if (responseEl) {
ownerResponse = responseEl.textContent?.trim() || '';
// Try to find response time nearby
const responseTimeEl = responseEl.closest('.review-response-container')?.querySelector('.rsqaWe') ||
responseEl.parentElement?.querySelector('.rsqaWe, .dehysf');
if (responseTimeEl) {
ownerResponseTime = responseTimeEl.textContent?.trim() || '';
}
// Prefer containers explicitly marked as an owner response (via ARIA/labels)
let ownerResponseContainer =
element.querySelector('[aria-label="Response from the owner"]') ||
(Array.from(element.querySelectorAll('[role="region"]')).find((region) => {
const label = (region.getAttribute('aria-label') || '').toLowerCase();
return label.includes('response from the owner') || label.includes('owner response');
}) || null);
if (ownerResponseContainer) {
const responseTextEl = ownerResponseContainer.querySelector('.review-response, .owner-response');
if (responseTextEl) {
ownerResponse = responseTextEl.textContent?.trim() || '';
}
const responseTimeEl = ownerResponseContainer.querySelector('.rsqaWe, .dehysf');
if (responseTimeEl) {
ownerResponseTime = responseTimeEl.textContent?.trim() || '';
}
} else {
// Conservative fallback: only treat as owner response if text + time metadata coexist
const possibleResponseEls = element.querySelectorAll('.review-response, .owner-response');
for (const responseEl of possibleResponseEls) {
const container = responseEl.closest('.review-response-container') || responseEl.parentElement;
if (!container) continue;
const responseTimeEl = container.querySelector('.rsqaWe, .dehysf');
if (!responseTimeEl) continue;
ownerResponse = responseEl.textContent?.trim() || '';
ownerResponseTime = responseTimeEl.textContent?.trim() || '';

Copilot uses AI. Check for mistakes.
Comment thread gmaps/reviews.go Outdated
let ownerResponseTime = '';
const responseSelectors = [
'.CDe7pd', // Owner response container
'.wiI7pd.xwPlne', // Alternative response text
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would love to merge this feature but I have noticed that using these type of css selector is very fragile since they change too often.

Can you confirm that this is still stable.

@rakhaxor
Copy link
Copy Markdown
Author

Hi @gosom,

Thanks for flagging the selector fragility — your concern was correct. I went and verified it on live pages on 2026-04-28 and confirmed several of the original class selectors had already started rotating: .CDe7pd and .wiI7pd.xwPlne are missing on some places, .WMbnJf / .bwb7ce / .dehysf no longer match, and Google has changed the per-review rating from a [role="img"] star aria-label into plain text "4/5" inside .DU9Pgb. The PR as originally written would have started silently dropping owner responses and ratings.

I refactored the JS extractor in extractReviewsFromPage (gmaps/reviews.go) to lead with stable structural / semantic anchors and keep the obfuscated class names only as fallbacks. Per-field rationale:

Container div[data-review-id] (deduped — Google nests an empty

inside each outer wrapper, which would otherwise double-count every review)
Author a[href*="/maps/contrib/"], button[data-href*="/contrib/"], [aria-label*="Photo of"]
Profile pic img[src*="googleusercontent"], img[src*="lh3.google"]
Rating [role="img"][aria-label*="star" i] FIRST, then text-pattern fallback ^N/5$ (catches the new plain-text format)
Time regex extract (\d+ <unit> ago | a <unit> ago | just now | yesterday | today | il y a N <unit>) with negative lookbehind (?<![\d\/]) so a rating glued to the time ("4/53 months ago") doesn't poison the match
Review text [data-expandable-section] span, span[jsname], then class fallback (.wiI7pd, .MyEned span, etc.)
"More" button button[aria-expanded="false"], button[aria-label*="More" i], button[jsaction*="expand"]
Owner response TEXT-CONTENT match: a localized "Response from the owner" header regex covering English / French / Spanish / German / Italian / Portuguese / Dutch / Polish / Danish / Swedish / Turkish, then the response body is the longest sibling text inside the same container. Class names (.CDe7pd etc.) are kept only as Strategy-2 fallback.

One important behavioural fix I had to ship inside the JS: clicking the "More" button on Google Maps schedules an animated DOM update — reading textContent synchronously after btn.click() returns the still-truncated string. The original code did exactly this, so reviews longer than ~150 chars and replies longer than ~500 chars were captured truncated. The fix is to do a single global pre-pass clicking every "More"/"Voir plus"/etc. button across the panel, await new Promise(r => setTimeout(r, 800)) for the expansion to settle, then iterate and read. The page.Eval callback is now an async () => {…}. Per-element click loops were removed since they're now redundant.

Live verification, 6 places (April 28, 2026, before the cleanup):

Hilton Times Square (NYC, EN) 10 reviews via DOM: 10 names / 9 ratings / 10 times / 9 owner responses [the 1 zero is a Priceline-syndicated review with no GMaps rating]
Eleven Madison Park (NYC, EN) 10 reviews: 10 / 10 / 10 / 0 [restaurant doesn't reply on Google]
Mr. Tire Auto Service (Baltimore) 10 reviews: 10 / 10 / 10 / 10
Prabha Krishnan DDS (NYC) 10 reviews: 10 / 10 / 10 / 9 [1 reviewer legit no reply]
Le Bristol Paris (Paris, FR) 10 reviews: 10 / 10 / 10 / 10 [validates "Réponse du propriétaire" + "il y a N mois"]
Blue Bottle Coffee (San Francisco) 10 reviews: 10 / 10 / 10 / 0 [chain location doesn't reply]

End-to-end production-CLI runs (RPC path, primary code path) on 2 places to sanity-check the bundled changes:

Hilton Times Square: 1000/1000 with text, 952/1000 with rating, 662/1000 with owner reply, longest reply 2432 chars, no truncation, ~14s.
Taj Chandigarh: 1000/1000 with text, 992/1000 with rating, 557/1000 with owner reply, longest reply 1698 chars, no truncation, ~14s.

Two unrelated correctness fixes I bundled in this PR since I was already in the file:

  1. Container dedupe in the DOM extractor: a <div data-review-id="..."> outer wrapper has a nested empty <div data-review-id="..."> inside it (a marker), which double-counts every review. Fixed by filtering elements whose ancestor matches the same selector.
  2. extractPlaceID was writing into patterns map[string]*regexp.Regexp inside patternsOnce.Do(...) without ever calling make(...) first — would panic on the first call. One-line fix.

Happy to split (1) and (2) out into their own PR if you'd prefer to keep this one focused on the owner-response feature — just say the word.

The selectors I marked STABLE in the comments are anchors Google would have to break their own product schema URL conventions or accessibility model to invalidate, so they should be sturdier than the obfuscated class names. Class fallbacks remain so we still match cases where Google moves things around but keeps .wiI7pd / .d4r55 / etc., which has been the historical pattern.

Let me know if you'd prefer:
(a) drop the class-name fallbacks entirely and rely only on the semantic anchors,
(b) split the dedupe + extractPlaceID-panic fixes into separate PRs,
(c) anything else.

Otherwise the code is ready for re-review.

This adds the ability to capture business owner responses to reviews.

Changes:
- Added OwnerResponse and OwnerResponseTime fields to Review struct
- Updated parseReviews() to extract owner response from JSON paths [3][14][0][0] (text) and [3][3] (time)
- Updated DOMReview struct and DOM extraction JavaScript for fallback method
- Updated ConvertDOMReviewsToReviews() to map new fields

The owner response data is extracted from Google Maps' review JSON structure where:
- Owner response text is at index [3][14][0][0]
- Owner response time (e.g., "2 months ago") is at index [3][3]

This enables users to track which reviews have already been responded to
by the business owner, useful for review management workflows.
- Lead with structural/semantic anchors (data-review-id, ARIA roles,
  contributor URLs) before obfuscated class names, several of which
  (.CDe7pd, .wiI7pd.xwPlne, .WMbnJf, .bwb7ce, .dehysf) have rotated.
- Detect owner response via localized "Response from the owner" header
  regex (en/fr/es/de/it/pt/nl/pl/da/sv/tr) instead of class selectors.
- Click "More"/"Voir plus" expanders globally and await 800ms for the
  animated DOM update before reading textContent. Synchronous reads
  were silently capturing truncated text for long reviews/replies.
- Dedupe nested empty <div data-review-id> markers that double-counted
  every review.
- Rating extraction adds plain-text "N/5" fallback for the new format
  Google now uses; star aria-label remains primary.
- Time extraction uses regex with negative lookbehind so a rating glued
  to the time string ("4/53 months ago") no longer poisons the match.
- Fix uninitialized patterns map in extractPlaceID that would panic on
  the first call.
- Add ReviewId field to Review struct, populated from RPC JSON-array
  paths when available.
@rakhaxor rakhaxor force-pushed the feature/owner-response-extraction branch from ea7b4f8 to a16653e Compare April 28, 2026 09:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants