feat(gdocs): extract smart chips (date, person, rich link) when reading documents by argo49 · Pull Request #635 · taylorwilsdon/google_workspace_mcp

argo49 · 2026-03-30T02:23:47Z

Description

Date chips, person chips, and rich link chips were silently dropped when reading Google Docs because only textRun elements were handled. Add support for dateElement, person, and richLink paragraph elements across markdown conversion, plain text extraction, and structure parsing. Consolidate duplicated paragraph extraction logic into a shared extract_paragraph_text() function.

Type of Change

Bug fix (non-breaking change which fixes an issue)
[ x ] New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update

Testing

[ x ] I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
[ x ] I have tested this change manually

Checklist

My code follows the style guidelines of this project
[ x ] I have performed a self-review of my own code
[ x ] I have commented my code, particularly in hard-to-understand areas
[ x ] My changes generate no new warnings
[ x ] I have enabled "Allow edits from maintainers" for this pull request

Additional Notes

Add any other context about the pull request here.

Tests were failing on a clean main branch run before my changes.

⚠️ IMPORTANT: This repository requires that you enable "Allow edits from maintainers" when creating your pull request. This allows maintainers to make small fixes and improvements directly to your branch, speeding up the review process.

To enable this setting:

When creating the PR, check the "Allow edits from maintainers" checkbox
If you've already created the PR, you can enable this in the PR sidebar under "Allow edits from maintainers"

Summary by CodeRabbit

New Features
- Google Docs rich links, person references, and date elements are now recognized and converted into appropriate Markdown/text representations within paragraphs and table cells.
Tests
- Added comprehensive tests covering rich links, person chips, date elements, mixed paragraphs, and table cell handling to validate Markdown output and structure extraction.
Chores
- Paragraph text extraction was made a public/shared utility (resulting in an API surface change).

…ng documents Date chips, person chips, and rich link chips were silently dropped when reading Google Docs because only textRun elements were handled. Add support for dateElement, person, and richLink paragraph elements across markdown conversion, plain text extraction, and structure parsing. Consolidate duplicated paragraph extraction logic into a shared extract_paragraph_text() function.

coderabbitai · 2026-03-30T02:24:06Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: fac27a90-ca3f-4d9b-97ef-00c287dd9a5c

📥 Commits

Reviewing files that changed from the base of the PR and between 1427764 and 5200d31.

📒 Files selected for processing (1)

gdocs/docs_structure.py

🚧 Files skipped from review as they are similar to previous changes (1)

gdocs/docs_structure.py

📝 Walkthrough

Walkthrough

This PR adds support for richLink, person, and dateElement inline elements to paragraph extraction and Markdown conversion, refactors the paragraph extractor into a public extract_paragraph_text function, updates call sites, and expands tests covering these smart-chip scenarios.

Changes

Cohort / File(s)	Summary
Core element handling `gdocs/docs_markdown.py`, `gdocs/docs_structure.py`	Add handling for `richLink` → Markdown link (title/uri fallback), `person` → `name (email)` or fallback, and `dateElement` → `displayText`.
Paragraph extractor refactor `gdocs/docs_structure.py`, `gdocs/docs_tools.py`	Rename `_extract_paragraph_text` → `extract_paragraph_text` (public) and replace inline paragraph-element logic with calls to the new extractor.
Tests & fixtures `tests/gdocs/test_docs_markdown.py`	Add fixtures and tests for richLink variants, person chips, dateElement cases, mixed paragraphs, and table-cell chip handling; assert both Markdown output and parsed structure text.

Sequence Diagram(s)

(omitted)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

feat(docs): add get_doc_as_markdown tool with comment context #490: Prior work introducing paragraph-parsing logic that this change extends with smart-chip handling and extractor refactor.

Suggested labels

enhancement

Poem

🐰 I nibble through docs with a twitchy nose,

richLinks gleam where the link-text grows,
people appear as “Name (email)” bright,
dates hop in, displayed just right—
a carrot of markdown tucked snug for the night.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 45.83% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The pull request title clearly and specifically summarizes the main change: adding extraction support for smart chips (date, person, and rich link elements) in Google Docs document reading.
Description check	✅ Passed	The pull request description follows the required template structure with all main sections completed: clear description of changes, type of change selected (new feature), testing details provided, and most checklist items checked including enabling maintainer edits.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

gdocs/docs_structure.py (1)

202-216: ⚠️ Potential issue | 🔴 Critical

Critical: _extract_paragraph_text is undefined — function was renamed but this call site was not updated.

Line 208 calls _extract_paragraph_text() which no longer exists. The function was renamed to extract_paragraph_text() (public, without underscore prefix) but this call site in _parse_segment() was not updated. This will raise a NameError at runtime when parsing document headers or footers.

🐛 Proposed fix

 def _parse_segment(segment_data: dict[str, Any]) -> dict[str, Any]:
     """Parse a document segment (header/footer)."""
     content = segment_data.get("content", [])
     text_parts = []
     for element in content:
         if "paragraph" in element:
-            text_parts.append(_extract_paragraph_text(element["paragraph"]))
+            text_parts.append(extract_paragraph_text(element["paragraph"]))
 
     return {
         "content": content,
         "start_index": content[0].get("startIndex", 0) if content else 0,
         "end_index": content[-1].get("endIndex", 0) if content else 0,
         "text_preview": "".join(text_parts)[:100],
         "element_count": len(content),
     }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@gdocs/docs_structure.py` around lines 202 - 216, The call in _parse_segment
still uses the old private name _extract_paragraph_text which no longer exists;
update the call to use the renamed public function extract_paragraph_text (i.e.,
replace _extract_paragraph_text(...) with extract_paragraph_text(...)) and
ensure the function is imported or defined in the same module so no NameError
occurs at runtime.

🧹 Nitpick comments (1)

tests/gdocs/test_docs_markdown.py (1)
734-748: Consider moving DATE_ELEMENT_DOC fixture before TestStructureWithChips for clarity.

The DATE_ELEMENT_DOC fixture (line 753) is defined after the TestStructureWithChips class that references it (line 746). While Python resolves names at call time so this works correctly, placing fixtures before their usage improves readability and follows the pattern established by other fixtures in this file.

Also applies to: 751-781
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/gdocs/test_docs_markdown.py` around lines 734 - 748, The
TestStructureWithChips class references the DATE_ELEMENT_DOC fixture which is
defined later in the file; move the DATE_ELEMENT_DOC fixture definition so it
appears before the TestStructureWithChips class (and similarly reorder any other
fixtures in the 751-781 range that are used by tests) to improve readability and
follow the file's fixture-before-usage pattern — update the file so
DATE_ELEMENT_DOC (and related fixtures) are declared prior to
TestStructureWithChips while keeping names and contents unchanged.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@gdocs/docs_structure.py`:
- Around line 202-216: The call in _parse_segment still uses the old private
name _extract_paragraph_text which no longer exists; update the call to use the
renamed public function extract_paragraph_text (i.e., replace
_extract_paragraph_text(...) with extract_paragraph_text(...)) and ensure the
function is imported or defined in the same module so no NameError occurs at
runtime.

---

Nitpick comments:
In `@tests/gdocs/test_docs_markdown.py`:
- Around line 734-748: The TestStructureWithChips class references the
DATE_ELEMENT_DOC fixture which is defined later in the file; move the
DATE_ELEMENT_DOC fixture definition so it appears before the
TestStructureWithChips class (and similarly reorder any other fixtures in the
751-781 range that are used by tests) to improve readability and follow the
file's fixture-before-usage pattern — update the file so DATE_ELEMENT_DOC (and
related fixtures) are declared prior to TestStructureWithChips while keeping
names and contents unchanged.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4903999c-1651-400b-9f0c-55e720cc92d5

📥 Commits

Reviewing files that changed from the base of the PR and between 45a5f93 and 1427764.

📒 Files selected for processing (4)

gdocs/docs_markdown.py
gdocs/docs_structure.py
gdocs/docs_tools.py
tests/gdocs/test_docs_markdown.py

…_segment

taylorwilsdon · 2026-04-04T00:56:46Z

I think we may have resolved this in #649 - can you confirm?

coderabbitai Bot reviewed Mar 30, 2026

View reviewed changes

fix(gdocs): fix remaining _extract_paragraph_text reference in _parse…

5200d31

…_segment

taylorwilsdon self-assigned this Apr 1, 2026

taylorwilsdon added the enhancement New feature or request label Apr 1, 2026

argo49 closed this Apr 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(gdocs): extract smart chips (date, person, rich link) when reading documents#635

feat(gdocs): extract smart chips (date, person, rich link) when reading documents#635
argo49 wants to merge 2 commits into
taylorwilsdon:mainfrom
argo49:feat/chip-extraction

argo49 commented Mar 30, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Mar 30, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

taylorwilsdon commented Apr 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

argo49 commented Mar 30, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of Change

Testing

Checklist

Additional Notes

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

taylorwilsdon commented Apr 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

argo49 commented Mar 30, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Mar 30, 2026 •

edited

Loading