This guide addresses the issue reported in #2547 where Google Docs API returns date elements and other smart chips as Unicode placeholder characters.
When using the Google Docs API to retrieve document content, date elements, smart chips, and other non-text elements are returned as Unicode Private Use Area characters, specifically \ue907 (U+E907).
# This is what you see in the API response:
{'startIndex': 1, 'endIndex': 5, 'textRun': {'content': '\ue907 | ', 'textStyle': {}}}Instead of getting readable text like "Jan 13, 2025", you get the Unicode character \ue907.
This is intentional behavior by the Google Docs API, not a bug in the client library. According to the official API documentation:
TextRun.content: "The text of this run. Any non-text elements in the run are replaced with the Unicode character U+E907."
Google Docs uses Private Use Area Unicode characters to represent:
- Smart chips (dates, people, places)
- Code blocks
- Special embedded elements
- Rich content that isn't plain text
The actual date/chip information is typically available in richLink properties within the same document elements.
from googleapiclient.discovery import build
# Unicode placeholder character used by Google Docs API
SMART_CHIP_PLACEHOLDER = '\ue907'
# When processing document elements
for element in elements:
if 'textRun' in element:
content = element['textRun']['content']
# Check for smart chip placeholder
if SMART_CHIP_PLACEHOLDER in content:
print("Found smart chip/date placeholder!")
# Make readable for display
readable = content.replace(SMART_CHIP_PLACEHOLDER, '[DATE/CHIP]')
print(f"Display version: {readable}")
# IMPORTANT: Look for actual data in rich links
if 'richLink' in element:
props = element['richLink'].get('richLinkProperties', {})
actual_date = props.get('title', '') # Real date is here!
calendar_uri = props.get('uri', '')
print(f"Actual date: {actual_date}")def has_smart_chips(text):
"""Check if text contains Google Docs smart chip placeholders."""
return '\ue907' in text
def make_readable(text):
"""Replace Unicode placeholders with readable text."""
return text.replace('\ue907', '[DATE/CHIP]')While the text content shows \ue907, you can find the actual date/chip data in:
# Look for richLink elements with the actual data
for element in elements:
if 'richLink' in element:
rich_link = element['richLink']
if 'richLinkProperties' in rich_link:
props = rich_link['richLinkProperties']
actual_date = props.get('title', '') # "Jan 13, 2025"
calendar_uri = props.get('uri', '') # Calendar event link
print(f"Found actual date: {actual_date}")
if 'calendar.google.com' in calendar_uri:
print("This is a Google Calendar event!")Analyze surrounding elements for clues:
def find_date_context(elements, chip_index):
"""Look at adjacent elements for date context."""
before = elements[chip_index - 1] if chip_index > 0 else None
after = elements[chip_index + 1] if chip_index < len(elements) - 1 else None
# Examine textRun content in adjacent elementsFor calendar events, use the Calendar API:
from googleapiclient.discovery import build
def get_event_details(calendar_uri, credentials):
"""Extract full event details from calendar URI."""
calendar_service = build('calendar', 'v3', credentials=credentials)
# Parse event ID from URI and fetch complete event dataSee examples/docs_unicode_dates_example.py for a working example that demonstrates:
- Detecting Unicode placeholder characters
- Extracting actual dates from rich link properties
- Analyzing document structure
- Best practices for handling smart chips
- Always check for rich links - They often contain the actual data
- Use context analysis - Surrounding text can provide clues
- Don't rely solely on textRun content - It's designed to be a fallback
- Consider the document structure - Smart chips are part of larger semantic elements
- Use appropriate APIs - For calendar events, use Calendar API; for contacts, use People API
- This is not a bug - it's intentional API design
- The client library correctly returns what the API provides
- Focus on extracting semantic meaning rather than display text
- Different smart chip types may require different extraction strategies