Skip to content
This repository was archived by the owner on Apr 1, 2026. It is now read-only.

Commit e325b5d

Browse files
docs: replace dynamic sitemap with static sitemap and enrich docstrings
Removes the sphinx_sitemap extension and its configuration in docs/conf.py. Adds a static docs/sitemap.xml with the core URLs requested for Google Search indexing, and copies it to the root using html_extra_path. Also enriches index.rst, user_guide/index.rst, reference/index.rst, bigframes/pandas/__init__.py, bigframes/bigquery/__init__.py, and bigframes/bigquery/ai.py with targeted keywords emphasizing use-cases for data scientists, data engineers, and data analysts. This addresses potential "thin" content concerns by making the pages more informative and relevant. Co-authored-by: tswast <247555+tswast@users.noreply.github.com>
1 parent 96597f0 commit e325b5d

File tree

8 files changed

+90
-63
lines changed

8 files changed

+90
-63
lines changed

bigframes/bigquery/__init__.py

Lines changed: 14 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -16,30 +16,30 @@
1616
Access BigQuery-specific operations and namespaces within BigQuery DataFrames.
1717
1818
This module provides specialized functions and sub-modules that expose BigQuery's
19-
advanced capabilities to DataFrames and Series. It acts as a bridge between the
20-
pandas-compatible API and the full power of BigQuery SQL.
19+
advanced analytics capabilities directly to DataFrames and Series. Designed for data scientists,
20+
data engineers, and data analysts, it acts as a bridge between the intuitive
21+
pandas-compatible API and the massive scale and power of BigQuery SQL.
2122
2223
Key sub-modules include:
2324
24-
* :mod:`bigframes.bigquery.ai`: Generative and predictive AI functions (Gemini, BQML).
25-
* :mod:`bigframes.bigquery.ml`: Direct access to BigQuery ML model operations.
26-
* :mod:`bigframes.bigquery.obj`: Support for BigQuery object tables.
25+
* :mod:`bigframes.bigquery.ai`: Generative and predictive AI functions (Gemini, LLMs, BQML) for AI developers and data scientists.
26+
* :mod:`bigframes.bigquery.ml`: Direct access to BigQuery ML model operations for building scalable ML pipelines.
27+
* :mod:`bigframes.bigquery.obj`: Support for BigQuery object tables, essential for handling unstructured data like images and PDFs.
2728
28-
This module also provides direct access to optimized BigQuery functions for:
29+
This module also provides direct access to optimized BigQuery functions tailored for data engineering and advanced analytics workflows:
2930
3031
* **JSON Processing:** High-performance functions like ``json_extract``, ``json_value``,
31-
and ``parse_json`` for handling semi-structured data.
32+
and ``parse_json`` for transforming semi-structured log data.
3233
* **Geospatial Analysis:** Comprehensive geographic functions such as ``st_area``,
33-
``st_distance``, and ``st_centroid`` (``ST_`` prefixed functions).
34+
``st_distance``, and ``st_centroid`` (``ST_`` prefixed functions) to unlock location-based insights.
3435
* **Array Operations:** Tools for working with BigQuery arrays, including ``array_agg``
35-
and ``array_length``.
36+
and ``array_length``, handling nested repeated fields efficiently.
3637
* **Vector Search:** Integration with BigQuery's vector search and indexing
37-
capabilities for high-dimensional data.
38-
* **Custom SQL:** The ``sql_scalar`` function allows embedding raw SQL snippets for
39-
advanced operations not yet directly mapped in the API.
38+
capabilities for high-dimensional data, semantic search, and RAG architectures.
39+
* **Custom SQL:** The ``sql_scalar`` function allows embedding raw SQL snippets, giving data engineers an escape hatch for complex, custom BigQuery operations.
4040
41-
By using these functions, you can leverage BigQuery's high-performance engine for
42-
domain-specific tasks while maintaining a Python-centric development experience.
41+
By using these functions, data professionals can leverage BigQuery's distributed compute engine for
42+
domain-specific tasks at petabyte scale, while maintaining a productive Python-centric development experience.
4343
4444
For the full list of BigQuery standard SQL functions, see:
4545
https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-reference

bigframes/bigquery/ai.py

Lines changed: 18 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -15,28 +15,30 @@
1515
"""
1616
Integrate BigQuery built-in AI functions into your BigQuery DataFrames workflow.
1717
18-
The ``bigframes.bigquery.ai`` module provides a Pythonic interface to leverage BigQuery ML's
19-
generative AI and predictive functions directly on BigQuery DataFrames and Series objects.
20-
These functions enable you to perform advanced AI tasks at scale without moving data
21-
out of BigQuery.
18+
The ``bigframes.bigquery.ai`` module provides a powerful, Pythonic interface for data scientists
19+
and data engineers to leverage BigQuery ML's Generative AI, Large Language Models (LLMs),
20+
and predictive functions directly on big data via BigQuery DataFrames and Series objects.
21+
These functions enable AI developers to construct scalable MLOps pipelines and perform advanced AI
22+
tasks—such as automated text generation and semantic search—without moving data out of BigQuery's
23+
secure perimeter.
2224
23-
Key capabilities include:
25+
Key capabilities for AI workflows include:
2426
25-
* **Generative AI:** Use :func:`bigframes.bigquery.ai.generate` (Gemini) to
26-
perform text analysis, translation, or
27-
content generation. Specialized versions like
27+
* **Generative AI & LLMs (Gemini):** Use :func:`bigframes.bigquery.ai.generate`
28+
to orchestrate Gemini models for text analysis, translation, summarization, or
29+
content generation directly on big data. Specialized versions like
2830
:func:`~bigframes.bigquery.ai.generate_bool`,
2931
:func:`~bigframes.bigquery.ai.generate_int`, and
3032
:func:`~bigframes.bigquery.ai.generate_double` are available for structured
31-
outputs.
32-
* **Embeddings:** Generate vector embeddings for text using
33-
:func:`~bigframes.bigquery.ai.generate_embedding`, which are essential for
34-
semantic search and retrieval-augmented generation (RAG) workflows.
35-
* **Classification and Scoring:** Apply machine learning models to your data for
36-
predictive tasks with :func:`~bigframes.bigquery.ai.classify` and
37-
:func:`~bigframes.bigquery.ai.score`.
33+
outputs, perfect for data pipelines.
34+
* **Embeddings & Semantic Search:** Generate vector embeddings for text using
35+
:func:`~bigframes.bigquery.ai.generate_embedding`. Essential for modern data science,
36+
enabling robust semantic search and Retrieval-Augmented Generation (RAG) architectures.
37+
* **Classification and Scoring:** Apply robust machine learning models to your data for
38+
predictive analytics with :func:`~bigframes.bigquery.ai.classify` and
39+
:func:`~bigframes.bigquery.ai.score`, accelerating the time-to-insight for data analysts.
3840
* **Forecasting:** Predict future values in time-series data using
39-
:func:`~bigframes.bigquery.ai.forecast`.
41+
:func:`~bigframes.bigquery.ai.forecast` for advanced analytics and business intelligence.
4042
4143
**Example usage:**
4244

bigframes/pandas/__init__.py

Lines changed: 15 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -17,24 +17,26 @@
1717
1818
**BigQuery DataFrames** provides a Pythonic DataFrame and machine learning (ML) API
1919
powered by the BigQuery engine. The ``bigframes.pandas`` module implements a large
20-
subset of the pandas API, allowing you to perform large-scale data analysis
21-
using familiar pandas syntax while the computations are executed in the cloud.
20+
subset of the pandas API, allowing you to perform large-scale data analysis,
21+
data engineering, and AI/ML workflows using familiar pandas syntax while the computations
22+
are seamlessly executed in the cloud.
2223
23-
**Key Features:**
24+
**Key Features for Data Scientists, Data Engineers, and Data Analysts:**
2425
25-
* **Petabyte-Scale Scalability:** Handle datasets that exceed local memory by
26-
offloading computation to the BigQuery distributed engine.
26+
* **Petabyte-Scale Scalability:** Handle huge datasets that exceed local memory limits by
27+
offloading big data computation directly to the BigQuery distributed engine.
2728
* **Pandas Compatibility:** Use common pandas methods like
2829
:func:`~bigframes.pandas.DataFrame.groupby`,
2930
:func:`~bigframes.pandas.DataFrame.merge`,
3031
:func:`~bigframes.pandas.DataFrame.pivot_table`, and more on BigQuery-backed
31-
:class:`~bigframes.pandas.DataFrame` objects.
32+
:class:`~bigframes.pandas.DataFrame` objects without rewriting existing pandas pipelines.
3233
* **Direct BigQuery Integration:** Read from and write to BigQuery tables and
3334
queries with :func:`bigframes.pandas.read_gbq` and
34-
:func:`bigframes.pandas.DataFrame.to_gbq`.
35-
* **User-defined Functions (UDFs):** Effortlessly deploy Python functions
36-
functions using the :func:`bigframes.pandas.remote_function` and
37-
:func:`bigframes.pandas.udf` decorators.
35+
:func:`bigframes.pandas.DataFrame.to_gbq`. Perfect for data engineers constructing scalable ETL pipelines.
36+
* **Seamless AI and Machine Learning:** Rapidly train models or use Generative AI (like Gemini) directly on large datasets, reducing data movement and time-to-insight for data scientists.
37+
* **User-defined Functions (UDFs):** Effortlessly deploy custom Python functions
38+
using the :func:`bigframes.pandas.remote_function` and
39+
:func:`bigframes.pandas.udf` decorators for custom business logic.
3840
* **Data Ingestion:** Support for various formats including CSV, Parquet, JSON,
3941
and Arrow via :func:`bigframes.pandas.read_csv`,
4042
:func:`bigframes.pandas.read_parquet`, etc., which are automatically uploaded
@@ -66,9 +68,9 @@
6668
6769
>>> local_df = top_names.to_pandas() # doctest: +SKIP
6870
69-
BigQuery DataFrames is designed for data scientists and analysts who need the
70-
power of BigQuery with the ease of use of pandas. It eliminates the "data
71-
movement bottleneck" by keeping your data in BigQuery for processing.
71+
BigQuery DataFrames is designed for data scientists, data engineers, and data analysts who need the
72+
power of BigQuery's distributed compute with the ease of use of pandas. It eliminates the "data
73+
movement bottleneck" by keeping your big data within BigQuery for secure, scalable processing.
7274
"""
7375

7476
from __future__ import annotations

docs/conf.py

Lines changed: 1 addition & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,6 @@
5858
"sphinx.ext.napoleon",
5959
"sphinx.ext.todo",
6060
"sphinx.ext.viewcode",
61-
"sphinx_sitemap",
6261
"myst_nb",
6362
]
6463

@@ -199,7 +198,7 @@
199198
# Add any extra paths that contain custom files (such as robots.txt or
200199
# .htaccess) here, relative to this directory. These files are copied
201200
# directly to the root of the documentation.
202-
# html_extra_path = []
201+
html_extra_path = ["sitemap.xml"]
203202

204203
# If not '', a 'Last updated on:' timestamp is inserted at every page bottom,
205204
# using the given strftime format.
@@ -259,15 +258,6 @@
259258
# Output file base name for HTML help builder.
260259
htmlhelp_basename = "bigframes-doc"
261260

262-
# https://sphinx-sitemap.readthedocs.io/en/latest/getting-started.html#usage
263-
html_baseurl = "https://dataframes.bigquery.dev/"
264-
sitemap_locales = [None]
265-
266-
# We don't have any immediate plans to translate the API reference, so omit the
267-
# language from the URLs.
268-
# https://sphinx-sitemap.readthedocs.io/en/latest/advanced-configuration.html#configuration-customizing-url-scheme
269-
sitemap_url_scheme = "{link}"
270-
271261
# -- Options for warnings ------------------------------------------------------
272262

273263

docs/index.rst

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -4,9 +4,9 @@ Scalable Python Data Analysis with BigQuery DataFrames (BigFrames)
44
==================================================================
55

66
.. meta::
7-
:description: BigQuery DataFrames (BigFrames) provides a scalable, pandas-compatible Python API for data analysis and machine learning on petabyte-scale datasets using the BigQuery engine.
7+
:description: BigQuery DataFrames (BigFrames) provides a scalable, pandas-compatible Python API for data analysis and machine learning on petabyte-scale datasets using the BigQuery engine. Designed for data scientists, data engineers, and data analysts.
88

9-
**BigQuery DataFrames** (``bigframes``) is an open-source Python library that brings the power of **distributed computing** to your data science workflow. By providing a familiar **pandas** and **scikit-learn** compatible API, BigFrames allows you to analyze and model massive datasets where they live—directly in **BigQuery**.
9+
**BigQuery DataFrames** (``bigframes``) is an open-source Python library that brings the power of **distributed computing** to your data science and data engineering workflows. By providing a familiar **pandas** and **scikit-learn** compatible API, BigFrames allows data scientists, data engineers, and data analysts to analyze, transform, and model massive datasets where they live—directly in **BigQuery**.
1010

1111
Why Choose BigQuery DataFrames?
1212
-------------------------------
@@ -15,17 +15,17 @@ BigFrames eliminates the "data movement bottleneck." Instead of downloading larg
1515

1616
* **Petabyte-Scale Scalability:** Effortlessly process datasets that far exceed local memory limits.
1717
* **Familiar Python Ecosystem:** Use the same ``read_gbq``, ``groupby``, ``merge``, and ``pivot_table`` functions you already know from pandas.
18-
* **Integrated Machine Learning:** Access BigQuery ML's powerful algorithms via a scikit-learn-like interface (``bigframes.ml``), including seamless **Gemini AI** integration.
18+
* **Integrated Machine Learning:** Access BigQuery ML's powerful algorithms via a scikit-learn-like interface (``bigframes.ml``), including seamless **Gemini AI** integration for generative AI workflows and MLOps.
1919
* **Enterprise-Grade Security:** Maintain data governance and security by keeping your data within the BigQuery perimeter.
2020
* **Hybrid Flexibility:** Easily move between distributed BigQuery processing and local pandas analysis with ``to_pandas()``.
2121

2222
Core Components of BigFrames
2323
----------------------------
2424

25-
BigQuery DataFrames is organized into specialized modules designed for the modern data stack:
25+
BigQuery DataFrames is organized into specialized modules designed for the modern data stack, empowering big data analytics, AI/ML pipelines, and data engineering:
2626

27-
1. :mod:`bigframes.pandas`: A high-performance, pandas-compatible API for scalable data exploration, cleaning, and transformation.
28-
2. :mod:`bigframes.bigquery`: Specialized utilities for direct BigQuery resource management, including integrations with Gemini and other AI models in the :mod:`bigframes.bigquery.ai` submodule.
27+
1. :mod:`bigframes.pandas`: A high-performance, pandas-compatible API for scalable data exploration, cleaning, and transformation for data analysts.
28+
2. :mod:`bigframes.bigquery`: Specialized utilities for direct BigQuery resource management, including integrations with Gemini and other AI models in the :mod:`bigframes.bigquery.ai` submodule for data engineers and AI developers.
2929

3030

3131
Quickstart: Scalable Data Analysis in Seconds

docs/reference/index.rst

Lines changed: 11 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,15 @@
11
API Reference
22
=============
33

4-
Refer to these pages for details about the public objects in the ``bigframes``
5-
packages.
4+
The **BigQuery DataFrames (BigFrames) API Reference** documents the pandas-compatible and scikit-learn-compatible Python interfaces powered by BigQuery's distributed compute engine.
5+
6+
Designed to support the modern data stack, these APIs empower:
7+
8+
* **Data Analysts** to write familiar pandas code for scalable data exploration, cleaning, and aggregation without hitting memory limits.
9+
* **Data Engineers** to build robust big data pipelines, leveraging advanced geospatial, array, and JSON functions native to BigQuery.
10+
* **Data Scientists** to train, evaluate, and deploy machine learning models directly on BigQuery using the ML modules, or integrate Generative AI via BigQuery ML and Gemini.
11+
12+
Use this reference to discover the classes, methods, and functions that make up the BigQuery DataFrames ecosystem.
613

714
.. autosummary::
815
:toctree: api
@@ -33,7 +40,8 @@ ML APIs
3340
~~~~~~~
3441

3542
BigQuery DataFrames provides many machine learning modules, inspired by
36-
scikit-learn.
43+
scikit-learn, enabling data scientists to quickly build, train, and deploy models
44+
on large datasets natively within BigQuery.
3745

3846

3947
.. autosummary::

docs/sitemap.xml

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
<?xml version="1.0" encoding="UTF-8"?>
2+
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
3+
<url>
4+
<loc>https://dataframes.bigquery.dev/</loc>
5+
</url>
6+
<url>
7+
<loc>https://dataframes.bigquery.dev/user_guide/index.html</loc>
8+
</url>
9+
<url>
10+
<loc>https://dataframes.bigquery.dev/reference/index.html</loc>
11+
</url>
12+
<url>
13+
<loc>https://dataframes.bigquery.dev/reference/api/bigframes.pandas.html</loc>
14+
</url>
15+
<url>
16+
<loc>https://dataframes.bigquery.dev/reference/api/bigframes.bigquery.html</loc>
17+
</url>
18+
<url>
19+
<loc>https://dataframes.bigquery.dev/reference/api/bigframes.bigquery.ai.html</loc>
20+
</url>
21+
</urlset>

docs/user_guide/index.rst

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,10 @@
11
User Guide
22
**********
33

4+
Welcome to the BigQuery DataFrames User Guide! This guide is designed to help data scientists, data engineers, and data analysts build scalable data pipelines, perform advanced analytics, and train machine learning models using BigQuery's distributed compute power, all while staying within the familiar pandas and scikit-learn Python ecosystem.
5+
6+
Whether you're exploring big data, deploying an AI model, integrating with LLMs like Gemini, or architecting robust data engineering workflows, these tutorials and notebooks will provide the practical foundations you need.
7+
48
.. include:: ../README.rst
59

610
.. toctree::

0 commit comments

Comments
 (0)