Skip to content

Commit 4e8e8c3

Browse files
committed
docs: remove GA4 Data API data source
Reflects codelibs/recotem#109, which removed the `source.type: ga4` data source. Drops the GA4 source page, recipe `source.type: ga4` section, `recotem[ga4]` install extra, `RECOTEM_GA4_MAX_PAGES` env var, and the GA4 sidebar entry, in both English and Japanese v2 docs. BigQuery-based GA4 usage (`events_*` export via `type: bigquery`) is preserved. Also repoints a pre-existing dead link in environment-variables.md (`./data-sources/` -> `./recipe-reference#source`) so the build passes.
1 parent 6cbadb9 commit 4e8e8c3

11 files changed

Lines changed: 6 additions & 337 deletions

File tree

.vitepress/config.ts

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -119,7 +119,6 @@ function v2DocsSidebar(lang: 'en' | 'ja'): DefaultTheme.SidebarItem[] {
119119
{ text: 'CSV / Parquet', link: `${prefix}/data-sources/csv` },
120120
{ text: 'BigQuery', link: `${prefix}/data-sources/bigquery` },
121121
{ text: 'SQL', link: `${prefix}/data-sources/sql` },
122-
{ text: 'GA4', link: `${prefix}/data-sources/ga4` },
123122
{ text: lang === 'ja' ? 'プラグイン' : 'Plugins', link: `${prefix}/data-sources/plugins` },
124123
],
125124
},

docs/data-sources/ga4.md

Lines changed: 0 additions & 131 deletions
This file was deleted.

docs/environment-variables.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -89,14 +89,13 @@ These variables configure storage paths, locking, metadata field filtering, and
8989

9090
## Data source
9191

92-
These variables tune behaviour of specific data sources. They are read only by `recotem train` and only when the corresponding source is used. See the [Data sources](./data-sources/) reference for full context.
92+
These variables tune behaviour of specific data sources. They are read only by `recotem train` and only when the corresponding source is used. See the [Data sources](./recipe-reference#source) reference for full context.
9393

9494
| Variable | Default | Scope | Clamping | Description |
9595
|---|---|---|---|---|
9696
| `RECOTEM_BQ_REQUIRE_STORAGE_API` | (unset) | train || Truthy values: `1`, `true`, `yes`, `on`. When set, the BigQuery source raises `DataSourceError` (exit 3) instead of silently falling back to the slower REST API when the BigQuery Storage Read API fails (e.g. missing `bigquery.readSessions.create` IAM permission). Use this to surface IAM gaps rather than accepting degraded throughput. |
9797
| `RECOTEM_MAX_SQL_ROWS` | `50_000_000` | train | [1_000, 500_000_000] | Hard cap on the number of rows returned by the SQL data source. Exceeding the cap raises `DataSourceError` (exit 3). Caps **row count**, not DataFrame resident memory — see [SQL source — memory bound caveat](./data-sources/sql#memory-bound-caveat). |
9898
| `RECOTEM_SQL_ALLOW_PRIVATE` | (unset) | train || Truthy values: `1`, `true`, `yes`, `on`. Opts the SQL source into accepting private/loopback DSN hosts (default deny, for SSRF). Covers every driver-routing form — netloc, `?host=`, `?hostaddr=`, `?service=`, `?unix_socket=`, absolute-path host, and network DSNs with no host info — all default-deny without this flag. Also disables the DNS-rebinding re-check before each probe/fetch — opting in means trusting the host end-to-end. |
99-
| `RECOTEM_GA4_MAX_PAGES` | `500` | train | [1, 10_000] | Hard ceiling on GA4 Data API pagination loops. Reached when a property is too large for the default; raise after confirming quota. |
10099

101100
## Recipe expansion
102101

docs/index.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,7 @@ A recipe is the single source of truth for a model:
4747
```
4848

4949
The recipe captures:
50-
- **Where to get data** (`source` block — CSV, Parquet, BigQuery, SQL, GA4, or plugin)
50+
- **Where to get data** (`source` block — CSV, Parquet, BigQuery, SQL, or plugin)
5151
- **How to map columns** (`schema` block — user ID, item ID, optional timestamp)
5252
- **Data quality gates** (`cleansing` block — null-drop, dedup, minimum thresholds)
5353
- **What to train** (`training` block — algorithms, Optuna budget, split scheme)
@@ -152,7 +152,6 @@ This separation means:
152152
- [CSV / Parquet Source](./data-sources/csv) — local, object-storage, and HTTP source options
153153
- [BigQuery Source](./data-sources/bigquery) — authentication, parameter binding, GA4 patterns
154154
- [SQL Source](./data-sources/sql) — PostgreSQL / MySQL / MariaDB / SQLite via SQLAlchemy 2
155-
- [GA4 Source](./data-sources/ga4) — Google Analytics 4 Data API, skipping the BigQuery Export hop
156155
- [Plugin Data Sources](./data-sources/plugins) — extend `source.type` with custom plugins
157156
- Deployment guides — Docker, Kubernetes, cron scheduling
158157
- Operations — key rotation, recovery, sizing, troubleshooting

docs/recipe-reference.md

Lines changed: 1 addition & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ A recipe is a YAML file that defines what data to fetch, how to train, and where
1111
| Field | Type | Required | Description |
1212
|-------|------|----------|-------------|
1313
| `name` | string | yes | Endpoint name. Pattern: `^[A-Za-z0-9_-]{1,64}$`. Used in endpoint paths such as `/v1/recipes/{name}:recommend`. |
14-
| `source` | object | yes | Data source config. `type` field is the discriminator (`csv`, `parquet`, `bigquery`, `sql`, `ga4`, or any plugin). Validated in two stages: the rest of the recipe is parsed first, then the source dict is dispatched to the plugin's `Config` class. As a result, errors in `source.*` surface *after* errors elsewhere in the recipe; an unknown `source.type` raises a `DataSourceError` listing all registered type names. |
14+
| `source` | object | yes | Data source config. `type` field is the discriminator (`csv`, `parquet`, `bigquery`, `sql`, or any plugin). Validated in two stages: the rest of the recipe is parsed first, then the source dict is dispatched to the plugin's `Config` class. As a result, errors in `source.*` surface *after* errors elsewhere in the recipe; an unknown `source.type` raises a `DataSourceError` listing all registered type names. |
1515
| `schema` | object | yes | Column mapping. |
1616
| `cleansing` | object | no | Data quality gates. |
1717
| `item_metadata` | object | no | Metadata joined into predict responses. |
@@ -100,37 +100,6 @@ source:
100100

101101
Install one extra: `pip install "recotem[postgres]"`, `recotem[mysql]`, or `recotem[sqlite]`. Full reference: [SQL source](./data-sources/sql).
102102

103-
### `source.type: ga4`
104-
105-
```yaml
106-
source:
107-
type: ga4
108-
property_id: "123456789"
109-
user_dimension: userPseudoId
110-
item_dimension: itemId
111-
time_dimension: date
112-
event_names: [purchase, view_item, add_to_cart]
113-
lookback_days: 90 # XOR with start_date + end_date
114-
max_rows: 1_000_000
115-
weight_column: event_count
116-
api_timeout_seconds: 60
117-
```
118-
119-
| Field | Type | Default | Notes |
120-
|-------|------|---------|-------|
121-
| `property_id` | string | required | Numeric only (`^\d+$`). Not the `G-XXXX` measurement ID. |
122-
| `user_dimension` | string | required | `userId` or `userPseudoId`. |
123-
| `item_dimension` | string | `itemId` | Any GA4 item-scoped dimension. |
124-
| `time_dimension` | string | `date` | `date` / `dateHour` / `dateHourMinute`. |
125-
| `event_names` | list[string] | required | 1–50 names; each matches `^[A-Za-z_][A-Za-z0-9_]{0,39}$`. |
126-
| `lookback_days` | int | XOR | 1–3650. Rolling window ending yesterday. |
127-
| `start_date` / `end_date` | string (ISO) | XOR | Both required if either is set. |
128-
| `max_rows` | int | required | Valid range `[1, 50_000_000]`. |
129-
| `weight_column` | string | `event_count` | Must not collide with the dimension keys or the literal `eventName`. |
130-
| `api_timeout_seconds` | int | `60` | Valid range `[5, 600]`. |
131-
132-
Install the extra: `pip install "recotem[ga4]"`. Full reference: [GA4 source](./data-sources/ga4).
133-
134103
---
135104

136105
## `schema`

guide/installation.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,6 @@ The core package ships with CSV and Parquet data sources. Install extras for add
3131
| PostgreSQL data source | `pip install "recotem[postgres]"` | Read interaction data from PostgreSQL via psycopg |
3232
| MySQL / MariaDB data source | `pip install "recotem[mysql]"` | Read interaction data from MySQL or MariaDB via PyMySQL |
3333
| SQLite data source | `pip install "recotem[sqlite]"` | Read interaction data from SQLite (uses stdlib `sqlite3`) |
34-
| Google Analytics 4 data source | `pip install "recotem[ga4]"` | Read interaction events from GA4 via the Data API |
3534
| Amazon S3 | `pip install "recotem[s3]"` | Read/write artifacts and data from S3 |
3635
| Google Cloud Storage | `pip install "recotem[gcs]"` | Read/write artifacts and data from GCS |
3736
| Azure Blob Storage | `pip install "recotem[azure]"` | Read/write artifacts and data from Azure |

0 commit comments

Comments
 (0)