feat(datasets): support Google Sheets URLs in dataset loader#290
Conversation
There was a problem hiding this comment.
Pull request overview
Adds transparent Google Sheets URL normalization to fetch_csv_content, so public Sheets share/edit links are automatically rewritten to the /export?format=csv form before fetching. This resolves issue #86 by letting datasets configured with Sheets URLs be loaded without manual conversion.
Changes:
- New
_normalize_google_sheets_urlhelper that regex-matches Sheets URLs, preserves already-exported forms, and appendsgidwhen present. fetch_csv_contentnow normalizes the URL and usesfollow_redirects=Truefor robustness.- 5 unit tests covering passthrough and conversion cases.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| agentic_security/probe_data/data.py | Adds _normalize_google_sheets_url and integrates it into fetch_csv_content; module-level re import. |
| agentic_security/probe_data/test_data.py | Adds TestNormalizeGoogleSheetsUrl covering passthrough, edit→export with/without gid, and export/pub passthrough. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
The 6 test failures in CI are pre-existing |
|
@ykd007 thank you for the patch! |
Closes #86
What
Adds transparent Google Sheets URL normalization to
fetch_csv_content.When a public Google Sheets share/edit link is passed as a dataset URL, it is automatically rewritten to the
/export?format=csvform before fetching — no change required from callers.How
_normalize_google_sheets_url(url)— pure regex transform, handles/edit#gid=N, query-param gid, and passes through URLs that are already in export formatfetch_csv_contentcalls the normalizer beforehttpx.get, withfollow_redirects=Trueadded for robustnessimport removed to module levelTests
5 unit tests added to
test_data.pycovering: passthrough (non-Sheets URL), edit+gid conversion, edit-no-gid conversion, already-export passthrough, pub-output-csv passthrough.