Skip to content

skip invalid file types for gdrive#715

Merged
jordan-homan merged 5 commits into
mainfrom
gdrive_stability_1
May 26, 2026
Merged

skip invalid file types for gdrive#715
jordan-homan merged 5 commits into
mainfrom
gdrive_stability_1

Conversation

@jordan-homan
Copy link
Copy Markdown
Contributor

@jordan-homan jordan-homan commented May 22, 2026

Skips invalid shortcut files in google drive downloader. Example:

Unexpected job failure after 3 retries : most recent error 400: Error in downloader - Unsupported Google Drive native MIME type for file 1Tk0By3CY-NXRlkbcWhgyVtJcnFRLItTX: application/vnd.google-apps.shortcut. Docs Editors files must be exported before download.
image

Summary by cubic

Skip non-downloadable Google Drive files and empty files during indexing and downloading to prevent errors and reduce noise. Exportable Docs/Sheets/Slides still export; unknown native types still error.

  • Bug Fixes
    • Added GOOGLE_DRIVE_SKIP_MIME_TYPES and _should_skip_file to drop shortcuts/forms/maps/sites/jam/fusiontable, plus inode/x-empty and zero-byte files with no MIME.
    • Indexer filters these in get_paginated_results and count_files_recursively.
    • Downloader logs and skips; _download_file returns None and run returns [] for skipped files.
    • Tests updated to cover skip behavior.

Written for commit 4e8cef5. Summary will update on new commits. Review in cubic

@jordan-homan jordan-homan marked this pull request as ready for review May 22, 2026 17:16
@jordan-homan jordan-homan requested a review from a team as a code owner May 22, 2026 17:16
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 2 files

Shadow auto-approve: would not auto-approve because issues were found.

Re-trigger cubic

Comment thread unstructured_ingest/processes/connectors/google_drive.py
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 issues found across 3 files (changes from recent commits).

Shadow auto-approve: would not auto-approve. Auto-approval blocked by 1 unresolved issue from previous reviews.

Re-trigger cubic

Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 issues found across 4 files (changes from recent commits).

Shadow auto-approve: would auto-approve. This PR adds a targeted skip for non-downloadable Google Drive native files and empty files, preventing recurring download errors; the changes are isolated to the Google Drive connector, well-tested, and do not impact core business logic or other connectors.

Re-trigger cubic

@jordan-homan jordan-homan merged commit 293c3fb into main May 26, 2026
37 of 39 checks passed
@jordan-homan jordan-homan deleted the gdrive_stability_1 branch May 26, 2026 13:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants