Skip to content

Box - Get File Text#20507

Merged
GTFalcao merged 3 commits intomasterfrom
issue-20469
Apr 7, 2026
Merged

Box - Get File Text#20507
GTFalcao merged 3 commits intomasterfrom
issue-20469

Conversation

@michelle0927
Copy link
Copy Markdown
Collaborator

@michelle0927 michelle0927 commented Apr 3, 2026

Resolves #20469

Summary by CodeRabbit

  • New Features

    • Added "Get File Text" action to extract text content from Box files.
  • Improvements

    • Enhanced Box app request/ file retrieval behavior for more reliable file access.
  • Updates

    • Multiple Box actions and sources updated with bumped version metadata for consistency.
    • Package version updated to 0.6.0.

@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 3, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
pipedream-docs-redirect-do-not-edit Ignored Ignored Apr 3, 2026 6:25pm

Request Review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 3, 2026

Walkthrough

Adds a new "Get File Text" action to extract text via Box Representations API, updates multiple Box action/source version fields and package version, and enhances the Box app with a getFile helper and _makeRequest support for direct URLs.

Changes

Cohort / File(s) Summary
Version Bumps — Actions
components/box/actions/create-sign-request/create-sign-request.mjs, components/box/actions/download-file/download-file.mjs, components/box/actions/get-comments/get-comments.mjs, components/box/actions/search-content/search-content.mjs, components/box/actions/upload-file/upload-file.mjs, components/box/actions/upload-file-version/upload-file-version.mjs
Incremented exported action version fields; no logic changes.
Version Bumps — Sources
components/box/sources/new-event/new-event.mjs, components/box/sources/new-file/new-file.mjs, components/box/sources/new-folder/new-folder.mjs
Incremented source component version fields; no logic changes.
New Action — Get File Text
components/box/actions/get-file-text/get-file-text.mjs
Added new action box-get-file-text (v0.0.1) that requests representations with x-rep-hints: [extracted_text], locates a text representation URL, requests the text via _makeRequest, and returns the content or throws clear ConfigurationErrors.
App Updates
components/box/box.app.mjs
Modified _makeRequest to accept optional url param (uses url ?? this._getApiUrl(path)); added async getFile({ fileId, ...args } = {}) to fetch /files/{fileId} with representations.
Package Version
components/box/package.json
Bumped @pipedream/box package version from 0.5.50.6.0.

Sequence Diagram

sequenceDiagram
    participant User as User/Workflow
    participant Action as Get File Text Action
    participant BoxApp as Box App
    participant BoxAPI as Box API

    User->>Action: Invoke with fileId
    Action->>BoxApp: getFile(fileId)
    BoxApp->>BoxAPI: GET /files/{fileId}?fields=representations
    BoxAPI-->>BoxApp: File metadata (representations)
    BoxApp-->>Action: representations array

    Action->>Action: Find representation with entry.content.url_template
    alt text representation found
        Action->>BoxApp: _makeRequest(url)
        BoxApp->>BoxAPI: GET extracted text URL
        BoxAPI-->>BoxApp: Text content
        BoxApp-->>Action: Text content
        Action-->>User: Return text content
    else no text representation
        Action-->>User: Throw ConfigurationError (no text representation)
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • PR #20213: Overlapping Box component version metadata updates across the same action and source files.

Suggested reviewers

  • GTFalcao
🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description only provides 'Resolves #20469' without completing the WHY section from the required template, making it largely incomplete. Complete the description template by adding a WHY section explaining the purpose, benefits, and context of adding the Get File Text action to the Box integration.
Out of Scope Changes check ⚠️ Warning Multiple version bumps for existing actions and the package are out of scope relative to the issue requirement of adding a Get File Text action. Separate version bump changes into a distinct maintenance PR. Keep this PR focused solely on implementing the Get File Text action and its required Box.app.mjs helper methods.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Box - Get File Text' clearly and specifically describes the new action being added, directly matching the main change in the changeset.
Linked Issues check ✅ Passed The PR successfully implements the Get File Text action with proper fileId input handling, Box Representations API integration with extracted_text representation, and error handling for unsupported files.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch issue-20469

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@michelle0927 michelle0927 marked this pull request as ready for review April 3, 2026 18:11
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@components/box/actions/get-file-text/get-file-text.mjs`:
- Around line 60-64: Replace the fragile URL slicing logic that builds the
request URL from urlTemplate; instead locate the "{+asset_path}" placeholder in
urlTemplate and substitute it with the asset_path variable (preserving path
slashes as Box expects for {+asset_path}), then pass that resulting URL to
this.app._makeRequest; update the code around urlTemplate, asset_path and the
this.app._makeRequest call in the get-file-text flow to perform a direct
placeholder replacement rather than using urlTemplate.slice(...).
- Around line 40-57: The current getFile call catches all errors and always
throws ConfigurationError("File not found") and also assumes destructuring
always yields entries and a usable url_template; update the error handling
around this.app.getFile to rethrow or wrap the original error when it's not a
404/NotFound, and defensively check that the response has representations and
entries before using entries.find; inspect each representation entry for a
successful status (e.g., check entry.representation?.status or equivalent) and
only consider entries whose status is "success"; when deriving urlTemplate from
entries (the variable currently computed by
entries.find(...).content.url_template), validate existence and then construct
the download URL by replacing the {+asset_path} placeholder per Box docs instead
of slicing the string; use ConfigurationError only for
configuration/representation-missing conditions and preserve or include original
error info for API failures from this.app.getFile.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 1a6535be-4caf-40a7-89c9-bddb600e92fd

📥 Commits

Reviewing files that changed from the base of the PR and between 222112e and b5d65c9.

📒 Files selected for processing (12)
  • components/box/actions/create-sign-request/create-sign-request.mjs
  • components/box/actions/download-file/download-file.mjs
  • components/box/actions/get-comments/get-comments.mjs
  • components/box/actions/get-file-text/get-file-text.mjs
  • components/box/actions/search-content/search-content.mjs
  • components/box/actions/upload-file-version/upload-file-version.mjs
  • components/box/actions/upload-file/upload-file.mjs
  • components/box/box.app.mjs
  • components/box/package.json
  • components/box/sources/new-event/new-event.mjs
  • components/box/sources/new-file/new-file.mjs
  • components/box/sources/new-folder/new-folder.mjs

Comment thread components/box/actions/get-file-text/get-file-text.mjs
Comment thread components/box/actions/get-file-text/get-file-text.mjs Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
components/box/actions/get-file-text/get-file-text.mjs (1)

39-57: ⚠️ Potential issue | 🟠 Major

Narrow error mapping and validate extracted_text readiness before download.

Line 51 maps every getFile failure to “File not found”, which hides auth/rate-limit/transient API errors. Also, Line 55 should not select by content.url_template alone—filter for representation === "extracted_text" and gate on status.state (e.g., success / viewable) to return a clear “not ready yet” message.

🔧 Proposed fix
-    let entries;
+    let entries = [];
     try {
-      ({ representations: { entries } } = await this.app.getFile({
+      const file = await this.app.getFile({
         $,
         fileId: this.fileId,
         params: {
           fields: "representations",
         },
         headers: {
           "x-rep-hints": "[extracted_text]",
         },
-      }));
+      });
+      entries = file?.representations?.entries ?? [];
     } catch (error) {
-      throw new ConfigurationError(`File not found: ${this.fileId}`);
+      if (error?.response?.status === 404) {
+        throw new ConfigurationError(`File not found: ${this.fileId}`);
+      }
+      throw error;
     }
 
-    const urlTemplate = entries.find((entry) => entry?.content?.url_template)?.content.url_template;
-    if (!urlTemplate) {
-      throw new ConfigurationError("File does not have a text representation");
+    const extractedText = entries.find((entry) => entry?.representation === "extracted_text");
+    const state = extractedText?.status?.state;
+    const urlTemplate = extractedText?.content?.url_template;
+    if (!urlTemplate || !["success", "viewable"].includes(state)) {
+      throw new ConfigurationError(
+        state && !["success", "viewable"].includes(state)
+          ? `Text representation is ${state}. Try again later.`
+          : "File does not have a text representation",
+      );
     }
In Box Representations API, which `status.state` values indicate an extracted_text representation is downloadable, and should `content.url_template` be treated as opaque with only `{+asset_path}` substitution?
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@components/box/actions/get-file-text/get-file-text.mjs` around lines 39 - 57,
The catch block currently maps all errors from this.app.getFile to a generic
ConfigurationError("File not found: this.fileId"); change it to rethrow non-404
errors (inspect error.status or error.response?.status) and only convert
404/NotFound into ConfigurationError with this.fileId; after extracting entries
from the response, replace the urlTemplate lookup with logic that finds
entries.find(e => e.representation === "extracted_text") and then check its
status.state is one of the downloadable states (e.g., "success" or "viewable")
before reading e.content.url_template; if state is not downloadable, throw a
ConfigurationError("Extracted text not ready for file: this.fileId") and treat
content.url_template as opaque except for replacing the documented {+asset_path}
variable when constructing the final download URL.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@components/box/actions/get-file-text/get-file-text.mjs`:
- Around line 39-57: The catch block currently maps all errors from
this.app.getFile to a generic ConfigurationError("File not found: this.fileId");
change it to rethrow non-404 errors (inspect error.status or
error.response?.status) and only convert 404/NotFound into ConfigurationError
with this.fileId; after extracting entries from the response, replace the
urlTemplate lookup with logic that finds entries.find(e => e.representation ===
"extracted_text") and then check its status.state is one of the downloadable
states (e.g., "success" or "viewable") before reading e.content.url_template; if
state is not downloadable, throw a ConfigurationError("Extracted text not ready
for file: this.fileId") and treat content.url_template as opaque except for
replacing the documented {+asset_path} variable when constructing the final
download URL.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: b6fc0042-9078-42ca-b16f-f0c9cd0b4568

📥 Commits

Reviewing files that changed from the base of the PR and between b5d65c9 and 8860f51.

📒 Files selected for processing (1)
  • components/box/actions/get-file-text/get-file-text.mjs

Copy link
Copy Markdown
Collaborator

@GTFalcao GTFalcao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@GTFalcao GTFalcao merged commit 5eee18f into master Apr 7, 2026
9 checks passed
@GTFalcao GTFalcao deleted the issue-20469 branch April 7, 2026 19:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Box — add text extraction from files

2 participants