Skip to content

[Feature] PaddleOCR: support Dify uploaded file variables while preserving URL/base64 compatibility #3181

@jimmyzhuu

Description

@jimmyzhuu

Self Checks

  • I have read the Contributing Guide and Language Policy.
  • I have searched for existing issues search for existing issues, including closed ones.
  • I confirm that I am using English to submit this report, otherwise it will be closed.
  • Please do not modify this template :) and fill in all the required fields.

1. Is this request related to a challenge you're experiencing? Tell me about your story.

I would like to improve the official PaddleOCR plugin so it can consume Dify uploaded file variables directly.

Currently, the PaddleOCR tools expose the file parameter as type: string, and the descriptions say users should provide either:

  • a publicly accessible image/PDF URL, or
  • a base64-encoded image/PDF payload.

This works, but it is not very natural in Dify workflows. In many Chatflow / Workflow scenarios, the user already has a Dify uploaded file variable from an upload input, chat attachment, or upstream file-producing node. For those users, requiring a public URL or manual base64 conversion creates an extra integration step.

I am interested in contributing this feature, but before opening a PR I would like to confirm the preferred plugin interface design with maintainers, because there is a trade-off between a cleaner UX and backward compatibility.

2. Additional context or comments

I see two possible implementation paths.

Option A: keep one file parameter, change it from string to file

This would make the PaddleOCR UI cleaner:

  • file becomes type: file.
  • The Python implementation still accepts both File objects and legacy strings.
  • If file is a Dify File, the plugin reads file.blob, base64-encodes it, and sends it to the PaddleOCR API.
  • If file is a string, the plugin keeps the current behavior and passes it through as URL/base64.
  • When fileType is auto, the plugin can infer PDF/image from mime_type, extension, or filename.

This gives users a single obvious input field.

The concern is that changing the plugin schema from string to file changes the external parameter contract. Existing workflows that bind file to a text variable or constant URL/base64 may still work at the Python layer, but the Dify UI / workflow configuration layer may treat the parameter differently after the schema change. It may also affect Agent use cases, since file-type tool parameters are handled differently from string parameters.

Option B: keep the existing file string parameter and add a new file_upload parameter

This is more backward-compatible:

  • Keep file as type: string for existing URL/base64 users.
  • Add file_upload as type: file for Dify uploaded files.
  • Runtime behavior: prefer file_upload when provided, otherwise fall back to file.
  • Document file_upload as the recommended input for new Dify workflows, and file as the legacy URL/base64 input.

This avoids breaking existing workflows and Agent-style URL/base64 usage, but the UI will show two file-related inputs, which may be more confusing for new users.

Proposed shared behavior for either option

For the implementation itself, I would keep the change local to the PaddleOCR plugin:

  • Normalize file input in a shared helper.
  • Convert Dify File.blob to base64 before calling the PaddleOCR API.
  • Preserve current URL/base64 string behavior.
  • Infer fileType only when the user leaves it as auto.
  • Add unit tests for uploaded file input, URL/base64 string input, file type inference, explicit fileType override, and all three PaddleOCR tools.

My question for maintainers:

Which interface would you prefer for an official plugin?

  1. Option A: one clean file parameter, with the schema changed to type: file, while preserving string compatibility in Python where possible.
  2. Option B: keep file as string and add file_upload as a separate uploaded-file parameter for maximum compatibility.

I am happy to prepare the PR following the preferred direction.

3. Can you help us with this feature?

  • I am interested in contributing to this feature.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions