Skip to content

Notion datasource times out enumerating pages on large workspaces #3170

@Kota-Maeda

Description

@Kota-Maeda

Self Checks

  • This is only for bug report, if you would like to ask a question, please head to Discussions.
  • I have searched for existing issues Dify issues & Dify Official Plugins, including closed ones.
  • I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
  • [FOR CHINESE USERS] 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
  • Please do not modify this template :) and fill in all the required fields.

Dify version

1.14.1 (self-hosted, Docker)

Plugin version

langgenius/notion_datasource@0.1.18

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

Steps to reproduce

  1. Have a Notion workspace where many pages / databases are shared with the integration (in our environment: several thousand items).
  2. In Dify, go to Settings → Data Source → Notion, add an Integration Token (API key).
  3. Go to Create Knowledge Base → Data Source → Notion → Sync.
  4. The page list never appears. The plugin-daemon kills the request after exactly 600 seconds (the SSE deadline), the API returns httpx.ReadTimeout, and the UI eventually falls back to the “Notion is not connected” screen.

The same workspace works fine on smaller integrations (a few hundred pages), so this is a scale problem, not a credentials problem.

Why it scales badly

Looking at datasources/notion_datasource/datasources/utils/notion_client.py, get_authorized_pages() does the following serially:

  1. Loop /v1/search with filter=page until has_more is false.
  2. Loop /v1/search with filter=database until has_more is false.
  3. For every result, call /v1/blocks/{id} once to resolve the parent page id.
  4. If the parent is itself a block_id, recurse into another /v1/blocks/{id} call (no memoization).

There is no concurrency, no caching, and the three call sites do not go through the retry path in _make_request, so any transient 429 / 5xx fails the entire enumeration.

For an N-item workspace with average parent depth K, the parent-resolution phase alone issues ~N × K serial HTTP calls. At a few thousand items this exceeds the plugin-daemon's 600-second SSE deadline and the request is killed.

✔️ Error log

# plugin_daemon
ERROR dify-plugin-daemon factory.go:28 PluginDaemonInternalServerError
  error="killed by timeout"
  service.baseSSEService(... 0x258 ...)              # 0x258 == 600 seconds
  service.DatasourceGetOnlineDocumentPages(...)
HTTP request method=POST
  path=/plugin/<tenant>/dispatch/datasource/get_online_document_pages
  status=200 latency_ms=600003   # ← killed exactly at the SSE deadline

# api
ERROR app.py:875 Exception on /console/api/notion/pre-import/pages [GET]
  httpcore.ReadTimeout: timed out
  httpx.ReadTimeout: timed out
  core.plugin.entities.plugin_daemon.PluginDaemonInnerError:
    Request to Plugin Daemon Service failed

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions