fix: limit chapter title length to 256 characters in pdf_split_handle.py by shaohuzhang1 · Pull Request #2803 · 1Panel-dev/MaxKB

shaohuzhang1 · 2025-04-07T02:53:38Z

fix: limit chapter title length to 256 characters in pdf_split_handle.py --bug=1054363 --user=刘瑞斌【知识库】导入PDF文档，分段标题长度超长时，没有自动截断 https://www.tapd.cn/57709429/s/1681044

--bug=1054363 --user=刘瑞斌【知识库】导入PDF文档，分段标题长度超长时，没有自动截断 https://www.tapd.cn/57709429/s/1681044

f2c-ci-robot · 2025-04-07T02:53:41Z

Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

f2c-ci-robot · 2025-04-07T02:53:50Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

shaohuzhang1 · 2025-04-07T02:54:28Z

apps/common/handle/impl/pdf_split_handle.py

+                chapters.append({"title": real_chapter_title, "content": chapter_text if chapter_text else real_chapter_title})
            # 保存章节内容和章节标题
        return chapters



The code appears to be functioning correctly based on initial inspection, but here are a few suggestions and corrections:

Null Characters Check: The line chapter_text = chapter_text.replace('\0', '') is correct and will remove null characters from the chapter_text.

Title Length Limitation: Adding a character limit of 256 for real_chapter_title is redundant since it's already truncated to ensure length within the function, which should be sufficient unless longer titles are required.

Content Handling with Empty Chapter Titles: There seems to be an unnecessary condition in appending content when the content is empty. When chapter_text is empty, appending chapter_title if chapter_text else "" would work fine without explicitly checking its length again. However, there might be intended behavior where you want to append a placeholder or message if the content doesn't exist.

Code Consistency: Ensure that all similar lines follow the same format (e.g., spacing around operators), for better readability.

Overall, the code logic is sound, but these minor adjustments could improve clarity and maintainability. Here’s a slightly refined version of the relevant section:

def handle_toc(doc, limit): # Null characters are not allowed. chapter_text = chapter_text.replace('\0', '') # Initialize real_chapter_title to avoid repetition if limit > 0: real_chapter_title = chapter_title[:limit] chapters = [] # If the chapter title exists, add it as the key; otherwise, use 'Unknown' # For simplicity, let's assume that None or empty strings are meant to have no entry at all if chapter_title: # Split the text into paragraphs based on the specified limit split_text = PdfSplitHandle.split_text(chapter_text, limit) for text in split_text: chapters.append({ "title": real_chapter_title, "content": text.strip() # Strip leading/trailing whitespace from each paragraph }) return chapters

These changes make the code cleaner and more consistent.

fix: limit chapter title length to 256 characters in pdf_split_handle.py

de85864

--bug=1054363 --user=刘瑞斌【知识库】导入PDF文档，分段标题长度超长时，没有自动截断 https://www.tapd.cn/57709429/s/1681044

f2c-ci-robot bot added the do-not-merge/release-note-label-needed label Apr 7, 2025

shaohuzhang1 commented Apr 7, 2025

View reviewed changes

liuruibin merged commit 560890f into main Apr 7, 2025
4 of 5 checks passed

liuruibin deleted the pr@main@fix_limit_title branch April 7, 2025 02:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: limit chapter title length to 256 characters in pdf_split_handle.py#2803

fix: limit chapter title length to 256 characters in pdf_split_handle.py#2803
liuruibin merged 1 commit intomainfrom
pr@main@fix_limit_title

shaohuzhang1 commented Apr 7, 2025

Uh oh!

f2c-ci-robot bot commented Apr 7, 2025

Uh oh!

f2c-ci-robot bot commented Apr 7, 2025

Uh oh!

shaohuzhang1 Apr 7, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

shaohuzhang1 commented Apr 7, 2025

Uh oh!

f2c-ci-robot bot commented Apr 7, 2025

Uh oh!

f2c-ci-robot bot commented Apr 7, 2025

Uh oh!

shaohuzhang1 Apr 7, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants