Skip to content

Commit c25af28

Browse files
committed
fix: improve filtering logic to exclude empty titles and whitespace-only content
--bug=1066052@tapd-62980211 --user=刘瑞斌 【知识库】智能分段方式遇到空的一级标题,分段结果不对 https://www.tapd.cn/62980211/s/1841052
1 parent a745f5a commit c25af28

File tree

1 file changed

+5
-1
lines changed

1 file changed

+5
-1
lines changed

apps/common/utils/split_model.py

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -165,7 +165,11 @@ def parse_level(text, pattern: str):
165165
:return: 符合正则的文本
166166
"""
167167
level_content_list = list(map(to_tree_obj, [r[0:255] for r in re_findall(pattern, text) if r is not None]))
168-
return list(map(filter_special_symbol, level_content_list))
168+
# 过滤掉空标题或只包含#和空白字符的标题
169+
filtered_list = [item for item in level_content_list
170+
if item['content'].strip() and item['content'].replace('#', '').strip()]
171+
return list(map(filter_special_symbol, filtered_list))
172+
169173

170174

171175
def re_findall(pattern, text):

0 commit comments

Comments
 (0)