Skip to content

Commit c0e2ce7

Browse files
enh: language detector
Signed-off-by: thiswillbeyourgithub <26625900+thiswillbeyourgithub@users.noreply.github.com>
1 parent 2d928ab commit c0e2ce7

File tree

1 file changed

+4
-2
lines changed

1 file changed

+4
-2
lines changed

wdoc/utils/misc.py

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -86,8 +86,7 @@ def language_detector(text: str) -> float:
8686
f"Couldn't import optional package 'langdetect' either: '{err}'"
8787
)
8888

89-
def language_detector(text: str) -> None:
90-
return None
89+
language_detector = None
9190

9291

9392
if (
@@ -813,6 +812,9 @@ def check_docs_tkn_length(
813812

814813
# check if language check is above a threshold and cast as lowercase as it's apparently what it was trained on
815814
try:
815+
if not language_detector:
816+
# bypass if language_detector not defined
817+
return 1.0
816818
probs = [language_detector(d.page_content.replace("\n", "<br>")) for d in docs]
817819
if not probs or probs[0] is None:
818820
# bypass if language_detector not defined

0 commit comments

Comments
 (0)