You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The UniversalDetector() when combined with feed returns {'encoding': None, 'confidence': None} when the "detect" produces the correct result.
import cchardet
from pathlib import Path
UniversalDetect = cchardet.UniversalDetector()
smi = Path('file.smi')
def cchardet_detect(input_file: Path):
with input_file.open(mode="rb") as ifp:
data = ifp.read()
return cchardet.detect(data)
def cchardet_universal(input_file: Path):
with input_file.open(mode="rb") as ifp:
for line in ifp:
UniversalDetect.feed(line)
if UniversalDetect.done:
break
return UniversalDetect.result
cchardet_detect(smi)
cchardet_universal(smi)
The file in question is encoded in UHC (aka CP949).
The UniversalDetector() when combined with feed returns {'encoding': None, 'confidence': None} when the "detect" produces the correct result.
The file in question is encoded in UHC (aka CP949).
Output:
Also of note that:
But the issue linked is reported as being related to the input file being large. My guess is that it was incorrect, as I obtained the same results here with a file that's ~100 kiB.