GB18030 encoded text is being detected as utf_16, big5 and cp037, and only big5 can decode it.
Detection as utf_16 is very wrong as that codec must contain the utf16 BOM, so the library should be very cautious about that result, but chardet has a patch to do exactly that chardet/chardet#109
The GB18030 BOM tends to result in detection as cp037 The BOM is regularly causing problems in chardet-like-libraries. c.f. chardet/chardet#178
GB18030 encoded text is being detected as utf_16, big5 and cp037, and only big5 can decode it.
Detection as utf_16 is very wrong as that codec must contain the utf16 BOM, so the library should be very cautious about that result, but
chardethas a patch to do exactly that chardet/chardet#109The GB18030 BOM tends to result in detection as cp037 The BOM is regularly causing problems in chardet-like-libraries. c.f. chardet/chardet#178