You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -15,7 +15,7 @@ Detection of character sets with a simple and redesigned interface.
15
15
16
16
This package is based on [Ude](https://github.com/errepi/ude) and since version 2 also on [uchardet](https://gitlab.freedesktop.org/uchardet/uchardet),
17
17
which are ports of the [Mozilla Universal Charset Detector](https://mxr.mozilla.org/mozilla/source/extensions/universalchardet/).
18
-
18
+
19
19
The interface and other classes has been resigned so it's easier to use and better object oriented design (OOD). Unit tests and CI has been added.
20
20
21
21
Features:
@@ -52,29 +52,48 @@ var result = CharsetDetector.DetectFromBytes(byteArray);
52
52
53
53
The article "[A composite approach to language/encoding detection](https://www-archive.mozilla.org/projects/intl/UniversalCharsetDetection.html)" describes the charsets detection algorithms implemented by the library.
54
54
55
-
The following charsets are supported:
56
-
57
-
* ASCII
58
-
* UTF-8
59
-
* UTF-16 (BE and LE)
60
-
* UTF-32 (BE and LE)
61
-
* windows-1252 (mostly equivalent to iso8859-1)
62
-
* windows-1251 and ISO-8859-5 (cyrillic)
63
-
* windows-1253 and ISO-8859-7 (greek)
64
-
* windows-1255 (logical hebrew. Includes ISO-8859-8-I and most of x-mac-hebrew)
65
-
* ISO-8859-8 (visual hebrew)
66
-
* Big-5
67
-
* gb18030 (superset of gb2312)
68
-
* HZ-GB-2312
69
-
* Shift-JIS
70
-
* CP949
71
-
* EUC-KR, EUC-JP, EUC-TW
72
-
* ISO-2022-JP, ISO-2022-KR, ISO-2022-CN
73
-
* KOI8-R
74
-
* x-mac-cyrillic
75
-
* IBM855 and IBM866
76
-
* X-ISO-10646-UCS-4-3412 and X-ISO-10646-UCS-4-2413 (unusual BOM)
77
-
55
+
<details>
56
+
<summary>The following charsets are supported</summary>
0 commit comments