Skip to content

Commit de0a0dd

Browse files
committed
update readme
1 parent d5d1e28 commit de0a0dd

1 file changed

Lines changed: 44 additions & 25 deletions

File tree

README.md

Lines changed: 44 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
[![Build status](https://ci.appveyor.com/api/projects/status/xr59ab52cav8vuph/branch/master?svg=true)](https://ci.appveyor.com/project/304NotModified/utf-unknown/branch/master)
22
[![NuGet Pre Release](https://img.shields.io/nuget/vpre/UTF.Unknown.svg)](https://www.nuget.org/packages/UTF.Unknown/)
33

4-
<!--
4+
<!--
55
[![codecov.io](https://codecov.io/github/UniversalCharsetDetector/ude/coverage.svg?branch=master)](https://codecov.io/github/UniversalCharsetDetector/ude?branch=master)
66
-->
77

@@ -15,7 +15,7 @@ Detection of character sets with a simple and redesigned interface.
1515

1616
This package is based on [Ude](https://github.com/errepi/ude) and since version 2 also on [uchardet](https://gitlab.freedesktop.org/uchardet/uchardet),
1717
which are ports of the [Mozilla Universal Charset Detector](https://mxr.mozilla.org/mozilla/source/extensions/universalchardet/).
18-
18+
1919
The interface and other classes has been resigned so it's easier to use and better object oriented design (OOD). Unit tests and CI has been added.
2020

2121
Features:
@@ -52,29 +52,48 @@ var result = CharsetDetector.DetectFromBytes(byteArray);
5252

5353
The article "[A composite approach to language/encoding detection](https://www-archive.mozilla.org/projects/intl/UniversalCharsetDetection.html)" describes the charsets detection algorithms implemented by the library.
5454

55-
The following charsets are supported:
56-
57-
* ASCII
58-
* UTF-8
59-
* UTF-16 (BE and LE)
60-
* UTF-32 (BE and LE)
61-
* windows-1252 (mostly equivalent to iso8859-1)
62-
* windows-1251 and ISO-8859-5 (cyrillic)
63-
* windows-1253 and ISO-8859-7 (greek)
64-
* windows-1255 (logical hebrew. Includes ISO-8859-8-I and most of x-mac-hebrew)
65-
* ISO-8859-8 (visual hebrew)
66-
* Big-5
67-
* gb18030 (superset of gb2312)
68-
* HZ-GB-2312
69-
* Shift-JIS
70-
* CP949
71-
* EUC-KR, EUC-JP, EUC-TW
72-
* ISO-2022-JP, ISO-2022-KR, ISO-2022-CN
73-
* KOI8-R
74-
* x-mac-cyrillic
75-
* IBM855 and IBM866
76-
* X-ISO-10646-UCS-4-3412 and X-ISO-10646-UCS-4-2413 (unusual BOM)
77-
55+
<details>
56+
<summary>The following charsets are supported</summary>
57+
58+
| Language | Encodings |
59+
|-------------------------|-----------------------------------------------------------------------------------------------------|
60+
| International (Unicode) | UTF-8; UTF-16BE / UTF-16LE; UTF-32BE / UTF-32LE / X-ISO-10646-UCS-4-34121 / X-ISO-10646-UCS-4-21431 |
61+
| Arabic | ISO-8859-6; WINDOWS-1256 |
62+
| Bulgarian | ISO-8859-5; WINDOWS-1251 |
63+
| Chinese | ISO-2022-CN; BIG5; EUC-TW; GB18030; HZ-GB-2312 |
64+
| Croatian | ISO-8859-2; ISO-8859-13; ISO-8859-16; WINDOWS-1250; IBM852; MAC-CENTRALEUROPE |
65+
| Czech | WINDOWS-1250; ISO-8859-2; IBM852; MAC-CENTRALEUROPE |
66+
| Danish | ISO-8859-1; ISO-8859-15; WINDOWS-1252 |
67+
| English | ASCII |
68+
| Esperanto | ISO-8859-3 |
69+
| Estonian | ISO-8859-4; ISO-8859-13; ISO-8859-13; WINDOWS-1252; WINDOWS-1257 |
70+
| Finnish | ISO-8859-1; ISO-8859-4; ISO-8859-9; ISO-8859-13; ISO-8859-15; WINDOWS-1252 |
71+
| French | ISO-8859-1; ISO-8859-15; WINDOWS-1252 |
72+
| German | ISO-8859-1; WINDOWS-1252 |
73+
| Greek | ISO-8859-7; WINDOWS-1253 |
74+
| Hebrew | ISO-8859-8; WINDOWS-1255 |
75+
| Hungarian | ISO-8859-2; WINDOWS-1250 |
76+
| Irish Gaelic | ISO-8859-1; ISO-8859-9; ISO-8859-15; WINDOWS-1252 |
77+
| Italian | ISO-8859-1; ISO-8859-3; ISO-8859-9; ISO-8859-15; WINDOWS-1252 |
78+
| Japanese | ISO-2022-JP; SHIFT_JIS; EUC-JP |
79+
| Korean | ISO-2022-KR; EUC-KR / UHC; WINDOWS-949 |
80+
| Lithuanian | ISO-8859-4; ISO-8859-10; ISO-8859-13 |
81+
| Latvian | ISO-8859-4; ISO-8859-10; ISO-8859-13 |
82+
| Maltese | ISO-8859-3 |
83+
| Polish | ISO-8859-2; ISO-8859-13; ISO-8859-16; WINDOWS-1250; IBM852; MAC-CENTRALEUROPE |
84+
| Portuguese | ISO-8859-1; ISO-8859-9; ISO-8859-15; WINDOWS-1252 |
85+
| Romanian | ISO-8859-2; ISO-8859-16; WINDOWS-1250; IBM852 |
86+
| Russian | ISO-8859-5; KOI8-R; WINDOWS-1251; MAC-CYRILLIC; IBM866; IBM855 |
87+
| Slovak | WINDOWS-1250; ISO-8859-2; IBM852; MAC-CENTRALEUROPE |
88+
| Slovene | ISO-8859-2; ISO-8859-16; WINDOWS-1250; IBM852; MAC-CENTRALEUROPE |
89+
| Spanish | ISO-8859-1; ISO-8859-15; WINDOWS-1252 |
90+
| Swedish | ISO-8859-1; ISO-8859-4; ISO-8859-9; ISO-8859-15; WINDOWS-1252 |
91+
| Thai | TIS-620; ISO-8859-11 |
92+
| Turkish | ISO-8859-3; ISO-8859-9 |
93+
| Vietnamese | VISCII; WINDOWS-1258 |
94+
| Others | WINDOWS-1252 |
95+
96+
</details>
7897

7998

8099

0 commit comments

Comments
 (0)