Skip to content

Commit b4624bb

Browse files
committed
docs(readme): rewrite in concise repo-style format
1 parent 2d4a671 commit b4624bb

File tree

1 file changed

+76
-164
lines changed

1 file changed

+76
-164
lines changed

README.md

Lines changed: 76 additions & 164 deletions
Original file line numberDiff line numberDiff line change
@@ -1,172 +1,84 @@
1-
# TTSTextNormalization - Normalize Text for TTS
2-
3-
[![NuGet Version](https://img.shields.io/nuget/v/Agash.TTSTextNormalization.svg?style=flat-square)](https://www.nuget.org/packages/Agash.TTSTextNormalization/)
4-
[![Build Status](https://img.shields.io/github/actions/workflow/status/Agash/TTSTextNormalization/dotnet-publish.yml?branch=master&style=flat-square)](https://github.com/Agash/TTSTextNormalization/actions)
5-
6-
A .NET 9 / C# 13 class library designed to normalize text containing emojis, currency symbols, numbers, URLs, abbreviations, and other non-standard elements, making it suitable for consistent and natural-sounding Text-to-Speech (TTS) synthesis across different engines (e.g., System.Speech, KokoroSharp). Specifically tailored for scenarios involving user-generated content like Twitch/YouTube chat and donations.
7-
8-
## Problem Solved
9-
10-
TTS engines often struggle with or produce inconsistent results when encountering:
11-
12-
* Emojis (e.g., ✨, 👍, 🇬🇧)
13-
* Currency symbols and codes from various locales (e.g., $, £, €, USD, JPY, BRL)
14-
* Different number formats (cardinals, ordinals, decimals, version numbers)
15-
* Common chat/gaming abbreviations and slang (e.g., lol, brb, gg, afk)
16-
* URLs (e.g., https://example.com, www.test.org)
17-
* Excessive punctuation or letter repetitions (e.g., !!!, ???, sooooo)
18-
* Non-standard characters
19-
20-
This library preprocesses input text using a configurable pipeline of rules to replace or adjust these elements *before* sending the text to the TTS engine, leading to a more predictable, consistent, and pleasant listening experience.
21-
22-
## Features
23-
24-
* **Emoji Normalization:** Replaces Unicode emojis (including flags, ZWJ sequences) with descriptive text (e.g., ✨ -> `sparkles`).
25-
* *Configurable:* Add optional prefix/suffix (e.g., "emoji sparkles", "sparkles emoji") via `EmojiRuleOptions`.
26-
* **Currency Normalization:** Detects currency symbols and ISO codes. Replaces amounts with spoken text using locale-aware mappings (e.g., `$10.50` -> `ten US dollars fifty cents`). Uses Humanizer.
27-
* **Number Normalization:** Handles standalone cardinals ("123" -> `one hundred and twenty-three`), ordinals ("1st" -> `first`), decimals ("1.5" -> `one point five`), and version-like numbers ("1.2.3" -> `one point two point three`). Uses Humanizer.
28-
* **URL Normalization:** Replaces detected URLs (http, https, www) with a placeholder (default: " link ").
29-
* *Configurable:* Specify custom placeholder text via `UrlRuleOptions`.
30-
* **Abbreviation/Acronym Expansion:** Expands a comprehensive list of common chat, gaming, and streaming abbreviations (e.g., `lol` -> `laughing out loud`). Case-insensitive and whole-word matching.
31-
* *Configurable:* Add custom abbreviations or completely replace the default list via `AbbreviationRuleOptions`.
32-
* **Basic Text Sanitization:** Normalizes line breaks, removes common problematic control/formatting characters, and replaces non-standard "fancy" punctuation with ASCII equivalents.
33-
* **Chat Text Cleanup:**
34-
* Reduces sequences of excessive punctuation (`!!!` -> `!`, `...` -> `.`, `???` -> `?`).
35-
* Reduces excessive letter repetitions (`soooo` -> `soo`).
36-
* **Whitespace Normalization:** Trims leading/trailing whitespace, collapses multiple internal whitespace characters to a single space, and normalizes spacing around common punctuation.
37-
* **Extensibility & Configuration:**
38-
* Designed around a pipeline of `ITextNormalizationRule` instances.
39-
* Easily configurable via Dependency Injection using `AddTextNormalization`.
40-
* Rule execution order can be overridden during registration.
41-
* Specific rules offer configuration via the standard .NET Options pattern (`IOptions<T>`).
42-
* Custom rules can be created by implementing the `ITextNormalizationRule` interface.
43-
* **Performance:** Optimized using modern .NET features like source generators (Regex, Emoji data), `FrozenDictionary` for lookups, `IOptions`, and efficient string handling where possible.
44-
45-
## Technology
46-
47-
* **C# 13 / .NET 9**: Leverages the latest language and runtime features.
48-
* **Source Generators:** Used for generating optimized Regex patterns and embedding up-to-date Emoji data at compile time.
49-
* **Humanizer:** Used for robust number-to-words and ordinal conversion.
50-
* **Core .NET Libraries:** `System.Text.RegularExpressions`, `System.Globalization`, `System.Collections.Frozen`, `System.Text.Json` (in generator), `Microsoft.Extensions.Options`.
51-
* **Dependency Injection:** Designed for easy integration using `Microsoft.Extensions.DependencyInjection`.
52-
53-
## Getting Started
54-
55-
### Installation
56-
57-
```powershell
1+
# TTSTextNormalization
2+
3+
.NET library for normalizing user-generated text before Text-to-Speech playback (chat, donations, comments, alerts), so engines pronounce content more consistently.
4+
5+
[![GitHub Actions Workflow Status](https://img.shields.io/github/actions/workflow/status/Agash/TTSTextNormalization/dotnet-publish.yml?style=flat-square&logo=github&logoColor=white)](https://github.com/Agash/TTSTextNormalization/actions)
6+
[![NuGet Version](https://img.shields.io/nuget/v/Agash.TTSTextNormalization.svg?style=flat-square&logo=nuget&logoColor=white)](https://www.nuget.org/packages/Agash.TTSTextNormalization/)
7+
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg?style=flat-square)](https://opensource.org/licenses/MIT)
8+
9+
## Targets
10+
11+
- `net10.0` (primary)
12+
- `net9.0`
13+
14+
## Install
15+
16+
```bash
5817
dotnet add package Agash.TTSTextNormalization
5918
```
60-
Or install `Agash.TTSTextNormalization` via the NuGet Package Manager in Visual Studio.
61-
62-
### Basic Usage with Dependency Injection (Recommended)
63-
64-
1. **Configure Services (e.g., in `Program.cs` or `Startup.cs`):**
65-
66-
```csharp
67-
using Microsoft.Extensions.DependencyInjection;
68-
using TTSTextNormalization.Abstractions; // For ITextNormalizer
69-
using TTSTextNormalization.DependencyInjection; // For extension methods
70-
using TTSTextNormalization.Rules; // For rule options classes
71-
using System.Collections.Frozen; // For FrozenDictionary
72-
73-
// ... other using statements
74-
75-
var services = new ServiceCollection();
76-
77-
// --- Configure Rule Options (Optional) ---
78-
services.Configure<AbbreviationRuleOptions>(options =>
79-
{
80-
// Example: Add custom abbreviations and override 'gg'
81-
var customMap = new Dictionary<string, string>(StringComparer.OrdinalIgnoreCase)
82-
{
83-
{ "cya", "see you" },
84-
{ "gg", "very good game" } // Overrides default
85-
};
86-
options.CustomAbbreviations = customMap.ToFrozenDictionary(StringComparer.OrdinalIgnoreCase);
87-
options.ReplaceDefaultAbbreviations = false; // Merge with defaults (default behavior)
88-
});
89-
90-
services.Configure<UrlRuleOptions>(options =>
91-
{
92-
options.PlaceholderText = " website link "; // Use a custom placeholder
93-
});
94-
95-
services.Configure<EmojiRuleOptions>(options =>
96-
{
97-
options.Suffix = "emoji"; // Append " emoji" to names, e.g., "thumbs up emoji"
98-
});
99-
100-
101-
// --- Configure the Normalization Pipeline ---
102-
services.AddTextNormalization(builder =>
103-
{
104-
// Add rules. Order is determined by default 'Order' property unless overridden.
105-
builder.AddBasicSanitizationRule(); // Order 10
106-
builder.AddEmojiRule(); // Order 100 (Uses configured options)
107-
builder.AddCurrencyRule(); // Order 200
108-
builder.AddAbbreviationNormalizationRule(); // Order 300 (Uses configured options)
109-
builder.AddNumberNormalizationRule(); // Order 400
110-
builder.AddExcessivePunctuationRule(); // Order 500
111-
builder.AddLetterRepetitionRule(); // Order 510
112-
builder.AddUrlNormalizationRule(); // Order 600 (Uses configured options)
113-
// Example: Add Whitespace rule but make it run earlier
114-
builder.AddWhitespaceNormalizationRule(orderOverride: 50); // Runs before Emoji now!
115-
// Add custom rules here: builder.AddRule<MyCustomRule>(orderOverride: 700);
116-
});
117-
118-
// Register other services...
119-
// Add Logging if desired (pipeline logs information)
120-
// services.AddLogging(logBuilder => logBuilder.AddConsole());
121-
122-
// Build the provider
123-
var serviceProvider = services.BuildServiceProvider();
124-
```
125-
126-
2. **Use the Normalizer:**
127-
128-
```csharp
129-
// Get the normalizer instance from DI
130-
var normalizer = serviceProvider.GetRequiredService<ITextNormalizer>();
131-
132-
// Example inputs
133-
string input1 = " OMG!!! That stream was 🔥🔥🔥!! CYA! Costs $10... Check www.example.com! ";
134-
string input2 = "He said “hello” ✨ gg.";
135-
136-
// Normalize
137-
string normalized1 = normalizer.Normalize(input1);
138-
string normalized2 = normalizer.Normalize(input2);
139-
140-
// Output (approximate, based on configured rules and options)
141-
Console.WriteLine(normalized1);
142-
// Output: oh my god! That stream was fire emoji fire emoji fire emoji! see you! Costs ten US dollars. Check website link!
143-
144-
Console.WriteLine(normalized2);
145-
// Output: He said "hello" sparkles emoji very good game.
146-
147-
// Pass the normalized text to your TTS engine
148-
// MyTTSEngine.Speak(normalized1);
149-
// MyTTSEngine.Speak(normalized2);
150-
```
151-
152-
## Building
153-
154-
Ensure you have the .NET 9 SDK installed.
155-
156-
1. Clone the repository:
157-
```bash
158-
git clone https://github.com/Agash/TTSTextNormalization.git
159-
cd TTSTextNormalization
160-
```
161-
2. Build the solution:
162-
```bash
163-
dotnet build -c Release
164-
```
19+
20+
## What You Get
21+
22+
- Emoji normalization (including ZWJ sequences and flags)
23+
- Currency normalization (`$10.50`, `EUR 100`, etc.) to spoken forms
24+
- Number normalization (cardinal, ordinal, decimal, multi-dot/version-style)
25+
- URL replacement via configurable placeholder text
26+
- Abbreviation expansion for common chat/gaming terms (`lol`, `brb`, `gg`, ...)
27+
- Basic sanitization of control chars and punctuation variants
28+
- Cleanup for excessive punctuation and repeated letters
29+
- Final whitespace and punctuation spacing normalization
30+
- DI-first, ordered pipeline via `ITextNormalizationRule`
31+
32+
## Quick Start (DI)
33+
34+
```csharp
35+
using Microsoft.Extensions.DependencyInjection;
36+
using TTSTextNormalization.Abstractions;
37+
using TTSTextNormalization.DependencyInjection;
38+
using TTSTextNormalization.Rules;
39+
40+
ServiceCollection services = new();
41+
42+
services.Configure<UrlRuleOptions>(o => o.PlaceholderText = " website link ");
43+
services.Configure<EmojiRuleOptions>(o => o.Suffix = "emoji");
44+
45+
services.AddTextNormalization(builder =>
46+
{
47+
builder.AddBasicSanitizationRule();
48+
builder.AddEmojiRule();
49+
builder.AddCurrencyRule();
50+
builder.AddAbbreviationNormalizationRule();
51+
builder.AddNumberNormalizationRule();
52+
builder.AddExcessivePunctuationRule();
53+
builder.AddLetterRepetitionRule();
54+
builder.AddUrlNormalizationRule();
55+
builder.AddWhitespaceNormalizationRule();
56+
});
57+
58+
ServiceProvider provider = services.BuildServiceProvider();
59+
ITextNormalizer normalizer = provider.GetRequiredService<ITextNormalizer>();
60+
61+
string input = "OMG!!! that stream was 🔥🔥🔥 $10.50 www.example.com";
62+
string output = normalizer.Normalize(input);
63+
```
64+
65+
## Notes
66+
67+
- This library normalizes text only. It does not provide TTS playback itself.
68+
- Rule ordering is configurable; defaults are designed for chat-like inputs.
69+
70+
## Build
71+
72+
```bash
73+
dotnet restore
74+
dotnet build -c Release
75+
dotnet test -c Release
76+
```
16577

16678
## Contributing
16779

168-
Contributions are welcome! Please open an issue first to discuss potential changes or bug fixes. If submitting a pull request, ensure tests pass and new features include corresponding tests.
80+
PRs are welcome. If behavior changes, include tests in `TTSTextNormalization.Tests`.
16981

17082
## License
17183

172-
This project is licensed under the [MIT License](LICENSE.txt).
84+
MIT. See `LICENSE.txt`.

0 commit comments

Comments
 (0)