|
| 1 | +# PatternFormatter |
| 2 | + |
| 3 | +The `PatternFormatter` enables advanced pattern-based string transformation using filtering patterns and transformation directives. |
| 4 | + |
| 5 | +## Usage |
| 6 | + |
| 7 | +### Basic Filtering |
| 8 | + |
| 9 | +```php |
| 10 | +use Respect\StringFormatter\PatternFormatter; |
| 11 | + |
| 12 | +$formatter = new PatternFormatter('000-0000'); |
| 13 | + |
| 14 | +echo $formatter->format('1234567890'); |
| 15 | +// Outputs: "123-4567" |
| 16 | +``` |
| 17 | + |
| 18 | +### Case Transformations |
| 19 | + |
| 20 | +```php |
| 21 | +use Respect\StringFormatter\PatternFormatter; |
| 22 | + |
| 23 | +$formatter = new PatternFormatter('\\l###\\U###'); |
| 24 | + |
| 25 | +echo $formatter->format('abCDEF'); |
| 26 | +// Outputs: "abcDEF" |
| 27 | +``` |
| 28 | + |
| 29 | +### Repetition Quantifiers |
| 30 | + |
| 31 | +```php |
| 32 | +use Respect\StringFormatter\PatternFormatter; |
| 33 | + |
| 34 | +// Match all digits |
| 35 | +$formatter = new PatternFormatter('0{1,}'); |
| 36 | + |
| 37 | +echo $formatter->format('abc123def456'); |
| 38 | +// Outputs: "123456" |
| 39 | + |
| 40 | +// Match up to 5 characters |
| 41 | +$formatter = new PatternFormatter('#{,5}'); |
| 42 | + |
| 43 | +echo $formatter->format('abcdefghij'); |
| 44 | +// Outputs: "abcde" |
| 45 | +``` |
| 46 | + |
| 47 | +## API |
| 48 | + |
| 49 | +### `PatternFormatter::__construct` |
| 50 | + |
| 51 | +- `__construct(string $pattern)` |
| 52 | + |
| 53 | +Creates a new formatter instance with the specified pattern. |
| 54 | + |
| 55 | +**Parameters:** |
| 56 | + |
| 57 | +- `$pattern`: The pattern string defining transformation rules |
| 58 | + |
| 59 | +**Throws:** `InvalidFormatterException` when pattern is empty |
| 60 | + |
| 61 | +### `format` |
| 62 | + |
| 63 | +- `format(string $input): string` |
| 64 | + |
| 65 | +Formats the input string according to the pattern rules, applying filters and transformations. |
| 66 | + |
| 67 | +**Parameters:** |
| 68 | + |
| 69 | +- `$input`: The string to format |
| 70 | + |
| 71 | +**Returns:** The formatted string with transformations applied |
| 72 | + |
| 73 | +## Pattern Syntax |
| 74 | + |
| 75 | +### Filtering Patterns |
| 76 | + |
| 77 | +| Pattern | Description | |
| 78 | +| ------- | --------------------------------------- | |
| 79 | +| `#` | Any character | |
| 80 | +| `0` | Digits only (0-9) | |
| 81 | +| `A` | Uppercase letters only | |
| 82 | +| `a` | Lowercase letters only | |
| 83 | +| `C` | Letters (upper/lower) only | |
| 84 | +| `W` | Word characters (alphanumeric) only | |
| 85 | + |
| 86 | +### Repetition Quantifiers |
| 87 | + |
| 88 | +Filters can be followed by a repetition quantifier to match multiple characters: |
| 89 | + |
| 90 | +| Pattern | Description | |
| 91 | +| ---------- | --------------------------------------------------- | |
| 92 | +| `X{n}` | Match filter `X` exactly `n` times | |
| 93 | +| `X{n,m}` | Match filter `X` at least `n` and at most `m` times | |
| 94 | +| `X{n,}` | Match filter `X` at least `n` times (no upper limit) | |
| 95 | +| `X{,m}` | Match filter `X` at most `m` times (zero minimum) | |
| 96 | +| `X{,}` | Match filter `X` unlimited times | |
| 97 | + |
| 98 | +Where `X` is any filtering pattern (`#`, `0`, `A`, `a`, `C`, `W`). |
| 99 | + |
| 100 | +**Examples:** |
| 101 | + |
| 102 | +| Pattern | Description | |
| 103 | +| ---------- | ---------------------------------------- | |
| 104 | +| `C{3}` | Exactly 3 letters | |
| 105 | +| `A{5,10}` | Uppercase letter 5 to 10 times | |
| 106 | +| `0{1,}` | Digit 1 or more times | |
| 107 | +| `#{,5}` | Any character up to 5 times | |
| 108 | + |
| 109 | +### Transformation Patterns |
| 110 | + |
| 111 | +| Pattern | Description | |
| 112 | +| ------- | ---------------------------- | |
| 113 | +| `\d` | Delete the character | |
| 114 | +| `\l` | Lowercase next character | |
| 115 | +| `\L` | Lowercase until `\E` | |
| 116 | +| `\u` | Uppercase next character | |
| 117 | +| `\U` | Uppercase until `\E` | |
| 118 | +| `\i` | Invert case next character | |
| 119 | +| `\I` | Invert case until `\E` | |
| 120 | +| `\E` | End the transformation state | |
| 121 | + |
| 122 | +### Escape Sequences |
| 123 | + |
| 124 | +| Pattern | Description | Example | |
| 125 | +| ------- | --------------------- | --------------------- | |
| 126 | +| `\#` | Literal `#` character | Matches `#` literally | |
| 127 | +| `\0` | Literal `0` character | Matches `0` literally | |
| 128 | +| `\A` | Literal `A` character | Matches `A` literally | |
| 129 | +| `\@` | Literal `@` character | Matches `@` literally | |
| 130 | + |
| 131 | +### Literal Characters |
| 132 | + |
| 133 | +Any character not defined as a pattern (`A`, `a`, `0`, `#`, `C`, `W`, `\`) is treated as a literal and appears in the output as-is. |
| 134 | + |
| 135 | +## Behavior |
| 136 | + |
| 137 | +### Filtering Patterns |
| 138 | + |
| 139 | +- **Remove non-matching characters**: Characters that don't match the filter are skipped |
| 140 | +- **Keep matching characters as-is**: When characters match the filter, they pass through unchanged |
| 141 | +- **Consume from input**: Filters advance the input pointer when they find a match |
| 142 | + |
| 143 | +### Transformation Patterns |
| 144 | + |
| 145 | +- **Stateful transformations**: `\L`, `\U`, `\I` persist until reset |
| 146 | +- **Single-character transformations**: `\d`, `\l`, `\u`, `\i` affect only the next character |
| 147 | +- **End of transformations**: `\E` clears any active transformation state |
| 148 | +- **Unicode aware**: Transformations work with international characters |
| 149 | + |
| 150 | +### Repetition Quantifiers |
| 151 | + |
| 152 | +- **Greedy matching**: Repetitions consume as many matching characters as possible up to the maximum |
| 153 | +- **Non-matching characters skipped**: Characters that don't match the filter are skipped over |
| 154 | +- **Works with transformations**: Repetitions can be combined with case transformations |
| 155 | +- **Partial matches allowed**: If fewer characters match than the minimum, all available matches are returned |
| 156 | + |
| 157 | +## Examples |
| 158 | + |
| 159 | +| Pattern | Input | Output | Description | |
| 160 | +| ---------------- | ------------ | ---------------- | -------------------------- | |
| 161 | +| `000-0000` | `1234567` | `123-4567` | Phone number formatting | |
| 162 | +| `AAA-000` | `ABC123` | `ABC-123` | License plate format | |
| 163 | +| `\U###` | `abc` | `ABC` | Uppercase until reset | |
| 164 | +| `\L####` | `ABC1` | `abc1` | Lowercase until reset | |
| 165 | +| `\l#\u#` | `Ab` | `aB` | Case transformation | |
| 166 | +| `\I####` | `AbCd` | `aBcD` | Case inversion until reset | |
| 167 | +| `CC00WW` | `AB123D` | `AB123D` | International postal code | |
| 168 | +| `(000) 000-0000` | `1234567890` | `(123) 456-7890` | US phone format | |
| 169 | +| `000-00-0000` | `123456789` | `123-45-6789` | SSN format | |
| 170 | +| `\L##\E##` | `ABCD` | `abCD` | Transformation reset | |
| 171 | +| `##\d##` | `ABCDE` | `ABDE` | Deleting character | |
| 172 | +| `0{1,}` | `a1b2c3` | `123` | All digits (unbounded) | |
| 173 | +| `#{,5}` | `abcdefgh` | `abcde` | Up to 5 characters | |
| 174 | +| `A{2,4}` | `ABCDEFG` | `ABCD` | 2 to 4 uppercase letters | |
| 175 | +| `C{3,}-0{3,}` | `ABC-123` | `ABC-123` | Letters then digits | |
| 176 | + |
| 177 | +## International Support |
| 178 | + |
| 179 | +The formatter works with Unicode characters and international text: |
| 180 | + |
| 181 | +```php |
| 182 | +$formatter = new PatternFormatter('\\U##'); |
| 183 | + |
| 184 | +echo $formatter->format('ñáçé'); |
| 185 | +// Outputs: "Ñá" |
| 186 | + |
| 187 | +$formatter = new PatternFormatter('CC'); |
| 188 | + |
| 189 | +echo $formatter->format('ñáç123'); |
| 190 | +// Outputs: "ñá" |
| 191 | +``` |
| 192 | + |
| 193 | +## Edge Cases |
| 194 | + |
| 195 | +| Pattern | Input | Output | Reason | |
| 196 | +| ------- | ---------- | ------- | -------------------------------------------- | |
| 197 | +| `###` | `ab` | `ab` | Pattern longer than input uses all available | |
| 198 | +| `####` | `abcdefgh` | `abcd` | Input longer than pattern truncates | |
| 199 | +| `C0` | `ABC123` | `A1` | Non-matching characters are skiped | |
| 200 | +| `AAA` | `123` | (empty) | No matching characters found | |
| 201 | +| `\E###` | `abc🙂` | `abc` | Transformation with no active state | |
0 commit comments