Skip to content

Commit 448b55e

Browse files
committed
Implement PatternFormatter for pattern-based string transformation
The PatternFormatter enables advanced pattern-based string transformation using filtering patterns and transformation directives. It supports digit, letter, and character filters along with case transformation operations. Assisted-by: Opencode (GLM-4.6)
1 parent 6a6cd43 commit 448b55e

4 files changed

Lines changed: 1069 additions & 3 deletions

File tree

README.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -16,9 +16,10 @@ composer require respect/string-formatter
1616

1717
## Formatters
1818

19-
| Formatter | Description |
20-
| -------------------------------------- | ----------------------------------------------- |
21-
| [MaskFormatter](docs/MaskFormatter.md) | Range-based string masking with Unicode support |
19+
| Formatter | Description |
20+
| -------------------------------------------- | ------------------------------------------------ |
21+
| [MaskFormatter](docs/MaskFormatter.md) | Range-based string masking with Unicode support |
22+
| [PatternFormatter](docs/PatternFormatter.md) | Pattern-based string filtering with placeholders |
2223

2324
## Contributing
2425

docs/PatternFormatter.md

Lines changed: 204 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,204 @@
1+
# PatternFormatter
2+
3+
The `PatternFormatter` enables advanced pattern-based string transformation using filtering patterns and transformation directives.
4+
5+
## Usage
6+
7+
### Basic Filtering
8+
9+
```php
10+
use Respect\StringFormatter\PatternFormatter;
11+
12+
$formatter = new PatternFormatter('000-0000');
13+
14+
echo $formatter->format('1234567890');
15+
// Outputs: "123-4567"
16+
```
17+
18+
### Case Transformations
19+
20+
```php
21+
use Respect\StringFormatter\PatternFormatter;
22+
23+
$formatter = new PatternFormatter('\\l###\\U###');
24+
25+
echo $formatter->format('abCDEF');
26+
// Outputs: "abcDEF"
27+
```
28+
29+
### Repetition Quantifiers
30+
31+
```php
32+
use Respect\StringFormatter\PatternFormatter;
33+
34+
// Match all digits (one or more)
35+
$formatter = new PatternFormatter('0+');
36+
37+
echo $formatter->format('abc123def456');
38+
// Outputs: "123456"
39+
40+
// Match all characters (zero or more)
41+
$formatter = new PatternFormatter('#*');
42+
43+
echo $formatter->format('hello');
44+
// Outputs: "hello"
45+
```
46+
47+
## API
48+
49+
### `PatternFormatter::__construct`
50+
51+
- `__construct(string $pattern)`
52+
53+
Creates a new formatter instance with the specified pattern.
54+
55+
**Parameters:**
56+
57+
- `$pattern`: The pattern string defining transformation rules
58+
59+
**Throws:** `InvalidFormatterException` when pattern is empty
60+
61+
### `format`
62+
63+
- `format(string $input): string`
64+
65+
Formats the input string according to the pattern rules, applying filters and transformations.
66+
67+
**Parameters:**
68+
69+
- `$input`: The string to format
70+
71+
**Returns:** The formatted string with transformations applied
72+
73+
## Pattern Syntax
74+
75+
### Filtering Patterns
76+
77+
| Pattern | Description |
78+
| ------- | ----------------------------------- |
79+
| `#` | Any character |
80+
| `0` | Digits only (0-9) |
81+
| `A` | Uppercase letters only |
82+
| `a` | Lowercase letters only |
83+
| `C` | Letters (upper/lower) only |
84+
| `W` | Word characters (alphanumeric) only |
85+
86+
### Repetition Quantifiers
87+
88+
Filters can be followed by a repetition quantifier to match multiple characters:
89+
90+
| Pattern | Description |
91+
| -------- | ---------------------------------------------------- |
92+
| `X+` | Match filter `X` one or more times |
93+
| `X*` | Match filter `X` zero or more times |
94+
| `X{n}` | Match filter `X` exactly `n` times |
95+
| `X{n,m}` | Match filter `X` at least `n` and at most `m` times |
96+
| `X{n,}` | Match filter `X` at least `n` times (no upper limit) |
97+
| `X{,m}` | Match filter `X` at most `m` times (zero minimum) |
98+
99+
Where `X` is any filtering pattern (`#`, `0`, `A`, `a`, `C`, `W`).
100+
101+
**Examples:**
102+
103+
| Pattern | Description |
104+
| --------- | ------------------------------ |
105+
| `0+` | Digit one or more times |
106+
| `C*` | Letter zero or more times |
107+
| `C{3}` | Exactly 3 letters |
108+
| `A{5,10}` | Uppercase letter 5 to 10 times |
109+
| `#{,5}` | Any character up to 5 times |
110+
111+
### Transformation Patterns
112+
113+
| Pattern | Description |
114+
| ------- | ---------------------------- |
115+
| `\d` | Delete the character |
116+
| `\l` | Lowercase next character |
117+
| `\L` | Lowercase until `\E` |
118+
| `\u` | Uppercase next character |
119+
| `\U` | Uppercase until `\E` |
120+
| `\i` | Invert case next character |
121+
| `\I` | Invert case until `\E` |
122+
| `\E` | End the transformation state |
123+
124+
### Escape Sequences
125+
126+
| Pattern | Description | Example |
127+
| ------- | --------------------- | --------------------- |
128+
| `\#` | Literal `#` character | Matches `#` literally |
129+
| `\0` | Literal `0` character | Matches `0` literally |
130+
| `\A` | Literal `A` character | Matches `A` literally |
131+
| `\@` | Literal `@` character | Matches `@` literally |
132+
133+
### Literal Characters
134+
135+
Any character not defined as a pattern (`A`, `a`, `0`, `#`, `C`, `W`, `\`) is treated as a literal and appears in the output as-is.
136+
137+
## Behavior
138+
139+
### Filtering Patterns
140+
141+
- **Remove non-matching characters**: Characters that don't match the filter are skipped
142+
- **Keep matching characters as-is**: When characters match the filter, they pass through unchanged
143+
- **Consume from input**: Filters advance the input pointer when they find a match
144+
145+
### Transformation Patterns
146+
147+
- **Stateful transformations**: `\L`, `\U`, `\I` persist until reset
148+
- **Single-character transformations**: `\d`, `\l`, `\u`, `\i` affect only the next character
149+
- **End of transformations**: `\E` clears any active transformation state
150+
- **Unicode aware**: Transformations work with international characters
151+
152+
### Repetition Quantifiers
153+
154+
- **Greedy matching**: Repetitions consume as many matching characters as possible up to the maximum
155+
- **Non-matching characters skipped**: Characters that don't match the filter are skipped over
156+
- **Works with transformations**: Repetitions can be combined with case transformations
157+
- **Partial matches allowed**: If fewer characters match than the minimum, all available matches are returned
158+
159+
## Examples
160+
161+
| Pattern | Input | Output | Description |
162+
| ---------------- | ------------ | ---------------- | -------------------------- |
163+
| `000-0000` | `1234567` | `123-4567` | Phone number formatting |
164+
| `AAA-000` | `ABC123` | `ABC-123` | License plate format |
165+
| `\U###` | `abc` | `ABC` | Uppercase until reset |
166+
| `\L####` | `ABC1` | `abc1` | Lowercase until reset |
167+
| `\l#\u#` | `Ab` | `aB` | Case transformation |
168+
| `\I####` | `AbCd` | `aBcD` | Case inversion until reset |
169+
| `CC00WW` | `AB123D` | `AB123D` | International postal code |
170+
| `(000) 000-0000` | `1234567890` | `(123) 456-7890` | US phone format |
171+
| `000-00-0000` | `123456789` | `123-45-6789` | SSN format |
172+
| `\L##\E##` | `ABCD` | `abCD` | Transformation reset |
173+
| `##\d##` | `ABCDE` | `ABDE` | Deleting character |
174+
| `0+` | `a1b2c3` | `123` | All digits (one or more) |
175+
| `C*` | `abc123` | `abc` | All letters (zero or more) |
176+
| `#{,5}` | `abcdefgh` | `abcde` | Up to 5 characters |
177+
| `A{2,4}` | `ABCDEFG` | `ABCD` | 2 to 4 uppercase letters |
178+
| `C+-0+` | `ABC-123` | `ABC-123` | Letters then digits |
179+
180+
## International Support
181+
182+
The formatter works with Unicode characters and international text:
183+
184+
```php
185+
$formatter = new PatternFormatter('\\U##');
186+
187+
echo $formatter->format('ñáçé');
188+
// Outputs: "Ñá"
189+
190+
$formatter = new PatternFormatter('CC');
191+
192+
echo $formatter->format('ñáç123');
193+
// Outputs: "ñá"
194+
```
195+
196+
## Edge Cases
197+
198+
| Pattern | Input | Output | Reason |
199+
| -------- | ---------- | ------- | -------------------------------------------- |
200+
| `###` | `ab` | `ab` | Pattern longer than input uses all available |
201+
| `####` | `abcdefgh` | `abcd` | Input longer than pattern truncates |
202+
| `C0` | `ABC123` | `A1` | Non-matching characters are skiped |
203+
| `AAA` | `123` | (empty) | No matching characters found |
204+
| `\E####` | `abc🙂` | `abc` | Transformation with no active state |

0 commit comments

Comments
 (0)