Skip to content

Commit b8ac99a

Browse files
committed
Implement PatternFormatter for pattern-based string transformation
The PatternFormatter enables advanced pattern-based string transformation using filtering patterns and transformation directives. It supports digit, letter, and character filters along with case transformation operations. Assisted-by: Opencode (GLM-4.6)
1 parent 6a6cd43 commit b8ac99a

4 files changed

Lines changed: 924 additions & 3 deletions

File tree

README.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -16,9 +16,10 @@ composer require respect/string-formatter
1616

1717
## Formatters
1818

19-
| Formatter | Description |
20-
| -------------------------------------- | ----------------------------------------------- |
21-
| [MaskFormatter](docs/MaskFormatter.md) | Range-based string masking with Unicode support |
19+
| Formatter | Description |
20+
| -------------------------------------------- | ------------------------------------------------ |
21+
| [MaskFormatter](docs/MaskFormatter.md) | Range-based string masking with Unicode support |
22+
| [PatternFormatter](docs/PatternFormatter.md) | Pattern-based string filtering with placeholders |
2223

2324
## Contributing
2425

docs/PatternFormatter.md

Lines changed: 201 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,201 @@
1+
# PatternFormatter
2+
3+
The `PatternFormatter` enables advanced pattern-based string transformation using filtering patterns and transformation directives.
4+
5+
## Usage
6+
7+
### Basic Filtering
8+
9+
```php
10+
use Respect\StringFormatter\PatternFormatter;
11+
12+
$formatter = new PatternFormatter('000-0000');
13+
14+
echo $formatter->format('1234567890');
15+
// Outputs: "123-4567"
16+
```
17+
18+
### Case Transformations
19+
20+
```php
21+
use Respect\StringFormatter\PatternFormatter;
22+
23+
$formatter = new PatternFormatter('\\l###\\U###');
24+
25+
echo $formatter->format('abCDEF');
26+
// Outputs: "abcDEF"
27+
```
28+
29+
### Repetition Quantifiers
30+
31+
```php
32+
use Respect\StringFormatter\PatternFormatter;
33+
34+
// Match all digits
35+
$formatter = new PatternFormatter('0{1,}');
36+
37+
echo $formatter->format('abc123def456');
38+
// Outputs: "123456"
39+
40+
// Match up to 5 characters
41+
$formatter = new PatternFormatter('#{,5}');
42+
43+
echo $formatter->format('abcdefghij');
44+
// Outputs: "abcde"
45+
```
46+
47+
## API
48+
49+
### `PatternFormatter::__construct`
50+
51+
- `__construct(string $pattern)`
52+
53+
Creates a new formatter instance with the specified pattern.
54+
55+
**Parameters:**
56+
57+
- `$pattern`: The pattern string defining transformation rules
58+
59+
**Throws:** `InvalidFormatterException` when pattern is empty
60+
61+
### `format`
62+
63+
- `format(string $input): string`
64+
65+
Formats the input string according to the pattern rules, applying filters and transformations.
66+
67+
**Parameters:**
68+
69+
- `$input`: The string to format
70+
71+
**Returns:** The formatted string with transformations applied
72+
73+
## Pattern Syntax
74+
75+
### Filtering Patterns
76+
77+
| Pattern | Description |
78+
| ------- | --------------------------------------- |
79+
| `#` | Any character |
80+
| `0` | Digits only (0-9) |
81+
| `A` | Uppercase letters only |
82+
| `a` | Lowercase letters only |
83+
| `C` | Letters (upper/lower) only |
84+
| `W` | Word characters (alphanumeric) only |
85+
86+
### Repetition Quantifiers
87+
88+
Filters can be followed by a repetition quantifier to match multiple characters:
89+
90+
| Pattern | Description |
91+
| ---------- | --------------------------------------------------- |
92+
| `X{n}` | Match filter `X` exactly `n` times |
93+
| `X{n,m}` | Match filter `X` at least `n` and at most `m` times |
94+
| `X{n,}` | Match filter `X` at least `n` times (no upper limit) |
95+
| `X{,m}` | Match filter `X` at most `m` times (zero minimum) |
96+
| `X{,}` | Match filter `X` unlimited times |
97+
98+
Where `X` is any filtering pattern (`#`, `0`, `A`, `a`, `C`, `W`).
99+
100+
**Examples:**
101+
102+
| Pattern | Description |
103+
| ---------- | ---------------------------------------- |
104+
| `C{3}` | Exactly 3 letters |
105+
| `A{5,10}` | Uppercase letter 5 to 10 times |
106+
| `0{1,}` | Digit 1 or more times |
107+
| `#{,5}` | Any character up to 5 times |
108+
109+
### Transformation Patterns
110+
111+
| Pattern | Description |
112+
| ------- | ---------------------------- |
113+
| `\d` | Delete the character |
114+
| `\l` | Lowercase next character |
115+
| `\L` | Lowercase until `\E` |
116+
| `\u` | Uppercase next character |
117+
| `\U` | Uppercase until `\E` |
118+
| `\i` | Invert case next character |
119+
| `\I` | Invert case until `\E` |
120+
| `\E` | End the transformation state |
121+
122+
### Escape Sequences
123+
124+
| Pattern | Description | Example |
125+
| ------- | --------------------- | --------------------- |
126+
| `\#` | Literal `#` character | Matches `#` literally |
127+
| `\0` | Literal `0` character | Matches `0` literally |
128+
| `\A` | Literal `A` character | Matches `A` literally |
129+
| `\@` | Literal `@` character | Matches `@` literally |
130+
131+
### Literal Characters
132+
133+
Any character not defined as a pattern (`A`, `a`, `0`, `#`, `C`, `W`, `\`) is treated as a literal and appears in the output as-is.
134+
135+
## Behavior
136+
137+
### Filtering Patterns
138+
139+
- **Remove non-matching characters**: Characters that don't match the filter are skipped
140+
- **Keep matching characters as-is**: When characters match the filter, they pass through unchanged
141+
- **Consume from input**: Filters advance the input pointer when they find a match
142+
143+
### Transformation Patterns
144+
145+
- **Stateful transformations**: `\L`, `\U`, `\I` persist until reset
146+
- **Single-character transformations**: `\d`, `\l`, `\u`, `\i` affect only the next character
147+
- **End of transformations**: `\E` clears any active transformation state
148+
- **Unicode aware**: Transformations work with international characters
149+
150+
### Repetition Quantifiers
151+
152+
- **Greedy matching**: Repetitions consume as many matching characters as possible up to the maximum
153+
- **Non-matching characters skipped**: Characters that don't match the filter are skipped over
154+
- **Works with transformations**: Repetitions can be combined with case transformations
155+
- **Partial matches allowed**: If fewer characters match than the minimum, all available matches are returned
156+
157+
## Examples
158+
159+
| Pattern | Input | Output | Description |
160+
| ---------------- | ------------ | ---------------- | -------------------------- |
161+
| `000-0000` | `1234567` | `123-4567` | Phone number formatting |
162+
| `AAA-000` | `ABC123` | `ABC-123` | License plate format |
163+
| `\U###` | `abc` | `ABC` | Uppercase until reset |
164+
| `\L####` | `ABC1` | `abc1` | Lowercase until reset |
165+
| `\l#\u#` | `Ab` | `aB` | Case transformation |
166+
| `\I####` | `AbCd` | `aBcD` | Case inversion until reset |
167+
| `CC00WW` | `AB123D` | `AB123D` | International postal code |
168+
| `(000) 000-0000` | `1234567890` | `(123) 456-7890` | US phone format |
169+
| `000-00-0000` | `123456789` | `123-45-6789` | SSN format |
170+
| `\L##\E##` | `ABCD` | `abCD` | Transformation reset |
171+
| `##\d##` | `ABCDE` | `ABDE` | Deleting character |
172+
| `0{1,}` | `a1b2c3` | `123` | All digits (unbounded) |
173+
| `#{,5}` | `abcdefgh` | `abcde` | Up to 5 characters |
174+
| `A{2,4}` | `ABCDEFG` | `ABCD` | 2 to 4 uppercase letters |
175+
| `C{3,}-0{3,}` | `ABC-123` | `ABC-123` | Letters then digits |
176+
177+
## International Support
178+
179+
The formatter works with Unicode characters and international text:
180+
181+
```php
182+
$formatter = new PatternFormatter('\\U##');
183+
184+
echo $formatter->format('ñáçé');
185+
// Outputs: "Ñá"
186+
187+
$formatter = new PatternFormatter('CC');
188+
189+
echo $formatter->format('ñáç123');
190+
// Outputs: "ñá"
191+
```
192+
193+
## Edge Cases
194+
195+
| Pattern | Input | Output | Reason |
196+
| ------- | ---------- | ------- | -------------------------------------------- |
197+
| `###` | `ab` | `ab` | Pattern longer than input uses all available |
198+
| `####` | `abcdefgh` | `abcd` | Input longer than pattern truncates |
199+
| `C0` | `ABC123` | `A1` | Non-matching characters are skiped |
200+
| `AAA` | `123` | (empty) | No matching characters found |
201+
| `\E###` | `abc🙂` | `abc` | Transformation with no active state |

src/PatternFormatter.php

Lines changed: 161 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,161 @@
1+
<?php
2+
3+
declare(strict_types=1);
4+
5+
namespace Respect\StringFormatter;
6+
7+
use function array_key_exists;
8+
use function count;
9+
use function implode;
10+
use function lcfirst;
11+
use function mb_str_split;
12+
use function mb_strlen;
13+
use function mb_strtolower;
14+
use function mb_strtoupper;
15+
use function mb_substr;
16+
use function preg_match;
17+
18+
final readonly class PatternFormatter implements Formatter
19+
{
20+
private const array FILTERS = [
21+
'#' => '/^.$/u',
22+
'0' => '/^[0-9]$/',
23+
'A' => '/^[A-Z]$/',
24+
'a' => '/^[a-z]$/',
25+
'C' => '/^\p{L}$/u',
26+
'W' => '/^[\p{L}\p{N}]$/u',
27+
];
28+
29+
private const array TRANSFORMATIONS = [
30+
'd' => 'delete',
31+
'l' => 'lower',
32+
'L' => 'LOWER',
33+
'u' => 'upper',
34+
'U' => 'UPPER',
35+
'i' => 'invert',
36+
'I' => 'INVERT',
37+
'E' => 'reset',
38+
];
39+
40+
public function __construct(
41+
private string $pattern,
42+
) {
43+
if ($this->pattern === '') {
44+
throw new InvalidFormatterException('Pattern cannot be empty');
45+
}
46+
}
47+
48+
public function format(string $input): string
49+
{
50+
$chars = mb_str_split($input);
51+
$charIndex = 0;
52+
$output = [];
53+
$transform = null;
54+
$patternLength = mb_strlen($this->pattern);
55+
56+
for ($i = 0; $i < $patternLength; $i++) {
57+
$char = mb_substr($this->pattern, $i, 1);
58+
59+
// Handle escape sequences
60+
if ($char === '\\' && $i + 1 < $patternLength) {
61+
$next = mb_substr($this->pattern, $i + 1, 1);
62+
63+
if (array_key_exists($next, self::TRANSFORMATIONS)) {
64+
$type = self::TRANSFORMATIONS[$next];
65+
if ($type === 'delete') {
66+
$charIndex++;
67+
} elseif ($type === 'reset') {
68+
$transform = null;
69+
} else {
70+
$transform = $type;
71+
}
72+
73+
$i++;
74+
continue;
75+
}
76+
77+
// Escaped literal character
78+
$output[] = $next;
79+
$i++;
80+
continue;
81+
}
82+
83+
// Handle filter patterns
84+
if (array_key_exists($char, self::FILTERS)) {
85+
$repetition = $this->parseRepetition($i + 1);
86+
if ($repetition !== null) {
87+
[, $max, $consumed] = $repetition;
88+
$i += $consumed;
89+
} else {
90+
$max = 1;
91+
}
92+
93+
$count = 0;
94+
while (($max === null || $count < $max) && $charIndex < count($chars)) {
95+
if (!$this->matches($char, $chars[$charIndex])) {
96+
$charIndex++;
97+
continue;
98+
}
99+
100+
$output[] = $this->applyTransform($chars[$charIndex++], $transform);
101+
$count++;
102+
103+
if ($transform === null || $transform !== lcfirst($transform)) {
104+
continue;
105+
}
106+
107+
$transform = null; // Clear single-use (lowercase) transformations
108+
}
109+
110+
continue;
111+
}
112+
113+
// Literal character
114+
$output[] = $char;
115+
}
116+
117+
return implode('', $output);
118+
}
119+
120+
/**
121+
* Parses a repetition quantifier {n}, {min,}, {,max}, or {min,max} starting at the given position.
122+
*
123+
* @return array{int, int|null, int}|null Returns [min, max, consumed chars] or null if no valid quantifier
124+
*/
125+
private function parseRepetition(int $position): array|null
126+
{
127+
$remaining = mb_substr($this->pattern, $position);
128+
129+
// Match {n} for exact count
130+
if (preg_match('/^\{(\d+)\}/', $remaining, $matches) === 1) {
131+
$count = (int) $matches[1];
132+
133+
return [$count, $count, mb_strlen($matches[0])];
134+
}
135+
136+
// Match range quantifiers with comma separator
137+
if (preg_match('/^\{(\d*),(\d*)\}/', $remaining, $matches) !== 1) {
138+
return null;
139+
}
140+
141+
$min = $matches[1] === '' ? 0 : (int) $matches[1];
142+
$max = $matches[2] === '' ? null : (int) $matches[2];
143+
144+
return [$min, $max, mb_strlen($matches[0])];
145+
}
146+
147+
private function matches(string $filter, string $char): bool
148+
{
149+
return preg_match(self::FILTERS[$filter], $char) === 1;
150+
}
151+
152+
private function applyTransform(string $char, string|null $transform): string
153+
{
154+
return match ($transform) {
155+
'lower', 'LOWER' => mb_strtolower($char),
156+
'upper', 'UPPER' => mb_strtoupper($char),
157+
'invert', 'INVERT' => mb_strtolower($char) === $char ? mb_strtoupper($char) : mb_strtolower($char),
158+
default => $char,
159+
};
160+
}
161+
}

0 commit comments

Comments
 (0)