Skip to content

Commit 74b4065

Browse files
committed
Implement PatternFormatter for pattern-based string transformation
The PatternFormatter enables advanced pattern-based string transformation using filtering patterns and transformation directives. It supports digit, letter, and character filters along with case transformation operations. Assisted-by: Opencode (GLM-4.6)
1 parent 6a6cd43 commit 74b4065

4 files changed

Lines changed: 957 additions & 3 deletions

File tree

README.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -16,9 +16,10 @@ composer require respect/string-formatter
1616

1717
## Formatters
1818

19-
| Formatter | Description |
20-
| -------------------------------------- | ----------------------------------------------- |
21-
| [MaskFormatter](docs/MaskFormatter.md) | Range-based string masking with Unicode support |
19+
| Formatter | Description |
20+
| -------------------------------------------- | ------------------------------------------------ |
21+
| [MaskFormatter](docs/MaskFormatter.md) | Range-based string masking with Unicode support |
22+
| [PatternFormatter](docs/PatternFormatter.md) | Pattern-based string filtering with placeholders |
2223

2324
## Contributing
2425

docs/PatternFormatter.md

Lines changed: 204 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,204 @@
1+
# PatternFormatter
2+
3+
The `PatternFormatter` enables advanced pattern-based string transformation using filtering patterns and transformation directives.
4+
5+
## Usage
6+
7+
### Basic Filtering
8+
9+
```php
10+
use Respect\StringFormatter\PatternFormatter;
11+
12+
$formatter = new PatternFormatter('000-0000');
13+
14+
echo $formatter->format('1234567890');
15+
// Outputs: "123-4567"
16+
```
17+
18+
### Case Transformations
19+
20+
```php
21+
use Respect\StringFormatter\PatternFormatter;
22+
23+
$formatter = new PatternFormatter('\\l###\\U###');
24+
25+
echo $formatter->format('abCDEF');
26+
// Outputs: "abcDEF"
27+
```
28+
29+
### Repetition Quantifiers
30+
31+
```php
32+
use Respect\StringFormatter\PatternFormatter;
33+
34+
// Match all digits (one or more)
35+
$formatter = new PatternFormatter('0+');
36+
37+
echo $formatter->format('abc123def456');
38+
// Outputs: "123456"
39+
40+
// Match all characters (zero or more)
41+
$formatter = new PatternFormatter('#*');
42+
43+
echo $formatter->format('hello');
44+
// Outputs: "hello"
45+
```
46+
47+
## API
48+
49+
### `PatternFormatter::__construct`
50+
51+
- `__construct(string $pattern)`
52+
53+
Creates a new formatter instance with the specified pattern.
54+
55+
**Parameters:**
56+
57+
- `$pattern`: The pattern string defining transformation rules
58+
59+
**Throws:** `InvalidFormatterException` when pattern is empty
60+
61+
### `format`
62+
63+
- `format(string $input): string`
64+
65+
Formats the input string according to the pattern rules, applying filters and transformations.
66+
67+
**Parameters:**
68+
69+
- `$input`: The string to format
70+
71+
**Returns:** The formatted string with transformations applied
72+
73+
## Pattern Syntax
74+
75+
### Filtering Patterns
76+
77+
| Pattern | Description |
78+
| ------- | ----------------------------------- |
79+
| `#` | Any character |
80+
| `0` | Digits only (0-9) |
81+
| `A` | Uppercase letters only |
82+
| `a` | Lowercase letters only |
83+
| `C` | Letters (upper/lower) only |
84+
| `W` | Word characters (alphanumeric) only |
85+
86+
### Repetition Quantifiers
87+
88+
Filters can be followed by a repetition quantifier to match multiple characters:
89+
90+
| Pattern | Description |
91+
| -------- | ---------------------------------------------------- |
92+
| `X+` | Match filter `X` one or more times |
93+
| `X*` | Match filter `X` zero or more times |
94+
| `X{n}` | Match filter `X` exactly `n` times |
95+
| `X{n,m}` | Match filter `X` at least `n` and at most `m` times |
96+
| `X{n,}` | Match filter `X` at least `n` times (no upper limit) |
97+
| `X{,m}` | Match filter `X` at most `m` times (zero minimum) |
98+
99+
Where `X` is any filtering pattern (`#`, `0`, `A`, `a`, `C`, `W`).
100+
101+
**Examples:**
102+
103+
| Pattern | Description |
104+
| --------- | ------------------------------ |
105+
| `0+` | Digit one or more times |
106+
| `C*` | Letter zero or more times |
107+
| `C{3}` | Exactly 3 letters |
108+
| `A{5,10}` | Uppercase letter 5 to 10 times |
109+
| `#{,5}` | Any character up to 5 times |
110+
111+
### Transformation Patterns
112+
113+
| Pattern | Description |
114+
| ------- | ---------------------------- |
115+
| `\d` | Delete the character |
116+
| `\l` | Lowercase next character |
117+
| `\L` | Lowercase until `\E` |
118+
| `\u` | Uppercase next character |
119+
| `\U` | Uppercase until `\E` |
120+
| `\i` | Invert case next character |
121+
| `\I` | Invert case until `\E` |
122+
| `\E` | End the transformation state |
123+
124+
### Escape Sequences
125+
126+
| Pattern | Description | Example |
127+
| ------- | --------------------- | --------------------- |
128+
| `\#` | Literal `#` character | Matches `#` literally |
129+
| `\0` | Literal `0` character | Matches `0` literally |
130+
| `\A` | Literal `A` character | Matches `A` literally |
131+
| `\@` | Literal `@` character | Matches `@` literally |
132+
133+
### Literal Characters
134+
135+
Any character not defined as a pattern (`A`, `a`, `0`, `#`, `C`, `W`, `\`) is treated as a literal and appears in the output as-is.
136+
137+
## Behavior
138+
139+
### Filtering Patterns
140+
141+
- **Remove non-matching characters**: Characters that don't match the filter are skipped
142+
- **Keep matching characters as-is**: When characters match the filter, they pass through unchanged
143+
- **Consume from input**: Filters advance the input pointer when they find a match
144+
145+
### Transformation Patterns
146+
147+
- **Stateful transformations**: `\L`, `\U`, `\I` persist until reset
148+
- **Single-character transformations**: `\d`, `\l`, `\u`, `\i` affect only the next character
149+
- **End of transformations**: `\E` clears any active transformation state
150+
- **Unicode aware**: Transformations work with international characters
151+
152+
### Repetition Quantifiers
153+
154+
- **Greedy matching**: Repetitions consume as many matching characters as possible up to the maximum
155+
- **Non-matching characters skipped**: Characters that don't match the filter are skipped over
156+
- **Works with transformations**: Repetitions can be combined with case transformations
157+
- **Partial matches allowed**: If fewer characters match than the minimum, all available matches are returned
158+
159+
## Examples
160+
161+
| Pattern | Input | Output | Description |
162+
| ---------------- | ------------ | ---------------- | -------------------------- |
163+
| `000-0000` | `1234567` | `123-4567` | Phone number formatting |
164+
| `AAA-000` | `ABC123` | `ABC-123` | License plate format |
165+
| `\U###` | `abc` | `ABC` | Uppercase until reset |
166+
| `\L####` | `ABC1` | `abc1` | Lowercase until reset |
167+
| `\l#\u#` | `Ab` | `aB` | Case transformation |
168+
| `\I####` | `AbCd` | `aBcD` | Case inversion until reset |
169+
| `CC00WW` | `AB123D` | `AB123D` | International postal code |
170+
| `(000) 000-0000` | `1234567890` | `(123) 456-7890` | US phone format |
171+
| `000-00-0000` | `123456789` | `123-45-6789` | SSN format |
172+
| `\L##\E##` | `ABCD` | `abCD` | Transformation reset |
173+
| `##\d##` | `ABCDE` | `ABDE` | Deleting character |
174+
| `0+` | `a1b2c3` | `123` | All digits (one or more) |
175+
| `C*` | `abc123` | `abc` | All letters (zero or more) |
176+
| `#{,5}` | `abcdefgh` | `abcde` | Up to 5 characters |
177+
| `A{2,4}` | `ABCDEFG` | `ABCD` | 2 to 4 uppercase letters |
178+
| `C+-0+` | `ABC-123` | `ABC-123` | Letters then digits |
179+
180+
## International Support
181+
182+
The formatter works with Unicode characters and international text:
183+
184+
```php
185+
$formatter = new PatternFormatter('\\U##');
186+
187+
echo $formatter->format('ñáçé');
188+
// Outputs: "Ñá"
189+
190+
$formatter = new PatternFormatter('CC');
191+
192+
echo $formatter->format('ñáç123');
193+
// Outputs: "ñá"
194+
```
195+
196+
## Edge Cases
197+
198+
| Pattern | Input | Output | Reason |
199+
| -------- | ---------- | ------- | -------------------------------------------- |
200+
| `###` | `ab` | `ab` | Pattern longer than input uses all available |
201+
| `####` | `abcdefgh` | `abcd` | Input longer than pattern truncates |
202+
| `C0` | `ABC123` | `A1` | Non-matching characters are skiped |
203+
| `AAA` | `123` | (empty) | No matching characters found |
204+
| `\E####` | `abc🙂` | `abc` | Transformation with no active state |

src/PatternFormatter.php

Lines changed: 176 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,176 @@
1+
<?php
2+
3+
declare(strict_types=1);
4+
5+
namespace Respect\StringFormatter;
6+
7+
use function array_key_exists;
8+
use function count;
9+
use function implode;
10+
use function lcfirst;
11+
use function mb_str_split;
12+
use function mb_strlen;
13+
use function mb_strtolower;
14+
use function mb_strtoupper;
15+
use function mb_substr;
16+
use function preg_match;
17+
18+
final readonly class PatternFormatter implements Formatter
19+
{
20+
private const array FILTERS = [
21+
'#' => '/^.$/u',
22+
'0' => '/^[0-9]$/',
23+
'A' => '/^[A-Z]$/',
24+
'a' => '/^[a-z]$/',
25+
'C' => '/^\p{L}$/u',
26+
'W' => '/^[\p{L}\p{N}]$/u',
27+
];
28+
29+
private const array TRANSFORMATIONS = [
30+
'd' => 'delete',
31+
'l' => 'lower',
32+
'L' => 'LOWER',
33+
'u' => 'upper',
34+
'U' => 'UPPER',
35+
'i' => 'invert',
36+
'I' => 'INVERT',
37+
'E' => 'reset',
38+
];
39+
40+
public function __construct(
41+
private string $pattern,
42+
) {
43+
if ($this->pattern === '') {
44+
throw new InvalidFormatterException('Pattern cannot be empty');
45+
}
46+
}
47+
48+
public function format(string $input): string
49+
{
50+
$chars = mb_str_split($input);
51+
$charIndex = 0;
52+
$output = [];
53+
$transform = null;
54+
$patternLength = mb_strlen($this->pattern);
55+
56+
for ($i = 0; $i < $patternLength; $i++) {
57+
$char = mb_substr($this->pattern, $i, 1);
58+
59+
// Handle escape sequences
60+
if ($char === '\\' && $i + 1 < $patternLength) {
61+
$next = mb_substr($this->pattern, $i + 1, 1);
62+
63+
if (array_key_exists($next, self::TRANSFORMATIONS)) {
64+
$type = self::TRANSFORMATIONS[$next];
65+
if ($type === 'delete') {
66+
$charIndex++;
67+
} elseif ($type === 'reset') {
68+
$transform = null;
69+
} else {
70+
$transform = $type;
71+
}
72+
73+
$i++;
74+
continue;
75+
}
76+
77+
// Escaped literal character
78+
$output[] = $next;
79+
$i++;
80+
continue;
81+
}
82+
83+
// Handle filter patterns
84+
if (array_key_exists($char, self::FILTERS)) {
85+
$repetition = $this->parseRepetition($i + 1);
86+
if ($repetition !== null) {
87+
[, $max, $consumed] = $repetition;
88+
$i += $consumed;
89+
} else {
90+
$max = 1;
91+
}
92+
93+
$count = 0;
94+
while (($max === null || $count < $max) && $charIndex < count($chars)) {
95+
if (!$this->matches($char, $chars[$charIndex])) {
96+
$charIndex++;
97+
continue;
98+
}
99+
100+
$output[] = $this->applyTransform($chars[$charIndex++], $transform);
101+
$count++;
102+
103+
if ($transform === null || $transform !== lcfirst($transform)) {
104+
continue;
105+
}
106+
107+
$transform = null; // Clear single-use (lowercase) transformations
108+
}
109+
110+
continue;
111+
}
112+
113+
// Literal character
114+
$output[] = $char;
115+
}
116+
117+
return implode('', $output);
118+
}
119+
120+
/**
121+
* Parses a repetition quantifier (+, *, {n}, {n,}, {,m}, or {n,m}) starting at the given position.
122+
*
123+
* @return array{int, int|null, int}|null Returns [min, max, consumed chars] or null if no valid quantifier
124+
*/
125+
private function parseRepetition(int $position): array|null
126+
{
127+
$remaining = mb_substr($this->pattern, $position);
128+
129+
// Match + for one or more
130+
if (mb_substr($remaining, 0, 1) === '+') {
131+
return [1, null, 1];
132+
}
133+
134+
// Match * for zero or more
135+
if (mb_substr($remaining, 0, 1) === '*') {
136+
return [0, null, 1];
137+
}
138+
139+
// Match {n} for exact count
140+
if (preg_match('/^\{(\d+)\}/', $remaining, $matches) === 1) {
141+
$count = (int) $matches[1];
142+
143+
return [$count, $count, mb_strlen($matches[0])];
144+
}
145+
146+
// Match range quantifiers with minimum specified
147+
if (preg_match('/^\{(\d+),(\d*)\}/', $remaining, $matches) === 1) {
148+
$min = (int) $matches[1];
149+
$max = $matches[2] === '' ? null : (int) $matches[2];
150+
151+
return [$min, $max, mb_strlen($matches[0])];
152+
}
153+
154+
// Match range quantifiers with only maximum specified
155+
if (preg_match('/^\{,(\d+)\}/', $remaining, $matches) === 1) {
156+
return [0, (int) $matches[1], mb_strlen($matches[0])];
157+
}
158+
159+
return null;
160+
}
161+
162+
private function matches(string $filter, string $char): bool
163+
{
164+
return preg_match(self::FILTERS[$filter], $char) === 1;
165+
}
166+
167+
private function applyTransform(string $char, string|null $transform): string
168+
{
169+
return match ($transform) {
170+
'lower', 'LOWER' => mb_strtolower($char),
171+
'upper', 'UPPER' => mb_strtoupper($char),
172+
'invert', 'INVERT' => mb_strtolower($char) === $char ? mb_strtoupper($char) : mb_strtolower($char),
173+
default => $char,
174+
};
175+
}
176+
}

0 commit comments

Comments
 (0)