Skip to content

Commit c2b7ca6

Browse files
committed
Implement PatternFormatter for pattern-based string transformation
The PatternFormatter enables advanced pattern-based string transformation using filtering patterns and transformation directives. It supports digit, letter, and character filters along with case transformation operations. Assisted-by: Opencode (GLM-4.6)
1 parent 6a6cd43 commit c2b7ca6

4 files changed

Lines changed: 805 additions & 3 deletions

File tree

README.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -16,9 +16,10 @@ composer require respect/string-formatter
1616

1717
## Formatters
1818

19-
| Formatter | Description |
20-
| -------------------------------------- | ----------------------------------------------- |
21-
| [MaskFormatter](docs/MaskFormatter.md) | Range-based string masking with Unicode support |
19+
| Formatter | Description |
20+
| -------------------------------------------- | ------------------------------------------------ |
21+
| [MaskFormatter](docs/MaskFormatter.md) | Range-based string masking with Unicode support |
22+
| [PatternFormatter](docs/PatternFormatter.md) | Pattern-based string filtering with placeholders |
2223

2324
## Contributing
2425

docs/PatternFormatter.md

Lines changed: 149 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,149 @@
1+
# PatternFormatter
2+
3+
The `PatternFormatter` enables advanced pattern-based string transformation using filtering patterns and transformation directives.
4+
5+
## Usage
6+
7+
### Basic Filtering
8+
9+
```php
10+
use Respect\StringFormatter\PatternFormatter;
11+
12+
$formatter = new PatternFormatter('000-0000');
13+
14+
echo $formatter->format('1234567890');
15+
// Outputs: "123-4567"
16+
```
17+
18+
### Case Transformations
19+
20+
```php
21+
use Respect\StringFormatter\PatternFormatter;
22+
23+
$formatter = new PatternFormatter('\\l###\\U###');
24+
25+
echo $formatter->format('abCDEF');
26+
// Outputs: "abcDEF"
27+
```
28+
29+
## API
30+
31+
### `PatternFormatter::__construct`
32+
33+
- `__construct(string $pattern)`
34+
35+
Creates a new formatter instance with the specified pattern.
36+
37+
**Parameters:**
38+
39+
- `$pattern`: The pattern string defining transformation rules
40+
41+
**Throws:** `InvalidFormatterException` when pattern is empty
42+
43+
### `format`
44+
45+
- `format(string $input): string`
46+
47+
Formats the input string according to the pattern rules, applying filters and transformations.
48+
49+
**Parameters:**
50+
51+
- `$input`: The string to format
52+
53+
**Returns:** The formatted string with transformations applied
54+
55+
## Pattern Syntax
56+
57+
### Filtering Patterns
58+
59+
| Pattern | Description |
60+
| ------- | --------------------------------------- |
61+
| `#` | Any character |
62+
| `0` | Digits only (0-9) |
63+
| `A` | Uppercase letters only |
64+
| `a` | Lowercase letters only |
65+
| `C` | Letters (upper/lower) only |
66+
| `W` | Word characters (alphanumeric) only |
67+
68+
### Transformation Patterns
69+
70+
| Pattern | Description |
71+
| ------- | ---------------------------- |
72+
| `\d` | Delete the character |
73+
| `\l` | Lowercase next character |
74+
| `\L` | Lowercase until `\E` |
75+
| `\u` | Uppercase next character |
76+
| `\U` | Uppercase until `\E` |
77+
| `\i` | Invert case next character |
78+
| `\I` | Invert case until `\E` |
79+
| `\E` | End the transformation state |
80+
81+
### Escape Sequences
82+
83+
| Pattern | Description | Example |
84+
| ------- | --------------------- | --------------------- |
85+
| `\#` | Literal `#` character | Matches `#` literally |
86+
| `\0` | Literal `0` character | Matches `0` literally |
87+
| `\A` | Literal `A` character | Matches `A` literally |
88+
| `\@` | Literal `@` character | Matches `@` literally |
89+
90+
### Literal Characters
91+
92+
Any character not defined as a pattern (`A`, `a`, `0`, `#`, `C`, `W`, `\`) is treated as a literal and appears in the output as-is.
93+
94+
## Behavior
95+
96+
### Filtering Patterns
97+
98+
- **Remove non-matching characters**: Characters that don't match the filter are skipped
99+
- **Keep matching characters as-is**: When characters match the filter, they pass through unchanged
100+
- **Consume from input**: Filters advance the input pointer when they find a match
101+
102+
### Transformation Patterns
103+
104+
- **Stateful transformations**: `\L`, `\U`, `\I` persist until reset
105+
- **Single-character transformations**: `\d`, `\l`, `\u`, `\i` affect only the next character
106+
- **End of transformations**: `\E` clears any active transformation state
107+
- **Unicode aware**: Transformations work with international characters
108+
109+
## Examples
110+
111+
| Pattern | Input | Output | Description |
112+
| ---------------- | ------------ | ---------------- | -------------------------- |
113+
| `000-0000` | `1234567` | `123-4567` | Phone number formatting |
114+
| `AAA-000` | `ABC123` | `ABC-123` | License plate format |
115+
| `\U###` | `abc` | `ABC` | Uppercase until reset |
116+
| `\L####` | `ABC1` | `abc1` | Lowercase until reset |
117+
| `\l#\u#` | `Ab` | `aB` | Case transformation |
118+
| `\I####` | `AbCd` | `aBcD` | Case inversion until reset |
119+
| `CC00WW` | `AB123D` | `AB123D` | International postal code |
120+
| `(000) 000-0000` | `1234567890` | `(123) 456-7890` | US phone format |
121+
| `000-00-0000` | `123456789` | `123-45-6789` | SSN format |
122+
| `\L##\E##` | `ABCD` | `abCD` | Transformation reset |
123+
| `##\d##` | `ABCDE` | `ABDE` | Deleting character |
124+
125+
## International Support
126+
127+
The formatter works with Unicode characters and international text:
128+
129+
```php
130+
$formatter = new PatternFormatter('\\U##');
131+
132+
echo $formatter->format('ñáçé');
133+
// Outputs: "Ñá"
134+
135+
$formatter = new PatternFormatter('CC');
136+
137+
echo $formatter->format('ñáç123');
138+
// Outputs: "ñá"
139+
```
140+
141+
## Edge Cases
142+
143+
| Pattern | Input | Output | Reason |
144+
| ------- | ---------- | ------- | -------------------------------------------- |
145+
| `###` | `ab` | `ab` | Pattern longer than input uses all available |
146+
| `####` | `abcdefgh` | `abcd` | Input longer than pattern truncates |
147+
| `C0` | `ABC123` | `A1` | Non-matching characters are skiped |
148+
| `AAA` | `123` | (empty) | No matching characters found |
149+
| `\E###` | `abc🙂` | `abc` | Transformation with no active state |

src/PatternFormatter.php

Lines changed: 206 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,206 @@
1+
<?php
2+
3+
declare(strict_types=1);
4+
5+
namespace Respect\StringFormatter;
6+
7+
use function array_key_exists;
8+
use function assert;
9+
use function count;
10+
use function implode;
11+
use function mb_str_split;
12+
use function mb_strlen;
13+
use function mb_strtolower;
14+
use function mb_strtoupper;
15+
use function mb_substr;
16+
use function preg_match;
17+
use function str_ends_with;
18+
19+
final readonly class PatternFormatter implements Formatter
20+
{
21+
private const array FILTERING_PATTERNS = [
22+
'#' => 'any',
23+
'0' => 'digit',
24+
'A' => 'upper_alpha',
25+
'a' => 'lower_alpha',
26+
'C' => 'alpha',
27+
'W' => 'alphanumeric',
28+
'X' => 'hex',
29+
'!' => 'punctuation',
30+
'@' => 'symbol',
31+
];
32+
33+
private const array TRANSFORMATION_PATTERNS = [
34+
'd' => 'delete',
35+
'l' => 'lower_single',
36+
'L' => 'lower_until_reset',
37+
'u' => 'upper_single',
38+
'U' => 'upper_until_reset',
39+
'i' => 'invert_single',
40+
'I' => 'invert_until_reset',
41+
'E' => 'reset_transformations',
42+
];
43+
44+
public function __construct(
45+
private string $pattern,
46+
) {
47+
if ($this->pattern === '') {
48+
throw new InvalidFormatterException('Pattern cannot be empty');
49+
}
50+
}
51+
52+
public function format(string $input): string
53+
{
54+
$inputChars = mb_str_split($input);
55+
$inputIndex = 0;
56+
$result = [];
57+
$transformation = null;
58+
$patternIndex = 0;
59+
60+
while ($patternIndex < mb_strlen($this->pattern)) {
61+
$patternChar = mb_substr($this->pattern, $patternIndex, 1);
62+
63+
// Handle escape sequences
64+
if ($patternChar === '\\') {
65+
$nextPatternChar = mb_substr($this->pattern, $patternIndex + 1, 1);
66+
67+
if ($nextPatternChar !== '') {
68+
// Handle escaped transformation patterns
69+
if (array_key_exists($nextPatternChar, self::TRANSFORMATION_PATTERNS)) {
70+
match (self::TRANSFORMATION_PATTERNS[$nextPatternChar]) {
71+
'delete' => $inputIndex++,
72+
'lower_single' => $transformation = 'lower_single',
73+
'upper_single' => $transformation = 'upper_single',
74+
'invert_single' => $transformation = 'invert_single',
75+
'lower_until_reset' => $transformation = 'lower',
76+
'upper_until_reset' => $transformation = 'upper',
77+
'invert_until_reset' => $transformation = 'invert',
78+
'reset_transformations' => $transformation = null,
79+
};
80+
$patternIndex += 2;
81+
continue;
82+
}
83+
84+
// For backslash followed by any other character, output that character literally
85+
$result[] = $nextPatternChar;
86+
$patternIndex += 2;
87+
continue;
88+
}
89+
}
90+
91+
// Handle filtering patterns
92+
if (array_key_exists($patternChar, self::FILTERING_PATTERNS)) {
93+
$consumeResult = $this->consumeNextMatchingChar(
94+
$inputChars,
95+
$inputIndex,
96+
$patternChar,
97+
$result,
98+
$transformation,
99+
);
100+
$inputIndex = $consumeResult['newInputIndex'];
101+
$result = $consumeResult['newResult'];
102+
$transformation = $consumeResult['newTransformation'];
103+
$patternIndex++;
104+
continue;
105+
}
106+
107+
// Handle literal characters - they appear in output as-is and don't consume input
108+
$result[] = $patternChar;
109+
$patternIndex++;
110+
}
111+
112+
return implode('', $result);
113+
}
114+
115+
/**
116+
* @param array<string> $inputChars
117+
* @param array<string> $result
118+
*
119+
* @return array{newInputIndex: int, newResult: array<string>, newTransformation: string|null}
120+
*/
121+
private function consumeNextMatchingChar(
122+
array $inputChars,
123+
int $inputIndex,
124+
string $filter,
125+
array $result,
126+
string|null $transformation,
127+
): array {
128+
while ($inputIndex < count($inputChars)) {
129+
if ($this->matchesFilter($filter, $inputChars[$inputIndex])) {
130+
if ($transformation !== null) {
131+
$tempTransformation = $transformation;
132+
// Clear single-use transformations
133+
$finalTransformation = $transformation;
134+
if (str_ends_with($transformation, '_single')) {
135+
$finalTransformation = null;
136+
}
137+
138+
$transformationResult = $this->appendWithTransformation(
139+
$result,
140+
$inputChars,
141+
$inputIndex,
142+
$tempTransformation,
143+
);
144+
145+
return [
146+
'newInputIndex' => $transformationResult['newInputIndex'],
147+
'newResult' => $transformationResult['newResult'],
148+
'newTransformation' => $finalTransformation,
149+
];
150+
} else {
151+
$result[] = $inputChars[$inputIndex];
152+
$inputIndex++;
153+
}
154+
155+
break;
156+
}
157+
158+
$inputIndex++; // Skip non-matching character
159+
}
160+
161+
return ['newInputIndex' => $inputIndex, 'newResult' => $result, 'newTransformation' => $transformation];
162+
}
163+
164+
private function matchesFilter(string $filter, string $char): bool
165+
{
166+
assert(isset(self::FILTERING_PATTERNS[$filter]));
167+
168+
$filterType = self::FILTERING_PATTERNS[$filter];
169+
170+
return match ($filterType) {
171+
'any' => true,
172+
'digit' => preg_match('/^[0-9]$/', $char) === 1,
173+
'upper_alpha' => preg_match('/^[A-Z]$/', $char) === 1,
174+
'lower_alpha' => preg_match('/^[a-z]$/', $char) === 1,
175+
'alpha' => preg_match('/^\p{L}$/u', $char) === 1,
176+
'alphanumeric' => preg_match('/^[\p{L}\p{N}]$/u', $char) === 1,
177+
178+
default => false,
179+
};
180+
}
181+
182+
/**
183+
* @param array<string> $result
184+
* @param array<string> $inputChars
185+
*
186+
* @return array{newInputIndex: int, newResult: array<string>, newTransformation: string|null}
187+
*/
188+
private function appendWithTransformation(
189+
array $result,
190+
array $inputChars,
191+
int $inputIndex,
192+
string $transformation,
193+
): array {
194+
$char = $inputChars[$inputIndex];
195+
$inputIndex++;
196+
197+
$result[] = match ($transformation) {
198+
'lower', 'lower_single' => mb_strtolower($char),
199+
'upper', 'upper_single' => mb_strtoupper($char),
200+
'invert', 'invert_single' => mb_strtolower($char) === $char ? mb_strtoupper($char) : mb_strtolower($char),
201+
default => $char,
202+
};
203+
204+
return ['newInputIndex' => $inputIndex, 'newResult' => $result, 'newTransformation' => null];
205+
}
206+
}

0 commit comments

Comments
 (0)