Skip to content

Commit 5ba4269

Browse files
committed
Add a new DataFrame::renameEach()
1 parent d2dda43 commit 5ba4269

12 files changed

Lines changed: 241 additions & 103 deletions

File tree

documentation/upgrading.md

Lines changed: 36 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -5,13 +5,34 @@ Please follow the instructions for your specific version to ensure a smooth upgr
55

66
---
77

8+
## Upgrading from 0.15.x to 0.16.x
9+
10+
### 1) Deprecated `Flow\ETL\DataFrame::renameAll*` methods
11+
12+
Methods:
13+
- `Flow\ETL\DataFrame::renameAll()`,
14+
- `Flow\ETL\DataFrame::renameAllLowerCase()`,
15+
- `Flow\ETL\DataFrame::renameAllUpperCase()`,
16+
- `Flow\ETL\DataFrame::renameAllUpperCaseFirst()`,
17+
- `Flow\ETL\DataFrame::renameAllUpperCaseWord()`,
18+
19+
Were deprecated in favor of using new method: `DataFrame::renameEach()` with proper `RenameEntryStrategy` object.
20+
21+
### 2) Deprecated `RenameAllCaseTransformer` & `RenameStrReplaceAllEntriesTransformer`
22+
23+
Selected transformers were deprecated in favor of using `DataFrame::renameEach()` with related `RenameEntryStrategy`:
24+
- `RenameAllCaseTransformer` -> `RenameCaseTransformer`,
25+
- `RenameStrReplaceAllEntriesTransformer` -> `RenameReplaceStrategy`,
26+
27+
---
28+
829
## Upgrading from 0.14.x to 0.15.x
930

10-
### 1) Removed `Flow\ETL\Row\Schema\Matcher` and implementations
11-
Schema Matcher was the initial attempt to implement a schema evolution next to schema validation that over
12-
time got replaced with different implementation of Schema Validator.
31+
### 1) Removed `Flow\ETL\Row\Schema\Matcher` and implementations
32+
Schema Matcher was the initial attempt to implement a schema evolution next to schema validation that over
33+
time got replaced with a different implementation of Schema Validator.
1334

14-
### 2) Renamed `Flow\ETL\Row\Schema` namespace into `Flow\ETL\Schema`.
35+
### 2) Renamed `Flow\ETL\Row\Schema` namespace into `Flow\ETL\Schema`.
1536
This means all classes related to Schema now live under `Flow\ETL\Schema` namespace.
1637

1738
---
@@ -24,7 +45,7 @@ The old method is now deprecated and will be removed in the next release.
2445

2546
### 2) Replaced `Flow\ETL\Function\ScalarFunction\TypedScalarFunction` with `Flow\ETL\Function\ScalarFunction\ScalarResult`.
2647

27-
The old interface was used to allow define the return type of the ScalarFunctions.
48+
The old interface was used to allow defining the return type of the ScalarFunctions.
2849
It was replaced with a ScalarResult value object that is much more flexible than the interface,
2950
because it's allowing to return any type dynamically without making the scalar function stateful.
3051

@@ -52,10 +73,10 @@ type_structure([
5273

5374
From now options for:
5475

55-
- to_dbal_table_insert()
56-
- to_db_table_update()
76+
- `to_dbal_table_insert()`
77+
- `to_db_table_update()`
5778

58-
are passed as an objects (instance of UpdateOptions|InsertOptions interfaces) and they are platform specific,
79+
are passed as objects (instance of UpdateOptions|InsertOptions interfaces) and they are platform specific,
5980
so please use the proper class for the platform you are using.
6081

6182
- PostgreSQL
@@ -71,9 +92,9 @@ so please use the proper class for the platform you are using.
7192
## Upgrading from 0.8.x to 0.10.x
7293

7394

74-
### 1) Providing multiple paths to single extractor
95+
### 1) Providing multiple paths to a single extractor
7596

76-
From now in order to read from multiple locations use `from_all(Extractor ...$extractors) : Exctractor` extractor.
97+
From now to read from multiple locations use `from_all(Extractor ...$extractors) : Exctractor` extractor.
7798

7899
Before:
79100
```php
@@ -118,7 +139,7 @@ from_parquet(path(__DIR__ . '/data/1.parquet'))->withSchema($schema);
118139

119140
### 1) Joins
120141

121-
In order to support joining bigger datasets, we had to move from initial NestedLoop join algorithm into Hash Join algorithm.
142+
To support joining bigger datasets, we had to move from initial NestedLoop join algorithm into Hash Join algorithm.
122143

123144
- the only supported coin expression is `=` (equals) that can be grouped with `AND` and `OR` operators.
124145
- `joinPrefix` is now always required, and by default is set to 'joined_'
@@ -134,8 +155,8 @@ Above changes were introduced in all 3 types of joins:
134155

135156
### 2) GroupBy
136157

137-
From now on, DataFrame::groupBy() method will return GroupedDataFrame object, which is nothing more than a GroupBy
138-
statement Builder. In order to get the results you first need to define the aggregation functions or optionally pivot the data.
158+
From now on, `DataFrame::groupBy()` method will return `GroupedDataFrame` object, which is nothing more than a GroupBy
159+
statement Builder. To get the results, you first need to define the aggregation functions or optionally pivot the data.
139160

140161
## Upgrading from 0.6.x to 0.7.x
141162

@@ -188,7 +209,7 @@ DataFrame::parallelize() method is deprecated, and it will be removed, instead u
188209

189210
### 6) Rows in batch - Extractors
190211

191-
From now, file based Extractors will always throw one Row at time, in order to merge them into bigger groups
212+
From now, file-based Extractors will always throw one Row at time, in order to merge them into bigger groups
192213
use `DataFrame::batchSize(int $size)` just after extractor method.
193214

194215
Before:
@@ -218,7 +239,7 @@ Affected extractors:
218239
- Text
219240
- XML
220241
- Avro
221-
- DoctrineDBAL - rows_in_batch wasn't removed but now results are thrown row by row, instead of whole page.
242+
- DoctrineDBAL - `rows_in_batch` wasn't removed, but now results are thrown row by row, instead of whole page.
222243
- GoogleSheet
223244

224245
### 7) `GoogleSheetExtractor`

src/core/etl/src/Flow/ETL/DSL/functions.php

Lines changed: 21 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,7 @@
55
namespace Flow\ETL\DSL;
66

77
use Flow\Calculator\Rounding;
8-
use Flow\ETL\{
9-
Analyze,
8+
use Flow\ETL\{Analyze,
109
Attribute\DocumentationDSL,
1110
Attribute\DocumentationExample,
1211
Attribute\Module,
@@ -37,9 +36,9 @@
3736
Schema\SchemaFormatter,
3837
Transformation,
3938
Transformer,
39+
Transformer\StyleConverter\Style,
4040
Window,
41-
WithEntry
42-
};
41+
WithEntry};
4342
use Flow\ETL\ErrorHandler\{IgnoreError, SkipRows, ThrowError};
4443
use Flow\ETL\Exception\{InvalidArgumentException, RuntimeException, SchemaDefinitionNotFoundException};
4544
use Flow\ETL\Extractor\FilesExtractor;
@@ -349,6 +348,24 @@ function to_branch(ScalarFunction $condition, Loader $loader) : Loader\Branching
349348
return new Loader\BranchingLoader($condition, $loader);
350349
}
351350

351+
#[DocumentationDSL(module: Module::CORE, type: DSLType::TRANSFORMER)]
352+
function rename_style(Style $style) : Transformer\StyleConverter\RenameCaseEntryStrategy
353+
{
354+
return new Transformer\StyleConverter\RenameCaseEntryStrategy($style);
355+
}
356+
357+
#[DocumentationDSL(module: Module::CORE, type: DSLType::TRANSFORMER)]
358+
function rename_replace(string $search, string $replace) : Transformer\StyleConverter\RenameReplaceEntryStrategy
359+
{
360+
return new Transformer\StyleConverter\RenameReplaceEntryStrategy($search, $replace);
361+
}
362+
363+
#[DocumentationDSL(module: Module::CORE, type: DSLType::TRANSFORMER)]
364+
function rename_transliterate(string $transliterator = 'Any-Latin; Latin-ASCII; Lower()') : Transformer\StyleConverter\RenameTransliterateEntryStrategy
365+
{
366+
return new Transformer\StyleConverter\RenameTransliterateEntryStrategy($transliterator);
367+
}
368+
352369
#[DocumentationDSL(module: Module::CORE, type: DSLType::ENTRY)]
353370
function bool_entry(string $name, ?bool $value, ?Schema\Metadata $metadata = null) : Entry\BooleanEntry
354371
{

src/core/etl/src/Flow/ETL/DataFrame.php

Lines changed: 22 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,8 @@
2828
VoidPipeline};
2929
use Flow\ETL\Row\{Formatter\ASCIISchemaFormatter, Reference, References};
3030
use Flow\ETL\Schema\Definition;
31-
use Flow\ETL\Transformer\{AutoCastTransformer,
31+
use Flow\ETL\Transformer\{
32+
AutoCastTransformer,
3233
CallbackRowTransformer,
3334
CrossJoinRowsTransformer,
3435
DropDuplicatesTransformer,
@@ -43,13 +44,16 @@
4344
OrderEntries\TypeComparator,
4445
RenameEachTransformer,
4546
RenameEntryTransformer,
46-
RenameStrReplaceAllEntriesTransformer,
4747
ScalarFunctionFilterTransformer,
4848
ScalarFunctionTransformer,
4949
SelectEntriesTransformer,
50-
StyleConverter\RenameStrategy,
50+
StyleConverter\RenameCaseEntryStrategy,
51+
StyleConverter\RenameEntryStrategy,
52+
StyleConverter\RenameReplaceEntryStrategy,
53+
StyleConverter\Style,
5154
UntilTransformer,
52-
WindowFunctionTransformer};
55+
WindowFunctionTransformer
56+
};
5357
use Flow\Filesystem\Path\Filter;
5458

5559
final class DataFrame
@@ -629,30 +633,32 @@ public function rename(string $from, string $to) : self
629633

630634
/**
631635
* @lazy
632-
* Iterate over all entry names and replace given search string with replace string.
636+
* Iterate over all entry names and replace the given search string with replace string.
637+
*
638+
* @deprecated use DataFrame::renameEach() with a RenameReplaceStrategy
633639
*/
634640
public function renameAll(string $search, string $replace) : self
635641
{
636-
$this->pipeline->add(new RenameStrReplaceAllEntriesTransformer($search, $replace));
642+
$this->renameEach(new RenameReplaceEntryStrategy($search, $replace));
637643

638644
return $this;
639645
}
640646

641647
/**
642648
* @lazy
643649
*
644-
* @deprecated use DataFrame::renameEach() with a selected RenameStrategy
650+
* @deprecated use DataFrame::renameEach() with a selected Style
645651
*/
646652
public function renameAllLowerCase() : self
647653
{
648-
$this->renameEach(RenameStrategy::LOWER);
654+
$this->renameEach(new RenameCaseEntryStrategy(Style::LOWER));
649655

650656
return $this;
651657
}
652658

653659
/**
654660
* @lazy
655-
* Rename all entries to given style.
661+
* Rename all entries to a given style.
656662
* Please look into \Flow\ETL\Function\StyleConverter\StringStyles class for all available styles.
657663
*/
658664
public function renameAllStyle(StringStyles|string $style) : self
@@ -665,40 +671,40 @@ public function renameAllStyle(StringStyles|string $style) : self
665671
/**
666672
* @lazy
667673
*
668-
* @deprecated use DataFrame::renameEach() with a selected RenameStrategy
674+
* @deprecated use DataFrame::renameEach() with a selected Style
669675
*/
670676
public function renameAllUpperCase() : self
671677
{
672-
$this->renameEach(RenameStrategy::UPPER);
678+
$this->renameEach(new RenameCaseEntryStrategy(Style::UPPER));
673679

674680
return $this;
675681
}
676682

677683
/**
678684
* @lazy
679685
*
680-
* @deprecated use DataFrame::renameEach() with a selected RenameStrategy
686+
* @deprecated use DataFrame::renameEach() with a selected Style
681687
*/
682688
public function renameAllUpperCaseFirst() : self
683689
{
684-
$this->renameEach(RenameStrategy::UCFIRST);
690+
$this->renameEach(new RenameCaseEntryStrategy(Style::UCFIRST));
685691

686692
return $this;
687693
}
688694

689695
/**
690696
* @lazy
691697
*
692-
* @deprecated use DataFrame::renameEach() with a selected RenameStrategy
698+
* @deprecated use DataFrame::renameEach() with a selected Style
693699
*/
694700
public function renameAllUpperCaseWord() : self
695701
{
696-
$this->renameEach(RenameStrategy::UCWORDS);
702+
$this->renameEach(new RenameCaseEntryStrategy(Style::UCWORDS));
697703

698704
return $this;
699705
}
700706

701-
public function renameEach(RenameStrategy $strategy) : self
707+
public function renameEach(RenameEntryStrategy $strategy) : self
702708
{
703709
$this->pipeline->add(new RenameEachTransformer($strategy));
704710

src/core/etl/src/Flow/ETL/Transformer/RenameAllCaseTransformer.php

Lines changed: 19 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -4,14 +4,19 @@
44

55
namespace Flow\ETL\Transformer;
66

7-
use Flow\ETL\{FlowContext, Rows, Transformer, Transformer\StyleConverter\RenameStrategy};
7+
use Flow\ETL\{FlowContext,
8+
Row,
9+
Rows,
10+
Transformer,
11+
Transformer\StyleConverter\RenameCaseEntryStrategy,
12+
Transformer\StyleConverter\Style};
813

914
/**
10-
* @deprecated use RenameEachTransformer with a selected RenameStrategy
15+
* @deprecated Use `DataFrame::renameEach()` and `RenameCaseTransformer`
1116
*/
1217
final class RenameAllCaseTransformer implements Transformer
1318
{
14-
private RenameEachTransformer $transformer;
19+
private RenameCaseEntryStrategy $transformer;
1520

1621
public function __construct(
1722
bool $upper = false,
@@ -20,24 +25,30 @@ public function __construct(
2025
bool $ucwords = false,
2126
) {
2227
if ($upper) {
23-
$this->transformer = new RenameEachTransformer(RenameStrategy::UPPER);
28+
$this->transformer = new RenameCaseEntryStrategy(Style::UPPER);
2429
}
2530

2631
if ($lower) {
27-
$this->transformer = new RenameEachTransformer(RenameStrategy::LOWER);
32+
$this->transformer = new RenameCaseEntryStrategy(Style::LOWER);
2833
}
2934

3035
if ($ucfirst) {
31-
$this->transformer = new RenameEachTransformer(RenameStrategy::UCFIRST);
36+
$this->transformer = new RenameCaseEntryStrategy(Style::UCFIRST);
3237
}
3338

3439
if ($ucwords) {
35-
$this->transformer = new RenameEachTransformer(RenameStrategy::UCWORDS);
40+
$this->transformer = new RenameCaseEntryStrategy(Style::UCWORDS);
3641
}
3742
}
3843

3944
public function transform(Rows $rows, FlowContext $context) : Rows
4045
{
41-
return $this->transformer->transform($rows, $context);
46+
return $rows->map(function (Row $row) use ($context) : Row {
47+
foreach ($row->entries()->all() as $entry) {
48+
$row = $this->transformer->rename($row, $entry, $context);
49+
}
50+
51+
return $row;
52+
});
4253
}
4354
}

0 commit comments

Comments
 (0)