Skip to content

Commit 66e5766

Browse files
authored
Add a new DataFrame::renameEach() (#1615)
Add a new `DataFrame::renameEach()` Add a new functions `rename_style()`, `rename_replace()` & `rename_transliterate()` Deprecate `Flow\ETL\DataFrame::renameAll()` Deprecate `Flow\ETL\DataFrame::renameAllLowerCase()` Deprecate `Flow\ETL\DataFrame::renameAllUpperCase()` Deprecate `Flow\ETL\DataFrame::renameAllUpperCaseFirst()` Deprecate `Flow\ETL\DataFrame::renameAllUpperCaseWord()` Deprecate `Flow\ETL\RenameAllCaseTransformer` Deprecate `Flow\ETL\RenameStrReplaceAllEntriesTransformer`
1 parent 4d82a90 commit 66e5766

17 files changed

Lines changed: 507 additions & 107 deletions

documentation/upgrading.md

Lines changed: 36 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -5,13 +5,34 @@ Please follow the instructions for your specific version to ensure a smooth upgr
55

66
---
77

8+
## Upgrading from 0.15.x to 0.16.x
9+
10+
### 1) Deprecated `Flow\ETL\DataFrame::renameAll*` methods
11+
12+
Methods:
13+
- `Flow\ETL\DataFrame::renameAll()`,
14+
- `Flow\ETL\DataFrame::renameAllLowerCase()`,
15+
- `Flow\ETL\DataFrame::renameAllUpperCase()`,
16+
- `Flow\ETL\DataFrame::renameAllUpperCaseFirst()`,
17+
- `Flow\ETL\DataFrame::renameAllUpperCaseWord()`,
18+
19+
Were deprecated in favor of using new method: `DataFrame::renameEach()` with proper `RenameEntryStrategy` object.
20+
21+
### 2) Deprecated `RenameAllCaseTransformer` & `RenameStrReplaceAllEntriesTransformer`
22+
23+
Selected transformers were deprecated in favor of using `DataFrame::renameEach()` with related `RenameEntryStrategy`:
24+
- `RenameAllCaseTransformer` -> `RenameCaseTransformer`,
25+
- `RenameStrReplaceAllEntriesTransformer` -> `RenameReplaceStrategy`,
26+
27+
---
28+
829
## Upgrading from 0.14.x to 0.15.x
930

10-
### 1) Removed `Flow\ETL\Row\Schema\Matcher` and implementations
11-
Schema Matcher was the initial attempt to implement a schema evolution next to schema validation that over
12-
time got replaced with different implementation of Schema Validator.
31+
### 1) Removed `Flow\ETL\Row\Schema\Matcher` and implementations
32+
Schema Matcher was the initial attempt to implement a schema evolution next to schema validation that over
33+
time got replaced with a different implementation of Schema Validator.
1334

14-
### 2) Renamed `Flow\ETL\Row\Schema` namespace into `Flow\ETL\Schema`.
35+
### 2) Renamed `Flow\ETL\Row\Schema` namespace into `Flow\ETL\Schema`.
1536
This means all classes related to Schema now live under `Flow\ETL\Schema` namespace.
1637

1738
---
@@ -24,7 +45,7 @@ The old method is now deprecated and will be removed in the next release.
2445

2546
### 2) Replaced `Flow\ETL\Function\ScalarFunction\TypedScalarFunction` with `Flow\ETL\Function\ScalarFunction\ScalarResult`.
2647

27-
The old interface was used to allow define the return type of the ScalarFunctions.
48+
The old interface was used to allow defining the return type of the ScalarFunctions.
2849
It was replaced with a ScalarResult value object that is much more flexible than the interface,
2950
because it's allowing to return any type dynamically without making the scalar function stateful.
3051

@@ -52,10 +73,10 @@ type_structure([
5273

5374
From now options for:
5475

55-
- to_dbal_table_insert()
56-
- to_db_table_update()
76+
- `to_dbal_table_insert()`
77+
- `to_db_table_update()`
5778

58-
are passed as an objects (instance of UpdateOptions|InsertOptions interfaces) and they are platform specific,
79+
are passed as objects (instance of UpdateOptions|InsertOptions interfaces) and they are platform specific,
5980
so please use the proper class for the platform you are using.
6081

6182
- PostgreSQL
@@ -71,9 +92,9 @@ so please use the proper class for the platform you are using.
7192
## Upgrading from 0.8.x to 0.10.x
7293

7394

74-
### 1) Providing multiple paths to single extractor
95+
### 1) Providing multiple paths to a single extractor
7596

76-
From now in order to read from multiple locations use `from_all(Extractor ...$extractors) : Exctractor` extractor.
97+
From now to read from multiple locations use `from_all(Extractor ...$extractors) : Exctractor` extractor.
7798

7899
Before:
79100
```php
@@ -118,7 +139,7 @@ from_parquet(path(__DIR__ . '/data/1.parquet'))->withSchema($schema);
118139

119140
### 1) Joins
120141

121-
In order to support joining bigger datasets, we had to move from initial NestedLoop join algorithm into Hash Join algorithm.
142+
To support joining bigger datasets, we had to move from initial NestedLoop join algorithm into Hash Join algorithm.
122143

123144
- the only supported coin expression is `=` (equals) that can be grouped with `AND` and `OR` operators.
124145
- `joinPrefix` is now always required, and by default is set to 'joined_'
@@ -134,8 +155,8 @@ Above changes were introduced in all 3 types of joins:
134155

135156
### 2) GroupBy
136157

137-
From now on, DataFrame::groupBy() method will return GroupedDataFrame object, which is nothing more than a GroupBy
138-
statement Builder. In order to get the results you first need to define the aggregation functions or optionally pivot the data.
158+
From now on, `DataFrame::groupBy()` method will return `GroupedDataFrame` object, which is nothing more than a GroupBy
159+
statement Builder. To get the results, you first need to define the aggregation functions or optionally pivot the data.
139160

140161
## Upgrading from 0.6.x to 0.7.x
141162

@@ -188,7 +209,7 @@ DataFrame::parallelize() method is deprecated, and it will be removed, instead u
188209

189210
### 6) Rows in batch - Extractors
190211

191-
From now, file based Extractors will always throw one Row at time, in order to merge them into bigger groups
212+
From now, file-based Extractors will always throw one Row at time, in order to merge them into bigger groups
192213
use `DataFrame::batchSize(int $size)` just after extractor method.
193214

194215
Before:
@@ -218,7 +239,7 @@ Affected extractors:
218239
- Text
219240
- XML
220241
- Avro
221-
- DoctrineDBAL - rows_in_batch wasn't removed but now results are thrown row by row, instead of whole page.
242+
- DoctrineDBAL - `rows_in_batch` wasn't removed, but now results are thrown row by row, instead of whole page.
222243
- GoogleSheet
223244

224245
### 7) `GoogleSheetExtractor`

src/core/etl/src/Flow/ETL/DSL/functions.php

Lines changed: 21 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,7 @@
55
namespace Flow\ETL\DSL;
66

77
use Flow\Calculator\Rounding;
8-
use Flow\ETL\{
9-
Analyze,
8+
use Flow\ETL\{Analyze,
109
Attribute\DocumentationDSL,
1110
Attribute\DocumentationExample,
1211
Attribute\Module,
@@ -37,9 +36,9 @@
3736
Schema\SchemaFormatter,
3837
Transformation,
3938
Transformer,
39+
Transformer\Rename\Style,
4040
Window,
41-
WithEntry
42-
};
41+
WithEntry};
4342
use Flow\ETL\ErrorHandler\{IgnoreError, SkipRows, ThrowError};
4443
use Flow\ETL\Exception\{InvalidArgumentException, RuntimeException, SchemaDefinitionNotFoundException};
4544
use Flow\ETL\Extractor\FilesExtractor;
@@ -349,6 +348,24 @@ function to_branch(ScalarFunction $condition, Loader $loader) : Loader\Branching
349348
return new Loader\BranchingLoader($condition, $loader);
350349
}
351350

351+
#[DocumentationDSL(module: Module::CORE, type: DSLType::TRANSFORMER)]
352+
function rename_style(Style $style) : Transformer\Rename\RenameCaseEntryStrategy
353+
{
354+
return new Transformer\Rename\RenameCaseEntryStrategy($style);
355+
}
356+
357+
#[DocumentationDSL(module: Module::CORE, type: DSLType::TRANSFORMER)]
358+
function rename_replace(string $search, string $replace) : Transformer\Rename\RenameReplaceEntryStrategy
359+
{
360+
return new Transformer\Rename\RenameReplaceEntryStrategy($search, $replace);
361+
}
362+
363+
#[DocumentationDSL(module: Module::CORE, type: DSLType::TRANSFORMER)]
364+
function rename_transliterate(string $transliterator = 'Any-Latin; Latin-ASCII; Lower()') : Transformer\Rename\RenameTransliterateEntryStrategy
365+
{
366+
return new Transformer\Rename\RenameTransliterateEntryStrategy($transliterator);
367+
}
368+
352369
#[DocumentationDSL(module: Module::CORE, type: DSLType::ENTRY)]
353370
function bool_entry(string $name, ?bool $value, ?Schema\Metadata $metadata = null) : Entry\BooleanEntry
354371
{

src/core/etl/src/Flow/ETL/DataFrame.php

Lines changed: 41 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,8 @@
2828
VoidPipeline};
2929
use Flow\ETL\Row\{Formatter\ASCIISchemaFormatter, Reference, References};
3030
use Flow\ETL\Schema\Definition;
31-
use Flow\ETL\Transformer\{AutoCastTransformer,
31+
use Flow\ETL\Transformer\{
32+
AutoCastTransformer,
3233
CallbackRowTransformer,
3334
CrossJoinRowsTransformer,
3435
DropDuplicatesTransformer,
@@ -41,14 +42,18 @@
4142
OrderEntriesTransformer,
4243
OrderEntries\Comparator,
4344
OrderEntries\TypeComparator,
44-
RenameAllCaseTransformer,
45+
RenameEachEntryTransformer,
4546
RenameEntryTransformer,
46-
RenameStrReplaceAllEntriesTransformer,
47+
Rename\RenameCaseEntryStrategy,
48+
Rename\RenameEntryStrategy,
49+
Rename\RenameReplaceEntryStrategy,
50+
Rename\Style,
4751
ScalarFunctionFilterTransformer,
4852
ScalarFunctionTransformer,
4953
SelectEntriesTransformer,
5054
UntilTransformer,
51-
WindowFunctionTransformer};
55+
WindowFunctionTransformer
56+
};
5257
use Flow\Filesystem\Path\Filter;
5358

5459
final class DataFrame
@@ -84,7 +89,8 @@ public function autoCast() : self
8489
* Merge/Split Rows yielded by Extractor into batches of given size.
8590
* For example, when Extractor is yielding one row at time, this method will merge them into batches of given size
8691
* before passing them to the next pipeline element.
87-
* Similarly when Extractor is yielding batches of rows, this method will split them into smaller batches of given size.
92+
* Similarly when Extractor is yielding batches of rows, this method will split them into smaller batches of given
93+
* size.
8894
*
8995
* In order to merge all Rows into a single batch use DataFrame::collect() method or set size to -1 or 0.
9096
*
@@ -210,7 +216,8 @@ public function crossJoin(self $dataFrame, string $prefix = '') : self
210216

211217
/**
212218
* @param int $limit maximum numbers of rows to display
213-
* @param bool|int $truncate false or if set to 0 columns are not truncated, otherwise default truncate to 20 characters
219+
* @param bool|int $truncate false or if set to 0 columns are not truncated, otherwise default truncate to 20
220+
* characters
214221
* @param Formatter $formatter
215222
*
216223
* @trigger
@@ -258,7 +265,8 @@ public function dropDuplicates(string|Reference ...$entries) : self
258265
}
259266

260267
/**
261-
* Drop all partitions from Rows, additionally when $dropPartitionColumns is set to true, partition columns are also removed.
268+
* Drop all partitions from Rows, additionally when $dropPartitionColumns is set to true, partition columns are
269+
* also removed.
262270
*
263271
* @lazy
264272
*/
@@ -625,28 +633,32 @@ public function rename(string $from, string $to) : self
625633

626634
/**
627635
* @lazy
628-
* Iterate over all entry names and replace given search string with replace string.
636+
* Iterate over all entry names and replace the given search string with replace string.
637+
*
638+
* @deprecated use DataFrame::renameEach() with a RenameReplaceStrategy
629639
*/
630640
public function renameAll(string $search, string $replace) : self
631641
{
632-
$this->pipeline->add(new RenameStrReplaceAllEntriesTransformer($search, $replace));
642+
$this->renameEach(new RenameReplaceEntryStrategy($search, $replace));
633643

634644
return $this;
635645
}
636646

637647
/**
638648
* @lazy
649+
*
650+
* @deprecated use DataFrame::renameEach() with a selected Style
639651
*/
640652
public function renameAllLowerCase() : self
641653
{
642-
$this->pipeline->add(new RenameAllCaseTransformer(lower: true));
654+
$this->renameEach(new RenameCaseEntryStrategy(Style::LOWER));
643655

644656
return $this;
645657
}
646658

647659
/**
648660
* @lazy
649-
* Rename all entries to given style.
661+
* Rename all entries to a given style.
650662
* Please look into \Flow\ETL\Function\StyleConverter\StringStyles class for all available styles.
651663
*/
652664
public function renameAllStyle(StringStyles|string $style) : self
@@ -658,30 +670,43 @@ public function renameAllStyle(StringStyles|string $style) : self
658670

659671
/**
660672
* @lazy
673+
*
674+
* @deprecated use DataFrame::renameEach() with a selected Style
661675
*/
662676
public function renameAllUpperCase() : self
663677
{
664-
$this->pipeline->add(new RenameAllCaseTransformer(upper: true));
678+
$this->renameEach(new RenameCaseEntryStrategy(Style::UPPER));
665679

666680
return $this;
667681
}
668682

669683
/**
670684
* @lazy
685+
*
686+
* @deprecated use DataFrame::renameEach() with a selected Style
671687
*/
672688
public function renameAllUpperCaseFirst() : self
673689
{
674-
$this->pipeline->add(new RenameAllCaseTransformer(ucfirst: true));
690+
$this->renameEach(new RenameCaseEntryStrategy(Style::UCFIRST));
675691

676692
return $this;
677693
}
678694

679695
/**
680696
* @lazy
697+
*
698+
* @deprecated use DataFrame::renameEach() with a selected Style
681699
*/
682700
public function renameAllUpperCaseWord() : self
683701
{
684-
$this->pipeline->add(new RenameAllCaseTransformer(ucwords: true));
702+
$this->renameEach(new RenameCaseEntryStrategy(Style::UCWORDS));
703+
704+
return $this;
705+
}
706+
707+
public function renameEach(RenameEntryStrategy $strategy) : self
708+
{
709+
$this->pipeline->add(new RenameEachEntryTransformer($strategy));
685710

686711
return $this;
687712
}
@@ -825,8 +850,8 @@ public function transform(Transformer|Transformation|Transformations|WithEntry $
825850
}
826851

827852
/**
828-
* The difference between filter and until is that filter will keep filtering rows until extractors finish yielding rows.
829-
* Until will send a STOP signal to the Extractor when the condition is not met.
853+
* The difference between filter and until is that filter will keep filtering rows until extractors finish yielding
854+
* rows. Until will send a STOP signal to the Extractor when the condition is not met.
830855
*
831856
* @lazy
832857
*/

src/core/etl/src/Flow/ETL/Pipeline/Optimizer/LimitOptimization.php

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,16 @@
88
use Flow\ETL\Function\ScalarFunction\ExpandResults;
99
use Flow\ETL\{Loader, Pipeline, Transformer};
1010
use Flow\ETL\Pipeline\{BatchingPipeline, CollectingPipeline, LinkedPipeline, SynchronousPipeline, VoidPipeline};
11-
use Flow\ETL\Transformer\{CallbackRowTransformer, DropEntriesTransformer, EntryNameStyleConverterTransformer, LimitTransformer, RenameAllCaseTransformer, RenameEntryTransformer, RenameStrReplaceAllEntriesTransformer, ScalarFunctionTransformer, SelectEntriesTransformer};
11+
use Flow\ETL\Transformer\{CallbackRowTransformer,
12+
DropEntriesTransformer,
13+
EntryNameStyleConverterTransformer,
14+
LimitTransformer,
15+
RenameAllCaseTransformer,
16+
RenameEachEntryTransformer,
17+
RenameEntryTransformer,
18+
RenameStrReplaceAllEntriesTransformer,
19+
ScalarFunctionTransformer,
20+
SelectEntriesTransformer};
1221

1322
final class LimitOptimization implements Optimization
1423
{
@@ -27,6 +36,7 @@ final class LimitOptimization implements Optimization
2736
SelectEntriesTransformer::class,
2837
DropEntriesTransformer::class,
2938
RenameAllCaseTransformer::class,
39+
RenameEachEntryTransformer::class,
3040
RenameEntryTransformer::class,
3141
RenameStrReplaceAllEntriesTransformer::class,
3242
LimitTransformer::class,

0 commit comments

Comments
 (0)