Allow custom way of renaming values#1608
Conversation
Flow PHP - BenchmarksResults of the benchmarks from this PR are compared with the results from 1.x branch. Extractors+-----------------------+-------------------+------+-----+-----------------+------------------+------------------+
| benchmark | subject | revs | its | mem_peak | mode | rstdev |
+-----------------------+-------------------+------+-----+-----------------+------------------+------------------+
| CSVExtractorBench | bench_extract_10k | 1 | 3 | 4.883mb +0.00% | 620.347ms +0.63% | ±0.64% +129.03% |
| JsonExtractorBench | bench_extract_10k | 1 | 3 | 5.467mb +0.04% | 1.339s -0.75% | ±1.36% -18.31% |
| ParquetExtractorBench | bench_extract_10k | 1 | 3 | 86.480mb +0.00% | 952.203ms +1.53% | ±0.35% +300.69% |
| TextExtractorBench | bench_extract_10k | 1 | 3 | 4.607mb +0.00% | 38.587ms -0.80% | ±0.21% -66.80% |
| XmlExtractorBench | bench_extract_10k | 1 | 3 | 4.581mb +0.00% | 613.166ms +1.00% | ±1.53% +1974.90% |
+-----------------------+-------------------+------+-----+-----------------+------------------+------------------+
Transformers+-----------------------------+--------------------------+------+-----+------------------+-----------------+----------------+
| benchmark | subject | revs | its | mem_peak | mode | rstdev |
+-----------------------------+--------------------------+------+-----+------------------+-----------------+----------------+
| RenameEntryTransformerBench | bench_transform_10k_rows | 1 | 3 | 127.404mb +0.00% | 71.864ms +3.81% | ±0.60% +22.68% |
+-----------------------------+--------------------------+------+-----+------------------+-----------------+----------------+
Loaders+--------------------+----------------+------+-----+------------------+------------------+----------------+
| benchmark | subject | revs | its | mem_peak | mode | rstdev |
+--------------------+----------------+------+-----+------------------+------------------+----------------+
| CSVLoaderBench | bench_load_10k | 1 | 3 | 64.050mb +0.00% | 106.250ms -0.25% | ±0.36% -58.26% |
| JsonLoaderBench | bench_load_10k | 1 | 3 | 84.082mb +0.00% | 99.996ms +2.33% | ±2.27% -11.41% |
| ParquetLoaderBench | bench_load_10k | 1 | 3 | 166.564mb +0.00% | 20.990s -0.97% | ±0.18% -87.07% |
| TextLoaderBench | bench_load_10k | 1 | 3 | 18.139mb +0.00% | 31.321ms +0.54% | ±0.56% +8.72% |
+--------------------+----------------+------+-----+------------------+------------------+----------------+
Building Blocks+-------------------+----------------------------+------+-----+------------------+------------------+-----------------+
| benchmark | subject | revs | its | mem_peak | mode | rstdev |
+-------------------+----------------------------+------+-----+------------------+------------------+-----------------+
| RowsBench | bench_chunk_10_on_10k | 2 | 3 | 97.067mb +0.00% | 3.952ms -0.30% | ±0.54% -62.40% |
| RowsBench | bench_diff_left_1k_on_10k | 2 | 3 | 114.423mb +0.00% | 184.831ms +1.24% | ±0.22% -76.05% |
| RowsBench | bench_diff_right_1k_on_10k | 2 | 3 | 97.143mb +0.00% | 18.793ms +2.64% | ±0.43% -76.27% |
| RowsBench | bench_drop_1k_on_10k | 2 | 3 | 97.941mb +0.00% | 2.060ms +10.43% | ±2.59% +186.69% |
| RowsBench | bench_drop_right_1k_on_10k | 2 | 3 | 97.941mb +0.00% | 2.070ms +2.61% | ±3.41% -0.62% |
| RowsBench | bench_entries_on_10k | 2 | 3 | 96.102mb +0.00% | 5.415ms +1.70% | ±3.43% +13.34% |
| RowsBench | bench_filter_on_10k | 2 | 3 | 96.631mb +0.00% | 19.683ms +18.55% | ±1.10% +5.32% |
| RowsBench | bench_find_on_10k | 2 | 3 | 96.631mb +0.00% | 19.665ms +17.31% | ±0.37% -68.51% |
| RowsBench | bench_find_one_on_10k | 10 | 3 | 95.322mb +0.00% | 2.006μs +0.30% | ±2.32% +0.00% |
| RowsBench | bench_first_on_10k | 10 | 3 | 95.322mb +0.00% | 0.500μs +25.00% | ±0.00% -100.00% |
| RowsBench | bench_flat_map_on_1k | 2 | 3 | 104.540mb +0.00% | 15.948ms +2.11% | ±2.34% -34.26% |
| RowsBench | bench_map_on_10k | 2 | 3 | 134.608mb +0.00% | 74.263ms +0.00% | ±0.34% -14.66% |
| RowsBench | bench_merge_1k_on_10k | 2 | 3 | 97.151mb +0.00% | 2.042ms +9.91% | ±1.35% -61.40% |
| RowsBench | bench_partition_by_on_10k | 2 | 3 | 100.522mb +0.00% | 65.286ms +3.69% | ±0.55% -39.53% |
| RowsBench | bench_remove_on_10k | 2 | 3 | 98.204mb +0.00% | 4.567ms -2.22% | ±2.15% -19.81% |
| RowsBench | bench_sort_asc_on_1k | 2 | 3 | 95.685mb +0.00% | 41.399ms +2.86% | ±1.74% +552.87% |
| RowsBench | bench_sort_by_on_1k | 2 | 3 | 95.685mb +0.00% | 41.250ms +0.80% | ±0.09% -92.65% |
| RowsBench | bench_sort_desc_on_1k | 2 | 3 | 95.685mb +0.00% | 41.767ms +1.14% | ±2.39% +54.37% |
| RowsBench | bench_sort_entries_on_1k | 2 | 3 | 97.763mb +0.00% | 8.737ms +3.44% | ±1.05% +5.83% |
| RowsBench | bench_sort_on_1k | 2 | 3 | 95.513mb +0.00% | 32.428ms +9.01% | ±2.06% +19.64% |
| RowsBench | bench_take_1k_on_10k | 10 | 3 | 95.322mb +0.00% | 14.921μs +5.68% | ±0.84% -15.76% |
| RowsBench | bench_take_right_1k_on_10k | 10 | 3 | 95.322mb +0.00% | 16.182μs -3.58% | ±0.88% -70.83% |
| RowsBench | bench_unique_on_1k | 2 | 3 | 114.424mb +0.00% | 189.912ms +3.12% | ±0.67% +50.93% |
| TypeDetectorBench | bench_type_detector | 1 | 3 | 44.048mb +0.00% | 462.880ms +0.25% | ±2.54% +128.49% |
| TypeDetectorBench | bench_type_detector | 1 | 3 | 11.858mb +0.00% | 96.232ms +2.52% | ±1.04% +235.18% |
| EntryFactoryBench | bench_entry_factory | 1 | 3 | 105.513mb +0.00% | 728.960ms +0.79% | ±1.32% +256.90% |
| EntryFactoryBench | bench_entry_factory | 1 | 3 | 55.023mb +0.00% | 366.820ms +0.52% | ±0.91% +324.84% |
| EntryFactoryBench | bench_entry_factory | 1 | 3 | 14.801mb +0.00% | 79.510ms +1.99% | ±0.28% +106.10% |
+-------------------+----------------------------+------+-----+------------------+------------------+-----------------+
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## 1.x #1608 +/- ##
=======================================
Coverage 83.21% 83.21%
=======================================
Files 703 703
Lines 19060 19063 +3
=======================================
+ Hits 15860 15863 +3
Misses 3200 3200
🚀 New features to boost your workflow:
|
| /** | ||
| * @lazy | ||
| */ | ||
| public function renameAllCallback(\Closure $callback) : self |
There was a problem hiding this comment.
I'm not a fan of adding more callbacks to data frame. I started working on a parallel processing on a side and each api method that adds callback is just a pure hell since in parallel processing I need to serialize and spread "transformations" among many processes which with callbacks is extremely unstable.
Even for map I had to put an exception that DataFrame::map() cant be used with parallel processing and that its recommended to implement a scalar function or transformation instead.
Change Log
Added
Fixed
Changed
Removed
Deprecated
Security
Description
As provided in test examples, we can easily apply custom renaming, i.e., translitering non-Latin characters to Latin ones with a callback function.