Skip to content

Allow custom way of renaming values#1608

Closed
stloyd wants to merge 3 commits intoflow-php:1.xfrom
stloyd:rename-callback
Closed

Allow custom way of renaming values#1608
stloyd wants to merge 3 commits intoflow-php:1.xfrom
stloyd:rename-callback

Conversation

@stloyd
Copy link
Copy Markdown
Member

@stloyd stloyd commented Apr 28, 2025

Change Log

Added

  • Allow custom way of renaming values

Fixed

Changed

Removed

Deprecated

Security


Description

As provided in test examples, we can easily apply custom renaming, i.e., translitering non-Latin characters to Latin ones with a callback function.

@stloyd stloyd requested a review from norberttech as a code owner April 28, 2025 13:03
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 28, 2025

Flow PHP - Benchmarks

Results of the benchmarks from this PR are compared with the results from 1.x branch.

Extractors
+-----------------------+-------------------+------+-----+-----------------+------------------+------------------+
| benchmark             | subject           | revs | its | mem_peak        | mode             | rstdev           |
+-----------------------+-------------------+------+-----+-----------------+------------------+------------------+
| CSVExtractorBench     | bench_extract_10k | 1    | 3   | 4.883mb +0.00%  | 620.347ms +0.63% | ±0.64% +129.03%  |
| JsonExtractorBench    | bench_extract_10k | 1    | 3   | 5.467mb +0.04%  | 1.339s -0.75%    | ±1.36% -18.31%   |
| ParquetExtractorBench | bench_extract_10k | 1    | 3   | 86.480mb +0.00% | 952.203ms +1.53% | ±0.35% +300.69%  |
| TextExtractorBench    | bench_extract_10k | 1    | 3   | 4.607mb +0.00%  | 38.587ms -0.80%  | ±0.21% -66.80%   |
| XmlExtractorBench     | bench_extract_10k | 1    | 3   | 4.581mb +0.00%  | 613.166ms +1.00% | ±1.53% +1974.90% |
+-----------------------+-------------------+------+-----+-----------------+------------------+------------------+
Transformers
+-----------------------------+--------------------------+------+-----+------------------+-----------------+----------------+
| benchmark                   | subject                  | revs | its | mem_peak         | mode            | rstdev         |
+-----------------------------+--------------------------+------+-----+------------------+-----------------+----------------+
| RenameEntryTransformerBench | bench_transform_10k_rows | 1    | 3   | 127.404mb +0.00% | 71.864ms +3.81% | ±0.60% +22.68% |
+-----------------------------+--------------------------+------+-----+------------------+-----------------+----------------+
Loaders
+--------------------+----------------+------+-----+------------------+------------------+----------------+
| benchmark          | subject        | revs | its | mem_peak         | mode             | rstdev         |
+--------------------+----------------+------+-----+------------------+------------------+----------------+
| CSVLoaderBench     | bench_load_10k | 1    | 3   | 64.050mb +0.00%  | 106.250ms -0.25% | ±0.36% -58.26% |
| JsonLoaderBench    | bench_load_10k | 1    | 3   | 84.082mb +0.00%  | 99.996ms +2.33%  | ±2.27% -11.41% |
| ParquetLoaderBench | bench_load_10k | 1    | 3   | 166.564mb +0.00% | 20.990s -0.97%   | ±0.18% -87.07% |
| TextLoaderBench    | bench_load_10k | 1    | 3   | 18.139mb +0.00%  | 31.321ms +0.54%  | ±0.56% +8.72%  |
+--------------------+----------------+------+-----+------------------+------------------+----------------+
Building Blocks
+-------------------+----------------------------+------+-----+------------------+------------------+-----------------+
| benchmark         | subject                    | revs | its | mem_peak         | mode             | rstdev          |
+-------------------+----------------------------+------+-----+------------------+------------------+-----------------+
| RowsBench         | bench_chunk_10_on_10k      | 2    | 3   | 97.067mb +0.00%  | 3.952ms -0.30%   | ±0.54% -62.40%  |
| RowsBench         | bench_diff_left_1k_on_10k  | 2    | 3   | 114.423mb +0.00% | 184.831ms +1.24% | ±0.22% -76.05%  |
| RowsBench         | bench_diff_right_1k_on_10k | 2    | 3   | 97.143mb +0.00%  | 18.793ms +2.64%  | ±0.43% -76.27%  |
| RowsBench         | bench_drop_1k_on_10k       | 2    | 3   | 97.941mb +0.00%  | 2.060ms +10.43%  | ±2.59% +186.69% |
| RowsBench         | bench_drop_right_1k_on_10k | 2    | 3   | 97.941mb +0.00%  | 2.070ms +2.61%   | ±3.41% -0.62%   |
| RowsBench         | bench_entries_on_10k       | 2    | 3   | 96.102mb +0.00%  | 5.415ms +1.70%   | ±3.43% +13.34%  |
| RowsBench         | bench_filter_on_10k        | 2    | 3   | 96.631mb +0.00%  | 19.683ms +18.55% | ±1.10% +5.32%   |
| RowsBench         | bench_find_on_10k          | 2    | 3   | 96.631mb +0.00%  | 19.665ms +17.31% | ±0.37% -68.51%  |
| RowsBench         | bench_find_one_on_10k      | 10   | 3   | 95.322mb +0.00%  | 2.006μs +0.30%   | ±2.32% +0.00%   |
| RowsBench         | bench_first_on_10k         | 10   | 3   | 95.322mb +0.00%  | 0.500μs +25.00%  | ±0.00% -100.00% |
| RowsBench         | bench_flat_map_on_1k       | 2    | 3   | 104.540mb +0.00% | 15.948ms +2.11%  | ±2.34% -34.26%  |
| RowsBench         | bench_map_on_10k           | 2    | 3   | 134.608mb +0.00% | 74.263ms +0.00%  | ±0.34% -14.66%  |
| RowsBench         | bench_merge_1k_on_10k      | 2    | 3   | 97.151mb +0.00%  | 2.042ms +9.91%   | ±1.35% -61.40%  |
| RowsBench         | bench_partition_by_on_10k  | 2    | 3   | 100.522mb +0.00% | 65.286ms +3.69%  | ±0.55% -39.53%  |
| RowsBench         | bench_remove_on_10k        | 2    | 3   | 98.204mb +0.00%  | 4.567ms -2.22%   | ±2.15% -19.81%  |
| RowsBench         | bench_sort_asc_on_1k       | 2    | 3   | 95.685mb +0.00%  | 41.399ms +2.86%  | ±1.74% +552.87% |
| RowsBench         | bench_sort_by_on_1k        | 2    | 3   | 95.685mb +0.00%  | 41.250ms +0.80%  | ±0.09% -92.65%  |
| RowsBench         | bench_sort_desc_on_1k      | 2    | 3   | 95.685mb +0.00%  | 41.767ms +1.14%  | ±2.39% +54.37%  |
| RowsBench         | bench_sort_entries_on_1k   | 2    | 3   | 97.763mb +0.00%  | 8.737ms +3.44%   | ±1.05% +5.83%   |
| RowsBench         | bench_sort_on_1k           | 2    | 3   | 95.513mb +0.00%  | 32.428ms +9.01%  | ±2.06% +19.64%  |
| RowsBench         | bench_take_1k_on_10k       | 10   | 3   | 95.322mb +0.00%  | 14.921μs +5.68%  | ±0.84% -15.76%  |
| RowsBench         | bench_take_right_1k_on_10k | 10   | 3   | 95.322mb +0.00%  | 16.182μs -3.58%  | ±0.88% -70.83%  |
| RowsBench         | bench_unique_on_1k         | 2    | 3   | 114.424mb +0.00% | 189.912ms +3.12% | ±0.67% +50.93%  |
| TypeDetectorBench | bench_type_detector        | 1    | 3   | 44.048mb +0.00%  | 462.880ms +0.25% | ±2.54% +128.49% |
| TypeDetectorBench | bench_type_detector        | 1    | 3   | 11.858mb +0.00%  | 96.232ms +2.52%  | ±1.04% +235.18% |
| EntryFactoryBench | bench_entry_factory        | 1    | 3   | 105.513mb +0.00% | 728.960ms +0.79% | ±1.32% +256.90% |
| EntryFactoryBench | bench_entry_factory        | 1    | 3   | 55.023mb +0.00%  | 366.820ms +0.52% | ±0.91% +324.84% |
| EntryFactoryBench | bench_entry_factory        | 1    | 3   | 14.801mb +0.00%  | 79.510ms +1.99%  | ±0.28% +106.10% |
+-------------------+----------------------------+------+-----+------------------+------------------+-----------------+

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 28, 2025

Codecov Report

Attention: Patch coverage is 90.90909% with 1 line in your changes missing coverage. Please review.

Project coverage is 83.21%. Comparing base (7814d70) to head (a3c54a6).
Report is 14 commits behind head on 1.x.

Additional details and impacted files
@@           Coverage Diff           @@
##              1.x    #1608   +/-   ##
=======================================
  Coverage   83.21%   83.21%           
=======================================
  Files         703      703           
  Lines       19060    19063    +3     
=======================================
+ Hits        15860    15863    +3     
  Misses       3200     3200           
Components Coverage Δ
etl 86.24% <90.90%> (-0.01%) ⬇️
cli 84.59% <ø> (ø)
lib-array-dot 94.53% <ø> (ø)
lib-azure-sdk 62.56% <ø> (ø)
lib-doctrine-dbal-bulk 90.11% <ø> (ø)
lib-filesystem 78.02% <ø> (ø)
lib-parquet 84.36% <ø> (ø)
lib-parquet-viewer 82.02% <ø> (ø)
lib-snappy 91.16% <ø> (+0.46%) ⬆️
bridge-filesystem-async-aws 90.38% <ø> (ø)
bridge-filesystem-azure 89.92% <ø> (ø)
bridge-monolog-http 96.38% <ø> (ø)
symfony-http-foundation 74.41% <ø> (ø)
adapter-chartjs 86.45% <ø> (ø)
adapter-csv 89.57% <ø> (ø)
adapter-doctrine 89.51% <ø> (ø)
adapter-elasticsearch 97.19% <ø> (ø)
adapter-google-sheet 80.00% <ø> (ø)
adapter-http 59.15% <ø> (ø)
adapter-json 90.62% <ø> (ø)
adapter-logger 53.84% <ø> (ø)
adapter-meilisearch 97.75% <ø> (ø)
adapter-parquet 80.85% <ø> (ø)
adapter-text 84.44% <ø> (ø)
adapter-xml 83.15% <ø> (ø)
🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

/**
* @lazy
*/
public function renameAllCallback(\Closure $callback) : self
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a fan of adding more callbacks to data frame. I started working on a parallel processing on a side and each api method that adds callback is just a pure hell since in parallel processing I need to serialize and spread "transformations" among many processes which with callbacks is extremely unstable.

Even for map I had to put an exception that DataFrame::map() cant be used with parallel processing and that its recommended to implement a scalar function or transformation instead.

@stloyd stloyd closed this May 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants