Commit 2a5be51
authored
## Which issue does this PR close?
- Backport of #22632 to `branch-54`.
## Rationale for this change
Content-defined chunking (CDC) write options were added in #21110 and
are slated for the 54.0.0 release. This backports the refactor in #22632
so the config/proto surface ships in its final form, before the release
goes out.
The CDC options previously worked as `use_content_defined_chunking:
Option<CdcOptions>` with a `ConfigField` impl that accepted a bare
`use_content_defined_chunking = true|false` and otherwise enabled CDC
implicitly when any sub-field was set. This has a few problems:
- **Naming diverges from parquet-rs.** `WriterProperties` exposes
`content_defined_chunking()` /
`set_content_defined_chunking(Option<CdcOptions>)` with no `use_`
prefix.
- **Implicit / order-dependent on the SQL side.** Format options in
`COPY ... OPTIONS` / `CREATE EXTERNAL TABLE ... OPTIONS` are applied
from a `HashMap` (non-deterministic order). With the old bare-boolean
form, mixing `... = false` with a sub-field could resolve to enabled or
disabled depending on iteration order.
- **Extra machinery.** Supporting the bare boolean required hand-written
`ConfigField` impls and a `#[expect(clippy::should_implement_trait)]`
workaround, plus a zero-sentinel fallback in the proto mapping.
Since CDC is unreleased, the config/proto surface can still be changed
freely.
## What changes are included in this PR?
- Rename the `ParquetOptions` field `use_content_defined_chunking` ->
`content_defined_chunking` (matches parquet-rs).
- Make `CdcOptions` a plain `config_namespace!` with an explicit
`enabled: bool` field alongside the chunking parameters; the field is a
bare `CdcOptions` (no longer `Option<CdcOptions>`). CDC is on iff
`content_defined_chunking.enabled` is true. Setting a parameter no
longer implicitly enables CDC, and the result is independent of key
order.
- Add `CdcOptions::enabled()` / `CdcOptions::disabled()` shorthand
constructors.
- Drop the `ConfigField` impls and the `should_implement_trait`
workaround — all generated by the macro now.
- Add an `enabled` field to the proto `CdcOptions` message so the proto
<-> config mapping is a plain field copy in both directions.
- Update unit tests, regenerate config docs + the `information_schema`
snapshot, and add `parquet_cdc_config.slt` documenting the resolution
behavior.
## Are these changes tested?
Yes — `datafusion-common` config + writer unit tests,
`datafusion-proto-common` proto round-trip tests, `datafusion/core`
parquet integration tests, and sqllogictest (`parquet_cdc.slt` + new
`parquet_cdc_config.slt`). Cherry-pick applied cleanly onto `branch-54`;
affected crates build and the CDC unit tests pass.
## Are there any user-facing changes?
Yes, but only to the unreleased CDC options:
- Config key `datafusion.execution.parquet.use_content_defined_chunking`
-> `datafusion.execution.parquet.content_defined_chunking.enabled` (plus
`.min_chunk_size` / `.max_chunk_size` / `.norm_level`).
- The bare-boolean form is removed; enable/disable via
`content_defined_chunking.enabled = true|false`.
No released API is affected.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
1 parent b508f1f commit 2a5be51
15 files changed
Lines changed: 461 additions & 449 deletions
File tree
- datafusion
- common/src
- file_options
- core/tests/parquet
- datasource-parquet/src
- proto-common
- proto
- src
- from_proto
- generated
- to_proto
- proto/src
- generated
- logical_plan
- sqllogictest/test_files
- docs/source/user-guide
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
709 | 709 | | |
710 | 710 | | |
711 | 711 | | |
712 | | - | |
713 | | - | |
714 | | - | |
715 | | - | |
716 | | - | |
717 | | - | |
718 | | - | |
719 | | - | |
720 | | - | |
721 | | - | |
722 | | - | |
723 | | - | |
724 | | - | |
725 | | - | |
726 | | - | |
727 | | - | |
728 | | - | |
729 | | - | |
730 | | - | |
731 | | - | |
732 | | - | |
733 | | - | |
734 | | - | |
735 | | - | |
736 | | - | |
737 | | - | |
738 | | - | |
739 | | - | |
740 | | - | |
741 | | - | |
742 | | - | |
743 | | - | |
744 | | - | |
745 | | - | |
746 | | - | |
747 | | - | |
748 | | - | |
| 712 | + | |
| 713 | + | |
| 714 | + | |
| 715 | + | |
| 716 | + | |
| 717 | + | |
| 718 | + | |
| 719 | + | |
| 720 | + | |
| 721 | + | |
| 722 | + | |
| 723 | + | |
| 724 | + | |
749 | 725 | | |
750 | | - | |
751 | | - | |
752 | | - | |
753 | | - | |
754 | | - | |
755 | | - | |
756 | | - | |
757 | | - | |
758 | | - | |
759 | | - | |
| 726 | + | |
| 727 | + | |
| 728 | + | |
760 | 729 | | |
761 | | - | |
762 | | - | |
763 | | - | |
764 | | - | |
765 | | - | |
766 | | - | |
767 | | - | |
768 | | - | |
| 730 | + | |
| 731 | + | |
| 732 | + | |
769 | 733 | | |
770 | | - | |
771 | | - | |
772 | | - | |
773 | | - | |
774 | | - | |
775 | | - | |
776 | | - | |
777 | | - | |
778 | | - | |
779 | | - | |
780 | | - | |
781 | | - | |
782 | | - | |
783 | | - | |
784 | | - | |
785 | | - | |
786 | | - | |
787 | | - | |
788 | | - | |
789 | | - | |
790 | | - | |
791 | | - | |
792 | | - | |
793 | | - | |
794 | | - | |
795 | | - | |
796 | | - | |
797 | | - | |
798 | | - | |
| 734 | + | |
| 735 | + | |
| 736 | + | |
799 | 737 | | |
800 | 738 | | |
801 | 739 | | |
802 | | - | |
803 | | - | |
804 | | - | |
805 | | - | |
806 | | - | |
807 | | - | |
808 | | - | |
809 | | - | |
810 | | - | |
811 | | - | |
812 | | - | |
813 | | - | |
814 | | - | |
815 | | - | |
816 | | - | |
817 | | - | |
818 | | - | |
819 | | - | |
820 | | - | |
821 | | - | |
822 | | - | |
823 | | - | |
824 | | - | |
825 | | - | |
826 | | - | |
827 | | - | |
828 | | - | |
829 | | - | |
| 740 | + | |
| 741 | + | |
| 742 | + | |
| 743 | + | |
| 744 | + | |
| 745 | + | |
| 746 | + | |
| 747 | + | |
| 748 | + | |
| 749 | + | |
830 | 750 | | |
831 | 751 | | |
832 | 752 | | |
833 | | - | |
834 | | - | |
835 | | - | |
836 | | - | |
837 | | - | |
838 | | - | |
839 | | - | |
| 753 | + | |
| 754 | + | |
| 755 | + | |
840 | 756 | | |
841 | 757 | | |
842 | 758 | | |
| |||
1036 | 952 | | |
1037 | 953 | | |
1038 | 954 | | |
1039 | | - | |
1040 | | - | |
1041 | | - | |
1042 | | - | |
1043 | | - | |
| 955 | + | |
| 956 | + | |
| 957 | + | |
| 958 | + | |
| 959 | + | |
| 960 | + | |
| 961 | + | |
| 962 | + | |
1044 | 963 | | |
1045 | 964 | | |
1046 | 965 | | |
| |||
4036 | 3955 | | |
4037 | 3956 | | |
4038 | 3957 | | |
4039 | | - | |
| 3958 | + | |
4040 | 3959 | | |
4041 | 3960 | | |
4042 | 3961 | | |
4043 | | - | |
4044 | | - | |
4045 | | - | |
4046 | | - | |
4047 | | - | |
4048 | | - | |
4049 | | - | |
| 3962 | + | |
| 3963 | + | |
4050 | 3964 | | |
4051 | | - | |
| 3965 | + | |
4052 | 3966 | | |
4053 | 3967 | | |
4054 | | - | |
| 3968 | + | |
4055 | 3969 | | |
4056 | 3970 | | |
4057 | 3971 | | |
4058 | | - | |
4059 | | - | |
4060 | | - | |
4061 | | - | |
4062 | | - | |
4063 | | - | |
| 3972 | + | |
| 3973 | + | |
4064 | 3974 | | |
4065 | 3975 | | |
4066 | 3976 | | |
4067 | 3977 | | |
4068 | | - | |
| 3978 | + | |
4069 | 3979 | | |
4070 | 3980 | | |
4071 | | - | |
| 3981 | + | |
4072 | 3982 | | |
4073 | 3983 | | |
4074 | 3984 | | |
4075 | | - | |
4076 | | - | |
4077 | | - | |
4078 | | - | |
4079 | | - | |
4080 | | - | |
4081 | | - | |
| 3985 | + | |
4082 | 3986 | | |
4083 | 3987 | | |
4084 | 3988 | | |
4085 | 3989 | | |
4086 | | - | |
| 3990 | + | |
4087 | 3991 | | |
4088 | 3992 | | |
4089 | 3993 | | |
4090 | 3994 | | |
4091 | | - | |
| 3995 | + | |
| 3996 | + | |
4092 | 3997 | | |
4093 | 3998 | | |
4094 | | - | |
| 3999 | + | |
4095 | 4000 | | |
4096 | 4001 | | |
4097 | 4002 | | |
4098 | | - | |
4099 | | - | |
4100 | | - | |
4101 | | - | |
4102 | | - | |
4103 | | - | |
| 4003 | + | |
| 4004 | + | |
4104 | 4005 | | |
4105 | | - | |
4106 | 4006 | | |
4107 | 4007 | | |
4108 | 4008 | | |
| |||
0 commit comments