Commit d88ab6a
authored
refactor: give parquet CDC options an explicit
## Which issue does this PR close?
- None
## Rationale for this change
The CDC options currently work as `use_content_defined_chunking:
Option<CdcOptions>` with a `ConfigField` impl that accepts a bare
`use_content_defined_chunking = true|false` and otherwise enables CDC
implicitly when any sub-field is set. This has a few problems:
- **Naming diverges from parquet-rs.** `WriterProperties` exposes
`content_defined_chunking()` /
`set_content_defined_chunking(Option<CdcOptions>)` with no `use_`
prefix.
- **Implicit / order-dependent on the SQL side.** Format options in
`COPY ... OPTIONS` / `CREATE EXTERNAL TABLE ... OPTIONS` are applied
from a `HashMap` (non-deterministic order). With the old bare-boolean
form, mixing `... = false` with a sub-field, or setting a sub-field
after `= false`, could resolve to enabled or disabled depending on
iteration order.
- **Extra machinery.** Supporting the bare boolean required a
hand-written `impl ConfigField for CdcOptions` + `impl ConfigField for
Option<CdcOptions>` and a `#[expect(clippy::should_implement_trait)]`
workaround, plus a zero-sentinel fallback in the proto mapping.
Since CDC is unreleased, the config/proto surface can still be changed
freely.
## What changes are included in this PR?
- Rename the `ParquetOptions` field `use_content_defined_chunking` ->
`content_defined_chunking` (matches parquet-rs).
- Make `CdcOptions` a plain `config_namespace!` with an explicit
`enabled: bool` field alongside the chunking parameters; the field is a
bare `CdcOptions` (no longer `Option<CdcOptions>`). CDC is on if
`content_defined_chunking.enabled` is true. Setting a parameter no
longer implicitly enables CDC, and the result is independent of key
order.
- Add `CdcOptions::enabled()` / `CdcOptions::disabled()` shorthand
constructors.
- Drop the `ConfigField` impls and the `should_implement_trait`
workaround — all generated by the macro now.
- Add an `enabled` field to the proto `CdcOptions` message so the proto
<-> config mapping is a plain field copy in both directions (removes the
presence-encoding and the zero-sentinel fallback).
- Update unit tests, regenerate config docs + the `information_schema`
snapshot, and add `parquet_cdc_config.slt` documenting the resolution
behavior.
## Are these changes tested?
Yes:
- `datafusion-common` config + writer unit tests (enable toggle,
parameter-does-not-enable, validation, writer round-trip).
- `datafusion-proto-common` proto round-trip tests (enabled / disabled /
negative norm level).
- `datafusion/core` parquet integration tests (data round-trip, page
boundaries).
- sqllogictest: `parquet_cdc.slt` (end-to-end) and a new
`parquet_cdc_config.slt` (config resolution / order independence).
## Are there any user-facing changes?
Yes, but only to the unreleased CDC options:
- Config key `datafusion.execution.parquet.use_content_defined_chunking`
-> `datafusion.execution.parquet.content_defined_chunking.enabled` (plus
`.min_chunk_size` / `.max_chunk_size` / `.norm_level`).
- The bare-boolean form is removed; enable/disable via
`content_defined_chunking.enabled = true|false`.
No released API is affected.
🤖 Generated with [Claude Code](https://claude.com/claude-code)enabled flag (#22632)1 parent 533ef35 commit d88ab6a
15 files changed
Lines changed: 470 additions & 459 deletions
File tree
- datafusion
- common/src
- file_options
- core/tests/parquet
- datasource-parquet/src
- proto-common
- proto
- src
- from_proto
- generated
- to_proto
- proto-models/src/generated
- proto/src/logical_plan
- sqllogictest/test_files
- docs/source/user-guide
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
756 | 756 | | |
757 | 757 | | |
758 | 758 | | |
759 | | - | |
760 | | - | |
761 | | - | |
762 | | - | |
763 | | - | |
764 | | - | |
765 | | - | |
766 | | - | |
767 | | - | |
768 | | - | |
769 | | - | |
770 | | - | |
771 | | - | |
772 | | - | |
773 | | - | |
774 | | - | |
775 | | - | |
776 | | - | |
777 | | - | |
778 | | - | |
779 | | - | |
780 | | - | |
781 | | - | |
782 | | - | |
783 | | - | |
784 | | - | |
785 | | - | |
786 | | - | |
787 | | - | |
788 | | - | |
789 | | - | |
790 | | - | |
791 | | - | |
792 | | - | |
793 | | - | |
794 | | - | |
795 | | - | |
| 759 | + | |
| 760 | + | |
| 761 | + | |
| 762 | + | |
| 763 | + | |
| 764 | + | |
| 765 | + | |
| 766 | + | |
| 767 | + | |
| 768 | + | |
| 769 | + | |
| 770 | + | |
| 771 | + | |
796 | 772 | | |
797 | | - | |
798 | | - | |
799 | | - | |
800 | | - | |
801 | | - | |
802 | | - | |
803 | | - | |
804 | | - | |
805 | | - | |
806 | | - | |
| 773 | + | |
| 774 | + | |
| 775 | + | |
807 | 776 | | |
808 | | - | |
809 | | - | |
810 | | - | |
811 | | - | |
812 | | - | |
813 | | - | |
814 | | - | |
815 | | - | |
| 777 | + | |
| 778 | + | |
| 779 | + | |
816 | 780 | | |
817 | | - | |
818 | | - | |
819 | | - | |
820 | | - | |
821 | | - | |
822 | | - | |
823 | | - | |
824 | | - | |
825 | | - | |
826 | | - | |
827 | | - | |
828 | | - | |
829 | | - | |
830 | | - | |
831 | | - | |
832 | | - | |
833 | | - | |
834 | | - | |
835 | | - | |
836 | | - | |
837 | | - | |
838 | | - | |
839 | | - | |
840 | | - | |
841 | | - | |
842 | | - | |
843 | | - | |
844 | | - | |
845 | | - | |
| 781 | + | |
| 782 | + | |
| 783 | + | |
846 | 784 | | |
847 | 785 | | |
848 | 786 | | |
849 | | - | |
850 | | - | |
851 | | - | |
852 | | - | |
853 | | - | |
854 | | - | |
855 | | - | |
856 | | - | |
857 | | - | |
858 | | - | |
859 | | - | |
860 | | - | |
861 | | - | |
862 | | - | |
863 | | - | |
864 | | - | |
865 | | - | |
866 | | - | |
867 | | - | |
868 | | - | |
869 | | - | |
870 | | - | |
871 | | - | |
872 | | - | |
873 | | - | |
874 | | - | |
875 | | - | |
876 | | - | |
| 787 | + | |
| 788 | + | |
| 789 | + | |
| 790 | + | |
| 791 | + | |
| 792 | + | |
| 793 | + | |
| 794 | + | |
| 795 | + | |
| 796 | + | |
877 | 797 | | |
878 | 798 | | |
879 | 799 | | |
880 | | - | |
881 | | - | |
882 | | - | |
883 | | - | |
884 | | - | |
885 | | - | |
886 | | - | |
| 800 | + | |
| 801 | + | |
| 802 | + | |
887 | 803 | | |
888 | 804 | | |
889 | 805 | | |
| |||
1083 | 999 | | |
1084 | 1000 | | |
1085 | 1001 | | |
1086 | | - | |
1087 | | - | |
1088 | | - | |
1089 | | - | |
1090 | | - | |
| 1002 | + | |
| 1003 | + | |
| 1004 | + | |
| 1005 | + | |
| 1006 | + | |
| 1007 | + | |
| 1008 | + | |
| 1009 | + | |
1091 | 1010 | | |
1092 | 1011 | | |
1093 | 1012 | | |
| |||
4111 | 4030 | | |
4112 | 4031 | | |
4113 | 4032 | | |
4114 | | - | |
| 4033 | + | |
4115 | 4034 | | |
4116 | 4035 | | |
4117 | 4036 | | |
4118 | | - | |
4119 | | - | |
4120 | | - | |
4121 | | - | |
4122 | | - | |
4123 | | - | |
4124 | | - | |
| 4037 | + | |
| 4038 | + | |
4125 | 4039 | | |
4126 | | - | |
| 4040 | + | |
4127 | 4041 | | |
4128 | 4042 | | |
4129 | | - | |
| 4043 | + | |
4130 | 4044 | | |
4131 | 4045 | | |
4132 | 4046 | | |
4133 | | - | |
4134 | | - | |
4135 | | - | |
4136 | | - | |
4137 | | - | |
4138 | | - | |
| 4047 | + | |
| 4048 | + | |
4139 | 4049 | | |
4140 | 4050 | | |
4141 | 4051 | | |
4142 | 4052 | | |
4143 | | - | |
| 4053 | + | |
4144 | 4054 | | |
4145 | 4055 | | |
4146 | | - | |
| 4056 | + | |
4147 | 4057 | | |
4148 | 4058 | | |
4149 | 4059 | | |
4150 | | - | |
4151 | | - | |
4152 | | - | |
4153 | | - | |
4154 | | - | |
4155 | | - | |
4156 | | - | |
| 4060 | + | |
4157 | 4061 | | |
4158 | 4062 | | |
4159 | 4063 | | |
4160 | 4064 | | |
4161 | | - | |
| 4065 | + | |
4162 | 4066 | | |
4163 | 4067 | | |
4164 | 4068 | | |
4165 | 4069 | | |
4166 | | - | |
| 4070 | + | |
| 4071 | + | |
4167 | 4072 | | |
4168 | 4073 | | |
4169 | | - | |
| 4074 | + | |
4170 | 4075 | | |
4171 | 4076 | | |
4172 | 4077 | | |
4173 | | - | |
4174 | | - | |
4175 | | - | |
4176 | | - | |
4177 | | - | |
4178 | | - | |
| 4078 | + | |
| 4079 | + | |
4179 | 4080 | | |
4180 | | - | |
4181 | 4081 | | |
4182 | 4082 | | |
4183 | 4083 | | |
| |||
0 commit comments