Skip to content

Commit b1049ac

Browse files
committed
feat: add FieldMismatchStrategy to replace allowExtraFields/allowMissingFields
Introduce a FieldMismatchStrategy enum (STRICT, IGNORE, SKIP) that provides fine-grained control over how records with mismatched field counts are handled. SKIP silently drops non-conforming records, which was previously impossible without manual filtering. New builder methods: extraFieldStrategy() and missingFieldStrategy(). The old allowExtraFields(boolean) and allowMissingFields(boolean) methods are deprecated (forRemoval) and delegate to the new API. Closes #176
1 parent 97241f7 commit b1049ac

18 files changed

Lines changed: 223 additions & 68 deletions

docs/src/content/docs/architecture/interpretation.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -98,8 +98,8 @@ record. However, this is just an assumption. Field `value_c_2` does not even hav
9898
To ensure no misinterpretation, FastCSV does not allow extra or missing fields in a record by default.
9999
This means that the above example would result in a `CsvParseException` when reading it with FastCSV.
100100

101-
However, this behavior can be changed by setting `CsvReaderBuilder.allowExtraFields(boolean)`
102-
and `CsvReaderBuilder.allowMissingFields(boolean)` to `true`.
101+
However, this behavior can be changed by setting `CsvReaderBuilder.extraFieldStrategy(FieldMismatchStrategy)`
102+
and `CsvReaderBuilder.missingFieldStrategy(FieldMismatchStrategy)` to `IGNORE` or `SKIP`.
103103

104104
### Empty lines
105105

docs/src/content/docs/guides/Examples/skip-non-csv-head.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ The main problem with those files is:
2424

2525
- When working with named fields, the very first line (`This is an example of a CSV file that contains`)
2626
would be interpreted as the actual header line.
27-
- An exception would be thrown unless the options `allowExtraFields(true)` is set, as some lines have
27+
- An exception would be thrown with the default `extraFieldStrategy(FieldMismatchStrategy.STRICT)`, as some lines have
2828
more fields than the first line.
2929

3030
FastCSV comes with two features to handle such files:

docs/src/content/docs/guides/basic.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -52,8 +52,8 @@ CsvReader.builder()
5252
.commentStrategy(CommentStrategy.SKIP)
5353
.commentCharacter('#')
5454
.skipEmptyLines(true)
55-
.allowExtraFields(false)
56-
.allowMissingFields(false)
55+
.extraFieldStrategy(FieldMismatchStrategy.STRICT)
56+
.missingFieldStrategy(FieldMismatchStrategy.STRICT)
5757
.allowExtraCharsAfterClosingQuote(false)
5858
.detectBomHeader(false)
5959
.maxBufferSize(16777216);

docs/src/content/docs/guides/upgrading.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -34,13 +34,13 @@ As the default has changed, you may need to check your code and your desired beh
3434

3535
FastCSV 4.x no longer ignores different field counts by default, ensuring that data is not misinterpreted.
3636

37-
You can change this behavior by calling `allowExtraFields(true)` and `allowMissingFields(true)` in the `CsvReaderBuilder`.
37+
You can change this behavior by calling `extraFieldStrategy(FieldMismatchStrategy.IGNORE)` and `missingFieldStrategy(FieldMismatchStrategy.IGNORE)` in the `CsvReaderBuilder`.
3838
These methods provide more control over how to handle different field counts in CSV data than the previous (now removed) `ignoreDifferentFieldCount()` method.
3939

4040
```java title="Example"
4141
CsvReaderBuilder builder = CsvReader.builder()
42-
.allowExtraFields(true)
43-
.allowMissingFields(true);
42+
.extraFieldStrategy(FieldMismatchStrategy.IGNORE)
43+
.missingFieldStrategy(FieldMismatchStrategy.IGNORE);
4444

4545
try (CsvReader<CsvRecord> csv = builder.ofCsvRecord(csvFile)) {
4646
// ...

example/src/main/java/ExampleCsvReaderWithFaultyData.java

Lines changed: 11 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
import de.siegmar.fastcsv.reader.CsvParseException;
22
import de.siegmar.fastcsv.reader.CsvReader;
3+
import de.siegmar.fastcsv.reader.FieldMismatchStrategy;
34

45
/// Example for reading CSV data with faulty (or ambiguous) data.
56
///
@@ -22,10 +23,17 @@ void main() {
2223
e.printStackTrace(System.out);
2324
}
2425

25-
IO.println("Reading data while not ignoring different field counts:");
26+
IO.println("Reading data while ignoring different field counts:");
2627
CsvReader.builder()
27-
.allowExtraFields(true)
28-
.allowMissingFields(true)
28+
.extraFieldStrategy(FieldMismatchStrategy.IGNORE)
29+
.missingFieldStrategy(FieldMismatchStrategy.IGNORE)
30+
.ofCsvRecord(data)
31+
.forEach(IO::println);
32+
33+
IO.println("Reading data while skipping records with different field counts:");
34+
CsvReader.builder()
35+
.extraFieldStrategy(FieldMismatchStrategy.SKIP)
36+
.missingFieldStrategy(FieldMismatchStrategy.SKIP)
2937
.ofCsvRecord(data)
3038
.forEach(IO::println);
3139
}

lib/src/intTest/java/blackbox/reader/AbstractCsvReaderTest.java

Lines changed: 66 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,6 @@
22

33
import static org.assertj.core.api.Assertions.assertThat;
44
import static org.assertj.core.api.Assertions.assertThatCode;
5-
import static org.assertj.core.api.Assertions.assertThatNoException;
65
import static org.assertj.core.api.Assertions.assertThatThrownBy;
76

87
import java.io.CharArrayReader;
@@ -29,6 +28,7 @@
2928
import de.siegmar.fastcsv.reader.CsvReader;
3029
import de.siegmar.fastcsv.reader.CsvRecord;
3130
import de.siegmar.fastcsv.reader.CsvRecordHandler;
31+
import de.siegmar.fastcsv.reader.FieldMismatchStrategy;
3232
import de.siegmar.fastcsv.reader.FieldModifiers;
3333
import testutil.CsvRecordAssert;
3434

@@ -120,10 +120,11 @@ void immutableResponse() {
120120
.isInstanceOf(UnsupportedOperationException.class);
121121
}
122122

123-
// allow extra fields
123+
// extra field strategy
124124

125125
@Test
126-
void allowNoExtraFields() {
126+
void extraFieldStrategyStrict() {
127+
crb.extraFieldStrategy(FieldMismatchStrategy.STRICT);
127128
assertThatThrownBy(() -> readAll("foo\nfoo,bar"))
128129
.isInstanceOf(CsvParseException.class)
129130
.hasMessage("Exception when reading record that started in line 2")
@@ -132,15 +133,34 @@ void allowNoExtraFields() {
132133
}
133134

134135
@Test
135-
void allowExtraFields() {
136-
crb.allowExtraFields(true);
137-
assertThatNoException().isThrownBy(() -> readAll("foo\nfoo,bar"));
136+
void extraFieldStrategyIgnore() {
137+
crb.extraFieldStrategy(FieldMismatchStrategy.IGNORE);
138+
assertThat(readAll("foo\nfoo,bar"))
139+
.satisfiesExactly(
140+
item1 -> CsvRecordAssert.assertThat(item1)
141+
.fields().containsExactly("foo"),
142+
item2 -> CsvRecordAssert.assertThat(item2)
143+
.fields().containsExactly("foo", "bar")
144+
);
145+
}
146+
147+
@Test
148+
void extraFieldStrategySkip() {
149+
crb.extraFieldStrategy(FieldMismatchStrategy.SKIP);
150+
assertThat(readAll("foo\nfoo,bar\nbaz"))
151+
.satisfiesExactly(
152+
item1 -> CsvRecordAssert.assertThat(item1)
153+
.fields().containsExactly("foo"),
154+
item2 -> CsvRecordAssert.assertThat(item2)
155+
.fields().containsExactly("baz")
156+
);
138157
}
139158

140-
// allow missing fields
159+
// missing field strategy
141160

142161
@Test
143-
void allowNoMissingFields() {
162+
void missingFieldStrategyStrict() {
163+
crb.missingFieldStrategy(FieldMismatchStrategy.STRICT);
144164
assertThatThrownBy(() -> readAll("foo,bar\nfoo"))
145165
.isInstanceOf(CsvParseException.class)
146166
.hasMessage("Exception when reading record that started in line 2")
@@ -149,9 +169,44 @@ void allowNoMissingFields() {
149169
}
150170

151171
@Test
152-
void allowMissingFields() {
153-
crb.allowMissingFields(true);
154-
assertThatNoException().isThrownBy(() -> readAll("foo,bar\nfoo"));
172+
void missingFieldStrategyIgnore() {
173+
crb.missingFieldStrategy(FieldMismatchStrategy.IGNORE);
174+
assertThat(readAll("foo,bar\nfoo"))
175+
.satisfiesExactly(
176+
item1 -> CsvRecordAssert.assertThat(item1)
177+
.fields().containsExactly("foo", "bar"),
178+
item2 -> CsvRecordAssert.assertThat(item2)
179+
.fields().containsExactly("foo")
180+
);
181+
}
182+
183+
@Test
184+
void missingFieldStrategySkip() {
185+
crb.missingFieldStrategy(FieldMismatchStrategy.SKIP);
186+
assertThat(readAll("foo,bar\nbaz\nfoo,bar"))
187+
.satisfiesExactly(
188+
item1 -> CsvRecordAssert.assertThat(item1)
189+
.fields().containsExactly("foo", "bar"),
190+
item2 -> CsvRecordAssert.assertThat(item2)
191+
.fields().containsExactly("foo", "bar")
192+
);
193+
}
194+
195+
// combined strategies
196+
197+
@Test
198+
void skipExtraIgnoreMissing() {
199+
crb.extraFieldStrategy(FieldMismatchStrategy.SKIP)
200+
.missingFieldStrategy(FieldMismatchStrategy.IGNORE);
201+
assertThat(readAll("a,b\nc\nd,e,f\ng,h"))
202+
.satisfiesExactly(
203+
item1 -> CsvRecordAssert.assertThat(item1)
204+
.fields().containsExactly("a", "b"),
205+
item2 -> CsvRecordAssert.assertThat(item2)
206+
.fields().containsExactly("c"),
207+
item3 -> CsvRecordAssert.assertThat(item3)
208+
.fields().containsExactly("g", "h")
209+
);
155210
}
156211

157212
// allow extra characters after closing quotes

lib/src/intTest/java/blackbox/reader/AbstractSkipLinesTest.java

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@
1919
import de.siegmar.fastcsv.reader.CsvReader;
2020
import de.siegmar.fastcsv.reader.CsvRecord;
2121
import de.siegmar.fastcsv.reader.CsvRecordHandler;
22+
import de.siegmar.fastcsv.reader.FieldMismatchStrategy;
2223
import de.siegmar.fastcsv.reader.FieldModifiers;
2324
import de.siegmar.fastcsv.reader.NamedCsvRecord;
2425
import testutil.CsvRecordAssert;
@@ -55,15 +56,15 @@ void multipleRecordsNoSkipEmpty() {
5556
@ParameterizedTest
5657
@ValueSource(strings = {",\nfoo\n", ",,\nfoo\n", "''\nfoo\n", "' '\nfoo\n"})
5758
void notEmpty(final String input) {
58-
crb.allowMissingFields(true).quoteCharacter('\'');
59+
crb.missingFieldStrategy(FieldMismatchStrategy.IGNORE).quoteCharacter('\'');
5960
final CsvRecordHandler cbh = CsvRecordHandler.of(c -> c.fieldModifier(FieldModifiers.TRIM));
6061
assertThat(crb.build(cbh, input).stream()).hasSize(2);
6162
}
6263

6364
@ParameterizedTest
6465
@ValueSource(strings = {",\nfoo\n", ",,\nfoo\n", "''\nfoo\n", "' '\nfoo\n"})
6566
void notEmptyCustomCallback(final String input) {
66-
crb.allowMissingFields(true).quoteCharacter('\'');
67+
crb.missingFieldStrategy(FieldMismatchStrategy.IGNORE).quoteCharacter('\'');
6768
final AbstractBaseCsvCallbackHandler<String[]> cbh = new AbstractBaseCsvCallbackHandler<>() {
6869
private final List<String> fields = new ArrayList<>();
6970

lib/src/intTest/java/blackbox/reader/CsvReaderBuilderTest.java

Lines changed: 20 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@
1919
import de.siegmar.fastcsv.reader.CommentStrategy;
2020
import de.siegmar.fastcsv.reader.CsvReader;
2121
import de.siegmar.fastcsv.reader.CsvRecord;
22+
import de.siegmar.fastcsv.reader.FieldMismatchStrategy;
2223
import testutil.CsvRecordAssert;
2324

2425
@SuppressWarnings("PMD.CloseResource")
@@ -79,7 +80,7 @@ void builderToString() {
7980
.isEqualTo("""
8081
CsvReaderBuilder[fieldSeparator=,, quoteCharacter=", \
8182
commentStrategy=NONE, commentCharacter=#, skipEmptyLines=true, \
82-
allowExtraFields=false, allowMissingFields=false, allowExtraCharsAfterClosingQuote=false, \
83+
extraFieldStrategy=STRICT, missingFieldStrategy=STRICT, allowExtraCharsAfterClosingQuote=false, \
8384
trimWhitespacesAroundQuotes=false, detectBomHeader=false, maxBufferSize=16777216]""");
8485
}
8586

@@ -128,12 +129,28 @@ void chained() {
128129
.commentStrategy(CommentStrategy.NONE)
129130
.commentCharacter('#')
130131
.skipEmptyLines(true)
131-
.allowExtraFields(false)
132-
.allowMissingFields(false)
132+
.extraFieldStrategy(FieldMismatchStrategy.STRICT)
133+
.missingFieldStrategy(FieldMismatchStrategy.STRICT)
133134
.allowExtraCharsAfterClosingQuote(false)
134135
.ofCsvRecord("foo");
135136

136137
assertThat(reader).isNotNull();
137138
}
138139

140+
@SuppressWarnings("removal")
141+
@Test
142+
void deprecatedAllowExtraFields() {
143+
assertThat(CsvReader.builder().allowExtraFields(true)
144+
.ofCsvRecord("foo\nfoo,bar").stream())
145+
.hasSize(2);
146+
}
147+
148+
@SuppressWarnings("removal")
149+
@Test
150+
void deprecatedAllowMissingFields() {
151+
assertThat(CsvReader.builder().allowMissingFields(true)
152+
.ofCsvRecord("foo,bar\nfoo").stream())
153+
.hasSize(2);
154+
}
155+
139156
}

lib/src/intTest/java/blackbox/reader/RelaxedCsvReaderTest.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ void readerToString() {
3434
assertThat(crb.ofCsvRecord(""))
3535
.asString()
3636
.isEqualTo("CsvReader[commentStrategy=NONE, skipEmptyLines=true, "
37-
+ "allowExtraFields=false, allowMissingFields=false, parser=RelaxedCsvParser]");
37+
+ "extraFieldStrategy=STRICT, missingFieldStrategy=STRICT, parser=RelaxedCsvParser]");
3838
}
3939

4040
}

lib/src/intTest/java/blackbox/reader/RelaxedGenericDataTest.java

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@
1515
import de.siegmar.fastcsv.reader.CommentStrategy;
1616
import de.siegmar.fastcsv.reader.CsvReader;
1717
import de.siegmar.fastcsv.reader.CsvRecord;
18+
import de.siegmar.fastcsv.reader.FieldMismatchStrategy;
1819
import specreader.CheckVariant;
1920
import specreader.CheckVariantWrapper;
2021
import specreader.TestSpecRepository;
@@ -98,8 +99,8 @@ private static List<List<String>> parseCsvRecords(final TestSpecSettings setting
9899

99100
return CsvReader.builder()
100101
.commentStrategy(commentStrategy)
101-
.allowExtraFields(true)
102-
.allowMissingFields(true)
102+
.extraFieldStrategy(FieldMismatchStrategy.IGNORE)
103+
.missingFieldStrategy(FieldMismatchStrategy.IGNORE)
103104
.skipEmptyLines(settings.skipEmptyLines())
104105
.ofCsvRecord(input)
105106
.stream()

0 commit comments

Comments
 (0)