While attempting to write a validator for the country prefix (mentioned in #154), I encountered a problem in the data.
Problem
The CSV format uses comma , as separator between localized name entries:
Name: "DE Text,EN Text,FR Text"
However, some holiday names contain commas within the text itself, causing parsing ambiguity:
Examples from Bulgaria (BG)
St. George's Day / Bulgarian Army's Day:
BG Гергьовден, Ден на храбростта и Българската армия,DE St. Georgstag, Tag der bulgarischen Armee,EN St. George's Day, and the Bulgarian Army's Day
Bulgarian Education and Culture / Slavic Script Day:
BG Ден на Българската просвета и култура и на славянската писменост,DE Tag der bulgarischen Aufklärung und Kultur, Tag der slawischen Literatur,EN Bulgarian Education and Culture, and Slavic Script Day
The parser (LocalizedTextListConverter.cs) splits by comma and expects format "XX Text" where XX is a 2-letter language code. When names contain commas, this breaks:
Tag der bulgarischen Armee is correctly parsed
, Tag der slawischen Literatur is incorrectly parsed (leading comma, space)
Impact
- 12+ Bulgarian holiday entries affected
- API returns malformed data for these entries
- Similar issues may exist in other languages/countries
Proposed Solution
Use pipe | separator for localized names instead of comma:
Name: "DE Weihnachtsferien|EN Christmas Holidays"
Example with commas in text:
Name: "BG Гергьовден, Ден на храбростта и Българската армия|DE St. Georgstag, Tag der bulgarischen Armee|EN St. George's Day, and the Bulgarian Army's Day"
This requires:
- Update all CSV files
- Update
LocalizedTextListConverter.cs to split on | instead of ,
- Update documentation
- Optional: Add validation check.
I'm open to prepare PRs for this. What do you think?
While attempting to write a validator for the country prefix (mentioned in #154), I encountered a problem in the data.
Problem
The CSV format uses comma
,as separator between localized name entries:However, some holiday names contain commas within the text itself, causing parsing ambiguity:
Examples from Bulgaria (BG)
St. George's Day / Bulgarian Army's Day:
Bulgarian Education and Culture / Slavic Script Day:
The parser (LocalizedTextListConverter.cs) splits by comma and expects format "XX Text" where XX is a 2-letter language code. When names contain commas, this breaks:
Tag der bulgarischen Armeeis correctly parsed, Tag der slawischen Literaturis incorrectly parsed (leading comma, space)Impact
Proposed Solution
Use pipe
|separator for localized names instead of comma:Name: "DE Weihnachtsferien|EN Christmas Holidays"Example with commas in text:
This requires:
LocalizedTextListConverter.csto split on|instead of,I'm open to prepare PRs for this. What do you think?