fix: accept 4-8 char primary language subtags per RFC 5646 §2.1#342
Closed
creed-bratton wants to merge 1 commit intoitalia:mainfrom
Closed
fix: accept 4-8 char primary language subtags per RFC 5646 §2.1#342creed-bratton wants to merge 1 commit intoitalia:mainfrom
creed-bratton wants to merge 1 commit intoitalia:mainfrom
Conversation
The switch in isValidBCP47StrictLanguageTag had two dead-end cases:
case n == 4: return false
default: return false // 5-8 chars land here
The regex (group 2) already accepts [A-Z]{4} and [A-Z]{5,8}, so these
tags pass syntactic validation but are then rejected by the switch.
RFC 5646 §2.1 explicitly allows both:
- 4-alpha: reserved for future use
- 5-8 alpha: registered language subtags
golang.org/x/text/language does not cover 4-8 char primary subtags, so
syntactic validation by the regex is sufficient for both cases.
Adds a unit test suite for isValidBCP47StrictLanguageTag covering 2-char,
3-char, 4-char, 5-8 char subtags, with and without region/script/extlang,
grandfathered tags, private-use tags, and invalid inputs.
ad82739 to
49dbde3
Compare
creed-bratton
added a commit
to creed-bratton/publiccode-parser-go
that referenced
this pull request
Apr 12, 2026
go-playground/validator@b599053 added bcp47_strict_language_tag as a built-in validator (go-playground/validator#1489), so our copy is now redundant. Drop validators/bcp47.go and its tests; update bcp47_keys to delegate to the built-in tag via a package-level Validate instance. Closes italia#341, closes italia#342.
Contributor
Author
|
Superseded by go-playground/validator#1489 (merged upstream). Will switch to the built-in validator in a new PR. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
isValidBCP47StrictLanguageTagrejected 4-8 char primary language subtags because of two dead-end switch cases:RFC 5646 §2.1 allows three forms for the primary language subtag beyond the usual 2-3 alpha ISO 639 codes:
The regex (group 2) already accepts
[A-Z]{4}and[A-Z]{5,8}, so these tags passed syntactic validation only to be rejected by the switch.golang.org/x/text/languagedoes not cover 4-8 char primary subtags, so the regex check is sufficient.Also adds a unit test suite for
isValidBCP47StrictLanguageTag— previously untested directly.