Skip to content

fix: accept 4-8 char primary language subtags per RFC 5646 §2.1#342

Closed
creed-bratton wants to merge 1 commit intoitalia:mainfrom
creed-bratton:bcp47-primary-subtag-5to8
Closed

fix: accept 4-8 char primary language subtags per RFC 5646 §2.1#342
creed-bratton wants to merge 1 commit intoitalia:mainfrom
creed-bratton:bcp47-primary-subtag-5to8

Conversation

@creed-bratton
Copy link
Copy Markdown
Contributor

isValidBCP47StrictLanguageTag rejected 4-8 char primary language subtags because of two dead-end switch cases:

case n == 4:
    return false
default:
    return false  // 5-8 chars land here

RFC 5646 §2.1 allows three forms for the primary language subtag beyond the usual 2-3 alpha ISO 639 codes:

  • 4 alpha — reserved for future use
  • 5-8 alpha — registered language subtag

The regex (group 2) already accepts [A-Z]{4} and [A-Z]{5,8}, so these tags passed syntactic validation only to be rejected by the switch. golang.org/x/text/language does not cover 4-8 char primary subtags, so the regex check is sufficient.

Also adds a unit test suite for isValidBCP47StrictLanguageTag — previously untested directly.

The switch in isValidBCP47StrictLanguageTag had two dead-end cases:

  case n == 4:  return false
  default:      return false  // 5-8 chars land here

The regex (group 2) already accepts [A-Z]{4} and [A-Z]{5,8}, so these
tags pass syntactic validation but are then rejected by the switch.
RFC 5646 §2.1 explicitly allows both:
  - 4-alpha: reserved for future use
  - 5-8 alpha: registered language subtags

golang.org/x/text/language does not cover 4-8 char primary subtags, so
syntactic validation by the regex is sufficient for both cases.

Adds a unit test suite for isValidBCP47StrictLanguageTag covering 2-char,
3-char, 4-char, 5-8 char subtags, with and without region/script/extlang,
grandfathered tags, private-use tags, and invalid inputs.
@creed-bratton creed-bratton force-pushed the bcp47-primary-subtag-5to8 branch from ad82739 to 49dbde3 Compare March 24, 2026 18:46
creed-bratton added a commit to creed-bratton/publiccode-parser-go that referenced this pull request Apr 12, 2026
go-playground/validator@b599053 added bcp47_strict_language_tag as a
built-in validator (go-playground/validator#1489), so our copy is now
redundant. Drop validators/bcp47.go and its tests; update bcp47_keys to
delegate to the built-in tag via a package-level Validate instance.

Closes italia#341, closes italia#342.
@creed-bratton
Copy link
Copy Markdown
Contributor Author

Superseded by go-playground/validator#1489 (merged upstream). Will switch to the built-in validator in a new PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant