Background
Block codes (aka slugs, aka block_ids, aka url_names, aka component_codes or container_codes) have historically been restricted to ascii slug characters (A-Za-z0-9_-). But, we recently discovered that we have some partial support for non-ascii characters in block codes in V2 content libraries: #38402 (comment). Specifically, non-ascii alphanumeric unicode characters, i.e. the character class accepted by re.match(r'\w', unicode=True) in Python: https://docs.python.org/3/library/re.html**
Currently, as far as we know, you can only get a block with a non-ascii code by doing one of the following:
- Creating a container (unit, subsection, section) with an initial title that includes non-ascii characters (in a content library)
- Editing a component or container's
key with the library backup ZIP, and then restoring that backup.
- Creating a component or container with a non-ascii alphanumeric character (e.g., "θ") in the title and then copy-pasting that component into a content library. The original item won't have a non-ascii code, but the pasted item will.
In the long term we'd like to support such non-ascii codes, but in the short term we're concerned about downstream effects of those codes making into systems which were designed without unicode in mind, like:
- modulestore
- courseware student module
- analytics
Acceptance criteria
- Research the effect of non-ascii alphanumeric chars on those downstream systems
- Decide which systems we feel are safe for such characters
- Document this decision
- Loosen UX-level validation as to allow for them
- Do the same for: library codes, course codes, org codes. Or create ticket(s) for them.
Background
Block codes (aka slugs, aka block_ids, aka url_names, aka component_codes or container_codes) have historically been restricted to ascii slug characters (
A-Za-z0-9_-). But, we recently discovered that we have some partial support for non-ascii characters in block codes in V2 content libraries: #38402 (comment). Specifically, non-ascii alphanumeric unicode characters, i.e. the character class accepted byre.match(r'\w', unicode=True)in Python: https://docs.python.org/3/library/re.html**Currently, as far as we know, you can only get a block with a non-ascii code by doing one of the following:
keywith the library backup ZIP, and then restoring that backup.In the long term we'd like to support such non-ascii codes, but in the short term we're concerned about downstream effects of those codes making into systems which were designed without unicode in mind, like:
Acceptance criteria