Skip to content

feat: Reinstate support for Unicode component_codes and container_codes#558

Merged
kdmccormick merged 1 commit intomainfrom
kdmccormick/unicode-codes
Apr 23, 2026
Merged

feat: Reinstate support for Unicode component_codes and container_codes#558
kdmccormick merged 1 commit intomainfrom
kdmccormick/unicode-codes

Conversation

@kdmccormick
Copy link
Copy Markdown
Member

@kdmccormick kdmccormick commented Apr 23, 2026

While upgrading platform from core 0.39.2 to core 0.43.0, we realized that we've secretly supported non-ascii component and container codes in V2 content libraries since their release. We do not advertise our support for non-ascii codes (and we don't want to, yet), but in order to avoid breaking sites that stumbled upon non-ascii codes, we need to loosen the validation & db constraints which were added to openedx-core.

On a code level, we simply add a unicode=True/False parameter to the code_field(). I've set it True for components and containers, and left it as False for collections for now, since we no of know way in which a user could make non-ascii collection code (other than by editing a ZIP archive).

For migrations, I've added a new migration (0013) which alters the model field validations, removes the too-strict db constraints, and reinstates unicode-friendly db constraints. I've also edited existing migrations 0009 and 0010 for components and containers, respectively, to loosen their db constraints. Here's my thinking:

  • If a dev has already applied 0009 and 0010: Migration 0013 will remove the over-strict db constraints and add the unicode-friendly ones.
  • Otherwise, for devs/ops who are skipping 0.43.0 and are applying 0009 and 0010 in the future (the vast majority): The altered 0009 and 0010 will apply unicode-friendly constraints, which won't cause an IntegrityError if there are non-ascii codes in the db. Then, 0013 will remove those constraints, and then just re-apply the same constraints again.

I'm not modifying the overly-strict Django validators in 0009 and 0010, because (I believe) Django will happily "add" and "remove" those validators regardless of what's in the database.

Bumps version to 0.44.0

@kdmccormick kdmccormick requested review from bradenmacdonald and ormsbee and removed request for ormsbee April 23, 2026 15:08
@kdmccormick kdmccormick force-pushed the kdmccormick/unicode-codes branch from d597b77 to ed61680 Compare April 23, 2026 15:11
@kdmccormick kdmccormick marked this pull request as ready for review April 23, 2026 15:12
Comment on lines +48 to +51
migrations.AddConstraint(
model_name='container',
constraint=models.CheckConstraint(condition=django.db.models.lookups.Regex(models.F('container_code'), '^[\\w.-]+\\Z'), name='oel_container_code_regex', violation_error_message='Enter a valid "code name" consisting of any letters, numbers, underscores, hyphens, or periods.'),
),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This updated constraint is already added in migration 0009, so why does it need to get re-created in this migration?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to spell this out in the PR description. It's for those of us developers who already installed 0.43.0 into platform and applied 0009 and 0010, getting the ascii-only constraints.

I guess I could omit this, and instead just send out a slack blast to the handful of devs who might be in this situation, telling them to back-migrate to 0008. I think I'd also yank 0.43 from PyPI in that case.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bradenmacdonald let me know if you'd prefer this :

I guess I could omit this, and instead just send out a slack blast to the handful of devs who might be in this situation, telling them to back-migrate to 0008. I think I'd also yank 0.43 from PyPI in that case.

Copy link
Copy Markdown
Contributor

@bradenmacdonald bradenmacdonald Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, sorry I didn't notice the PR description. Since your platform PR was not merged I expect you can count the number of affected developers on one hand. But it is a nice idea to resolve it automatically; if you're keeping it, maybe mention that in a comment in the migration itself too? Either way is good with me!

Copy link
Copy Markdown
Member Author

@kdmccormick kdmccormick Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great, I added comments!



def code_field(**kwargs) -> MultiCollationCharField:
def code_field(unicode: bool, **kwargs) -> MultiCollationCharField:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that this is mandatory.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same. I'd rather force developers to think about which kind of code they're defining.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we can make it default to true after doing openedx/openedx-platform#38413

Copy link
Copy Markdown
Contributor

@bradenmacdonald bradenmacdonald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good and works well!

While upgrading platform from core 0.39.2 to core 0.43.0, we realized that
we've quietly supported non-ascii component and container codes in Content
Libraries V2 since their release. We do not advertise our support for non-ascii
codes within Libraries (and we don't want to, yet [2]), but in order to avoid
breaking sites that stumbled upon non-ascii codes, we need to loosen the
validation & db constraints which were added to openedx-core.

On a code level, we simply add a `unicode=True/False` parameter to the
`code_field()`. It's set to True for components and containers, and left as
False for collections for now, since we no of no way in which a user could
make non-ascii collection code (other than by editing a ZIP archive).

There are some nuances around the migrations. See the comments for details.

[1] openedx/openedx-platform#38402 (comment)
[2] openedx/openedx-platform#38413

Bumps version to 0.44.0
@kdmccormick kdmccormick force-pushed the kdmccormick/unicode-codes branch from ed61680 to e7bdac8 Compare April 23, 2026 16:26
@kdmccormick
Copy link
Copy Markdown
Member Author

@bradenmacdonald I added comments and a couple tests. I'll merge now, but if you have any suggestions on those I'm happy to incorporate them when I make the "Mark things stable / bump to v1.0" PR.

@kdmccormick kdmccormick changed the title feat: Unicode component_codes and container_codes feat: Reinstate support for Unicode component_codes and container_codes Apr 23, 2026
@kdmccormick kdmccormick enabled auto-merge (squash) April 23, 2026 16:29
@kdmccormick kdmccormick merged commit beffc1f into main Apr 23, 2026
6 checks passed
@kdmccormick kdmccormick deleted the kdmccormick/unicode-codes branch April 23, 2026 16:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants