feat: update content libraries API to use events from openedx-core [FC-0117]#38437
feat: update content libraries API to use events from openedx-core [FC-0117]#38437bradenmacdonald wants to merge 17 commits intoopenedx:masterfrom
Conversation
|
Thanks for the pull request, @bradenmacdonald! This repository is currently maintained by Once you've gone through the following steps feel free to tag them in a comment and let them know that your changes are ready for engineering review. 🔘 Get product approvalIf you haven't already, check this list to see if your contribution needs to go through the product review process.
🔘 Provide contextTo help your reviewers and other members of the community understand the purpose and larger context of your changes, feel free to add as much of the following information to the PR description as you can:
🔘 Get a green buildIf one or more checks are failing, continue working on your changes until this is no longer the case and your build turns green. DetailsWhere can I find more information?If you'd like to get more details on all aspects of the review process for open source pull requests (OSPRs), check out the following resources: When can I expect my changes to be merged?Our goal is to get community contributions seen and reviewed as efficiently as possible. However, the amount of time that it takes to review and merge a PR can vary significantly based on factors such as:
💡 As a result it may take up to several weeks or months to complete a review and merge your PR. |
| { # Not 100% sure we want this, but a PUBLISHED event is emitted for container 2 | ||
| # because one of its children's published versions has changed, so whether or | ||
| # not it contains unpublished changes may have changed and the search index | ||
| # may need to be updated. It is not actually published though. | ||
| # TODO: should this be a CONTAINER_CHILD_PUBLISHED event? | ||
| # No PUBLISHED event is emitted for container 2, because it doesn't have a published version yet. | ||
| # Publishing 'html_block' would have potentially affected it if container 2's published version had a | ||
| # reference to 'html_block', but it doesn't yet until we publish it. | ||
| ) | ||
|
|
||
| # note that container 2 is still unpublished | ||
| c2_after = self._get_container(container2["id"]) | ||
| assert c2_after["has_unpublished_changes"] | ||
|
|
||
| # publish container2 now: | ||
| self._publish_container(container2["id"]) | ||
| self.expect_new_events( | ||
| { # An event for container 1 being published: | ||
| "signal": LIBRARY_CONTAINER_PUBLISHED, | ||
| "library_container": LibraryContainerData( | ||
| container_key=LibraryContainerLocator.from_string(container2["id"]), | ||
| ), | ||
| }, | ||
| { # An event for the html block in container 2 only: | ||
| "signal": LIBRARY_BLOCK_PUBLISHED, | ||
| "library_block": LibraryBlockData( | ||
| self.lib1_key, LibraryUsageLocatorV2.from_string(html_block2["id"]), | ||
| ), | ||
| }, |
There was a problem hiding this comment.
It's a little hard to tell from the diff here (because of how it's split up), but before this PR, a spurious PUBLISHED event was emitted for container 2 before it was ever published at all. I think the new behavior is much more correct, because it's built on Learning Core's new publish log side effects. I have explained why in the test case and added additional tests to ensure side effects are still resulting in PUBLISHED events when they should be. (Once we actually published container 2)
| { | ||
| "signal": CONTENT_OBJECT_ASSOCIATIONS_CHANGED, | ||
| "content_object": ContentObjectChangedData( | ||
| object_id=str(container_key), | ||
| changes=["collections", "tags"], | ||
| ), | ||
| }, | ||
| # We used to emit CONTENT_OBJECT_ASSOCIATIONS_CHANGED here for the restored container, specifically noting | ||
| # that changes=["collections", "tags"], because deleted things may have collections+tags that are once | ||
| # again relevant when it is restored. However, the CREATED event should be sufficient for notifying of that. | ||
| # (Or should we emit CREATED+UPDATED to be extra sure?) |
There was a problem hiding this comment.
Flagging this, as it's a change - no longer emitting CONTENT_OBJECT_ASSOCIATIONS_CHANGED in the case of restoring a deleted object.
TODO: test publishing a thing with collections and tags, delete it, then "revert all changes" in the library UI and make sure it re-appears with collections and tags intact. I haven't tested this yet.
| # openedx_content also lists ancestor containers of the affected units as changed. | ||
| # We don't strictly need this at the moment, at least as far as keeping our search index updated. | ||
| { | ||
| "signal": LIBRARY_CONTAINER_UPDATED, | ||
| "library_container": LibraryContainerData(container_key=self.subsection1.container_key), | ||
| }, | ||
| { | ||
| "signal": LIBRARY_CONTAINER_UPDATED, | ||
| "library_container": LibraryContainerData(container_key=self.subsection2.container_key), | ||
| }, | ||
| { | ||
| "signal": LIBRARY_CONTAINER_UPDATED, | ||
| "library_container": LibraryContainerData(container_key=self.section1.container_key), | ||
| }, | ||
| { | ||
| "signal": LIBRARY_CONTAINER_UPDATED, | ||
| "library_container": LibraryContainerData(container_key=self.section2.container_key), |
There was a problem hiding this comment.
Last change: we now emit events for ancestors of parent containers of modified entities, which we weren't doing before (before it was only one level - parent containers but not their ancestors in turn). I don't think we have a use case for this, but I am not sure if I could or should filter them out somehow, as the publish log treats direct ancestors (which we definitely care about and need events for) and their ancestors in turn exactly the same.
To avoid performance issues, in such cases where more than one ancestor is included in the event stream, the event for the directly modified entity is emitted synchronously but the indirect container events are emitted asynchronously. This seems to work well in the UI, making it update correctly/immediately when e.g. renaming something, but should still preserve performance even if you rename a component used in thousands of different containers.
There was a problem hiding this comment.
(Still need to look through test changes.)
At a high level, I do have a bit of a concern that having some things be sync and some async at a granular level (depending on how many things there are) is going to lead to inconsistencies and bugs. I think it's a reasonable tradeoff at the moment--just something we should keep an eye on.
| # Which entities were _directly_ changed here? | ||
| direct_changes = [asdict(change) for change in change_log.changes if change.new_version != change.old_version] | ||
| # And which entities were indirectly affected (e.g. parent containers)? | ||
| indirect_changes = [asdict(change) for change in change_log.changes if change.new_version == change.old_version] |
There was a problem hiding this comment.
[Comment] This reminds me that we should probably put a couple of helper methods in DraftChangeLog and DraftChangeLogRecord for this sort of thing, so we can keep the terminology consistent over time. Made a ticket for that: openedx/openedx-core#560
| update_async(change_list=indirect_changes) # update the many other affects entities async. | ||
| else: | ||
| # More than one entity was changed at once. Handle asynchronously: | ||
| update_async(change_list=[*direct_changes, *indirect_changes]) |
There was a problem hiding this comment.
Nit [optional]: I think it'd be a little easier to follow with early returns so there's less nesting, but this is totally readable as-is.
| ⏳ This event is emitted synchronously and this handler is called | ||
| synchronously. If a lot of entities were published, we need to dispatch | ||
| an asynchronous handler to deal with them to avoid slowdowns. If only one | ||
| entity was published, we want to deal with that synchronously so that we | ||
| can show the user correct data when the current requests completes. |
There was a problem hiding this comment.
We should clarify to note that it's async for any number > 1 ("a lot" might be misleading).
There was a problem hiding this comment.
Sure, updated to say "multiple" instead of "a lot of". d3e7fcd
| if len(change_log.changes) == 1: | ||
| fn = tasks.send_events_after_publish |
There was a problem hiding this comment.
This includes side-effects, right? If so, I think it should be mentioned prominently in the docstring, since that's going to be unintuitive for a lot of folks who are going to think "publish the Component" would only result in one entry.
| # .. event_implemented_name: LIBRARY_COLLECTION_CREATED | ||
| # .. event_type: org.openedx.content_authoring.content_library.collection.created.v1 |
There was a problem hiding this comment.
I think these just go where the signal is defined, not where it's sent.
There was a problem hiding this comment.
I realized later in the review that this was like this in the code that you refactored, but I still think it's wrong and should be removed in all the places other than where the signal is first defined.
There was a problem hiding this comment.
OK, great! I didn't want them there anyways; I was just copying the existing pattern without understanding it.
There was a problem hiding this comment.
@ormsbee Actually, it seems the event annotations are quite deliberate - see #36473 and it is mentioned in these docs:
In-line code annotations are also used when integrating the event into the service.
It's not super clear to me why this is the case but I think it's related to what the doc says at the end: "ensures that [the event] is used correctly across services" ?
Maybe @mariajgrimaldi or @BryanttV can clarify how these are used?
There was a problem hiding this comment.
Ack, you're totally right. That's... really weird to me. But okay, thank you.
| new_child_ids: Iterable[PublishableEntity.ID] | ||
| # If the title has changed, we notify ALL children that their parent container(s) have changed, e.g. to update the | ||
| # list of "units this component is used in", "sections this subsection is used in", etc. in the search index | ||
| title_changed: bool = bool(old_version and new_version) and (old_version.title != new_version.title) |
There was a problem hiding this comment.
Please explain in a comment why this does not include events where the old_version was None.
There was a problem hiding this comment.
Please explain in a comment why this does not include events where the old_version was None.
It didn't because the else: path happened to work just as well in that particular case, and I could make the code slightly more compact by not having to deal with getting the title old_version.title when old_version was none...
But that made me realize this was more convoluted than it needs to be, so I refactored this logic entirely to be much simpler and easier to follow. f3961bf
| # list of "units this component is used in", "sections this subsection is used in", etc. in the search index | ||
| title_changed: bool = bool(old_version and new_version) and (old_version.title != new_version.title) | ||
| if title_changed: | ||
| # TODO: there is no "get entity list for container version" API in openedx_content |
There was a problem hiding this comment.
You could also get effectively the same thing via dependencies (new_version.dependencies.all())
| # Different container versions but same list of child entities. For now we don't need to do anything, but in the | ||
| # future if we have some other kind of per-container settings relevant to child entities we might need to handle | ||
| # this the same way as title_changed. |
There was a problem hiding this comment.
Do we have to worry about the case where a container changes at the same time as component content within it? I'm not clear on how the meilisearch registers "return this Unit because the text that I'm typing is in a Component that this Unit has".
There was a problem hiding this comment.
Do we have to worry about the case where a container changes at the same time as component content within it?
No, I don't see that causing any problems.
In any case, I removed this code as checking for old_entity_list_id == new_entity_list_id is a pretty rare optimization, and you can also have a situation where the entity lists have different IDs but the same children, so it's simpler just to compare the children anyways.
I'm not clear on how the meilisearch registers "return this Unit because the text that I'm typing is in a Component that this Unit has".
It doesn't really, it would just match the Component itself, and then we display that component's parent units in the UI if the user is interested in seeing its context.
| if hasattr(entity, "component"): | ||
| opaque_key = api.library_component_usage_key(library_key, entity.component) | ||
| elif hasattr(entity, "container"): | ||
| opaque_key = api.library_container_locator(library_key, entity.container) |
There was a problem hiding this comment.
Nit: This really happens often enough where it seems like it should be a helper fn somewhere.
ormsbee
left a comment
There was a problem hiding this comment.
Just a minor additional nit request.
| # Unlike revert_changes below, we do not have to re-index collections, | ||
| # because publishing changes does not affect the component counts, and | ||
| # collections themselves don't have draft/published/unpublished status. | ||
| content_api.publish_all_drafts(learning_package.id, published_by=user_id) |
There was a problem hiding this comment.
Nit: Please add a comment here indicating that we do expect a bunch of events to be emitted by publishing, since it might otherwise not be obvious to folks just how much stuff is happening here.
| # .. event_implemented_name: LIBRARY_COLLECTION_CREATED | ||
| # .. event_type: org.openedx.content_authoring.content_library.collection.created.v1 |
There was a problem hiding this comment.
Ack, you're totally right. That's... really weird to me. But okay, thank you.
Yeah, I would prefer a more consistent approach too. But it comes from our direct experience with the libraries work... making everything async makes updating the UI after any change pretty awkward, and making everything sync is way too slow in many cases like renaming something that is used in many different places. So even though it's more complex, this sort of compromise seems to work best for now. |
In test_home.py:258, setUp calls OrganizationFactory(). That factory uses a factory_boy Sequence for short_name The sequence counter is process-global and monotonically increasing — it's never reset between tests. So: Run this test alone → org short_name is name0 → v2 key is lib:name0:test-key. Run it after N other tests that built Organizations → nameN → lib:nameN:test-key. The expected-response dicts at test_home.py:332 and test_home.py:367 hardcode 'lib:name0:test-key', which is why it only passes in isolation or if it happens to run before other Organization-using tests.
| # First, remove all children from the subsection: | ||
| with self.captureOnCommitCallbacks(execute=False): # suppress events | ||
| library_api.update_container_children(self.subsection.container_key, [], None) |
There was a problem hiding this comment.
Before: this test was changing a subsection to have the exact same unit child it already had, and that was emitting an event and updating the search index, because library_api.update_container_children was just hard-coded to send out LIBRARY_CONTAINER_UPDATED and CONTENT_OBJECT_ASSOCIATION_CHANGED events every time.
Now: our event logic is "smarter" and only sends out events if the container's children actually changed. So to keep the test working, first I have to clear the container's children.
Description
With openedx/openedx-core#543, openedx-core now emits events when changes happen within a Learning Package.
This PR updates the content libraries code and search code accordingly. The main benefit is that the search index now stays up to date regardless of which APIs are used. We don't need to "wrap" some low-level APIs in high-level APIs just to add events.
Note: The "Library Collections" code was already working fine because it used Django signals to watch for changes to the Collection-PublishableEntity many-to-many relationship, but it shouldn't have been so aware of the internals of
openedx_content.Supporting information
See openedx/openedx-core#462
Testing instructions
Coming soon
Deadline
Verawood
Other information
Depends on openedx/openedx-core#543 .
I wrote most of the code but used Claude Code for small bits and pieces.