Skip to content

Collection download: Fetch manifests from Github#611

Open
manuq wants to merge 4 commits into
masterfrom
remote-collections
Open

Collection download: Fetch manifests from Github#611
manuq wants to merge 4 commits into
masterfrom
remote-collections

Conversation

@manuq
Copy link
Copy Markdown
Collaborator

@manuq manuq commented Jun 14, 2023

This way the software update is decoupled from content update.

Issue #610

@manuq manuq requested review from dbnicholson and dylanmccall June 14, 2023 13:32
@manuq manuq mentioned this pull request Jun 14, 2023
3 tasks
global _content_manifests_by_grade_name
global _collection_download_manager

_collection_download_manager = CollectionDownloadManager()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why the download manager initialization needs to be delayed. The only time the manifests come into play is when other code calls _collection_download_manager.from_manifest() or _collection_download_manager.from_state().

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, today the manifests are read and parsed at import time. For example when kolibri manage migrate is called and this plugin is enabled. Not ideal. Worse would it be to add an online fetch when running migrations.

"https://raw.githubusercontent.com/endlessm"
"/endless-key-collections/main/json"
"/{grade}-{name}.json"
)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems pretty dangerous to make this use the main HEAD commit at runtime. If someone commits a mistake to main or is in the process of refactoring the collections then all apps are broken immediately. I think if we want to move in this direction then we should make releases of endless-key-collections and use released artifacts.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. @pwithnall started creating releases already. There is a ZIP file with all the source. I wonder if the JSON files could be separate release artifacts so they can be requested directly. Otherwise this code should request the ZIP and unpack it.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if the JSON files could be separate release artifacts so they can be requested directly.

There’s an unbounded number of JSON files, so I’m not sure that would actually gain you anything. Seems more reliable to be able to make a single request (for the .tar.gz) and then extract all the JSON files from the well-known json/ directory inside.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okey. The other improvement I'd do is allow changing the URL in an options.ini plugin setting. That way the same app build will be able to test different collections. It can be useful for content curation and testing (passing a mock collection with fewer content to avoid long download times).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option which doesn't require releases and artifacts but is a little safer is to have a stable branch in endless-key-collections that we occasionally fast forward to a known good point on master. Then you can continue using the direct github links with more confidence that it won't randomly break.

Another thing to consider is using github pages since I expect that would scale a lot better than the raw links. We could arrange that the pages branch is only updated on release.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The collection JSON is now published to https://endlessm.github.io/endless-key-collections per endlessm/endless-key-collections#29, so I think you should change the base URL in this PR. When new tags are made in endless-key-collections, the published JSON will automatically be updated.

This will be used in next commit. Having the URL configurable will be
useful for testing.
@manuq manuq force-pushed the remote-collections branch from c16fc0f to 755ec2b Compare June 21, 2023 11:47
@manuq
Copy link
Copy Markdown
Collaborator Author

manuq commented Jun 21, 2023

Ready for review again. I have updated the URL according to #611 (comment) and made it configurable. I think this will be useful to test unreleased collections and alternate collections (quickly iterate through the process with minimum content).

manuq added 3 commits June 21, 2023 09:32
This way the software update is decoupled from content update.

Issue #610
Previously the manifest files were read from disk so it didn't matter to
read them all on module import. But now they are fetched from the
Internet. So add a decorator to all API views for initiating the global
variables and fetch collections only once on first API call.
This is not needed anymore. Remove the setting and the method to parse
local manifests.
@manuq manuq force-pushed the remote-collections branch from 755ec2b to 2bf1c51 Compare June 21, 2023 12:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants