perf/blob io#240
Open
2e0byo wants to merge 20 commits into
Open
Conversation
Seeking is kept in the calling interface to avoid needing to provide an abstraction over multiple file pointer. Note that the type is wrong, and Blob doesn't support .readinto, annoyingly.
Sqlite already has a concept of treating a blob field like a file and buffering writes to it, so let's use that rather than thousands of tiny rows. That should keep the db much more performant. I've kept the chunking for now so it will work with the older approach, or if we decide to split into smaller chunk sizes like 128 MiB. But much of that code will now be a no-op.
When I first wrote this I split the logic from the impl and developed against a dict-based cache for simplicity. When that cache worked in the proxy I then parametrised tests over it and the sqlite impl until the latter reached parity. That development cache is pointless (in fact python apparently first shipped with sqlite because Guido didn't want to write a hashmap: it's almost performant enough for the job), and has long been deleted. This commit gets rid of the ABC, which no longer offers any useful separation. At the same time we can commit to SQLite directly in the API, which simplifies things somewhat.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR is on top of #239, though if that's rejected it will rebase on top of #238.
It replaces multiple chunk-sized records with a single record for the body, allocated up front with the correct size, and then buffered with sqlite's blobopen function (which returns a file pointer to a blob record). I've kept it compatible with the old format and it continues to play cached data in that format.
This should lead to improved DB performance, as previously the DB fragmented very quickly, and hopefully will lead to reduced RAM usage.
I've also taken the opportunity to clean up dead / redundant code from the development process of the proxy.
From using the cache for several months I've noticed:
Gstreamer crashes (based on seeking around aggressively) seem to go away when I used the default HTTP sink and not curl.