Skip to content

perf/blob io#240

Open
2e0byo wants to merge 20 commits into
EbbLabs:mainfrom
2e0byo:perf/blob-io
Open

perf/blob io#240
2e0byo wants to merge 20 commits into
EbbLabs:mainfrom
2e0byo:perf/blob-io

Conversation

@2e0byo
Copy link
Copy Markdown
Contributor

@2e0byo 2e0byo commented Apr 30, 2026

This PR is on top of #239, though if that's rejected it will rebase on top of #238.

It replaces multiple chunk-sized records with a single record for the body, allocated up front with the correct size, and then buffered with sqlite's blobopen function (which returns a file pointer to a blob record). I've kept it compatible with the old format and it continues to play cached data in that format.

This should lead to improved DB performance, as previously the DB fragmented very quickly, and hopefully will lead to reduced RAM usage.

I've also taken the opportunity to clean up dead / redundant code from the development process of the proxy.

From using the cache for several months I've noticed:

  • RAM usage sometimes kills mopidy on my raspberry pi (the ram headroom is marginal anyway and I'm running a pretty hevayweight NixOS I should tune, but it's definitely cache related)
  • if gstreamer crashes, pykka can re-spawn a thread with a now stale DB connection

Gstreamer crashes (based on seeking around aggressively) seem to go away when I used the default HTTP sink and not curl.

2e0byo added 20 commits April 28, 2026 15:16
Seeking is kept in the calling interface to avoid needing to provide an
abstraction over multiple file pointer.

Note that the type is wrong, and Blob doesn't support .readinto, annoyingly.
Sqlite already has a concept of treating a blob field like a file and buffering
writes to it, so let's use that rather than thousands of tiny rows.  That should
keep the db much more performant.

I've kept the chunking for now so it will work with the older approach, or if we
decide to split into smaller chunk sizes like 128 MiB.  But much of that code
will now be a no-op.
When I first wrote this I split the logic from the impl and developed against a
dict-based cache for simplicity.  When that cache worked in the proxy I then
parametrised tests over it and the sqlite impl until the latter reached parity.

That development cache is pointless (in fact python apparently first shipped
with sqlite because Guido didn't want to write a hashmap: it's almost performant
enough for the job), and has long been deleted.  This commit gets rid of the
ABC, which no longer offers any useful separation.  At the same time we can
commit to SQLite directly in the API, which simplifies things somewhat.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant