add option to disable game scan at startup#2083
Conversation
4a05061 to
9f0783a
Compare
|
It's really a good project :D |
* Do not initialize or write to db if setting is disable * dont do removeMultiDiskContent if skipScan is set
The sqlite database stores the file scan data (path, type, system) and the gamelist data. The scan behavior is the same as before, just it gets stored in sqlite when the new setting is on. When reboot happens it does not need to scan the file system and parse every gamelist xml file. I have seen some setups, with not the best storage take over 30 minutes to boot, hopefully it will help |
"Sorry, I didn't express myself clearly. I completely agree with you and I totally get what you're doing and what you're going to store. I actually worked quite a bit with SQL and PHP a few years ago—even if it's different, I’ve done some architecture work myself :) My question is more about what happens after importing all our GAMELISTs into the SQLite database. If we add a new romset for a new console, okay, we run the scan and you add it. But, for example, if I make a bunch of updates in an existing XML that has already been imported—say I found a lot of new info, player counts, corrected descriptions, etc.—and I do this on another PC, or a friend does it for me, will we be able to re-import the info and overwrite the old data? Because in Batocera, managing game info is tedious; nobody really does it except for a quick scrape... whereas on PC, we have GAMELIST management tools to refine the content, scrape, translate, and so on. |
|
I just tested this on Windows. 1/ I don't really see any performance improvement compared to the PARSE GAMELISTS ONLY option. SKIP GAME SCAN AT STARTUP Did you benchmark it ? 2/ If the PARSE GAMELISTS ONLY is unchecked, then there are no images at all for games. In the Metadata class, mRelativeTo is null so it the full path is never resolved. Also, the tables structures are not dynamic (every field is declared), so if we need to make any evolution with metadatas, we need to maintain the database, too, and make code to upgrade the database. That complexifies maintenance. I really wonder what's the interest of introducig such a thing. |
For me, I have my games on a ZFS server. Before I did lots of optimization (switch from smb to nfs, add lots of ram for L1 arc, optane drives for l2 arc, tweak the ZFS metadata cache size, 10gbe wired, etc) cold boot time used to take up to 30 minutes. Testing my setup, with 125K games, I see a 10-30x performance improvement with a warm ZFS arc cache (1-2 seconds, compared with 10-30 seconds). ZFS had 36gb of metadata cached in memory, I rebooted my storage array to simulate a cold boot and ran just now. With the sqlite db it starts up instantly (1.5 seconds), without I get 94 seconds to scan. If my setup wasn't so nice it would be much much slower. I have an ally-x also that I run ES-DE. Not the same as this, but can also see a huge benefit from disable scan at startup. On that device I have my roms on a 2tb SD card, SD card is very very slow. It takes a good 15-30 minutes everytime I start. This will be helpful for people with slow storage. I will make a pull request in their ES fork too eventually :) Let me know if it's okay to move forward with this. I can address the media path issue and can change to store metadata as json blob in sqlite |
|
I did not test with games & gamelists stored on a network share. It's possible it makes a true difference. I have no objections on your PR, but I need arguments and it has to work perfectly, and be easy to maintain ;-)
Also one of my main concern is that there is no database upgrade code. Faced with this question, I'm wondering about the relevance of the database structure you implemented. Maybe it should be managed - not with columns, but - with another "metadata" table. It's a bit more complicated to write load & save, but there would be no questions about future evolutions. If you have other suggestions. |
* drop the tables on schema change * fix media loading issue
Thank you so much for reviewing. I have a batocera machine connected to every TV in my house and really love it. I updated so the metadata goes in a json blob. Reduced the number of columns to 6, use MetaDataDecl for fields to store in the blob. Probably no schema changes ever, new metadata fields added to gameDecls are automatically picked up. id INTEGER PRIMARY KEY AUTOINCREMENT, If there is though, there is a table db_meta that tracks the schema version. If the version changes, we just drop the table and recreate. It will force a rescan. This is a transient cache for performance, its not the source of truth, so this should be okay I think Fixed the image/media path issue — added setRelativeTo(system) so relative paths resolve correctly when loading from cache. I will work on the dropdown, using a separate table for metadata, and other feedback later tonight. Please let me know if any other issues |
…tonly, database * Migrate gamelistonly to new setting * Use option list to select between modes instead of slider * Add CLI args for game loading mode * Use key value table for storing game metadata instead of json blob
|
Update to be one setting. UI uses an OptionListComponent, migrate the old setting if it exists. Existing CLI args for parse game list only setting still works and added a new cli flag for game loading mode. Made the change to use key-value for metadata instead of JSON blob. It is a little bit slower then using the JSON blobs and the db size is a little bigger for the collection I am testing with. For this use case, json blobs are okay. If there is some future use case, sqlite does support json query and index on json fields. Let me know if any other issues |
|
Given the benchmarks, and the code... I think it would be better to stay with the json blob. But, given the size of the DB, i'm really wondering what performance improvment we can really have compared to the PARSE GAMELIST ONLY option.... |
|
Did a quick test, 8464ms so its slower. The time is dominated though by RemoveMultiDiskContent calls, for the db cache we don't need to do these. In terms of size, the xml for me is about the same size as the disk. If we were to compact sqlite db it will probably be smaller, it has its own grow algorithm, there is blank space. Will change back to the json blobs and do some more perf testing later tonight. With my setup, the xml files are being read from ram, so very fast. I will try with an SD card instead. |
|
You mean, with your benchmarks, the only difference between the two options ( gamelist only / database ) is the time spent in the removeMultiDiskContent method ? |
|
I think one big issue with gamelist, is it seems like it does not get persisted unless there is metadata. Simply starting and shutting down would not produce a gamelist, there is some code to ignore if just file name. Also the I think my numbers were flawed earlier. Before each test I am running the command below to drop any fs caching.
Without running it's pretty much instant for some of the tests. Does not reflect the real world scenario though, fresh boot of batocera. SD Card - 20K Games
Network - 125K+ GamesFor these network tests, I did not flush the ZFS cache. They are on a server with 1tb of ram, 3tb of optane L2, and 12 drives with 300tb of space. Even though its over the network its fast because everything is being served straight from memory.
The DB is faster. There are worse case scenarios that I was not able to test where the db would be even faster. I am sure there is a lower threshold where just file scan would be better. SQLite is a very fast, lightweight, embedded database. It's the most used database in the world, its basically on every phone. Something I noticed, the database one falls back to fs when no results are found in the db. I can fix that. When testing this, its very nice to have instant startup. Never experience that before :) |
|
I did one more set of tests on a real machine I have connected to one of my TVs. One of the NICs in this computer has buggy linux drivers, so its kind of a cheat test. This is with official build 41, does not have the sqlite change. I tested with normal scan and with gamelist. I reboot the storage array before the cold cache tests, to start with a fresh ARC Normal
Gamelist Only
I remember on this particular machine, used to be much slower. I would take over 15 minutes, before I made lots of network and storage changes. The sqlite version would probably load every with in a second. I have not tried putting on this machine. Gamelists are public-facing — they sit in the user's ROM directories, and there's an ecosystem of third-party tools that create and manage them. Paths can be wrong, files can get corrupt, and they're exposed to user edits. The database is an internal implementation detail — users never see or touch it. If it ever gets corrupted, can just delete and rescan. It's a cache, not a source of truth. |
|
How many multidisk files do you have ( cue/m3u... ) ????? |
|
I just wrote this #2101 to boost RemoveMultiDiskContent when the PARSE GAMELISTS ONLY is ON by storing the multidisk info in the gamelist.xml. This requires at least ES to be launched/exited one time. It would be interesting to compare the two options, with this new optimization ( PARSE GAMELISTS ONLY vs USE GAME DATABASE ). |
|
This will probably work great. From my testing, time was dominated with the multi disk content. I will test later tonight. Is there a way to get it to write out the gamelist files when just a file name? https://github.com/batocera-linux/batocera-emulationstation/blob/master/es-app/src/Gamelist.cpp#L241 |
|
Thank you so much for fix this. Sorry it took me a while to test, I accidentally lost the benchmark changes, had to add back today. It seems really fast on my SD card but that didn't have a lot of multiple rom games. Network colleciton had make some changes so it was writable. In my house I have 3 batocera setups, mount roms read-only but rest on local disk. 2026-03-28 23:42:45 INFO RemoveMultiDisk[system] PROFILE: items=10714 paths=48758 jsonParseMs=23 pathResolveMs=22285 I tested just on this computer. Seems like performance is somewhat similar to before. Seems like the majority of the time is checking if files exists, for this system 48K stat calls over network. On Windows it just does string concat if starts with '.'. On non-windows though it makes the stat calls. Would it be possible to just append too for linux if prefix is '.' as well? Also on shutdown the writes for gamelists takes minutes, but thats not important. I think I never really had the gamelists working properly because of the readonly network share. |
Tested on mint machine with
-DBATOCERA=OFFand a few homebrew roms. Confirmed DB is created and log messages stating DB is being used vs scanning. Tested add/removing roms, confirmed scan does not happen when setting is turned on and manual scanning still work, updates the db