Skip to content

add option to disable game scan at startup#2083

Open
DevNamedZed wants to merge 7 commits into
batocera-linux:masterfrom
DevNamedZed:manual_scan_option
Open

add option to disable game scan at startup#2083
DevNamedZed wants to merge 7 commits into
batocera-linux:masterfrom
DevNamedZed:manual_scan_option

Conversation

@DevNamedZed
Copy link
Copy Markdown

@DevNamedZed DevNamedZed commented Feb 27, 2026

  • Adds SQLite game cache (gamedatabase.db) that persists discovered games across restarts
  • New "Skip game scan at startup" option in Developer Settings — loads from cache instead of filesystem walk
  • "Scan for new games" button in Game Settings for manual rescan when skip is enabled
  • Defaults to off — zero behavior change unless opted in, graceful fallback if DB is missing

Tested on mint machine with -DBATOCERA=OFF and a few homebrew roms. Confirmed DB is created and log messages stating DB is being used vs scanning. Tested add/removing roms, confirmed scan does not happen when setting is turned on and manual scanning still work, updates the db

@DevNamedZed DevNamedZed marked this pull request as ready for review March 16, 2026 04:21
@Darknior
Copy link
Copy Markdown

It's really a good project :D
When we manual scan games, are you scanning directories but also gamelist.xml to auto add all informations and media we have already config ? For exemple if we add a new rom set ready to use ....
For sure using a DB will open to some news feature too, like game stat etc ...
If you add a login option i have read, it will be cool for users that have ask for it, for exemple in familly to have different favorite games by users, or different RA account ...etc
Thanks

@DevNamedZed DevNamedZed marked this pull request as draft March 16, 2026 11:31
* Do not initialize or write to db if setting is disable
* dont do removeMultiDiskContent if skipScan is set
@DevNamedZed
Copy link
Copy Markdown
Author

It's really a good project :D When we manual scan games, are you scanning directories but also gamelist.xml to auto add all informations and media we have already config ? For exemple if we add a new rom set ready to use .... For sure using a DB will open to some news feature too, like game stat etc ... If you add a login option i have read, it will be cool for users that have ask for it, for exemple in familly to have different favorite games by users, or different RA account ...etc Thanks

The sqlite database stores the file scan data (path, type, system) and the gamelist data. The scan behavior is the same as before, just it gets stored in sqlite when the new setting is on. When reboot happens it does not need to scan the file system and parse every gamelist xml file. I have seen some setups, with not the best storage take over 30 minutes to boot, hopefully it will help

@DevNamedZed DevNamedZed marked this pull request as ready for review March 16, 2026 13:39
@Darknior
Copy link
Copy Markdown

The sqlite database stores the file scan data (path, type, system) and the gamelist data. The scan behavior is the same as before, just it gets stored in sqlite when the new setting is on. When reboot happens it does not need to scan the file system and parse every gamelist xml file. I have seen some setups, with not the best storage take over 30 minutes to boot, hopefully it will help

"Sorry, I didn't express myself clearly.

I completely agree with you and I totally get what you're doing and what you're going to store. I actually worked quite a bit with SQL and PHP a few years ago—even if it's different, I’ve done some architecture work myself :)

My question is more about what happens after importing all our GAMELISTs into the SQLite database. If we add a new romset for a new console, okay, we run the scan and you add it.

But, for example, if I make a bunch of updates in an existing XML that has already been imported—say I found a lot of new info, player counts, corrected descriptions, etc.—and I do this on another PC, or a friend does it for me, will we be able to re-import the info and overwrite the old data?

Because in Batocera, managing game info is tedious; nobody really does it except for a quick scrape... whereas on PC, we have GAMELIST management tools to refine the content, scrape, translate, and so on.

@fabricecaruso
Copy link
Copy Markdown
Collaborator

I just tested this on Windows.

1/ I don't really see any performance improvement compared to the PARSE GAMELISTS ONLY option.
Here are my bencharks for 9251 games:
PARSE GAMELISTS ONLY
2026-03-16 22:13:56 TIME 1,115
2026-03-16 22:14:20 TIME 1,249
2026-03-16 22:14:39 TIME 1,713

SKIP GAME SCAN AT STARTUP
2026-03-16 22:13:19 TIME 1,609
2026-03-16 22:15:16 TIME 1,173
2026-03-16 22:15:34 TIME 1,404
So it's almost the same performance.

Did you benchmark it ?

2/ If the PARSE GAMELISTS ONLY is unchecked, then there are no images at all for games. In the Metadata class, mRelativeTo is null so it the full path is never resolved.
It's not mergeable in the current state because of this problem with the games medias.

Also, the tables structures are not dynamic (every field is declared), so if we need to make any evolution with metadatas, we need to maintain the database, too, and make code to upgrade the database. That complexifies maintenance.

I really wonder what's the interest of introducig such a thing.

@DevNamedZed
Copy link
Copy Markdown
Author

DevNamedZed commented Mar 16, 2026

I just tested this on Windows.

1/ I don't really see any performance improvement compared to the PARSE GAMELISTS ONLY option. Here are my bencharks for 9251 games: PARSE GAMELISTS ONLY 2026-03-16 22:13:56 TIME 1,115 2026-03-16 22:14:20 TIME 1,249 2026-03-16 22:14:39 TIME 1,713

SKIP GAME SCAN AT STARTUP 2026-03-16 22:13:19 TIME 1,609 2026-03-16 22:15:16 TIME 1,173 2026-03-16 22:15:34 TIME 1,404 So it's almost the same performance.

Did you benchmark it ?

2/ If the PARSE GAMELISTS ONLY is unchecked, then there are no images at all for games. In the Metadata class, mRelativeTo is null so it the full path is never resolved. It's not mergeable in the current state because of this problem with the games medias.

Also, the tables structures are not dynamic (every field is declared), so if we need to make any evolution with metadatas, we need to maintain the database, too, and make code to upgrade the database. That complexifies maintenance.

I really wonder what's the interest of introducig such a thing.

For me, I have my games on a ZFS server. Before I did lots of optimization (switch from smb to nfs, add lots of ram for L1 arc, optane drives for l2 arc, tweak the ZFS metadata cache size, 10gbe wired, etc) cold boot time used to take up to 30 minutes.

Testing my setup, with 125K games, I see a 10-30x performance improvement with a warm ZFS arc cache (1-2 seconds, compared with 10-30 seconds).

ZFS had 36gb of metadata cached in memory, I rebooted my storage array to simulate a cold boot and ran just now. With the sqlite db it starts up instantly (1.5 seconds), without I get 94 seconds to scan. If my setup wasn't so nice it would be much much slower.

Cold NFS scan: 93,851ms (94s)
DB cache load: 1,451ms (1.5s)

I have an ally-x also that I run ES-DE. Not the same as this, but can also see a huge benefit from disable scan at startup. On that device I have my roms on a 2tb SD card, SD card is very very slow. It takes a good 15-30 minutes everytime I start. This will be helpful for people with slow storage. I will make a pull request in their ES fork too eventually :)

Let me know if it's okay to move forward with this. I can address the media path issue and can change to store metadata as json blob in sqlite

@fabricecaruso
Copy link
Copy Markdown
Collaborator

I did not test with games & gamelists stored on a network share. It's possible it makes a true difference.

I have no objections on your PR, but I need arguments and it has to work perfectly, and be easy to maintain ;-)

  • First : You most solve the image problem - it's very strange you did not notice it ! It's because as you don't call loadFromXML, mRelativeTo is null and then relative paths can't be resolved.
  • I don't like the condition "skipScan = Settings::SkipGameScanAtStartup() && !Settings::ParseGamelistOnly();" -> If you check both options, SkipGameScanAtStartup does not apply, only ParseGamelistOnly does. It's not user friendly at all. How will understand ? Maybe this should be regrouped in a single combobox, something like "LOADING MODE : NORMAL / PARSE GAMELISTS ONLY / USE GAME DATABASE"

Also one of my main concern is that there is no database upgrade code.
I mean : If we need to change the metadata structure - imagine we want to add one new metadata ( I added the "tags" metadata recently ). How do we do this ? There's no "automatic" way.

Faced with this question, I'm wondering about the relevance of the database structure you implemented.
As it stands, each metadata field is a named column, which forces us to create new columns, and write code to update the structures each time we want to add something in the metadatas.

Maybe it should be managed - not with columns, but - with another "metadata" table.
The games table would only have the bare minimum (id, system, path, type, and possibly name -> everything required that is not declared in the MetaDataId enum), and another table, game_metadata (game_id, key, value).

It's a bit more complicated to write load & save, but there would be no questions about future evolutions.

If you have other suggestions.
But this question has to be answered !

* drop the tables on schema change
* fix media loading issue
@DevNamedZed
Copy link
Copy Markdown
Author

I did not test with games & gamelists stored on a network share. It's possible it makes a true difference.

I have no objections on your PR, but I need arguments and it has to work perfectly, and be easy to maintain ;-)

  • First : You most solve the image problem - it's very strange you did not notice it ! It's because as you don't call loadFromXML, mRelativeTo is null and then relative paths can't be resolved.
  • I don't like the condition "skipScan = Settings::SkipGameScanAtStartup() && !Settings::ParseGamelistOnly();" -> If you check both options, SkipGameScanAtStartup does not apply, only ParseGamelistOnly does. It's not user friendly at all. How will understand ? Maybe this should be regrouped in a single combobox, something like "LOADING MODE : NORMAL / PARSE GAMELISTS ONLY / USE GAME DATABASE"

Also one of my main concern is that there is no database upgrade code. I mean : If we need to change the metadata structure - imagine we want to add one new metadata ( I added the "tags" metadata recently ). How do we do this ? There's no "automatic" way.

Faced with this question, I'm wondering about the relevance of the database structure you implemented. As it stands, each metadata field is a named column, which forces us to create new columns, and write code to update the structures each time we want to add something in the metadatas.

Maybe it should be managed - not with columns, but - with another "metadata" table. The games table would only have the bare minimum (id, system, path, type, and possibly name -> everything required that is not declared in the MetaDataId enum), and another table, game_metadata (game_id, key, value).

It's a bit more complicated to write load & save, but there would be no questions about future evolutions.

If you have other suggestions. But this question has to be answered !

I did not test with games & gamelists stored on a network share. It's possible it makes a true difference.

I have no objections on your PR, but I need arguments and it has to work perfectly, and be easy to maintain ;-)

  • First : You most solve the image problem - it's very strange you did not notice it ! It's because as you don't call loadFromXML, mRelativeTo is null and then relative paths can't be resolved.
  • I don't like the condition "skipScan = Settings::SkipGameScanAtStartup() && !Settings::ParseGamelistOnly();" -> If you check both options, SkipGameScanAtStartup does not apply, only ParseGamelistOnly does. It's not user friendly at all. How will understand ? Maybe this should be regrouped in a single combobox, something like "LOADING MODE : NORMAL / PARSE GAMELISTS ONLY / USE GAME DATABASE"

Also one of my main concern is that there is no database upgrade code. I mean : If we need to change the metadata structure - imagine we want to add one new metadata ( I added the "tags" metadata recently ). How do we do this ? There's no "automatic" way.

Faced with this question, I'm wondering about the relevance of the database structure you implemented. As it stands, each metadata field is a named column, which forces us to create new columns, and write code to update the structures each time we want to add something in the metadatas.

Maybe it should be managed - not with columns, but - with another "metadata" table. The games table would only have the bare minimum (id, system, path, type, and possibly name -> everything required that is not declared in the MetaDataId enum), and another table, game_metadata (game_id, key, value).

It's a bit more complicated to write load & save, but there would be no questions about future evolutions.

If you have other suggestions. But this question has to be answered !

Thank you so much for reviewing. I have a batocera machine connected to every TV in my house and really love it.

I updated so the metadata goes in a json blob. Reduced the number of columns to 6, use MetaDataDecl for fields to store in the blob. Probably no schema changes ever, new metadata fields added to gameDecls are automatically picked up.

id INTEGER PRIMARY KEY AUTOINCREMENT,
system TEXT NOT NULL,
path TEXT NOT NULL,
type INTEGER NOT NULL DEFAULT 0,
metadata TEXT,
last_synced TEXT DEFAULT CURRENT_TIMESTAMP,

If there is though, there is a table db_meta that tracks the schema version. If the version changes, we just drop the table and recreate. It will force a rescan. This is a transient cache for performance, its not the source of truth, so this should be okay I think

Fixed the image/media path issue — added setRelativeTo(system) so relative paths resolve correctly when loading from cache.

I will work on the dropdown, using a separate table for metadata, and other feedback later tonight. Please let me know if any other issues

@DevNamedZed DevNamedZed marked this pull request as draft March 17, 2026 00:40
…tonly, database

* Migrate gamelistonly to new setting
* Use option list to select between modes instead of slider
* Add CLI args for game loading mode
* Use key value table for storing game  metadata instead of json blob
@DevNamedZed
Copy link
Copy Markdown
Author

Update to be one setting. UI uses an OptionListComponent, migrate the old setting if it exists. Existing CLI args for parse game list only setting still works and added a new cli flag for game loading mode.

Made the change to use key-value for metadata instead of JSON blob. It is a little bit slower then using the JSON blobs and the db size is a little bigger for the collection I am testing with.

- JSON blob cache load: 1.5s
- Key-value cache load: 2.8s
- DB size JSON: 184MB
- DB size key-value: 222MB

For this use case, json blobs are okay. If there is some future use case, sqlite does support json query and index on json fields.

Let me know if any other issues

@DevNamedZed DevNamedZed marked this pull request as ready for review March 17, 2026 06:18
@fabricecaruso
Copy link
Copy Markdown
Collaborator

Given the benchmarks, and the code... I think it would be better to stay with the json blob.
We don't need any search or indexing on the json fields.
It's easier to maintain and understand.

But, given the size of the DB, i'm really wondering what performance improvment we can really have compared to the PARSE GAMELIST ONLY option....

@DevNamedZed
Copy link
Copy Markdown
Author

Did a quick test, 8464ms so its slower. The time is dominated though by RemoveMultiDiskContent calls, for the db cache we don't need to do these. In terms of size, the xml for me is about the same size as the disk. If we were to compact sqlite db it will probably be smaller, it has its own grow algorithm, there is blank space.

Will change back to the json blobs and do some more perf testing later tonight. With my setup, the xml files are being read from ram, so very fast. I will try with an SD card instead.

@fabricecaruso
Copy link
Copy Markdown
Collaborator

You mean, with your benchmarks, the only difference between the two options ( gamelist only / database ) is the time spent in the removeMultiDiskContent method ?
I have very few multidisk games, and my profiler tells the total time in the removeMultiDiskContent method is 11ms...
If that's the case, maybe the optimisation to do is elsewhere...

@DevNamedZed
Copy link
Copy Markdown
Author

I think one big issue with gamelist, is it seems like it does not get persisted unless there is metadata. Simply starting and shutting down would not produce a gamelist, there is some code to ignore if just file name. Also the RemoveMultiDiskContent will read .cue, gdi, etc even when gamelist only is on.

I think my numbers were flawed earlier. Before each test I am running the command below to drop any fs caching.

sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'

Without running it's pretty much instant for some of the tests. Does not reflect the real world scenario though, fresh boot of batocera.

SD Card - 20K Games

Mode Filesystem scan Load time RemoveMultiDiskContent
Normal 6,059ms 5,512ms 5,509ms
Gamelist Only 0ms 2,066ms 3,013ms
Database 13ms 339ms 0ms

Network - 125K+ Games

For these network tests, I did not flush the ZFS cache. They are on a server with 1tb of ram, 3tb of optane L2, and 12 drives with 300tb of space. Even though its over the network its fast because everything is being served straight from memory.

Mode Filesystem scan Load time RemoveMultiDiskContent
Normal 26,058ms 26,147ms 50,578ms
Gamelist Only 0ms 1,129ms 25,011ms
Database 145ms 634ms 0ms

The DB is faster. There are worse case scenarios that I was not able to test where the db would be even faster. I am sure there is a lower threshold where just file scan would be better. SQLite is a very fast, lightweight, embedded database. It's the most used database in the world, its basically on every phone.

Something I noticed, the database one falls back to fs when no results are found in the db. I can fix that.

When testing this, its very nice to have instant startup. Never experience that before :)

@DevNamedZed
Copy link
Copy Markdown
Author

I did one more set of tests on a real machine I have connected to one of my TVs. One of the NICs in this computer has buggy linux drivers, so its kind of a cheat test. This is with official build 41, does not have the sqlite change. I tested with normal scan and with gamelist. I reboot the storage array before the cold cache tests, to start with a fresh ARC

Normal

Cold cache Warm cache
Total load time 434,867ms (7.25 min) 35,908ms (36s)
RemoveMultiDiskContent 248,451ms (4.1 min) 10,618ms (10.6s)
Everything else 186,416ms (3.1 min) 25,290ms (25.3s)

Gamelist Only

Cold cache Warm cache
Total load time 457,589ms (7.6 min) 29,221ms (29s)
RemoveMultiDiskContent 445,847ms (7.4 min) 17,838ms (17.8s)
Everything else 11,742ms (11.7s) 11,383ms (11.4s)

I remember on this particular machine, used to be much slower. I would take over 15 minutes, before I made lots of network and storage changes. The sqlite version would probably load every with in a second. I have not tried putting on this machine.

Gamelists are public-facing — they sit in the user's ROM directories, and there's an ecosystem of third-party tools that create and manage them. Paths can be wrong, files can get corrupt, and they're exposed to user edits. The database is an internal implementation detail — users never see or touch it. If it ever gets corrupted, can just delete and rescan. It's a cache, not a source of truth.

@fabricecaruso
Copy link
Copy Markdown
Collaborator

fabricecaruso commented Mar 18, 2026

How many multidisk files do you have ( cue/m3u... ) ?????
What would be insteresting is to compare everything, with the IGNORE MULTIDISK CONTENT disabled.
In your environment it seems that the real problem is this multidisk processing.

@fabricecaruso
Copy link
Copy Markdown
Collaborator

I just wrote this #2101 to boost RemoveMultiDiskContent when the PARSE GAMELISTS ONLY is ON by storing the multidisk info in the gamelist.xml. This requires at least ES to be launched/exited one time.

It would be interesting to compare the two options, with this new optimization ( PARSE GAMELISTS ONLY vs USE GAME DATABASE ).

@DevNamedZed
Copy link
Copy Markdown
Author

This will probably work great. From my testing, time was dominated with the multi disk content. I will test later tonight.

Is there a way to get it to write out the gamelist files when just a file name?

https://github.com/batocera-linux/batocera-emulationstation/blob/master/es-app/src/Gamelist.cpp#L241

@DevNamedZed
Copy link
Copy Markdown
Author

Thank you so much for fix this. Sorry it took me a while to test, I accidentally lost the benchmark changes, had to add back today.

It seems really fast on my SD card but that didn't have a lot of multiple rom games. Network colleciton had make some changes so it was writable. In my house I have 3 batocera setups, mount roms read-only but rest on local disk.

2026-03-28 23:42:45 INFO RemoveMultiDisk[system] PROFILE: items=10714 paths=48758 jsonParseMs=23 pathResolveMs=22285

I tested just on this computer. Seems like performance is somewhat similar to before. Seems like the majority of the time is checking if files exists, for this system 48K stat calls over network.

On Windows it just does string concat if starts with '.'. On non-windows though it makes the stat calls.

  std::string getCanonicalPath(const std::string& _path)
  {
      if (_path.size() >= 2 && _path[0] == ':' && _path[1] == '/')
          return _path;

  #if WIN32
      std::string path = _path[0] == '.' ? getAbsolutePath(_path) : _path;
      if (path.find("./") == std::string::npos && path.find(".\\") == std::string::npos)
          return path;
  #else
      std::string path = exists(_path) ? getAbsolutePath(_path) : _path;
  #endif

Would it be possible to just append too for linux if prefix is '.' as well?

Also on shutdown the writes for gamelists takes minutes, but thats not important. I think I never really had the gamelists working properly because of the readonly network share.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants