Skip to content

Reduce assets size #6963

@ShuffleWire

Description

@ShuffleWire

Hello team,

I've just discovered that the size of repo is huge, mainly due to the size of many media files, mainly due to format choice. If I can pick a few example, those are from the documentation, but others should exist as well.

webots/docs/blog/images/coupled_motors.gif is 33MB, I'm able as it to reencode to 21MB without chaning anything (free cookies !) but if I change it to mp4, I getting to 0.8MB, which is about 30x reduce

To get an idea of the amount of png files, I did ran :
find . -type f -iname "*.gif" -printf "%s\n" | awk '{total += $0} END {print "Total log size: " total/1024/1024 " MB"}'
and the output it 83.1688 MB
If we were to convert all to jpg, we could save about 83*(29/30) = 80M of data, which is about 8% of the repo size

webots/docs/blog/images/appearances/Pcb.png is 3.9MB, compressing it as JPEG give 1M, and I don't think we need anyway 1690x1690 pixels for AFAIK a icon.
Most of the image are PNG based, and changing to JPEG could give a 4x reduction in size

Again to get an idea of the amount of png files, I did ran :
find . -type f -iname "*.png" -printf "%s\n" | awk '{total += $0} END {print "Total log size: " total/1024/1024 " MB"}'
and the output it 339.492 MB
If we were to convert all to jpg, we could save about 339.492 *3/4 = 250M of data, which is about 25% of the repo size

FYI : the size of the jpg, which seems to be mainly for texture (so not easy to compress, I guess) are 551.438 MB (50% the repo size)
find . -type f -iname "*.jpg" -printf "%s\n" | awk '{total += $0} END {print "Total log size: " total/1024/1024 " MB"}'
(544 MB of it live in projects : splitted in 138MB in projects/appearances, and 281MB in project/objects, the reminder being splitted almost uniformly across the other subfolders of projects)

What do you think, those thing are low hanging fruit, we could also discuss about removing some assets (or resizing) depending on the use case.

Are there anything preventing it ?
I guess the hardest part would be to track every reference of those paths to update them accordingly...

Don't forget that the saving of disk space should be almost twice that, because every file is also stored in the .git subfolder.
I do know that any modification would still leave in the git history (could we prune that ?) but it would help for more casual developper that could do

git clone --single-branch --branch master https://github.com/cyberbotics/webots.git --depth=1
(cloning only from last commit, not the full history), or even a blobless clone (old files not downloaded, only if the user request it) (on my machine, doing so I end up with a 1.1G repos, instead of the 4.5G repos with a regular clone, that could help a lot also for user with slow hardware / internet)

Anyway reducing file size always bring benefit, even on the bandwith / cache aspect, so it's IMO good looking into that

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationImprove or fix the documentation (MD files only, no software development)

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions