Skip to content
chrisadamsonmcri edited this page Oct 22, 2025 · 2 revisions

Archiving means placing multiple files into an archive, a file that contains other files.

There are two archive formats we use for archiving:

  1. 7zip archives: extension 7z
  2. tar archives: extention tar

Advantages/Disadvantages:

  • 7zip doesnt store user/group information, extracted files are owned by the extractor. tar stores user/group information and can be restored on extraction.
  • 7zip doesnt handle hardlinks as hardlinks, files are duplicated. tar only stores one copy of hardlinked data, links are stored as links
  • 7zip allows different compression schemes per-file. tar files are uncompressed, must be compressed by single bitstream compressor (see below). Compressed tar files have extension tar.XXX where XXX represents the compressor used. A compressed tar file is called a tarball.
  • 7zip allows easier modification of archive (deleting/renaming/adding). tar files must be decompressed and recreated (technically 7zip does this as well but the program does it for you rather than manually)
  • 7zip supports symbolic links, but doesnt like broken symbolic links, tar will handle them
  • Listing a 7zip file contents in instant, tar files must be decompressed

Bitstream compressors:

  • lzma2: slow compression, best compression for random data (images, track files), medium decompression that can be multithreaded
  • bzip2: slow compression, best compression most of the time for text files, slow decompression, single threaded only
  • deflate (gzip): used for nifti and mgh files. Superseded, don't use.
  • zstd: fast compression, can be multithreaded, excellent compression given high speed, very fast decompression
  • lz4: ultra faster compression, can be multithreaded, compression not as good, ultra fast decompression

When creating archives, we generally want to do a few things:

  • Make an archive of a root directory that contains files, for example freesurfer.7z for a directory called freesurfer.
  • Make the archive not bigger than, say 10TB. If a directory contains many medium/large directories, compress each child directory. This means that it is easier to transfer/recover individual directories rather than transferring one huge >10TB file. It also means that less data is lost upon a corruption.

When to use tar or 7z. 7z is better for most operations once the archive is created and good for directories that have compressible and already compressed data. For example, a Freesurfer output will have the following directories:

  • label: text files, compressible
  • mri: mgz, m3z, .nii.gz are not compressible
  • stats: text files, compressible
  • surf: compressible

So, we add label, stats, surf with a slow compressor, to save space, then add mri without compression for speed.

To make a 7zip archive:

`~/kg98/Shared/archive_scripts/directory_7zz.job <foo> [level]`

Level is the compression level for the slow lzma2 compressor, it can be 1 (fastest), 3, 5, 7, 9 (slowest). Anywhere from 1 to 7 is recommended, 9 is too slow.

If you have text files with extension .tsv, .csv, .txt you can create an archive with those compressed using bzip2

~/kg98/Shared/archive_scripts/directory_7zz_csvtxt_bzip2.job <foo> [level]

To make a .tar.zst tarball

~/kg98/Shared/archive_scripts/directory_tarzst.job <foo> [level]

Level here can be any number from 1 (fast) to 19 (slow). Recommended range is from 3 to 15.

When these scripts complete, they make a .done file for each archive. So .7z.done. To delete the directories that were archived in the current working directory:

~/kg98/Shared/archive_scripts/remove_done_7z.sh for 7z archives
~/kg98/Shared/archive_scripts/remove_done_tarzst.sh for 7z archives

To make the file listings for all archives in the current directory and below, run:

~/kg98/Shared/archive_scripts/all_archives_listings.sh

Clone this wiki locally