Hello everyone,
I recently discovered Zstandard (later abbreviated to zdst) compression, which is an extremely fast (almost realtime) compression algorithm with decent compression performance.
I have a proposal concerning Yunohost backup archives.
Proposal
In Yunohost backups, replace (Tar)Gzip archives by Zstd archives.
Advantages
- (Several times) Faster backups
- (Several times) Faster restore
- Slightly smaller backups
- Potentially less CPU load (it ends faster)
Drawbacks
- This new format might confuse some users (who may need to learn new commands, even if there are very similar to gzip commands)
- Would it broke some app scripts ? Custom user scripts ? External tools ?
Motivation
Archiving and decompressing backups in Yunohost takes time, especially on unpowerful hardware (such as raspberry pi).
Yunohost still use gzip algorithm, which is now a lot slower than other standard algorithms, and not that good in terms of compression ratio. Using a more modern and efficient algorithm could provide our users faster backup&restore, while using less disk space.
Technical details - and why choose Zstandard
Compared to Gzip, according to this benchmark https://engineering.fb.com/2016/08/31/core-data/smaller-and-faster-data-compression-with-zstandard/ (privacy warning: facebook related link), it has a similar compression ratio but compresses about 5 times faster and decompresses about 3.5~4 times faster.
This benchmark also compares it with other common algorithms, such has lz4, zlib, xz.
Short summary: lz4 is lighting fast, but worse in compression ratio and zsdt if fast enough (around 7s/GB) ; zlib is slower ; xz compress more but is almost 100 times slower.
Exemple: a 291MB backup from my Yunohost gives a 68.6MB tar.gzip (after ~10 seconds), a 52.7MB tar.xz (in ~1min), and a 62.8MB tar.zst (after ~1.5s, 10% smaller than gzip).
NB: this test was done with a fast desktop computer, with fast CPU and storage. The times would be a lot longer on most of our users hardware (raspberry pi, ā¦).
Also Zstandard seems to be supported in all modern distributions, yet Iām unsure of its support status on Debian.
Which compression level is better ?
Here is a summary based on personnel benchmarks and some internet research (I can provide details if needed).
- Compression ratio is very similar between most compression levels (from 1 to 22). A significant bonus is observed for levels higher than 10~15, but the speed cost is very high for little gain. Iād suggest to keep it to level 1, to be as fast as possible.
- Decompression speed is the same no matter the compression level.