I recently discovered Zstandard (later abbreviated to zdst) compression, which is an extremely fast (almost realtime) compression algorithm with decent compression performance.
I have a proposal concerning Yunohost backup archives.
In Yunohost backups, replace (Tar)Gzip archives by Zstd archives.
- (Several times) Faster backups
- (Several times) Faster restore
- Slightly smaller backups
- Potentially less CPU load (it ends faster)
- This new format might confuse some users (who may need to learn new commands, even if there are very similar to gzip commands)
- Would it broke some app scripts ? Custom user scripts ? External tools ?
Archiving and decompressing backups in Yunohost takes time, especially on unpowerful hardware (such as raspberry pi).
Yunohost still use gzip algorithm, which is now a lot slower than other standard algorithms, and not that good in terms of compression ratio. Using a more modern and efficient algorithm could provide our users faster backup&restore, while using less disk space.
Technical details - and why choose Zstandard
Compared to Gzip, according to this benchmark https://engineering.fb.com/2016/08/31/core-data/smaller-and-faster-data-compression-with-zstandard/ (privacy warning: facebook related link), it has a similar compression ratio but compresses about 5 times faster and decompresses about 3.5~4 times faster.
This benchmark also compares it with other common algorithms, such has lz4, zlib, xz.
Short summary: lz4 is lighting fast, but worse in compression ratio and zsdt if fast enough (around 7s/GB) ; zlib is slower ; xz compress more but is almost 100 times slower.
Exemple: a 291MB backup from my Yunohost gives a 68.6MB tar.gzip (after ~10 seconds), a 52.7MB tar.xz (in ~1min), and a 62.8MB tar.zst (after ~1.5s, 10% smaller than gzip).
NB: this test was done with a fast desktop computer, with fast CPU and storage. The times would be a lot longer on most of our users hardware (raspberry pi, …).
Also Zstandard seems to be supported in all modern distributions, yet I’m unsure of its support status on Debian.
Which compression level is better ?
Here is a summary based on personnel benchmarks and some internet research (I can provide details if needed).
- Compression ratio is very similar between most compression levels (from 1 to 22). A significant bonus is observed for levels higher than 10~15, but the speed cost is very high for little gain. I’d suggest to keep it to level 1, to be as fast as possible.
- Decompression speed is the same no matter the compression level.