Backups: Implementing Zstandard compression instead of Gzip?

That’s good news ! :tada:

Ok, I thought it was done with bash scripts (I forgot yunohost used python) and in that case that was trivial as it was just a matter of one argument in the tar command.

If anyone is willing to implement such a feature, I’d be happy to help beta testing, but I understand it’s unlikely to happen.
(And meanwhile I could do it manually with zstd if needed)

1 Like

For the record: backups are no longer compressed since Yunohost 4.1 that is out now: YunoHost 4.1 release / Sortie de YunoHost 4.1

1 Like

btw. Whats about to use GitHub - borgbackup/borg: Deduplicating archiver with compression and authenticated encryption. ?
Some of the goals are:

  • Space efficient storage (Deduplication based on content-defined chunking)
  • All data can be protected using 256-bit AES encryption, data integrity and authenticity is verified using HMAC-SHA256
  • compression like: zstd
  • Borg can store data on any remote host accessible over SSH
  • Backups mountable as filesystems via FUSE

There’s an app (well, two) borg_ynh and borgserver_ynh

The integration in the core is ~ongoing and should be done in the next 6 months …

5 Likes

I did some test with uncompressed archives compared to compressed ones (Thanks a lot to @Maniack_Crudelis for implementing this into archivist). Here is the detailed result: Allow to choose the compression algorithm by maniackcrudelis · Pull Request #12 · maniackcrudelis/archivist · GitHub
TL;DR : a single weekly backup is ~4GB bigger (compressed backup size : roughly 5GB), almost twice as big. If we extrapolate this, 3 weekly backups + 1 monthly one would result in a ~16GB increase in storage space.
In my own case, I can’t afford such a big storage loss - in particular it prevent me from doing any further backup, because I’m out of space with another 9GB backup (I’ll have to remove an older one before creating the new one, I’m not a big fan of that).
The full backup seems to be only a few minutes longer with compression enabled (but I did not compare that precisely), for ~10 backups including 4 big ones (nextcloud, synapse, pixelfed, wordpress). Thanks Zstandard :slight_smile:

I’ll not argue against disabling compression by default, I understand the reason behind this choice and I am not in a good place to say if it was worth it or not.
But couldn’t we keep an option for that ? :pray:

Yes, c.f. YunoHost 4.1 release / Sortie de YunoHost 4.1 - #15 by Maniack_Crudelis

Not really documented anywhere in the doc for now though but that could be a small easy contribution if anybody’s interested ¯\_(ツ)_/¯

But (as indicated in this comment) this feature is buggy.

In addition is doesn’t give us the choice on a per-backup basis. That’s not a problem for me, but still I don’t get why it’s not an option in the backup command to keep old behavior (compression).

Indeed … but then that’s a bug and that should be fixed …

If you’re referring to the “old” --no-compress option, that option was ~misleading w.r.t. to the gzipped/not-gzipped. What this option actually did was to create an entire “raw” backup directory (so basically a backup not even .tar-ed). This behavior should still be reproducable with the --method copy option … but honestly i just don’t know how useful this is (and it’s not really well tested)

Alternatively, regarding compressing backups, you may still run manually the gzip command after creating the backup:

yunohost backup create my_backup_name [other option]
gzip /home/yunohost.backup/archives/my_backup_name.tar

and that’s it … of course, could have better integration to have this choice in the webadmin … But honestly if you really have issues with backup sizes, you should really consider using borg (and yeah at somepoint soon™ we’ll have borg in the core as we’re saying since like 3 years lol … :/)

That’s so great. Borg is great, but a bit daunting to setup properly with pruning and such.

1 Like

To play devils advocate here, storage is so cheap now that for most users, faster is better than smaller.

I know I can buy a 3 TB drive for about $100 US.

When things move to Borg, we can have deduplication, and pruning which helps a lot.

If a user is seriously worried about disk space, I would suggest Borg backup with Borgmatic (GitHub - witten/borgmatic: Simple, configuration-driven backup software for servers and workstations) to help with automating pruning and stuff. Borgimatic is analogous to docker and docker-compose.

I am not referring to this one (and I never used it).

That’s what I’m going to do for manual backups. And for archivist ones it’s now included, but I’ll still need to manually delete tar backups. And any of my (automated) backup will use around 20 extra GB (for >5GB compressed backup), which means I’d need to always keep ≥25GB of free space just to be able to backup… I am running fine with ~15GB free space on my VPS for almost a year, now I’ll need to migrate to another VPS (thus reinstall everything) or do the backup manually :frowning:

My point was that the old behavior could have been kept (as it was already developed and tested, I suppose it would not have been difficult… but I’m not the dev here) as an option, while use no compression by default.
That global setting is at least something (worse than a choice for each backup, but still useful), but it’s really hidden :confused:

But for a lot of users, increasing their storage space is not simple or not really possible.
And that upgrade add the need for (much) more storage, that was possibly not planned by Yunohost users before buying their servers.

And also: I don’t care if my daily/weekly backup takes even 1 more hour (that won’t happen) to complete, when it runs in the middle of the night. This cost me ~0. Lots of extra storage cost me, especially when rented with a VPS or such.

Hello,

I second what Lapineige said. I have a VPS with only 40Gb disk; and I upgraded from a 20Gb one two years ago, essentially to avoid having to delete Yunohost backups all the time - I could have only one backup at a time. Now, with 40Gb, I thought I was fine for a time. Not anymore… My backups, zipped, are roughly 4Gb, and I could have two or three before deleting one. The new backups are 6.6Gb, and for Yunohost to make them it seems I need even more. Right more I have 7.9Gb of available space and the automatic backup failed this night.

Is it planned to add an option to zip or not the automatic backups?

(anyway, it’s my first message on this forum, I’m sorry it’s for complaining, the Yunohost team do an amazing job!)

1 Like

My workaround for the moment is using the new option in archivist app (Allow to choose the compression algorithm by maniackcrudelis · Pull Request #12 · maniackcrudelis/archivist · GitHub, thanks a lot @Maniack_Crudelis for that <3).
It’s not ideal, especially because there is always a double (one in yunohost backup folder, one in archivist) that has to be removed manually.

Still, being able to compress it (and you can choose the algorithm, what a luxury :tada:) saves a ton of space, with a very small cost (the biggest backups take 2min of extra time, at 3 am that’s not a big deal…).

1 Like

For the record, compressing a Pixelfed backup, 6GB tar archive, to tar.zstd (simply using zsdt my_backup_file.tar) took only 45s and resulted in a… 200MB compressed archive, saving ~97% of the storage space ! (almost 6GB saved for a single backup…)

That was on an Hetzner CX11, the most entry-level of their VPS (1 CPU, …) - not a computing beast.
Decompressing it took 20s.

With gzip, it took 1min 43s, for a 336MB file (1,7 times bigger, 3 to 4 times slower).
Indeed, if you do 10 big backups like this (which is unusual), and have an overhead of 15-20min (or more) during the backup, that might be an issue (still, for a lot of people, the issue is more on the storage space…).
But with ZStandard, that overhead is greatly reduced. And is saving potentially tens of GB !

Not being able any more to compress backup - even with a option disabled by default - is a pity :frowning:

2 Likes

Yes, yet the current situation is imho better in terms of “not having unusable archives because somehow the whole tar/gz checksum got randomly corrupted”, which was the initial point of that change.

This is also related to the fact that the old code was doing both the archiving (tar) and compression (gz) at the same time, increasing the number of issues that may arises. Doing both thing sequentially (so compressing the backup only when it’s done) should be more robust while also being more flexible in terms of what compression algorithm to use. But it’s not that trivial because of other thing (e.g. backup_info must be able to access info.json on the fly without uncompressing the entire archive)

Anyway, as pointed out previously, volunteers time is limited, there’s only one thousand topics to deal with in parallel and it’s not really high priority, the high priority being borg… That doesn’t stop anybody from working on this, though.

Last but not least, your example seems highly biases : I’m quite surprised that you’re able to compress 6GB of archive into 200 MB … What are these data ? It sounds like it may be 6GB of non-multimedia files (or redundant files) such that it’s possible to obtain a high nice compression ratio somehow. I don’t have any quantified study for this but I’m guessing that in most cases, people either have a bunch of multimedia files that are not compressible, or not-such-a-large-amount-of-data.

1 Like

Oh, I wasn’t aware of that !
I understood that was motivated by too long backup times for some users. That’s why I thought it wasn’t a good choice, at least not to keep previous behaviour as an (deprecated ?) option.

Right now the “workflow” with archivist is fine, except that you have to create both a tar file and a compressed backup, and manually delete those .tar backups (and often automated backups fails before the backup list is complete, because you’re out of space).
But maybe that’s something we should investigate with @Maniack_Crudelis.

I realized a few minutes ago that there was a log issue that made the log file grow up to 5GB. Indeed, my example is really bad.
Yet most small backups (<1GB), such as Yunohost core for instance, are divided 3-5 times, that can make a few GB at the end. And bigger ones can be divided by ~2 on my biggest backups (wallabag going from 2.5GB down to 1.5GB, Nextcloud from 1GB to 300MB, …). A complete backup behind reduced by 5-6GB, when that makes 10 or 20% of your server storage, it counts :confused:

Regarding that point : doesn’t the external .info.json file serve that purpose ?

Yup, that speed things up, but the thing is that it’s also included in the archive itself, for the case where you copy the archive to another server (e.g. migration) and will probably forget to copy the info.json … Or maybe we should change the design, idk, just highlighting why it’s not 100% straightforward

2 Likes

Running into issue again with non compressed backup taking huge amount of space on a very limited storage space…
Some apps can no longer be backed up in fact, because it takes around 25GB (around 5 compressed) and I have only 15 left (and can’t expand, and nevertheless it’s costly).

Is there any update on this ? YunoHost 4.1 release / Sortie de YunoHost 4.1 - #15 by Maniack_Crudelis

Is the bug with the compression setting fixed ? :thinking:

How could we help to implement a single-backup option to compress it ?
What should we look for/adapt/be aware of ?

Did you try the option in general settings (in webadmin) to compress your backup ?

No, because as reported in the linked message, it was buggy at that time, and I don’t want to break my production server. That’s why I’m asking first if there way any change, either in the code or in the design decisions around such features.

So far I’m doing a normal backup then a Zstandard compression, but this means I need both [tar file file size] + [compressed archive file size] storage to finish one single backup.