Ok, I thought it was done with bash scripts (I forgot yunohost used python) and in that case that was trivial as it was just a matter of one argument in the tar command.
If anyone is willing to implement such a feature, I’d be happy to help beta testing, but I understand it’s unlikely to happen.
(And meanwhile I could do it manually with zstd if needed)
I did some test with uncompressed archives compared to compressed ones (Thanks a lot to @Maniack_Crudelis for implementing this into archivist). Here is the detailed result: Allow to choose the compression algorithm by maniackcrudelis · Pull Request #12 · maniackcrudelis/archivist · GitHub
TL;DR : a single weekly backup is ~4GB bigger (compressed backup size : roughly 5GB), almost twice as big. If we extrapolate this, 3 weekly backups + 1 monthly one would result in a ~16GB increase in storage space.
In my own case, I can’t afford such a big storage loss - in particular it prevent me from doing any further backup, because I’m out of space with another 9GB backup (I’ll have to remove an older one before creating the new one, I’m not a big fan of that).
The full backup seems to be only a few minutes longer with compression enabled (but I did not compare that precisely), for ~10 backups including 4 big ones (nextcloud, synapse, pixelfed, wordpress). Thanks Zstandard
I’ll not argue against disabling compression by default, I understand the reason behind this choice and I am not in a good place to say if it was worth it or not.
But couldn’t we keep an option for that ?
But (as indicated in this comment) this feature is buggy.
In addition is doesn’t give us the choice on a per-backup basis. That’s not a problem for me, but still I don’t get why it’s not an option in the backup command to keep old behavior (compression).
Indeed … but then that’s a bug and that should be fixed …
If you’re referring to the “old” --no-compress option, that option was ~misleading w.r.t. to the gzipped/not-gzipped. What this option actually did was to create an entire “raw” backup directory (so basically a backup not even .tar-ed). This behavior should still be reproducable with the --method copy option … but honestly i just don’t know how useful this is (and it’s not really well tested)
Alternatively, regarding compressing backups, you may still run manually the gzip command after creating the backup:
and that’s it … of course, could have better integration to have this choice in the webadmin … But honestly if you really have issues with backup sizes, you should really consider using borg (and yeah at somepoint soon™ we’ll have borg in the core as we’re saying since like 3 years lol … :/)
I am not referring to this one (and I never used it).
That’s what I’m going to do for manual backups. And for archivist ones it’s now included, but I’ll still need to manually delete tar backups. And any of my (automated) backup will use around 20 extra GB (for >5GB compressed backup), which means I’d need to always keep ≥25GB of free space just to be able to backup… I am running fine with ~15GB free space on my VPS for almost a year, now I’ll need to migrate to another VPS (thus reinstall everything) or do the backup manually
My point was that the old behavior could have been kept (as it was already developed and tested, I suppose it would not have been difficult… but I’m not the dev here) as an option, while use no compression by default.
That global setting is at least something (worse than a choice for each backup, but still useful), but it’s really hidden
But for a lot of users, increasing their storage space is not simple or not really possible.
And that upgrade add the need for (much) more storage, that was possibly not planned by Yunohost users before buying their servers.
And also: I don’t care if my daily/weekly backup takes even 1 more hour (that won’t happen) to complete, when it runs in the middle of the night. This cost me ~0. Lots of extra storage cost me, especially when rented with a VPS or such.
I second what Lapineige said. I have a VPS with only 40Gb disk; and I upgraded from a 20Gb one two years ago, essentially to avoid having to delete Yunohost backups all the time - I could have only one backup at a time. Now, with 40Gb, I thought I was fine for a time. Not anymore… My backups, zipped, are roughly 4Gb, and I could have two or three before deleting one. The new backups are 6.6Gb, and for Yunohost to make them it seems I need even more. Right more I have 7.9Gb of available space and the automatic backup failed this night.
Is it planned to add an option to zip or not the automatic backups?
(anyway, it’s my first message on this forum, I’m sorry it’s for complaining, the Yunohost team do an amazing job!)
Still, being able to compress it (and you can choose the algorithm, what a luxury ) saves a ton of space, with a very small cost (the biggest backups take 2min of extra time, at 3 am that’s not a big deal…).
For the record, compressing a Pixelfed backup, 6GB tar archive, to tar.zstd (simply using zsdt my_backup_file.tar) took only 45s and resulted in a… 200MB compressed archive, saving ~97% of the storage space ! (almost 6GB saved for a single backup…)
That was on an Hetzner CX11, the most entry-level of their VPS (1 CPU, …) - not a computing beast.
Decompressing it took 20s.
With gzip, it took 1min 43s, for a 336MB file (1,7 times bigger, 3 to 4 times slower).
Indeed, if you do 10 big backups like this (which is unusual), and have an overhead of 15-20min (or more) during the backup, that might be an issue (still, for a lot of people, the issue is more on the storage space…).
But with ZStandard, that overhead is greatly reduced. And is saving potentially tens of GB !
Not being able any more to compress backup - even with a option disabled by default - is a pity
Yes, yet the current situation is imho better in terms of “not having unusable archives because somehow the whole tar/gz checksum got randomly corrupted”, which was the initial point of that change.
This is also related to the fact that the old code was doing both the archiving (tar) and compression (gz) at the same time, increasing the number of issues that may arises. Doing both thing sequentially (so compressing the backup only when it’s done) should be more robust while also being more flexible in terms of what compression algorithm to use. But it’s not that trivial because of other thing (e.g. backup_info must be able to access info.json on the fly without uncompressing the entire archive)
Anyway, as pointed out previously, volunteers time is limited, there’s only one thousand topics to deal with in parallel and it’s not really high priority, the high priority being borg… That doesn’t stop anybody from working on this, though.
Last but not least, your example seems highly biases : I’m quite surprised that you’re able to compress 6GB of archive into 200 MB … What are these data ? It sounds like it may be 6GB of non-multimedia files (or redundant files) such that it’s possible to obtain a high nice compression ratio somehow. I don’t have any quantified study for this but I’m guessing that in most cases, people either have a bunch of multimedia files that are not compressible, or not-such-a-large-amount-of-data.
Oh, I wasn’t aware of that !
I understood that was motivated by too long backup times for some users. That’s why I thought it wasn’t a good choice, at least not to keep previous behaviour as an (deprecated ?) option.
Right now the “workflow” with archivist is fine, except that you have to create both a tar file and a compressed backup, and manually delete those .tar backups (and often automated backups fails before the backup list is complete, because you’re out of space).
But maybe that’s something we should investigate with @Maniack_Crudelis.
I realized a few minutes ago that there was a log issue that made the log file grow up to 5GB. Indeed, my example is really bad.
Yet most small backups (<1GB), such as Yunohost core for instance, are divided 3-5 times, that can make a few GB at the end. And bigger ones can be divided by ~2 on my biggest backups (wallabag going from 2.5GB down to 1.5GB, Nextcloud from 1GB to 300MB, …). A complete backup behind reduced by 5-6GB, when that makes 10 or 20% of your server storage, it counts
Regarding that point : doesn’t the external .info.json file serve that purpose ?
Yup, that speed things up, but the thing is that it’s also included in the archive itself, for the case where you copy the archive to another server (e.g. migration) and will probably forget to copy the info.json … Or maybe we should change the design, idk, just highlighting why it’s not 100% straightforward
Running into issue again with non compressed backup taking huge amount of space on a very limited storage space…
Some apps can no longer be backed up in fact, because it takes around 25GB (around 5 compressed) and I have only 15 left (and can’t expand, and nevertheless it’s costly).
No, because as reported in the linked message, it was buggy at that time, and I don’t want to break my production server. That’s why I’m asking first if there way any change, either in the code or in the design decisions around such features.
So far I’m doing a normal backup then a Zstandard compression, but this means I need both [tar file file size] + [compressed archive file size] storage to finish one single backup.