Is borg backup via ynh backup system relevant ? (performance-wise)

For remote backup, I am currently using on my ynh servers either :

But backups are terribly slow (borg phase can take up to 20 hours for ~200gb of files even days where data variation is less than 1GB). This slowness is not a matter of bandwidth or load on remote backup server.

What I understand of backup with borg + ynh backup system

From what I understand, both approaches have the same logic through ynh backup, in two steps :

  1. tell yunohost tools backup to copy relevant files to a temp dir
  2. backup this temporary dir with borg create

Is that how it really works ?

This seems safe as apps are supposed to provide backup scripts (each app take care of providing usable backup data). For example, the admin does not have to worry about making DB dumps, this is taken care via app backup scripts. So this is handy.

What I understand from borg caching and performance

For deduplication to work, at borg create time, borg slices files data into chunks that it hashes, and keeps a hash cache to avoid re-hashing unchanged files on next backup. This hash cache use inode number in the hash key

Thus, borg seems designed to be used with files in-place and not with temporary copies of files : copying files lead to varying inode number from a backup to another. For ynh copy-based backups, all files will get re-hashed on each backup.

Expected performance issues :

ynh backup phase:

  • time, load and diskspace taken by yunohost to copy the same data each day to temp backup folder (eg : 200GB of nextcloud data may hurt)

borg create phase:

  • files cache will be bigger than necessary, stabilizing at BORG_FILES_CACHE_TTL (defaults to 21) entries per file covered by borg backups. This may lead to RAM consumption but may be negligible (did not do the math)
  • time taken to re-hash all files for deduplication on each backup

Discussion

My analysis might not be rocksolid and It is possible that other performance issues interfere also. Last, I am not sure to understand correctly how exactly borg_ynh works.

I would be highly interested about your opinions over my analysis and your own experiences with borg + ynh. (cc @ljf ? others ?)

Short-term perspective

I suspect that I could improve significantly backup performance using a borg create over my whole filesystem (with some excludes) + manually taking care of creating sql dumps beforehand (that means not using ynh backup tools at all).

Hybrid approach might be to rely on ynh backup system for core only and backup apps manually (skipping ynh backup system).

Long-term perspective

In the way ynh allows to plug-in external backup systems, maybe a way to tell the backup system (here borg) which files/folders to backup rather than providing a folder with a copy of them may be relevant to improve backup performance

1 Like

Not exactly, in fact apps give a list of files to backup, but they doesn’t copy it in a temp dir otherwise you need to have 50% free space on your server…
Only database are dumped in its tmp directory.

However for borg we need to do an extra operation, to sort files like we want in archives (was done like that for legacy reasons). To do that we use mount --bind method and hardlink to avoid copy of dir and file.

However, if we can’t do hardlink or if we can’t mount bind, the system try indeed to copy files… SO you are probably right about inode issue.

hardlink could fails if we are not on the same FS, but files pointed directly by apps with ynh_backup are often small (config file).

If you have error on mount bind you should see warning at least
https://github.com/YunoHost/yunohost/blob/dev/src/yunohost/backup.py#L1788

Note: you could also have performance issues with borg if you use same destination repo, with several computer backuped on it.

On my side my borg_ynh seems not to take too long like you. We could add time debug info in borg_ynh emails could be helpfull in many ways.

Note: if you don’t find your issue, and just want to backup files and are not interrested too much in restore behaviour, you can disable the mount operation in borg_ynh script:

2 Likes