For remote backup, I am currently using on my ynh servers either :
- borg_ynh app (that relies on ynh backup system (apps+core))
- custom scripts of my own ; also relying on ynh backup system (apps+core)
But backups are terribly slow (borg phase can take up to 20 hours for ~200gb of files even days where data variation is less than 1GB). This slowness is not a matter of bandwidth or load on remote backup server.
What I understand of backup with borg + ynh backup system
From what I understand, both approaches have the same logic through ynh backup, in two steps :
- tell
yunohost tools backup
to copy relevant files to a temp dir - backup this temporary dir with
borg create
Is that how it really works ?
This seems safe as apps are supposed to provide backup scripts (each app take care of providing usable backup data). For example, the admin does not have to worry about making DB dumps, this is taken care via app backup scripts. So this is handy.
What I understand from borg caching and performance
For deduplication to work, at borg create
time, borg slices files data into chunks that it hashes, and keeps a hash cache to avoid re-hashing unchanged files on next backup. This hash cache use inode number in the hash key
Thus, borg seems designed to be used with files in-place and not with temporary copies of files : copying files lead to varying inode number from a backup to another. For ynh copy-based backups, all files will get re-hashed on each backup.
Expected performance issues :
ynh backup phase:
- time, load and diskspace taken by yunohost to copy the same data each day to temp backup folder (eg : 200GB of nextcloud data may hurt)
borg create phase:
- files cache will be bigger than necessary, stabilizing at
BORG_FILES_CACHE_TTL
(defaults to 21) entries per file covered by borg backups. This may lead to RAM consumption but may be negligible (did not do the math) - time taken to re-hash all files for deduplication on each backup
Discussion
My analysis might not be rocksolid and It is possible that other performance issues interfere also. Last, I am not sure to understand correctly how exactly borg_ynh works.
I would be highly interested about your opinions over my analysis and your own experiences with borg + ynh. (cc @ljf ? others ?)
Short-term perspective
I suspect that I could improve significantly backup performance using a borg create
over my whole filesystem (with some excludes) + manually taking care of creating sql dumps beforehand (that means not using ynh backup tools at all).
Hybrid approach might be to rely on ynh backup system for core only and backup apps manually (skipping ynh backup system).
Long-term perspective
In the way ynh allows to plug-in external backup systems, maybe a way to tell the backup system (here borg) which files/folders to backup rather than providing a folder with a copy of them may be relevant to improve backup performance