Hello, Habr!
Today I want to share our experience in automating big data backups of Nextcloud storages in different configurations. I work as a service station in Molniya AK, where we are engaged in the configuration management of IT systems, Nextcloud is used for data storage. Including, with a distributed structure, with redundancy.
The problems arising from the peculiarities of installations are that there is a lot of data. The versioning that Nextcloud gives, redundancy, subjective reasons, and others create many duplicates.
Background
When administering Nextcloud, there is an acute problem of organizing an effective backup, which must be encrypted, since the data is valuable.
We offer options for storing backups with us or at the customer on its machines separate from Nextcloud, which requires a flexible automated approach to administration.
There are many clients, all of them with different configurations, and all on their sites and with their own characteristics. Here is the standard technique when the entire site belongs to you, and backups are made from the crown, it does not fit well.
First, let's look at the input data. We need:
- Scalability in terms of one node or several. For large installations we use minio as storage.
- Find out about problems with performing a backup.
- You need to keep the backup with clients and / or with us.
- Quickly and easily deal with problems.
- Clients and settings are very different from each other - there is no consistency.
- The recovery speed should be minimal in two scenarios: full recovery (disaster), one folder - erased by mistake.
- Deduplication function required.
To solve the problem of managing backups, we screwed in GitLab. More tackle.
Of course, we are not the first to solve such a problem, but it seems to us that our practical hard-won experience can be interesting and we are ready to share it.
opensource, . , . , GitHub Nextcloud, , .
.
tar + gzip β . , .
β . minio . minio β , , -. .
Borg Restic , . , , β CI/CD β GitLab.
: Nextcloud gitlab-runner. , Borg Restic.
? , , .
GitHub , Nextcloud, . , ( ) .gitlab-ci.yml
API CI/CD, . , 1d
.
GitLab , , .
-.
:
- , .
- :
- return code.
- . , .
- timeout. .
- . .
- .
- , :
- . .
- , , stdout, . CI .
- .
GitLab, , . bash.
β welcome.
. job CI/CD. , , , . S3.
β AWS ( ). minio . , .
ssh . , S3 ssh .
β S3, .
.
Borg none
, . , , , .
. , . .
prepare
testcheck
maincommand
forcepostscript
. .
Service functions
cleanup
.checklog
.ret
exit handler.checktimeout
.
Environment
VERBOSE=1
(stdout).SAVELOGSONSUCCES=1
.INIT_REPO_IF_NOT_EXIST=1
, . - .TIMEOUT
. You can set it as 'm', 'h' or 'd' at the end.
. -:
KEEP_DAILY=7
KEEP_WEEKLY=4
KEEP_MONTHLY=6
ERROR_STRING
β string for the check in log for error.EXTRACT_ERROR_STRING
β expression for show string if error.KILL_TIMEOUT_SIGNAL
β signal for killing if timeout.TAIL
β how many strings with errors on screen.COLORMSG
β color of mesage (default yellow).
, wordpress , , mysql. Nexcloud, . , , , .
Restic vs Borg
, (, .):
- . kill -9.
- .
- (, ).
- .
- S3.
- .
1,6.
.
Borg S3, fuse , goofys. Restic S3 .
Goofys , , . beta, , , (). , , , .
, β .
.
- Kill -9 .
- . Borg .
Backuper | |
---|---|
Borg | 562Gb |
Restic | 628Gb |
- CPU
borg , , goofys. 1,2 . - . Restic 0,5, Borg 200. . .
- .
Backuper | |
---|---|
Borg | 500 |
Restic | 5 |
- S3 Restic . Borg goofys , , umount . S3 , , .
- , .
Restic β 3,5 .
Borg, 100 SSD β 5 . .
Borg S3 33 . .
Borg β GET/PUT S3. . β . ( ) restic , .
.
borg.
Borgβ β zstd. gzip, . lz4.
MySQL lz4 . , , Nextcloud .
Borg β , , .
-C auto,zstd
zstd
-
560Gb 562Gb . , , 628Gb. 2 , - auto,zstd
.
, . , .
goofys --cache "--free:5%:/mnt/cache" -o allow_other --endpoint https://storage.yandexcloud.net --file-mode=0666 --dir-mode=0777 xxxxxxx.com /mnt/goofys
export BORG_PASSCOMMAND="cat /home/borg/.borg-passphrase"
borg list /mnt/goofys/borg1/
borg check --debug -p --verify-data /mnt/goofys/borg1/
(). Nextcloud . , .
.
API GitLab , , .
, , . tar.gz Bacula.