GitLab直接整体拷贝相关的数据、配置和日志目录来实现备份实际是无法起效的,GitLab官方目前提供的唯一方式就是使用命令行方式比如gitlab-rake(GitLab 12.1之前)或者gitlab-backup命令来实现备份,但这两种方式一般都是适用于全量备份,这篇文章讨论一下如何在GitLab中实现增量备份。GitLab目前提供的增量备份,并不是严格意义上的增量备份,通过这篇文章的验证,我们来了解一下这种机制的实现和效果。
全量备份 vs 增量备份实际上有三种常见备份策略,文初图示就是中间的差分备份(Differential Backup)
策略 备份速度 磁盘使用量 备份文件对象 恢复所需文件 恢复速度 全量备份 低 高 所有文件 全量备份文件 高 差分备份 中 由中到高 有变更的文件 全量备份文件和差分备份 高 增量备份 高 低 有变更的文件 全量备份文件和其后的所有增量备份文件 低 GitLab是否支持增量备份对于这个非常简单的问题,但是回答有点绕,实际上GitLab是没有直接提供增量备份的功能的,比如使用gitlab-rake类似的命令可以在某次全量备份的基础之上直接生成从上次到指定时间点的备份数据,至少这种机制在目前还是不存在的,详细可以参看如下GitLab的一个Issue:
- https://gitlab.com/gitlab-org/gitlab-foss/-/issues/36975
上述Issue已经关闭,原因是因为如下Issue的存在:
- https://gitlab.com/gitlab-org/gitlab/-/issues/19482
此Issue已经放到backlog中,可以期待一下,但目前尚不知道何时完成,但是这种基础功能应该是会增强的。
在备份与恢复指南中,已经整理了使用的方式,详细可参看:
- https://liumiaocn.blog.csdn.net/article/details/107952592
环境的创建和准备可参看:
- https://liumiaocn.blog.csdn.net/article/details/107950120
备份源的GitLab的信息如下(本次实验中,host131对应的端口映射出来为宿主机器32001,host132为32002)
- 步骤1: 执行如下命令
执行命令:gitlab-backup create BACKUP=incremental_rsyncable GZIP_RSYNCABLE=yes
[root@host131 gitlab]# docker exec -it gitlab_gitlab_1 sh # gitlab-backup create BACKUP=incremental_rsyncable GZIP_RSYNCABLE=yes 2020-08-19 22:44:58 +0000 -- Dumping database ... Dumping PostgreSQL database gitlabhq_production ... [DONE] 2020-08-19 22:45:02 +0000 -- done 2020-08-19 22:45:02 +0000 -- Dumping repositories ... * root/webhookproject (@hashed/6b/86/6b86b273ff34fce19d6b804eff5a3f5747ada4eaa22f1d49c01e52ddb7875b4b) ... [DONE] [SKIPPED] Wiki 2020-08-19 22:45:03 +0000 -- done 2020-08-19 22:45:03 +0000 -- Dumping uploads ... 2020-08-19 22:45:03 +0000 -- done 2020-08-19 22:45:03 +0000 -- Dumping builds ... 2020-08-19 22:45:03 +0000 -- done 2020-08-19 22:45:03 +0000 -- Dumping artifacts ... 2020-08-19 22:45:03 +0000 -- done 2020-08-19 22:45:03 +0000 -- Dumping pages ... 2020-08-19 22:45:03 +0000 -- done 2020-08-19 22:45:03 +0000 -- Dumping lfs objects ... 2020-08-19 22:45:03 +0000 -- done 2020-08-19 22:45:03 +0000 -- Dumping container registry images ... 2020-08-19 22:45:03 +0000 -- [DISABLED] Creating backup archive: incremental_rsyncable_gitlab_backup.tar ... done Uploading backup archive to remote storage ... skipped Deleting tmp directories ... done done done done done done done done Deleting old backups ... skipping Warning: Your gitlab.rb and gitlab-secrets.json files contain sensitive data and are not included in this backup. You will need these files to restore a backup. Please back them up manually. Backup task is done. # cd /var/opt/gitlab/backups # ls -l incremental_rsyncable_gitlab_backup.tar -rw------- 1 git git 184320 Aug 19 22:45 incremental_rsyncable_gitlab_backup.tar #
- 步骤2: 拷贝备份文件至backups目录下,并确保权限
拷贝文件至目标机器的相应目录
[root@host131 backups]# scp incremental_rsyncable_gitlab_backup.tar 192.168.163.132:/root/gitlab/data/backups root@192.168.163.132's password: incremental_rsyncable_gitlab_backup.tar 100% 180KB 37.8MB/s 00:00 [root@host131 backups]#
设定权限
[root@host132 backups]# chmod 644 incremental_rsyncable_gitlab_backup.tar [root@host132 backups]# pwd /root/gitlab/data/backups [root@host132 backups]#
- 步骤3: 使用gitlab-ctl命令停止unicorn(或者puma)以及sidekiq服务
停止服务
[root@host132 backups]# docker exec -it gitlab_gitlab_1 sh # gitlab-ctl stop unicorn ok: down: unicorn: 0s, normally up # gitlab-ctl stop sidekiq ok: down: sidekiq: 0s, normally up #
状态确认
# gitlab-ctl status run: alertmanager: (pid 1372) 1630s; run: log: (pid 942) 1790s run: gitaly: (pid 1349) 1633s; run: log: (pid 459) 1897s run: gitlab-exporter: (pid 1326) 1634s; run: log: (pid 865) 1807s run: gitlab-workhorse: (pid 1318) 1635s; run: log: (pid 804) 1826s run: grafana: (pid 1387) 1629s; run: log: (pid 1243) 1664s run: logrotate: (pid 837) 1817s; run: log: (pid 846) 1815s run: nginx: (pid 819) 1823s; run: log: (pid 831) 1820s run: postgres-exporter: (pid 1381) 1630s; run: log: (pid 959) 1784s run: postgresql: (pid 490) 1892s; run: log: (pid 615) 1889s run: prometheus: (pid 1343) 1633s; run: log: (pid 914) 1796s run: redis: (pid 424) 1904s; run: log: (pid 439) 1903s run: redis-exporter: (pid 1329) 1634s; run: log: (pid 893) 1802s down: sidekiq: 44s, normally up; run: log: (pid 780) 1832s run: sshd: (pid 31) 1919s; run: log: (pid 30) 1919s down: unicorn: 61s, normally up; run: log: (pid 756) 1840s #
- 步骤4: 使用gitlab-backup restore进行数据恢复
# pwd /var/opt/gitlab/backups # ls -l total 180 -rw-r--r-- 1 root root 184320 Aug 19 23:02 incremental_rsyncable_gitlab_backup.tar # # gitlab-backup restore BACKUP=incremental_rsyncable Unpacking backup ... done Before restoring the database, we will remove all existing tables to avoid future upgrade problems. Be aware that if you have custom tables in the GitLab database these tables and all data will be removed. Do you want to continue (yes/no)? yes Removing all tables. Press `Ctrl-C` within 5 seconds to abort 2020-08-19 23:20:42 +0000 -- Cleaning the database ... 2020-08-19 23:20:43 +0000 -- done 2020-08-19 23:20:43 +0000 -- Restoring database ... Restoring PostgreSQL database gitlabhq_production ... SET SET SET SET SET set_config ------------ (1 row) SET SET SET SET ERROR: relation "public.u2f_registrations" does not exist ERROR: relation "public.timelogs" does not exist ...省略 ALTER TABLE ALTER TABLE [DONE] 2020-08-19 23:20:55 +0000 -- done 2020-08-19 23:20:55 +0000 -- Restoring repositories ... * root/webhookproject ... [DONE] 2020-08-19 23:20:56 +0000 -- done 2020-08-19 23:20:56 +0000 -- Restoring uploads ... 2020-08-19 23:20:56 +0000 -- done 2020-08-19 23:20:56 +0000 -- Restoring builds ... 2020-08-19 23:20:56 +0000 -- done 2020-08-19 23:20:56 +0000 -- Restoring artifacts ... 2020-08-19 23:20:56 +0000 -- done 2020-08-19 23:20:56 +0000 -- Restoring pages ... 2020-08-19 23:20:56 +0000 -- done 2020-08-19 23:20:56 +0000 -- Restoring lfs objects ... 2020-08-19 23:20:56 +0000 -- done This task will now rebuild the authorized_keys file. You will lose any data stored in the authorized_keys file. Do you want to continue (yes/no)? yes Warning: Your gitlab.rb and gitlab-secrets.json files contain sensitive data and are not included in this backup. You will need to restore these files manually. Restore task is done. #
-
步骤5: 手工恢复gitlab-secrets.json文件与gitlab.rb 此步骤在本次实验中跳过
-
步骤6: 重设、重启服务并检查
执行命令:gitlab-ctl reconfigure && gitlab-ctl restart && gitlab-rake gitlab:check SANITIZE=true
# gitlab-ctl reconfigure Starting Chef Client, version 14.14.29 resolving cookbooks for run list: ["gitlab"] Synchronizing Cookbooks: ...省略 Running handlers: Running handlers complete Chef Client finished, 4/699 resources updated in 12 seconds gitlab Reconfigured! #
# gitlab-ctl restart ok: run: alertmanager: (pid 5302) 0s ok: run: gitaly: (pid 5310) 0s ok: run: gitlab-exporter: (pid 5317) 1s ok: run: gitlab-workhorse: (pid 5319) 0s ok: run: grafana: (pid 5333) 0s ok: run: logrotate: (pid 5344) 1s ok: run: nginx: (pid 5354) 0s ok: run: postgres-exporter: (pid 5366) 1s ok: run: postgresql: (pid 5374) 0s ok: run: prometheus: (pid 5384) 1s ok: run: redis: (pid 5391) 0s ok: run: redis-exporter: (pid 5397) 0s ok: run: sidekiq: (pid 5402) 1s ok: run: sshd: (pid 5408) 0s ok: run: unicorn: (pid 5410) 1s #
# gitlab-rake gitlab:check SANITIZE=true Checking GitLab subtasks ... Checking GitLab Shell ... GitLab Shell: ... GitLab Shell version >= 12.2.0 ? ... OK (12.2.0) Running /opt/gitlab/embedded/service/gitlab-shell/bin/check Internal API available: FAILED - Internal API unreachable gitlab-shell self-check failed Try fixing it: Make sure GitLab is running; Check the gitlab-shell configuration file: sudo -u git -H editor /opt/gitlab/embedded/service/gitlab-shell/config.yml Please fix the error above and rerun the checks. Checking GitLab Shell ... Finished Checking Gitaly ... Gitaly: ... default ... OK Checking Gitaly ... Finished Checking Sidekiq ... Sidekiq: ... Running? ... no Try fixing it: sudo -u git -H RAILS_ENV=production bin/background_jobs start For more information see: doc/install/installation.md in section "Install Init Script" see log/sidekiq.log for possible errors Please fix the error above and rerun the checks. Checking Sidekiq ... Finished Checking Incoming Email ... Incoming Email: ... Reply by email is disabled in config/gitlab.yml Checking Incoming Email ... Finished Checking LDAP ... LDAP: ... LDAP is disabled in config/gitlab.yml Checking LDAP ... Finished Checking GitLab App ... Git configured correctly? ... yes Database config exists? ... yes All migrations up? ... yes Database contains orphaned GroupMembers? ... no GitLab config exists? ... yes GitLab config up to date? ... yes Log directory writable? ... yes Tmp directory writable? ... yes Uploads directory exists? ... yes Uploads directory has correct permissions? ... yes Uploads directory tmp has correct permissions? ... skipped (no tmp uploads folder yet) Init script exists? ... skipped (omnibus-gitlab has no init script) Init script up-to-date? ... skipped (omnibus-gitlab has no init script) Projects have namespace: ... 1/1 ... yes Redis version >= 4.0.0? ... yes Ruby version >= 2.5.3 ? ... yes (2.6.5) Git version >= 2.22.0 ? ... yes (2.26.2) Git user has default SSH configuration? ... yes Active users: ... 1 Is authorized keys file accessible? ... yes Checking GitLab App ... Finished Checking GitLab subtasks ... Finished #
注:虽然提示了sidekiq没有在Running的阶段,gitlab-ctl status确认状态无误,大概是正在启动中导致,因为这个过程中此恢复数据的GitLab服务出现了502错误,但是等了一阵就能正常动作了
登录备份后的GitLab服务(本次实验中,host131对应的端口映射出来为宿主机器32001,host132为32002),可以看到数据已经恢复
现在才真正开始确认增量备份,重新执行一次备份,信息如下所示
# gitlab-backup create BACKUP=incremental_rsyncable GZIP_RSYNCABLE=yes 2020-08-19 23:43:37 +0000 -- Dumping database ... Dumping PostgreSQL database gitlabhq_production ... [DONE] 2020-08-19 23:43:43 +0000 -- done 2020-08-19 23:43:43 +0000 -- Dumping repositories ... * root/webhookproject (@hashed/6b/86/6b86b273ff34fce19d6b804eff5a3f5747ada4eaa22f1d49c01e52ddb7875b4b) ... [DONE] [SKIPPED] Wiki 2020-08-19 23:43:44 +0000 -- done 2020-08-19 23:43:44 +0000 -- Dumping uploads ... 2020-08-19 23:43:44 +0000 -- done 2020-08-19 23:43:44 +0000 -- Dumping builds ... 2020-08-19 23:43:44 +0000 -- done 2020-08-19 23:43:44 +0000 -- Dumping artifacts ... 2020-08-19 23:43:44 +0000 -- done 2020-08-19 23:43:44 +0000 -- Dumping pages ... 2020-08-19 23:43:44 +0000 -- done 2020-08-19 23:43:44 +0000 -- Dumping lfs objects ... 2020-08-19 23:43:44 +0000 -- done 2020-08-19 23:43:44 +0000 -- Dumping container registry images ... 2020-08-19 23:43:44 +0000 -- [DISABLED] Creating backup archive: incremental_rsyncable_gitlab_backup.tar ... done Uploading backup archive to remote storage ... skipped Deleting tmp directories ... done done done done done done done done Deleting old backups ... skipping Warning: Your gitlab.rb and gitlab-secrets.json files contain sensitive data and are not included in this backup. You will need these files to restore a backup. Please back them up manually. Backup task is done. # ls -l incrementa* -rw------- 1 git git 184320 Aug 19 23:43 incremental_rsyncable_gitlab_backup.tar #
执行的过程中时间和结果分析上并未发现有太多区别,由于上次使用了scp进行拷贝,此次使用官方提示的能够实现incremental的rsync,第一次结果如下所示:
# rsync -vzrtopg --progress incremental_rsyncable_gitlab_backup.tar 192.168.163.132:/root/gitlab/data/backups root@192.168.163.132's password: sending incremental file list incremental_rsyncable_gitlab_backup.tar 184,320 100% 25.02MB/s 0:00:00 (xfr#1, to-chk=0/1) sent 131 bytes received 1,619 bytes 388.89 bytes/sec total size is 184,320 speedup is 105.33 #
第二次再次传输(此处并未再次执行gitlab-backup)
# rsync -vzrtopg --progress incremental_rsyncable_gitlab_backup.tar 192.168.163.132:/root/gitlab/data/backups root@192.168.163.132's password: sending incremental file list sent 88 bytes received 12 bytes 22.22 bytes/sec total size is 184,320 speedup is 1,843.20 #
重新执行backup,没有看出来差分备份如何体现的
# date Wed Aug 19 23:54:46 UTC 2020 # gitlab-backup create BACKUP=incremental_rsyncable GZIP_RSYNCABLE=yes 2020-08-19 23:55:44 +0000 -- Dumping database ... Dumping PostgreSQL database gitlabhq_production ... [DONE] 2020-08-19 23:55:51 +0000 -- done 2020-08-19 23:55:51 +0000 -- Dumping repositories ... * root/webhookproject (@hashed/6b/86/6b86b273ff34fce19d6b804eff5a3f5747ada4eaa22f1d49c01e52ddb7875b4b) ... [DONE] [SKIPPED] Wiki 2020-08-19 23:55:52 +0000 -- done 2020-08-19 23:55:52 +0000 -- Dumping uploads ... 2020-08-19 23:55:52 +0000 -- done 2020-08-19 23:55:52 +0000 -- Dumping builds ... 2020-08-19 23:55:52 +0000 -- done 2020-08-19 23:55:52 +0000 -- Dumping artifacts ... 2020-08-19 23:55:52 +0000 -- done 2020-08-19 23:55:52 +0000 -- Dumping pages ... 2020-08-19 23:55:52 +0000 -- done 2020-08-19 23:55:52 +0000 -- Dumping lfs objects ... 2020-08-19 23:55:53 +0000 -- done 2020-08-19 23:55:53 +0000 -- Dumping container registry images ... 2020-08-19 23:55:53 +0000 -- [DISABLED] Creating backup archive: incremental_rsyncable_gitlab_backup.tar ... done Uploading backup archive to remote storage ... skipped Deleting tmp directories ... done done done done done done done done Deleting old backups ... skipping Warning: Your gitlab.rb and gitlab-secrets.json files contain sensitive data and are not included in this backup. You will need these files to restore a backup. Please back them up manually. Backup task is done. # date Wed Aug 19 23:55:57 UTC 2020 #
再次执行rsync传输,只有一个speed up
# rsync -vzrtopg --progress incremental_rsyncable_gitlab_backup.tar 192.168.163.132:/root/gitlab/data/backups root@192.168.163.132's password: sending incremental file list incremental_rsyncable_gitlab_backup.tar 184,320 100% 10.78MB/s 0:00:00 (xfr#1, to-chk=0/1) sent 2,057 bytes received 1,619 bytes 816.89 bytes/sec total size is 184,320 speedup is 50.14 #总结
目前阶段的GitLab增量备份,可以考虑使用SKIP=tar的方式,再结合使用rsync,效果可能更好。但是也只是实现了传输的增量方式。