Postgresql備份與增量恢復

備份 PostgreSQL · 發表 2018-10-14 19:45:58

摘要：之前，我們在《Postgresql主從非同步流複製方案》一節中，部署了Postgresql的主從非同步流複製環境。主從複製的目的是為了實現資料的備份，實現資料的高可用性和容錯行。下面主要簡單地介紹下我們運維Postgresql資料庫時的場景備份與恢復方案。增量備份 P...

之前，我們在ofollow,noindex" target="_blank">《Postgresql主從非同步流複製方案》一節中，部署了Postgresql的主從非同步流複製環境。主從複製的目的是為了實現資料的備份，實現資料的高可用性和容錯行。下面主要簡單地介紹下我們運維Postgresql資料庫時的場景備份與恢復方案。

增量備份

PostgreSQL在做寫入操作時，對資料檔案做的任何修改資訊，首先會寫入WAL日誌（預寫日誌），然後才會對資料檔案做物理修改。當資料庫伺服器掉重啟時，PostgreSQL在啟動時會首先讀取WAL日誌，對資料檔案進行恢復。因此，從理論上講，如果我們有一個數據庫的基礎備份（也稱為全備），再配合WAL日誌，是可以將資料庫恢復到任意時間點的。

上面的知識點很重要，因為我們場景的增量備份說白了就是通過基礎備份 +增量WAL日誌 進行重做 恢復的。

增量備份設定

為了演示相關功能，我們基於《Postgresql主從非同步流複製方案》一節中的環境pghost1伺服器上，創建相關管理目錄

切換到 postgres 使用者下

mkdir -p /data/pg10/backups
mkdir -p /data/pg10/archive_wals

backups目錄則可以用來存放基礎備份

archive_wals目錄自然用來存放歸檔了

接下來我們修改我們的postgresql.conf檔案的相關設定

wal_level = replica

archive_mode = on

archive_command = '/usr/bin/lz4 -q -z %p /data/pg10/archive_wals/%f.lz4'

archive_command 引數的預設值是個空字串，它的值可以是一條shell命令或者一個複雜的shell指令碼。

在archive_command的shell命令或指令碼中可以用%p 表示將要歸檔的WAL檔案的包含完整路徑資訊的檔名，用%f 代表不包含路徑資訊的WAL檔案的檔名。

修改wal_level和archive_mode引數都需要重新啟動資料庫才可以生效，修改archive_command不需要重啟，只需要reload即可，例如：

postgres=# SELECT pg_reload_conf();

postgres=# show archive_command ;

建立基礎備份

我們使用之前介紹過的pg_basebackup命令進行基礎備份的建立，基礎備份很重要，我們的資料恢復不能沒有它，建議我們根據相關業務策略，週期性生成我們的基礎備份。

$ pg_basebackup -Ft -Pv -Xf -z -Z5 -p 25432 -D /data/pg10/backups/

這樣，我們就成功生成我們的基礎資料備份了

設定還原點

一般我們需要根據重要事件發生時建立一個還原點，通過基礎備份和歸檔恢復到事件發生之前的狀態。

建立還原點的系統函式為：pg_create_restore_point，它的定義如下：

postgres=#SELECT pg_create_restore_point('domac-201810141800');

恢復到指定還原點

接下來，我們通過一個示例，讓我們的資料還原到我們設定的還原點上

首先，我們建立一張測試表：

CREATE TABLE test_restore(
id SERIAL PRIMARY KEY,
ival INT NOT NULL DEFAULT 0,
description TEXT,
created_time TIMESTAMPTZ NOT NULL DEFAULT now()
);

初始化一些測試資料作為基礎資料，如下所示：

postgres=# INSERT INTO test_restore (ival) VALUES (1);
INSERT 0 1
postgres=# INSERT INTO test_restore (ival) VALUES (2);
INSERT 0 1
postgres=# INSERT INTO test_restore (ival) VALUES (3);
INSERT 0 1
postgres=# INSERT INTO test_restore (ival) VALUES (4);
INSERT 0 1

postgres=# select * from test_restore;
 id | ival | description |created_time
----+------+-------------+-------------------------------
1 |1 || 2018-10-14 11:13:41.57154+00
2 |2 || 2018-10-14 11:13:44.250221+00
3 |3 || 2018-10-14 11:13:46.311291+00
4 |4 || 2018-10-14 11:13:48.820479+00
(4 rows)

並且按照上文的方法建立一個基礎備份。如果是測試，有一點需要注意，由於WAL檔案是寫滿16MB才會進行歸檔，測試階段可能寫入會非常少，可以在執行完基礎備份 之後，手動進行一次WAL切換。例如：

postgres=# select pg_switch_wal();
 pg_switch_wal
---------------
 0/1D01B858
(1 row)

或者通過設定archive_timeout引數，在達到timeout閾值時強行切換到新的WAL段。

接下來，建立一個還原點，如下所示：

postgres=# select pg_create_restore_point('domac-1014');
 pg_create_restore_point
-------------------------
 0/1E0001A8
(1 row)

接下來我們對資料做一些變更, 我們刪除test_restore的所有資料：

postgres=# delete from test_restore;
DELETE 4

下面進行恢復到名稱為“domac-1014”還原點的實驗，如下所示：

停止資料庫

$ pg_ctl stop -D /data/pg10/db

移除舊的資料目錄

$ rm -rf /data/pg10/db

$ mkdir db && chmod 0700 db

$ tar -xvf /data/pg10/backups/base.tar.gz -C /data/pg10/db

cp $PGHOME/share/recovery.conf.sample /pgdata/10/data/recovery.conf

chmod 0600 /pgdata/10/data/recovery.conf

修改 recovery.conf, 修改以下配置資訊：

restore_command = '/usr/bin/lz4 -d /data/pg10/archive_wals/%f.lz4 %p'
recovery_target_name = 'domac-1014

然後啟動資料庫進入恢復狀態，觀察日誌，如下所示：

bash-4.2$ pg_ctl start -D /data/pg10/db
waiting for server to start....2018-10-14 11:26:56.949 UTC [8397] LOG:listening on IPv4 address "0.0.0.0", port 25432
2018-10-14 11:26:56.949 UTC [8397] LOG:listening on IPv6 address "::", port 25432
2018-10-14 11:26:56.952 UTC [8397] LOG:listening on Unix socket "/tmp/.s.PGSQL.25432"
2018-10-14 11:26:56.968 UTC [8398] LOG:database system was interrupted; last known up at 2018-10-14 09:26:59 UTC
2018-10-14 11:26:57.049 UTC [8398] LOG:starting point-in-time recovery to "domac-1014"
/data/pg10/archive_wals/00000002.history.lz4: No such file or directory
2018-10-14 11:26:57.052 UTC [8398] LOG:restored log file "00000002.history" from archive
/data/pg10/archive_w : decoded 16777216 bytes
2018-10-14 11:26:57.077 UTC [8398] LOG:restored log file "000000020000000000000016" from archive
2018-10-14 11:26:57.191 UTC [8398] LOG:redo starts at 0/16000060
2018-10-14 11:26:57.193 UTC [8398] LOG:consistent recovery state reached at 0/16000130
2018-10-14 11:26:57.193 UTC [8397] LOG:database system is ready to accept read only connections
/data/pg10/archive_w : decoded 16777216 bytes
2018-10-14 11:26:57.217 UTC [8398] LOG:restored log file "000000020000000000000017" from archive
 done
server started
/data/pg10/archive_w : decoded 16777216 bytes
2018-10-14 11:26:57.384 UTC [8398] LOG:restored log file "000000020000000000000018" from archive
/data/pg10/archive_w : decoded 16777216 bytes
2018-10-14 11:26:57.513 UTC [8398] LOG:restored log file "000000020000000000000019" from archive
/data/pg10/archive_w : decoded 16777216 bytes
2018-10-14 11:26:57.699 UTC [8398] LOG:restored log file "00000002000000000000001A" from archive
/data/pg10/archive_w : decoded 16777216 bytes
2018-10-14 11:26:57.805 UTC [8398] LOG:restored log file "00000002000000000000001B" from archive
/data/pg10/archive_w : decoded 16777216 bytes
2018-10-14 11:26:57.982 UTC [8398] LOG:restored log file "00000002000000000000001C" from archive
/data/pg10/archive_w : decoded 16777216 bytes
2018-10-14 11:26:58.116 UTC [8398] LOG:restored log file "00000002000000000000001D" from archive
/data/pg10/archive_w : decoded 16777216 bytes
2018-10-14 11:26:58.310 UTC [8398] LOG:restored log file "00000002000000000000001E" from archive
2018-10-14 11:26:58.379 UTC [8398] LOG:recovery stopping at restore point "domac-1014", time 2018-10-14 11:17:20.680941+00
2018-10-14 11:26:58.379 UTC [8398] LOG:recovery has paused
2018-10-14 11:26:58.379 UTC [8398] HINT:Execute pg_wal_replay_resume() to continue.

重啟後，我們對test_restore表進行查詢，看資料是否正常恢復：

postgres=# select * from test_restore;
 id | ival | description |created_time
----+------+-------------+-------------------------------
1 |1 || 2018-10-14 11:13:41.57154+00
2 |2 || 2018-10-14 11:13:44.250221+00
3 |3 || 2018-10-14 11:13:46.311291+00
4 |4 || 2018-10-14 11:13:48.820479+00
(4 rows)

可以看到資料已經恢復到指定的還原點：domac-1014 。

這時，recovery.conf可以移除，避免下次資料重啟，資料再次恢復到該還原點

總結

備份和恢復是資料庫管理中非常重要的工作，日常運維中，我們需要根據需要進行相關策略的備份，並且週期性地進行恢復測試，保證資料的安全。