在PG中pg_control文件类似oracle数据库的control文件(控制文件),在Oracle中如果该文件丢失/损坏,可以通过alter database create controlfile命令进行创建,对于PG数据库来说也可以通过pg_resetwal命令来实现创建,由于pg_control文件损坏,需要人工指定一些参数完成pg_resetwal相关操作
pg_resetwal 使用说明
-bash-4.2$ pg_resetwal --help
pg_resetwal resets the PostgreSQL write-ahead log.
Usage:
pg_resetwal [OPTION]... DATADIR
Options:
-c, --commit-timestamp-ids=XID,XID
set oldest and newest transactions bearing
commit timestamp (zero means no change)
[-D, --pgdata=]DATADIR data directory
-e, --epoch=XIDEPOCH set next transaction ID epoch
-f, --force force update to be done
-l, --next-wal-file=WALFILE set minimum starting location for new WAL
-m, --multixact-ids=MXID,MXID set next and oldest multitransaction ID
-n, --dry-run no update, just show what would be done
-o, --next-oid=OID set next OID
-O, --multixact-offset=OFFSET set next multitransaction offset
-u, --oldest-transaction-id=XID set oldest transaction ID
-V, --version output version information, then exit
-x, --next-transaction-id=XID set next transaction ID
--wal-segsize=SIZE size of WAL segments, in megabytes
-?, --help show this help, then exit
Report bugs to <pgsql-bugs@lists.postgresql.org>.
PostgreSQL home page: <https://www.postgresql.org/>
确认现在业务表记录情况
-bash-4.2$ psql
psql (14.3)
Type "help" for help.
postgres=# select count(1) from ac_event;
count
--------
246266
(1 row)
模拟pg_control文件异常
-bash-4.2$ ps -ef|grep postgres
postgres 37178 1 0 09:58 ? 00:00:00 /usr/pgsql-14/bin/postgres -D /var/lib/pgsql/14/data
postgres 37179 37178 0 09:58 ? 00:00:00 postgres: logger
postgres 37181 37178 0 09:58 ? 00:00:00 postgres: checkpointer
postgres 37182 37178 0 09:58 ? 00:00:00 postgres: background writer
postgres 37183 37178 0 09:58 ? 00:00:00 postgres: walwriter
postgres 37184 37178 0 09:58 ? 00:00:00 postgres: autovacuum launcher
postgres 37185 37178 0 09:58 ? 00:00:00 postgres: stats collector
postgres 37186 37178 0 09:58 ? 00:00:00 postgres: logical replication launcher
root 41368 41314 0 11:06 pts/1 00:00:00 su - postgres
postgres 41369 41368 0 11:06 pts/1 00:00:00 -bash
postgres 45071 41369 0 12:07 pts/1 00:00:00 ps -ef
postgres 45072 41369 0 12:07 pts/1 00:00:00 grep --color=auto postgres
-bash-4.2$ kill -9 37178
-bash-4.2$ ps -ef|grep postgres
root 41368 41314 0 11:06 pts/1 00:00:00 su - postgres
postgres 41369 41368 0 11:06 pts/1 00:00:00 -bash
postgres 45095 41369 0 12:08 pts/1 00:00:00 ps -ef
postgres 45096 41369 0 12:08 pts/1 00:00:00 grep --color=auto postgres
-bash-4.2$ pwd
/var/lib/pgsql/14/data/global
-bash-4.2$ ls -l pg_control
-rw-------. 1 postgres postgres 8192 May 30 12:04 pg_control
-bash-4.2$ rm -rf pg_control
-bash-4.2$ ls -l pg_control
ls: cannot access pg_control: No such file or directory
PG启动失败
-bash-4.2$ pg_ctl start
pg_ctl: another server might be running; trying to start server anyway
waiting for server to start....postgres: could not find the database system
Expected to find it in the directory "/var/lib/pgsql/14/data",
but could not open file "/var/lib/pgsql/14/data/global/pg_control": No such file or directory
stopped waiting
pg_ctl: could not start server
Examine the log output.
创建空pg_control文件启动依旧失败
-bash-4.2$ touch /var/lib/pgsql/14/data/global/pg_control
-bash-4.2$ pg_ctl start
pg_ctl: another server might be running; trying to start server anyway
waiting for server to start....2022-05-30 12:09:43.953 CST [45215] PANIC:
could not read file "global/pg_control": read 0 of 296
stopped waiting
pg_ctl: could not start server
Examine the log output.
设置next-wal-file
-l, –next-wal-file=WALFILE,这个参数设置下一个新的WAL文件的最小值,这个值可以从$PGDATA/pg_wal目录下去看最后一个WAL 文件,这个文件的id+1即可
-bash-4.2$ pwd
/var/lib/pgsql/14/data/pg_wal
-bash-4.2$ ls -l
total 16384
-rw-------. 1 postgres postgres 16777216 May 30 12:04 000000010000000000000014
drwx------. 2 postgres postgres 6 May 24 02:20 archive_status
-bash-4.2$
这个文件+1,-l 000000010000000000000015
设置next-transaction
-x, –next-transaction-id=XID,这个参数设置pg_control中的下一个XID的值,这个值可以从pg_xact目录下的文件中查询
-bash-4.2$ pwd
/var/lib/pgsql/14/data/pg_xact
-bash-4.2$ ls -ltr
total 8
-rw-------. 1 postgres postgres 8192 May 30 12:03 0000
最后一个是0000,那么下一个XID就是0001,然后乘以 1048576 (0×100000),实际上后面直接加5个0就行了。注意,这个值是16进制的。-x 0×000100000
multixact-ids设置
-m, –multixact-ids=MXID1,MXID2,这个参数包含两个部分,MXID1和MXID2,都可以从$PGDATA/pg_multixact/offsets目录下获得。MXID1的值,首先找到最大值,+1,再乘以 65536 (0×10000,相当于后面加4个0)作为这个参数的前半部分。找到最小的值,后面加4个0,作为MXID2的值
-bash-4.2$ pwd
/var/lib/pgsql/14/data/pg_multixact/offsets
-bash-4.2$ ls -ltr
total 8
-rw-------. 1 postgres postgres 8192 May 29 22:06 0000
-bash-4.2$
-m 0×00010000, 0×00000000(由于oldest multitransaction ID不能为0,因此后续这个值需要适当调整)
multixact-offset设置
-O, –multixact-offset=OFFSET,这个参数可以从$PGDATA/pg_multixact/members目录下获得。找到最大值,+1,乘以 52352 (0xCC80)
-bash-4.2$ pwd
/var/lib/pgsql/14/data/pg_multixact/members
-bash-4.2$ ls -ltr
total 8
-rw-------. 1 postgres postgres 8192 May 24 02:20 0000
-O 0xCC80
尝试执行pg_resetwal
-bash-4.2$ pg_resetwal -l 000000010000000000000015 -x 0x000100000 -m 0x00010000,0x00000000 -O 0xCC80 $PGDATA
pg_resetwal: error: oldest multitransaction ID (-m) must not be 0
multixact-ids值不对,进行调整后处理
postmaster.pid文件需要清理
由于PG库异常关闭,需要人工清理掉该文件
-bash-4.2$ pg_resetwal -l 000000010000000000000015 -x 0x000100000 -m 0x00020000,0x00010000 -O 0xCC80 $PGDATA
pg_resetwal: error: lock file "postmaster.pid" exists
-bash-4.2$ rm -rf postmaster.pid
pg_resetwal结果预览
-bash-4.2$ pg_resetwal -l 000000010000000000000015 -x 0x000100000 -m 0x00020000,0x00010000 -O 0xCC80 $PGDATA
pg_resetwal: warning: pg_control exists but is broken or wrong version; ignoring it
Guessed pg_control values:
pg_control version number: 1300
Catalog version number: 202107181
Database system identifier: 7103392535324046312
Latest checkpoint's TimeLineID: 1
Latest checkpoint's full_page_writes: off
Latest checkpoint's NextXID: 0:3
Latest checkpoint's NextOID: 12000
Latest checkpoint's NextMultiXactId: 1
Latest checkpoint's NextMultiOffset: 0
Latest checkpoint's oldestXID: 3
Latest checkpoint's oldestXID's DB: 0
Latest checkpoint's oldestActiveXID: 0
Latest checkpoint's oldestMultiXid: 1
Latest checkpoint's oldestMulti's DB: 0
Latest checkpoint's oldestCommitTsXid:0
Latest checkpoint's newestCommitTsXid:0
Maximum data alignment: 8
Database block size: 8192
Blocks per segment of large relation: 131072
WAL block size: 8192
Bytes per WAL segment: 16777216
Maximum length of identifiers: 64
Maximum columns in an index: 32
Maximum size of a TOAST chunk: 1996
Size of a large-object chunk: 2048
Date/time type storage: 64-bit integers
Float8 argument passing: by value
Data page checksum version: 0
Values to be changed:
First log segment after reset: 000000010000000000000015
NextMultiXactId: 131072
OldestMultiXid: 65536
OldestMulti's DB: 0
NextMultiOffset: 52352
NextXID: 1048576
OldestXID: 3
OldestXID's DB: 0
If these values seem acceptable, use -f to force reset.
pg_resetwal进行创建pg_control并启动PG
-bash-4.2$ pg_resetwal -l 000000010000000000000015 -x 0x000100000 -m 0x00020000,0x00010000 -O 0xCC80 -f $PGDATA
pg_resetwal: warning: pg_control exists but is broken or wrong version; ignoring it
Write-ahead log reset
-bash-4.2$ pg_ctl start
waiting for server to start....2022-05-30 13:33:28.266 CST [51437] LOG:
redirecting log output to logging collector process
2022-05-30 13:33:28.266 CST [51437] HINT: Future log output will appear in directory "log".
done
server started
验证数据
-bash-4.2$ psql
psql (14.3)
Type "help" for help.
postgres=# select count(1) from ac_event;
count
--------
245275
(1 row)
这种方法恢复之后,建议理解dump数据,然后导入到新库中