ORA-00257: archiver error的另外一种原因

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:ORA-00257: archiver error的另外一种原因

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

今天遇到一个相对特殊的案例,拿出来和大家分享,数据库报错为不能归档ORA-00257

SQL> conn system/xxxxxx
ERROR:
ORA-00257: archiver error. Connect internal only, until freed.

SQL> archive log list;
Database log mode              Archive Mode
Automatic archival             Enabled
Archive destination            /oracle/PRD/oraarch
Oldest online log sequence     11479
Current log sequence           11481

alert日志报错

sapprddb1:oraprd 73> tail -f alert_prd.log
Master archival failure: 19502
Master archival failure: 19502
Master archival failure: 19502
Master archival failure: 19502
Wed Oct 12 09:27:41 2022
Master archival failure: 19502
Master archival failure: 19502
Master archival failure: 19502
Wed Oct 12 09:27:41 2022
Master archival failure: 19502

该错误的含义

sapprddb1:oraprd 74>oerr ora 19502
19502, 00000, "write error on file \"%s\", block number %s (block size=%s)"
// *Cause:  write error on output file
// *Action: check the file

从报错初步看是由于归档目录空间满了导致,查看发现归档空间剩余很多

sapprddb1:/oracle/PRD/sapdata1 # df -h
Filesystem                          Size  Used Avail Use% Mounted on
/dev/sda1                            99G   25G   70G  26% /
udev                                253G  240K  253G   1% /dev
tmpfs                               426G   72K  426G   1% /dev/shm
/dev/sda3                           388G   12G  357G   4% /backup
/dev/mapper/SAPVG-oraclelv          197G   22G  165G  12% /oracle
/dev/mapper/SAPVG-mirrlogA           20G  773M   18G   5% /oracle/PRD/mirrlogA
/dev/mapper/SAPVG-mirrlogBlv         20G  773M   18G   5% /oracle/PRD/mirrlogB
/dev/mapper/SAPVG-oraarchlv         180G  2.2G  169G   2% /oracle/PRD/oraarch    <----剩余很多
/dev/mapper/SAPVG-origlogAlv         20G  894M   18G   5% /oracle/PRD/origlogA
/dev/mapper/SAPVG-origlogBlv         20G  894M   18G   5% /oracle/PRD/origlogB
/dev/mapper/SAPVG-sapdata1lv        591G  561G     0 100% /oracle/PRD/sapdata1
/dev/mapper/SAPVG-sapdata2lv        1.4T  1.4T     0 100% /oracle/PRD/sapdata2
/dev/mapper/SAPVG-sapdata3lv        788G  748G     0 100% /oracle/PRD/sapdata3
/dev/mapper/SAPVG-sapdata4lv        788G  748G     0 100% /oracle/PRD/sapdata4
sapprd:/sapmnt/PRD                   30G   12G   17G  42% /sapmnt/PRD
sapprd:/usr/sap/trans               105G 1005M   99G   1% /usr/sap/trans
/dev/mapper/VGSAP2-lvsapdata5       1.5T  340G  1.1T  25% /oracle/PRD/sapdata5
/dev/mapper/VGSAP2-lvsapdata6       1.5T  706G  696G  51% /oracle/PRD/sapdata6

尝试人工归档

SQL> alter system archive log current
  2  /
alter system archive log current
*
ERROR at line 1:
ORA-19502: write error on file "/oracle/PRD/sapdata1/cntrl/cntrlPRD.dbf", block
number 4837 (block size=16384)
ORA-27061: waiting for async I/Os failed
Linux-x86_64 Error: 28: No space left on device
Additional information: -1
Additional information: 442368

到这里基本上明确了,由于控制文件所在的分区磁盘使用100%,导致归档的时候无法写如记录到控制文件从而导致数据库报ORA-00257错误,对/oracle/PRD/sapdata1中的某个文件进行稍微收缩,数据库恢复正常

记录一次rman备份ORA-19502/ORA-27063错误原因分析

联系:手机/微信(+86 17813235971) QQ(107644445)

标题:记录一次rman备份ORA-19502/ORA-27063错误原因分析

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

rman备份出现ORA-19502/ORA-27063错误

RMAN> 2> 3> 4> 5> 6> 7> 8> 9> 10> 11> 12> 13> 14> 15> 16> 17> 18> 19> 20>
allocated channel: t11
channel t11: sid=824 instance=ncdb1 devtype=DISK
allocated channel: t12
channel t12: sid=838 instance=ncdb1 devtype=DISK
allocated channel: t13
channel t13: sid=809 instance=ncdb1 devtype=DISK
allocated channel: t14
channel t14: sid=886 instance=ncdb1 devtype=DISK
allocated channel: t15
channel t15: sid=620 instance=ncdb1 devtype=DISK
allocated channel: t16
channel t16: sid=599 instance=ncdb1 devtype=DISK
allocated channel: t17
channel t17: sid=482 instance=ncdb1 devtype=DISK
allocated channel: t18
channel t18: sid=506 instance=ncdb1 devtype=DISK
一共开通8个通道
channel t12: starting full datafile backupset
channel t12: specifying datafile(s) in backupset
input datafile fno=00008 name=/dev/rnc32g_39
input datafile fno=00016 name=/dev/rnc32g_47
input datafile fno=00024 name=/dev/rnc32g_57
input datafile fno=00032 name=/dev/rnc32g_25
input datafile fno=00040 name=/dev/rnc32g_33
input datafile fno=00048 name=/dev/rnc32g_3
input datafile fno=00056 name=/dev/rnc32g_11
input datafile fno=00064 name=/dev/rnc32g_19
input datafile fno=00072 name=/dev/rnc32g_67
input datafile fno=00080 name=/dev/rnc32g_106
input datafile fno=00088 name=/dev/rnc32g_114
input datafile fno=00096 name=/dev/rnc32g_87
input datafile fno=00104 name=/dev/rnc32g_95
input datafile fno=00112 name=/dev/rnc32g_103
input datafile fno=00120 name=/dev/rnc32g_75
input datafile fno=00003 name=/dev/rnc50_sysaux
input datafile fno=00130 name=/dev/rnc32g_119
channel t12: starting piece 1 at 14-MAY-12
--通道12备份数据文件
channel t17: starting full datafile backupset
channel t17: specifying datafile(s) in backupset
input datafile fno=00002 name=/dev/rnc32g_22
input datafile fno=00013 name=/dev/rnc32g_44
input datafile fno=00021 name=/dev/rnc32g_54
input datafile fno=00029 name=/dev/rnc32g_62
input datafile fno=00037 name=/dev/rnc32g_30
input datafile fno=00045 name=/dev/rnc32g_38
input datafile fno=00053 name=/dev/rnc32g_8
input datafile fno=00061 name=/dev/rnc32g_16
input datafile fno=00069 name=/dev/rnc32g_64
input datafile fno=00077 name=/dev/rncundo_33g_4
input datafile fno=00085 name=/dev/rnc32g_111
input datafile fno=00093 name=/dev/rnc32g_84
input datafile fno=00101 name=/dev/rnc32g_92
input datafile fno=00109 name=/dev/rnc32g_100
input datafile fno=00117 name=/dev/rnc32g_72
input datafile fno=00006 name=/dev/rnc50_4g_1
channel t17: starting piece 1 at 14-MAY-12
--通道17备份数据文件
channel t15: finished piece 1 at 15-MAY-12
piece handle=/rman/db_mpnb04jl_1_1 tag=TAG20120514T204954 comment=NONE
channel t15: backup set complete, elapsed time: 06:07:59
channel t11: finished piece 1 at 15-MAY-12
piece handle=/rman/db_mlnb04jk_1_1 tag=TAG20120514T204954 comment=NONE
channel t11: backup set complete, elapsed time: 06:17:25
channel t16: finished piece 1 at 15-MAY-12
piece handle=/rman/db_mqnb04jm_1_1 tag=TAG20120514T204954 comment=NONE
channel t16: backup set complete, elapsed time: 06:34:49
channel t14: finished piece 1 at 15-MAY-12
piece handle=/rman/db_monb04jl_1_1 tag=TAG20120514T204954 comment=NONE
channel t14: backup set complete, elapsed time: 06:40:05
channel t18: finished piece 1 at 15-MAY-12
piece handle=/rman/db_msnb04jn_1_1 tag=TAG20120514T204954 comment=NONE
channel t18: backup set complete, elapsed time: 06:43:38
channel t13: finished piece 1 at 15-MAY-12
piece handle=/rman/db_mnnb04jl_1_1 tag=TAG20120514T204954 comment=NONE
channel t13: backup set complete, elapsed time: 07:40:56
--这里可以看出rman的备份完成了通道11/13/14/15/16/18,也就是说目前为止通道12/17未完成.
RMAN-03009: failure of backup command on t12 channel at 05/15/2012 04:39:58
ORA-19502: write error on file "/rman/db_mmnb04jl_1_1", blockno 30481025 (blocksize=8192)
ORA-27063: number of bytes read/written is incorrect
IBM AIX RISC System/6000 Error: 28: No space left on device
Additional information: -1
Additional information: 1048576
ORA-19502: write error on file "/rman/db_mmnb04jl_1_1", blockno 30480897 (blocksize=8192)
ORA-27063: number of bytes read/written is incorrect
IBM AIX RISC System/6000 Error: 28: No space left on device
channel t12 disabled, job failed on it will be run on another channel
--通道12报错(硬盘空间不足)
channel t11: starting full datafile backupset
channel t11: specifying datafile(s) in backupset
input datafile fno=00008 name=/dev/rnc32g_39
input datafile fno=00016 name=/dev/rnc32g_47
input datafile fno=00024 name=/dev/rnc32g_57
input datafile fno=00032 name=/dev/rnc32g_25
input datafile fno=00040 name=/dev/rnc32g_33
input datafile fno=00048 name=/dev/rnc32g_3
input datafile fno=00056 name=/dev/rnc32g_11
input datafile fno=00064 name=/dev/rnc32g_19
input datafile fno=00072 name=/dev/rnc32g_67
input datafile fno=00080 name=/dev/rnc32g_106
input datafile fno=00088 name=/dev/rnc32g_114
input datafile fno=00096 name=/dev/rnc32g_87
input datafile fno=00104 name=/dev/rnc32g_95
input datafile fno=00112 name=/dev/rnc32g_103
input datafile fno=00120 name=/dev/rnc32g_75
input datafile fno=00003 name=/dev/rnc50_sysaux
input datafile fno=00130 name=/dev/rnc32g_119
channel t11: starting piece 1 at 15-MAY-12
--在通道12报错后,通道11已经完成了上次备份,所以启动备份通道12出错的数据文件
RMAN-03009: failure of backup command on t17 channel at 05/15/2012 04:39:58
ORA-19502: write error on file "/rman/db_mrnb04jm_1_1", blockno 30753793 (blocksize=8192)
ORA-27063: number of bytes read/written is incorrect
IBM AIX RISC System/6000 Error: 28: No space left on device
Additional information: -1
Additional information: 1048576
ORA-19502: write error on file "/rman/db_mrnb04jm_1_1", blockno 30753665 (blocksize=8192)
ORA-27063: number of bytes read/written is incorrect
IBM AIX RISC System/6000 Error: 28: No space left on device
channel t17 disabled, job failed on it will be run on another channel
--通道17也因为磁盘空间报错
channel t13: starting full datafile backupset
channel t13: specifying datafile(s) in backupset
input datafile fno=00002 name=/dev/rnc32g_22
input datafile fno=00013 name=/dev/rnc32g_44
input datafile fno=00021 name=/dev/rnc32g_54
input datafile fno=00029 name=/dev/rnc32g_62
input datafile fno=00037 name=/dev/rnc32g_30
input datafile fno=00045 name=/dev/rnc32g_38
input datafile fno=00053 name=/dev/rnc32g_8
input datafile fno=00061 name=/dev/rnc32g_16
input datafile fno=00069 name=/dev/rnc32g_64
input datafile fno=00077 name=/dev/rncundo_33g_4
input datafile fno=00085 name=/dev/rnc32g_111
input datafile fno=00093 name=/dev/rnc32g_84
input datafile fno=00101 name=/dev/rnc32g_92
input datafile fno=00109 name=/dev/rnc32g_100
input datafile fno=00117 name=/dev/rnc32g_72
input datafile fno=00006 name=/dev/rnc50_4g_1
channel t13: starting piece 1 at 15-MAY-12
--通道13也尝试备份通道17失败的数据文件
RMAN-03009: failure of backup command on t11 channel at 05/15/2012 04:39:59
ORA-19504: failed to create file "/rman/db_mtnb104u_1_1"
ORA-27044: unable to write the header block of file
IBM AIX RISC System/6000 Error: 28: No space left on device
Additional information: 3
Addition
--因为当前没有空闲空间,通道11终止,
--这个时候rman异常终止,导致后续的通道13终止记录未打印到日志

阅读完rman日志,很好理解因为存放rman备份的磁盘空间不足导致了一系列错误

检查磁盘剩余空间

Filesystem    GB blocks      Free %Used    Iused %Iused Mounted on
/dev/hd4          10.00      9.75    3%     6548     1% /
/dev/hd2          10.00      4.55   55%    84383     8% /usr
/dev/hd9var        5.00      4.04   20%     6290     1% /var
/dev/hd3           5.00      3.87   23%     1551     1% /tmp
/dev/hd1          10.00      9.91    1%      382     1% /home
/proc                 -         -    -         -     -  /proc
/dev/hd10opt       5.00      4.89    3%     3502     1% /opt
/dev/archalv      99.00     82.98   17%       96     1% /archa
/dev/fslv01       40.00     19.49   52%    72324     2% /ora10
/dev/fslv00     1800.00    467.25   75%       10     1% /rman

这下让人迷糊了,磁盘空间还剩余467.25G,怎么会报错呢?

分析原因

RMAN-03009: failure of backup command on t12 channel at 05/15/2012 04:39:58
ORA-19502: write error on file "/rman/db_mmnb04jl_1_1", blockno 30481025 (blocksize=8192)
ORA-27063: number of bytes read/written is incorrect
IBM AIX RISC System/6000 Error: 28: No space left on device
Additional information: -1
Additional information: 1048576
ORA-19502: write error on file "/rman/db_mmnb04jl_1_1", blockno 30480897 (blocksize=8192)
ORA-27063: number of bytes read/written is incorrect
IBM AIX RISC System/6000 Error: 28: No space left on device
channel t12 disabled, job failed on it will be run on another channel
RMAN-03009: failure of backup command on t17 channel at 05/15/2012 04:39:58
ORA-19502: write error on file "/rman/db_mrnb04jm_1_1", blockno 30753793 (blocksize=8192)
ORA-27063: number of bytes read/written is incorrect
IBM AIX RISC System/6000 Error: 28: No space left on device
Additional information: -1
Additional information: 1048576
ORA-19502: write error on file "/rman/db_mrnb04jm_1_1", blockno 30753665 (blocksize=8192)
ORA-27063: number of bytes read/written is incorrect
IBM AIX RISC System/6000 Error: 28: No space left on device
channel t17 disabled, job failed on it will be run on another channel

这两个通道在写入rman备份到磁盘中的时候,在05/15/2012 04:39:58发现磁盘空间不足,两个通道分别准备写入30480897/30753665号块的时候出错,那么当时这两个通道分别写入的数据块数为30480896/30753664,写入文件大小为(30480896+30753664)*8192/1024/1024/1024=467.1826171875G.这里可以看出磁盘剩余空间467.25G,其实当时已经写入了467.1826171875G,继续写入的时候出错.然后rman为了保证备份的正确性,自动删除了当时已经备份的467.1826171875G错误的备份文件.从而在备份结束后看到磁盘空间还有大量剩余而rman包空间不足的现象.