ALERT: Disable Transparent HugePages on SLES11, RHEL6, OEL6 and UEK2 Kernels

随着去ioe的潮流越来越汹涌,随着内存越来越便宜,使用Liunx大内存跑核心业务的系统越来越多。因为大内存,多会话,使得配置HugePages也成了必然的选择,在以前的文章中介绍过怎么去配置HugePages(Liunx系统中Oracle使用HugePages配置)以及因为没有配置Hugepage导致的故障(因未配置Hugepage会话数添增悲剧案例).但是随着Linux 6(RedHat 6, OEL 6, SLES 11 and UEK2 kernels等)的广泛应用,他们所在HugePages方面所具有的新特性(Transparent HugePages)很多人可能还没有注意到,Oracle强烈建议无论是rac还是单实例都关闭该特性.
检查是否启用Transparent HugePages方法

To check if the Transparent HugePages are enabled in your server execute the following:
Default/Enabled setting is  [always]:
# cat /sys/kernel/mm/transparent_hugepage/enabled
[always] never
 Disabled setting is [never]:
# cat /sys/kernel/mm/transparent_hugepage/enabled
always [never]
If "enabled" is NOT set to "[never]", the Transparent HugePages are being used.
You can also issue:
# grep AnonHugePages /proc/meminfo
If the output contains a line like "AnonHugepages: xxxx kB", with a value > 0kB the kernel is using Transparent HugePages.

禁用Transparent HugePages方法

Add the following to the kernel boot line in /etc/grub.conf (this is the preferred method) and reboot the server:
transparent_hugepage=never
Once modified the line will read similar to the following example:
title Oracle Linux Server (2.6.32-300.25.1.el6uek.x86_64)
        root (hd0,0)
        kernel /vmlinuz-2.6.32-300.25.1.el6uek.x86_64 ro root=LABEL=/ transparent_hugepage=never
        initrd /initramfs-2.6.32-300.25.1.el6uek.x86_64.img
OR
Add the following lines in /etc/rc.local and reboot the server:
if test -f /sys/kernel/mm/transparent_hugepage/enabled; then
   echo never > /sys/kernel/mm/transparent_hugepage/enabled
fi
if test -f /sys/kernel/mm/transparent_hugepage/defrag; then
   echo never > /sys/kernel/mm/transparent_hugepage/defrag
fi

具体见文档:ALERT Disable Transparent HugePages on SLES11 RHEL6 OEL6 and UEK2 Kernels(Doc ID 1557478.1)

oerr ora code本质探讨

oerr ora code 命令

[oracle@q9-gg02 ~]$ oerr ora 10047
10047, 00000, "trace switching of sessions"
// *Cause:
// *Action:

unix/linux系统下$ORACLE_HOME/rdbms/mesg/oraus.msg包含ora-记录

[oracle@q9-gg02 ~]$ vi $ORACLE_HOME/rdbms/mesg/oraus.msg
…………
10047, 00000, "trace switching of sessions"
// *Cause:
// *Action:
10048, 00000, "Undo segment shrink"
// *Cause:
// *Action:
10049, 00000, "protect library cache memory heaps"
// *Cause:
// *Action: Use the OS memory protection (if available) to protect library
//          cache memory heaps that are pinned.

oerr ora code本质

[oracle@q9-gg02 ~]$ strace oerr ora 10047
execve("/u01/oracle/app/product/10.2/db_1/bin/oerr", ["oerr", "ora", "10047"], [/* 34 vars */]) = 0
brk(0)                                  = 0x1c239000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2b60db3fe000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2b60db3ff000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
open("/u01/oracle/app/product/10.2/db_1/lib/tls/x86_64/libtermcap.so.2", O_RDONLY) = -1 ENOENT (No such file or directory)
stat("/u01/oracle/app/product/10.2/db_1/lib/tls/x86_64", 0x7fff7a3cb830) = -1 ENOENT (No such file or directory)
open("/u01/oracle/app/product/10.2/db_1/lib/tls/libtermcap.so.2", O_RDONLY) = -1 ENOENT (No such file or directory)
stat("/u01/oracle/app/product/10.2/db_1/lib/tls", 0x7fff7a3cb830) = -1 ENOENT (No such file or directory)
open("/u01/oracle/app/product/10.2/db_1/lib/x86_64/libtermcap.so.2", O_RDONLY) = -1 ENOENT (No such file or directory)
stat("/u01/oracle/app/product/10.2/db_1/lib/x86_64", 0x7fff7a3cb830) = -1 ENOENT (No such file or directory)
open("/u01/oracle/app/product/10.2/db_1/lib/libtermcap.so.2", O_RDONLY) = -1 ENOENT (No such file or directory)
stat("/u01/oracle/app/product/10.2/db_1/lib", {st_mode=S_IFDIR|0750, st_size=12288, ...}) = 0
open("tls/x86_64/libtermcap.so.2", O_RDONLY) = -1 ENOENT (No such file or directory)
open("tls/libtermcap.so.2", O_RDONLY)   = -1 ENOENT (No such file or directory)
open("x86_64/libtermcap.so.2", O_RDONLY) = -1 ENOENT (No such file or directory)
open("libtermcap.so.2", O_RDONLY)       = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY)      = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=154281, ...}) = 0
mmap(NULL, 154281, PROT_READ, MAP_PRIVATE, 3, 0) = 0x2b60db400000
close(3)                                = 0
open("/lib64/libtermcap.so.2", O_RDONLY) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0`\20 \214>\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=15840, ...}) = 0
mmap(0x3e8c200000, 2108944, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x3e8c200000
mprotect(0x3e8c203000, 2093056, PROT_NONE) = 0
mmap(0x3e8c402000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x2000) = 0x3e8c402000
close(3)                                = 0
open("/u01/oracle/app/product/10.2/db_1/lib/libdl.so.2", O_RDONLY) = -1 ENOENT (No such file or directory)
open("tls/x86_64/libdl.so.2", O_RDONLY) = -1 ENOENT (No such file or directory)
open("tls/libdl.so.2", O_RDONLY)        = -1 ENOENT (No such file or directory)
open("x86_64/libdl.so.2", O_RDONLY)     = -1 ENOENT (No such file or directory)
open("libdl.so.2", O_RDONLY)            = -1 ENOENT (No such file or directory)
open("/lib64/libdl.so.2", O_RDONLY)     = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\20\16`\214>\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=23360, ...}) = 0
mmap(0x3e8c600000, 2109696, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x3e8c600000
mprotect(0x3e8c602000, 2097152, PROT_NONE) = 0
mmap(0x3e8c802000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x2000) = 0x3e8c802000
close(3)                                = 0
open("/u01/oracle/app/product/10.2/db_1/lib/libc.so.6", O_RDONLY) = -1 ENOENT (No such file or directory)
open("tls/x86_64/libc.so.6", O_RDONLY)  = -1 ENOENT (No such file or directory)
open("tls/libc.so.6", O_RDONLY)         = -1 ENOENT (No such file or directory)
open("x86_64/libc.so.6", O_RDONLY)      = -1 ENOENT (No such file or directory)
open("libc.so.6", O_RDONLY)             = -1 ENOENT (No such file or directory)
open("/lib64/libc.so.6", O_RDONLY)      = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\220\332\341\213>\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=1722328, ...}) = 0
mmap(0x3e8be00000, 3502424, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x3e8be00000
mprotect(0x3e8bf4f000, 2093056, PROT_NONE) = 0
mmap(0x3e8c14e000, 20480, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x14e000) = 0x3e8c14e000
mmap(0x3e8c153000, 16728, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x3e8c153000
close(3)                                = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2b60db426000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2b60db427000
arch_prctl(ARCH_SET_FS, 0x2b60db426fe0) = 0
mprotect(0x3e8c802000, 4096, PROT_READ) = 0
mprotect(0x3e8c14e000, 16384, PROT_READ) = 0
mprotect(0x3e8bc1c000, 4096, PROT_READ) = 0
munmap(0x2b60db400000, 154281)          = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
open("/dev/tty", O_RDWR|O_NONBLOCK)     = 3
close(3)                                = 0
brk(0)                                  = 0x1c239000
brk(0x1c25a000)                         = 0x1c25a000
open("/usr/lib/locale/locale-archive", O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=56458208, ...}) = 0
mmap(NULL, 56458208, PROT_READ, MAP_PRIVATE, 3, 0) = 0x2b60db428000
close(3)                                = 0
getuid()                                = 1000
getgid()                                = 1000
geteuid()                               = 1000
getegid()                               = 1000
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
open("/proc/meminfo", O_RDONLY)         = 3
fstat(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2b60dea00000
read(3, "MemTotal:     132051308 kB\nMemFr"..., 1024) = 781
close(3)                                = 0
munmap(0x2b60dea00000, 4096)            = 0
rt_sigaction(SIGCHLD, {SIG_DFL, [], SA_RESTORER, 0x3e8be302d0}, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGCHLD, {SIG_DFL, [], SA_RESTORER, 0x3e8be302d0}, {SIG_DFL, [], SA_RESTORER, 0x3e8be302d0}, 8) = 0
rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER, 0x3e8be302d0}, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER, 0x3e8be302d0}, {SIG_DFL, [], SA_RESTORER, 0x3e8be302d0}, 8) = 0
rt_sigaction(SIGQUIT, {SIG_DFL, [], SA_RESTORER, 0x3e8be302d0}, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGQUIT, {SIG_DFL, [], SA_RESTORER, 0x3e8be302d0}, {SIG_DFL, [], SA_RESTORER, 0x3e8be302d0}, 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigaction(SIGQUIT, {0x1, [], SA_RESTORER, 0x3e8be302d0}, {SIG_DFL, [], SA_RESTORER, 0x3e8be302d0}, 8) = 0
uname({sys="Linux", node="q9-gg02.800best.com", ...}) = 0
stat("/home/oracle", {st_mode=S_IFDIR|0700, st_size=4096, ...}) = 0
stat(".", {st_mode=S_IFDIR|0700, st_size=4096, ...}) = 0
getpid()                                = 25928
open("/usr/lib64/gconv/gconv-modules.cache", O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=25464, ...}) = 0
mmap(NULL, 25464, PROT_READ, MAP_SHARED, 3, 0) = 0x2b60dea00000
close(3)                                = 0
getppid()                               = 25927
getpgrp()                               = 25927
rt_sigaction(SIGCHLD, {0x436360, [], SA_RESTORER, 0x3e8be302d0}, {SIG_DFL, [], SA_RESTORER, 0x3e8be302d0}, 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
open("/u01/oracle/app/product/10.2/db_1/bin/oerr", O_RDONLY) = 3
ioctl(3, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7fff7a3cc060) = -1 ENOTTY (Inappropriate ioctl for device)
lseek(3, 0, SEEK_CUR)                   = 0
read(3, "#!/bin/sh\n#\n# $Id: oerr 28-aug-2"..., 80) = 80
lseek(3, 0, SEEK_SET)                   = 0
getrlimit(RLIMIT_NOFILE, {rlim_cur=1024, rlim_max=64*1024}) = 0
dup2(3, 255)                            = 255
close(3)                                = 0
fcntl(255, F_SETFD, FD_CLOEXEC)         = 0
fcntl(255, F_GETFL)                     = 0x8000 (flags O_RDONLY|O_LARGEFILE)
fstat(255, {st_mode=S_IFREG|0755, st_size=2365, ...}) = 0
lseek(255, 0, SEEK_CUR)                 = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
read(255, "#!/bin/sh\n#\n# $Id: oerr 28-aug-2"..., 2365) = 2365
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
pipe([3, 4])                            = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [INT CHLD], [], 8) = 0
lseek(255, -973, SEEK_CUR)              = 1392
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x2b60db427070) = 25929
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigaction(SIGCHLD, {0x436360, [], SA_RESTORER, 0x3e8be302d0}, {0x436360, [], SA_RESTORER, 0x3e8be302d0}, 8) = 0
close(4)                                = 0
read(3, "ora:rdbms:*:\n", 128)          = 13
read(3, "", 128)                        = 0
--- SIGCHLD (Child exited) @ 0 (0) ---
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG, NULL) = 25929
wait4(-1, 0x7fff7a3cb804, WNOHANG, NULL) = -1 ECHILD (No child processes)
rt_sigreturn(0)                         = 0
close(3)                                = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGINT, {0x436f40, [], SA_RESTORER, 0x3e8be302d0}, {SIG_DFL, [], SA_RESTORER, 0x3e8be302d0}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER, 0x3e8be302d0}, {0x436f40, [], SA_RESTORER, 0x3e8be302d0}, 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
read(255, "if [ $? -ne 0 ]\nthen\n\techo \"oerr"..., 2365) = 973
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
pipe([3, 4])                            = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [INT CHLD], [], 8) = 0
lseek(255, -645, SEEK_CUR)              = 1720
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x2b60db427070) = 25931
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigaction(SIGCHLD, {0x436360, [], SA_RESTORER, 0x3e8be302d0}, {0x436360, [], SA_RESTORER, 0x3e8be302d0}, 8) = 0
close(4)                                = 0
read(3, "Facility=ora\nComponent=rdbms\n", 128) = 29
read(3, "", 128)                        = 0
--- SIGCHLD (Child exited) @ 0 (0) ---
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG, NULL) = 25931
wait4(-1, 0x7fff7a3cb904, WNOHANG, NULL) = -1 ECHILD (No child processes)
rt_sigreturn(0)                         = 0
close(3)                                = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGINT, {0x436f40, [], SA_RESTORER, 0x3e8be302d0}, {SIG_DFL, [], SA_RESTORER, 0x3e8be302d0}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER, 0x3e8be302d0}, {0x436f40, [], SA_RESTORER, 0x3e8be302d0}, 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
read(255, "if [ -z \"$Facility\" -o -z \"$Comp"..., 2365) = 645
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
--检查oraus.msg文件是否存在并且有权限访问
stat("/u01/oracle/app/product/10.2/db_1/rdbms/mesg/oraus.msg", {st_mode=S_IFREG|0644, st_size=3789794, ...}) = 0
geteuid()                               = 1000
getegid()                               = 1000
getuid()                                = 1000
getgid()                                = 1000
--访问oraus.msg文件
access("/u01/oracle/app/product/10.2/db_1/rdbms/mesg/oraus.msg", R_OK) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
stat(".", {st_mode=S_IFDIR|0700, st_size=4096, ...}) = 0
--查找awk命令位置
stat("/u01/oracle/app/product/10.2/db_1/jre/1.4.2/bin/awk", 0x7fff7a3cbe80) = -1 ENOENT (No such file or directory)
stat("/u01/oracle/app/product/10.2/db_1/bin/awk", 0x7fff7a3cbe80) = -1 ENOENT (No such file or directory)
stat("/usr/lib64/qt-3.3/bin/awk", 0x7fff7a3cbe80) = -1 ENOENT (No such file or directory)
stat("/usr/kerberos/bin/awk", 0x7fff7a3cbe80) = -1 ENOENT (No such file or directory)
stat("/usr/local/bin/awk", 0x7fff7a3cbe80) = -1 ENOENT (No such file or directory)
--找到awk命令
stat("/bin/awk", {st_mode=S_IFREG|0755, st_size=338744, ...}) = 0
access("/bin/awk", X_OK)                = 0
access("/bin/awk", R_OK)                = 0
stat("/bin/awk", {st_mode=S_IFREG|0755, st_size=338744, ...}) = 0
access("/bin/awk", X_OK)                = 0
access("/bin/awk", R_OK)                = 0
rt_sigprocmask(SIG_BLOCK, [INT CHLD], [], 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [INT CHLD], 8) = 0
rt_sigprocmask(SIG_SETMASK, [INT CHLD], NULL, 8) = 0
lseek(255, -8, SEEK_CUR)                = 2357
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x2b60db427070) = 25934
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGINT, {0x436f40, [], SA_RESTORER, 0x3e8be302d0}, {SIG_DFL, [], SA_RESTORER, 0x3e8be302d0}, 8) = 0
--通过awk命令获取oraus.msg中相关命令
wait4(-1, 10047, 00000, "trace switching of sessions"
// *Cause:
// *Action:
[{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 25934
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
--- SIGCHLD (Child exited) @ 0 (0) ---
wait4(-1, 0x7fff7a3cbb54, WNOHANG, NULL) = -1 ECHILD (No child processes)
rt_sigreturn(0xffffffffffffffff)        = 0
rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER, 0x3e8be302d0}, {0x436f40, [], SA_RESTORER, 0x3e8be302d0}, 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
read(255, "\nexit 0\n", 2365)           = 8
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
exit_group(0)                           = ?

通过这里可以知道oerr其实就是通过awk对$ORACLE_HOME/rdbms/mesg/oraus.msg文件进行筛选,显示出来对应ORA-错误号的提示

记录一次数据库异常导致ipc未释放案例

很多时候,linux/unix中的ipc未正常释放,表现的现象是再次启动实例的时候表现出来内存不足等类似现象,但是今天在数据库启动过程中因为数据库后台报ora-600[6002]错误,导致数据库启动失败,但是后续的一些列奇怪现象让我吃惊:
1.启动数据库异常的session一直处于卡主状态,任何办法无法终止
2.开打新会话,登录数据库提示进行操作提示not logging,但是abort会出现和1中一样的hang住
数据库open过程卡主 会话无法退出,crtl+c依然无效

SQL> startup
ORACLE instance started.
Total System Global Area  175775744 bytes
Fixed Size                  1343668 bytes
Variable Size             117444428 bytes
Database Buffers           50331648 bytes
Redo Buffers                6656000 bytes
Database mounted.
^C
^C^C^C^C^C

alert报错

Tue Aug 27 20:33:57 2013
SMON: enabling cache recovery
Archived Log entry 29 added for thread 1 sequence 29 ID 0xfb920b77 dest 1:
ARC3: Archival started
ARC0: STARTING ARCH PROCESSES COMPLETE
Errors in file /u01/oracle/diag/rdbms/ora11g/ora11g/trace/ora11g_ora_4992.trc  (incident=201905):
ORA-00600: internal error code, arguments: [6002], [6], [6], [1], [0], [], [], [], [], [], [], []
Incident details in: /u01/oracle/diag/rdbms/ora11g/ora11g/incident/incdir_201905/ora11g_ora_4992_i201905.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Undo initialization errored: err:600 serial:0 start:1524754 end:1528104 diff:3350 (33 seconds)
Errors in file /u01/oracle/diag/rdbms/ora11g/ora11g/trace/ora11g_ora_4992.trc:
ORA-00600: internal error code, arguments: [6002], [6], [6], [1], [0], [], [], [], [], [], [], []
Errors in file /u01/oracle/diag/rdbms/ora11g/ora11g/trace/ora11g_ora_4992.trc:
ORA-00600: internal error code, arguments: [6002], [6], [6], [1], [0], [], [], [], [], [], [], []
Error 600 happened during db open, shutting down database
USER (ospid: 4992): terminating the instance due to error 600
Instance terminated by USER, pid = 4992
ORA-1092 signalled during: alter database Open...

新开启会话登录

[oracle@xifenfei ~]$ sqlplus / as sysdba
SQL*Plus: Release 11.2.0.3.0 Production on Tue Aug 27 20:27:38 2013
Copyright (c) 1982, 2011, Oracle.  All rights reserved.
Connected.
SQL> startup mount
ORA-01012: not logged on
SQL> conn / as sysdba
Connected to an idle instance.
SQL> startup mount
ORA-01012: not logged on

alert日志提示

Tue Aug 27 20:53:18 2013
ORA-1092 : opitsk aborting process
Tue Aug 27 20:53:44 2013
ORA-1092 : opitsk aborting process
Tue Aug 27 20:54:21 2013

shutdown abort 依然卡主

SQL> shutdown abort
^C
^C^C

查看ipc
根据alert日志提示,数据库已经异常关闭了,但是sqlplus中显示异常,怀疑是ipc未被正常释放

[oracle@xifenfei ~]$ ipcs -m
------ Shared Memory Segments --------
key        shmid      owner      perms      bytes      nattch     status
0x00000000 2228224    oracle    640        8388608    1
0x00000000 2260993    oracle    640        167772160  1
0xee5c29c4 2293762    oracle    640        2097152    1
[oracle@xifenfei ~]$ ps -ef|grep pmon
oracle    4843  4785  0 20:29 pts/3    00:00:00 grep pmon

释放ipc

[oracle@xifenfei ~]$ ipcrm -m 2260993
[oracle@xifenfei ~]$ ipcs -m
------ Shared Memory Segments --------
key        shmid      owner      perms      bytes      nattch     status
0x00000000 2228224    oracle    640        8388608    1
0x00000000 2260993    oracle    640        167772160  1          dest
0xee5c29c4 2293762    oracle    640        2097152    1
[oracle@xifenfei ~]$ ipcrm -m 2228224
[oracle@xifenfei ~]$ ipcs -m
------ Shared Memory Segments --------
key        shmid      owner      perms      bytes      nattch     status
0x00000000 2228224    oracle    640        8388608    1          dest
0x00000000 2260993    oracle    640        167772160  1          dest
0xee5c29c4 2293762    oracle    640        2097152    1
[oracle@xifenfei ~]$ ipcrm -m 2293762
[oracle@xifenfei ~]$ ipcs -m
------ Shared Memory Segments --------
key        shmid      owner      perms      bytes      nattch     status
0x00000000 2228224    oracle    640        8388608    1          dest
0x00000000 2260993    oracle    640        167772160  1          dest
0x00000000 2293762    oracle    640        2097152    1          dest

重新启动数据库

[oracle@xifenfei ~]$ sqlplus / as sysdba
SQL*Plus: Release 11.2.0.3.0 Production on Tue Aug 27 20:33:35 2013
Copyright (c) 1982, 2011, Oracle.  All rights reserved.
Connected to an idle instance.
SQL> startup mount;
ORACLE instance started.
Total System Global Area  175775744 bytes
Fixed Size                  1343668 bytes
Variable Size             117444428 bytes
Database Buffers           50331648 bytes
Redo Buffers                6656000 bytes
Database mounted.

再次查看ipc

[oracle@xifenfei ~]$ ipcs -m
------ Shared Memory Segments --------
key        shmid      owner      perms      bytes      nattch     status
0x00000000 2228224    oracle    640        8388608    1          dest
0x00000000 2260993    oracle    640        167772160  1          dest
0x00000000 2293762    oracle    640        2097152    1          dest
0x00000000 2490374    oracle    640        8388608    29
0x00000000 2523143    oracle    640        167772160  29
0xee5c29c4 2555912    oracle    640        2097152    29

模拟基表事务未提交数据库crash,undo丢失恢复异常恢复

对于基表特别是bootstrap$中的表,如果进行了dml操作,但是没有提交,这个时候数据库crash,而且undo异常.对于类似的场景,都是很多数据库恢复的人都有畏惧,因为_corrupted_rollback_segments参数不能起作用.本实验dml修改seq$表,最终却引起了undo$和I_UNDO1异常(他们都在bootstrap$中)
模拟seq$表事务未提交

SQL> select * from v$version;
BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - Production
PL/SQL Release 11.2.0.3.0 - Production
CORE    11.2.0.3.0      Production
TNS for Linux: Version 11.2.0.3.0 - Production
NLSRTL Version 11.2.0.3.0 - Production
SQL> select sysdate "www.xifenfei.com" from dual;
www.xifenfei
------------
31-AUG-13
SQL> CREATE SEQUENCE seq_xifenfei INCREMENT BY 1 START WITH 1 CACHE 3;
Sequence created.
SQL> select object_id from dba_objects where object_name='SEQ_XIFENFEI';
 OBJECT_ID
----------
     77282
SQL> SELECT MINVALUE,HIGHWATER,INCREMENT$, CYCLE#,ORDER$,CACHE from seq$ where obj#=77282;
  MINVALUE  HIGHWATER INCREMENT$     CYCLE#     ORDER$      CACHE
---------- ---------- ---------- ---------- ---------- ----------
         1          1          1          0          0          3
SQL> update seq$ set  CYCLE#=3 where obj#=77282;
1 row updated.
--另外一个会话abort数据库
SQL> shutdown abort;
ORACLE instance shut down.
--人工rm掉undotbs01.dbf文件
--启动数据库
SQL> startup
ORACLE instance started.
Total System Global Area  175775744 bytes
Fixed Size                  1343668 bytes
Variable Size             117444428 bytes
Database Buffers           50331648 bytes
Redo Buffers                6656000 bytes
Database mounted.
ORA-01157: cannot identify/lock data file 3 - see DBWR trace file
ORA-01110: data file 3: '/u01/oracle/oradata/ora11g/undotbs01.dbf'
--离线数据文件
SQL> alter database datafile 3 offline;
Database altered.
--尝试open数据库
SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 2
ORA-00376: file 3 cannot be read at this time
ORA-01110: data file 3: '/u01/oracle/oradata/ora11g/undotbs01.dbf'
Process ID: 17864
Session ID: 125 Serial number: 5

alert日志

SMON: enabling cache recovery
Mon Aug 26 17:08:49 2013
ARC3 started with pid=25, OS id=17902
ARC1: Archival started
ARC2: Archival started
ARC1: Becoming the 'no FAL' ARCH
ARC1: Becoming the 'no SRL' ARCH
ARC2: Becoming the heartbeat ARCH
Archived Log entry 13 added for thread 1 sequence 13 ID 0xfb920b77 dest 1:
Errors in file /u01/oracle/diag/rdbms/ora11g/ora11g/trace/ora11g_ora_17864.trc:
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 2
ORA-00376: file 3 cannot be read at this time
ORA-01110: data file 3: '/u01/oracle/oradata/ora11g/undotbs01.dbf'
Errors in file /u01/oracle/diag/rdbms/ora11g/ora11g/trace/ora11g_ora_17864.trc:
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 2
ORA-00376: file 3 cannot be read at this time
ORA-01110: data file 3: '/u01/oracle/oradata/ora11g/undotbs01.dbf'
Error 704 happened during db open, shutting down database
USER (ospid: 17864): terminating the instance due to error 704
Instance terminated by USER, pid = 17864
ORA-1092 signalled during: alter database open...
opiodr aborting process unknown ospid (17864) as a result of ORA-1092

做10046跟踪

SQL> conn / as sysdba
Connected to an idle instance.
SQL> startup nomount;
ORACLE instance started.
Total System Global Area  175775744 bytes
Fixed Size                  1343668 bytes
Variable Size             117444428 bytes
Database Buffers           50331648 bytes
Redo Buffers                6656000 bytes
SQL> alter database mount;
Database altered.
SQL> alter session set events  '10046 trace name context forever,level 12';
Session altered.
SQL> alter session set db_file_multiblocK_read_count=1;
Session altered.
SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 2
ORA-00376: file 3 cannot be read at this time
ORA-01110: data file 3: '/u01/oracle/oradata/ora11g/undotbs01.dbf'
Process ID: 17979
Session ID: 125 Serial number: 3

trace文件内容

PARSING IN CURSOR #3064524800 len=142 dep=3 uid=0 oct=3 lid=0 tim=1377508386082495 hv=361892850 ad='265c66d0' sqlid='7bd391hat42zk'
select /*+ rule */ name,file#,block#,status$,user#,undosqn,xactsqn,scnbas,scnwrp,DECODE(inst#,0,NULL,inst#),ts#,spare1 from undo$ where us#=:1
END OF STMT
PARSE #3064524800:c=2999,e=2162,p=0,cr=0,cu=0,mis=1,r=0,dep=3,og=3,plh=0,tim=1377508386082492
BINDS #3064524800:
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=b6a8ebd0  bln=22  avl=02  flg=05
  value=27
EXEC #3064524800:c=1999,e=2030,p=0,cr=0,cu=0,mis=1,r=0,dep=3,og=3,plh=906473769,tim=1377508386084833
WAIT #3064524800: nam='db file sequential read' ela= 65 file#=1 block#=321 blocks=1 obj#=34 tim=1377508386085083
WAIT #3064524800: nam='db file sequential read' ela= 21 file#=1 block#=225 blocks=1 obj#=15 tim=1377508386085208
FETCH #3064524800:c=0,e=375,p=2,cr=2,cu=0,mis=0,r=1,dep=3,og=3,plh=906473769,tim=1377508386085319
STAT #3064524800 id=1 cnt=1 pid=0 pos=1 obj=15 op='TABLE ACCESS BY INDEX ROWID UNDO$ (cr=2 pr=2 pw=0 time=362 us)'
STAT #3064524800 id=2 cnt=1 pid=1 pos=1 obj=34 op='INDEX UNIQUE SCAN I_UNDO1 (cr=1 pr=1 pw=0 time=223 us)'
CLOSE #3064524800:c=0,e=12,dep=3,type=0,tim=1377508386085657
WAIT #3064511164: nam='control file sequential read' ela= 33 file#=0 block#=1 blocks=1 obj#=15 tim=1377508386085816
WAIT #3064511164: nam='control file sequential read' ela= 18 file#=0 block#=15 blocks=1 obj#=15 tim=1377508386085889
WAIT #3064511164: nam='control file sequential read' ela= 18 file#=0 block#=17 blocks=1 obj#=15 tim=1377508386085994
WAIT #3064511164: nam='control file sequential read' ela= 18 file#=0 block#=23 blocks=1 obj#=15 tim=1377508386086058
WAIT #3064511164: nam='control file sequential read' ela= 17 file#=0 block#=1 blocks=1 obj#=15 tim=1377508386086135
WAIT #3064511164: nam='control file sequential read' ela= 15 file#=0 block#=15 blocks=1 obj#=15 tim=1377508386086196
WAIT #3064511164: nam='control file sequential read' ela= 15 file#=0 block#=17 blocks=1 obj#=15 tim=1377508386086255
WAIT #3064511164: nam='control file sequential read' ela= 18 file#=0 block#=32 blocks=1 obj#=15 tim=1377508386086317
DDE rules only execution for: ORA 1110
----- START Event Driven Actions Dump ----
---- END Event Driven Actions Dump ----
----- START DDE Actions Dump -----
Executing SYNC actions
----- START DDE Action: 'DB_STRUCTURE_INTEGRITY_CHECK' (Async) -----
Successfully dispatched
----- END DDE Action: 'DB_STRUCTURE_INTEGRITY_CHECK' (SUCCESS, 0 csec) -----
Executing ASYNC actions
----- END DDE Actions Dump (total 0 csec) -----
FETCH #3064511164:c=8998,e=9239,p=5,cr=6,cu=0,mis=0,r=0,dep=2,og=4,plh=2853959010,tim=1377508386088336
STAT #3064511164 id=1 cnt=0 pid=0 pos=1 obj=18 op='TABLE ACCESS BY INDEX ROWID OBJ$ (cr=0 pr=0 pw=0 time=82 us)'
STAT #3064511164 id=2 cnt=1 pid=1 pos=1 obj=37 op='INDEX RANGE SCAN I_OBJ2 (cr=3 pr=3 pw=0 time=1166 us)'
=====================
PARSE ERROR #3064512008:len=60 dep=1 uid=0 oct=3 lid=0 tim=1377508386088729 err=604
SELECT NULL FROM PROPS$ WHERE NAME='BOOTSTRAP_UPGRADE_ERROR'
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 2
ORA-00376: file 3 cannot be read at this time
ORA-01110: data file 3: '/u01/oracle/oradata/ora11g/undotbs01.dbf'
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 2
ORA-00376: file 3 cannot be read at this time
ORA-01110: data file 3: '/u01/oracle/oradata/ora11g/undotbs01.dbf'

_corrupted_rollback_segments无法打开数据库

SQL> show parameter _corrupted_rollback_segments;
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
_corrupted_rollback_segments         string      _SYSSMU22_1366820418$, _SYSSMU
                                                 23_2165404374$, _SYSSMU24_1842
                                                 376483$, _SYSSMU25_2373003110$
                                                 , _SYSSMU26_3635237253$, _SYSS
                                                 MU27_1730246918$, _SYSSMU28_31
                                                 59173804$, _SYSSMU29_141006113
                                                 4$, _SYSSMU30_2299069988$, _SY
                                                 SSMU31_290584066$
SQL> startup pfile='/tmp/pfile' mount
ORACLE instance started.
Total System Global Area  175775744 bytes
Fixed Size                  1343668 bytes
Variable Size             117444428 bytes
Database Buffers           50331648 bytes
Redo Buffers                6656000 bytes
Database mounted.
SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 2
ORA-00376: file 3 cannot be read at this time
ORA-01110: data file 3: '/u01/oracle/oradata/ora11g/undotbs01.dbf'
Process ID: 17979
Session ID: 125 Serial number: 3

dump block225

Block header dump:  0x004000e1
 Object id on Block? Y
 seg/obj: 0xf  csc: 0xb8c.3c0c297b  itc: 1  flg: O  typ: 1 - DATA
     fsl: 0  fnx: 0x0 ver: 0x01
 Itl           Xid                  Uba        Flag  Lck        Scn/Fsc
0x01   0x0000.001.00000034  0x00400221.0034.08  --U-    1  fsc 0x0000.3c0c297c
bdba: 0x004000e1
data_block_dump,data header at 0xb6b8a644

bbed 提交225事务

[oracle@xifenfei ~]$ bbed password=blockedit filename='/u01/oracle/oradata/ora11g/system01.dbf' blocksize=8192
BBED: Release 2.0.0.0.0 - Limited Production on Mon Aug 26 17:31:09 2013
Copyright (c) 1982, 2007, Oracle.  All rights reserved.
************* !!! For Oracle Internal Use only !!! ***************
BBED> set block 225
        BLOCK#          225
BBED> map
 File: /u01/oracle/oradata/ora11g/system01.dbf (0)
 Block: 225                                   Dba:0x00000000
------------------------------------------------------------
 KTB Data Block (Table/Cluster)
 struct kcbh, 20 bytes                      @0
 struct ktbbh, 48 bytes                     @20
 struct kdbh, 14 bytes                      @68
 struct kdbt[1], 4 bytes                    @82
 sb2 kdbr[32]                               @86
 ub1 freespace[59]                          @150
 ub1 rowdata[7979]                          @209
 ub4 tailchk                                @8188
BBED> p ktbbh
struct ktbbh, 48 bytes                      @20
   ub1 ktbbhtyp                             @20       0x01 (KDDBTDATA)
   union ktbbhsid, 4 bytes                  @24
      ub4 ktbbhsg1                          @24       0x0000000f
      ub4 ktbbhod1                          @24       0x0000000f
   struct ktbbhcsc, 8 bytes                 @28
      ub4 kscnbas                           @28       0x3c0c297b
      ub2 kscnwrp                           @32       0x0b8c
   b2 ktbbhict                              @36      -2047
   ub1 ktbbhflg                             @38       0x03 (KTBFONFL)
   ub1 ktbbhfsl                             @39       0x00
   ub4 ktbbhfnx                             @40       0x00000000
   struct ktbbhitl[0], 24 bytes             @44
      struct ktbitxid, 8 bytes              @44
         ub2 kxidusn                        @44       0x0000
         ub2 kxidslt                        @46       0x0001
         ub4 kxidsqn                        @48       0x00000034
      struct ktbituba, 8 bytes              @52
         ub4 kubadba                        @52       0x00400221
         ub2 kubaseq                        @56       0x0034
         ub1 kubarec                        @58       0x08
      ub2 ktbitflg                          @60       0x2001 (KTBFUPB)
      union _ktbitun, 2 bytes               @62
         b2 _ktbitfsc                       @62       0
         ub2 _ktbitwrp                      @62       0x0000
      ub4 ktbitbas                          @64       0x3c0c297c
BBED> set mode edit
        MODE            Edit
BBED> m /x 0180 offset 60
 File: /u01/oracle/oradata/ora11g/system01.dbf (0)
 Block: 225              Offsets:   60 to   91           Dba:0x00000000
------------------------------------------------------------------------
 01800000 7c290c3c 00012000 ffff5200 8d003017 30170000 20008e1f 4106f805
 <32 bytes per line>
BBED> sum apply
Check value for File 0, Block 225:
current = 0x55db, required = 0x55db
BBED> verify
DBVERIFY - Verification starting
FILE = /u01/oracle/oradata/ora11g/system01.dbf
BLOCK = 225
Block Checking: DBA = 4194529, Block Type = KTB-managed data block
data header at 0xb6773244
kdbchk: row locked by non-existent transaction
        table=0   slot=20
        lockid=1   ktbbhitc=1
Block 225 failed with check code 6101
DBVERIFY - Verification complete
Total Blocks Examined         : 1
Total Blocks Processed (Data) : 1
Total Blocks Failing   (Data) : 1
Total Blocks Processed (Index): 0
Total Blocks Failing   (Index): 0
Total Blocks Empty            : 0
Total Blocks Marked Corrupt   : 0
Total Blocks Influx           : 0
BBED> p *kdbr[20]
rowdata[608]
------------
ub1 rowdata[608]                            @817      0x2c
BBED> dump
 File: /u01/oracle/oradata/ora11g/system01.dbf (0)
 Block: 225              Offsets:  817 to  848           Dba:0x00000000
------------------------------------------------------------------------
 2c011102 c115145f 53595353 4d553230 5f343035 34303430 31322402 c10202c1
 <32 bytes per line>
BBED> m /x 002c
 File: /u01/oracle/oradata/ora11g/system01.dbf (0)
 Block: 225              Offsets:  817 to  848           Dba:0x00000000
------------------------------------------------------------------------
 002c1102 c115145f 53595353 4d553230 5f343035 34303430 31322402 c10202c1
 <32 bytes per line>
BBED> sum apply
Check value for File 0, Block 225:
current = 0x79f6, required = 0x79f6
BBED> verify
DBVERIFY - Verification starting
FILE = /u01/oracle/oradata/ora11g/system01.dbf
BLOCK = 225
Block Checking: DBA = 4194529, Block Type = KTB-managed data block
data header at 0xb6773244
kdbchk: row locked by non-existent transaction
        table=0   slot=20
        lockid=44   ktbbhitc=1
Block 225 failed with check code 6101
DBVERIFY - Verification complete
Total Blocks Examined         : 1
Total Blocks Processed (Data) : 1
Total Blocks Failing   (Data) : 1
Total Blocks Processed (Index): 0
Total Blocks Failing   (Index): 0
Total Blocks Empty            : 0
Total Blocks Marked Corrupt   : 0
Total Blocks Influx           : 0
BBED> p *kdbr[20]
rowdata[608]
------------
ub1 rowdata[608]                            @817      0x00
BBED> d
 File: /u01/oracle/oradata/ora11g/system01.dbf (0)
 Block: 225              Offsets:  817 to  848           Dba:0x00000000
------------------------------------------------------------------------
 002c1102 c115145f 53595353 4d553230 5f343035 34303430 31322402 c10202c1
 <32 bytes per line>
BBED> m /x 2c00
 File: /u01/oracle/oradata/ora11g/system01.dbf (0)
 Block: 225              Offsets:  817 to  848           Dba:0x00000000
------------------------------------------------------------------------
 2c001102 c115145f 53595353 4d553230 5f343035 34303430 31322402 c10202c1
 <32 bytes per line>
BBED> sum apply
Check value for File 0, Block 225:
current = 0x55da, required = 0x55da
BBED> verify
DBVERIFY - Verification starting
FILE = /u01/oracle/oradata/ora11g/system01.dbf
BLOCK = 225
DBVERIFY - Verification complete
Total Blocks Examined         : 1
Total Blocks Processed (Data) : 1
Total Blocks Failing   (Data) : 0
Total Blocks Processed (Index): 0
Total Blocks Failing   (Index): 0
Total Blocks Empty            : 0
Total Blocks Marked Corrupt   : 0
Total Blocks Influx           : 0

dump 321 block

SQL> alter system dump datafile '/u01/oracle/oradata/ora11g/system01.dbf' block 321;
System altered.
SQL> select value from v$diag_info where name='Default Trace File';
VALUE
--------------------------------------------------------------------------------
/u01/oracle/diag/rdbms/ora11g/ora11g/trace/ora11g_ora_19704.trc
--trace文件
Block header dump:  0x00400141
 Object id on Block? Y
 seg/obj: 0x22  csc: 0xb8c.3c0c28ec  itc: 2  flg: -  typ: 2 - INDEX
     fsl: 0  fnx: 0x0 ver: 0x01
 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0000.000.00000000  0x00000000.0000.00  ----    0  fsc 0x0000.00000000
0x02   0x0000.038.00000033  0x0040021e.0033.3d  --U-    1  fsc 0x0000.3c0c28ed

bbed提交321 block 事务

BBED> set block 321
        BLOCK#          321
BBED> map
 File: /u01/oracle/oradata/ora11g/system01.dbf (0)
 Block: 321                                   Dba:0x00000000
------------------------------------------------------------
 KTB Data Block (Index Leaf)
 struct kcbh, 20 bytes                      @0
 struct ktbbh, 72 bytes                     @20
 struct kdxle, 32 bytes                     @92
 b2 kd_off[32]                              @124
 ub1 freespace[7581]                        @188
 ub1 rowdata[351]                           @7769
 ub4 tailchk                                @8188
BBED> p ktbbh
struct ktbbh, 72 bytes                      @20
   ub1 ktbbhtyp                             @20       0x02 (KDDBTINDEX)
   union ktbbhsid, 4 bytes                  @24
      ub4 ktbbhsg1                          @24       0x00000022
      ub4 ktbbhod1                          @24       0x00000022
   struct ktbbhcsc, 8 bytes                 @28
      ub4 kscnbas                           @28       0x3c0c28ec
      ub2 kscnwrp                           @32       0x0b8c
   b2 ktbbhict                              @36      -2046
   ub1 ktbbhflg                             @38       0x02 (NONE)
   ub1 ktbbhfsl                             @39       0x00
   ub4 ktbbhfnx                             @40       0x00000000
   struct ktbbhitl[0], 24 bytes             @44
      struct ktbitxid, 8 bytes              @44
         ub2 kxidusn                        @44       0x0000
         ub2 kxidslt                        @46       0x0000
         ub4 kxidsqn                        @48       0x00000000
      struct ktbituba, 8 bytes              @52
         ub4 kubadba                        @52       0x00000000
         ub2 kubaseq                        @56       0x0000
         ub1 kubarec                        @58       0x00
      ub2 ktbitflg                          @60       0x0000 (NONE)
      union _ktbitun, 2 bytes               @62
         b2 _ktbitfsc                       @62       0
         ub2 _ktbitwrp                      @62       0x0000
      ub4 ktbitbas                          @64       0x00000000
   struct ktbbhitl[1], 24 bytes             @68
      struct ktbitxid, 8 bytes              @68
         ub2 kxidusn                        @68       0x0000
         ub2 kxidslt                        @70       0x0038
         ub4 kxidsqn                        @72       0x00000033
      struct ktbituba, 8 bytes              @76
         ub4 kubadba                        @76       0x0040021e
         ub2 kubaseq                        @80       0x0033
         ub1 kubarec                        @82       0x3d
      ub2 ktbitflg                          @84       0x2001 (KTBFUPB)
      union _ktbitun, 2 bytes               @86
         b2 _ktbitfsc                       @86       0
         ub2 _ktbitwrp                      @86       0x0000
      ub4 ktbitbas                          @88       0x3c0c28ed
BBED> m /x 0180 offset 84
Warning: contents of previous BIFILE will be lost. Proceed? (Y/N) y
 File: /u01/oracle/oradata/ora11g/system01.dbf (0)
 Block: 321              Offsets:   84 to  595           Dba:0x00000000
------------------------------------------------------------------------
 01800000 ed280c3c 00008001 00000000 20006400 011e9d1d 00000000 00000000
 00000000 06000000 601f0000 561f4b1f 401f351f 2a1f1f1f 141f091f fe1ef31e
 e81edd1e d21ec71e bc1eb11e a61e9b1e 901e851e 7a1e6f1e 641e591e 4e1e431e
 381e2d1e 221e171e 0c1e011e 00000000 00000000 00000000 00000000 00000000
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 <32 bytes per line>
BBED> sum apply
Check value for File 0, Block 321:
current = 0x990e, required = 0x990e
BBED> verify
DBVERIFY - Verification starting
FILE = /u01/oracle/oradata/ora11g/system01.dbf
BLOCK = 321
Block Checking: DBA = 4194625, Block Type = KTB-managed data block
**** actual rows locked by itl 2  = 1 != # in trans. header = 0
---- end index block validation
Block 321 failed with check code 6401
DBVERIFY - Verification complete
Total Blocks Examined         : 1
Total Blocks Processed (Data) : 0
Total Blocks Failing   (Data) : 0
Total Blocks Processed (Index): 1
Total Blocks Failing   (Index): 1
Total Blocks Empty            : 0
Total Blocks Marked Corrupt   : 0
Total Blocks Influx           : 0
BBED> p kdxle
struct kdxle, 32 bytes                      @92
   struct kdxlexco, 16 bytes                @92
      ub1 kdxcolev                          @92       0x00
      ub1 kdxcolok                          @93       0x00
      ub1 kdxcoopc                          @94       0x80
      ub1 kdxconco                          @95       0x01
      ub4 kdxcosdc                          @96       0x00000000
      sb2 kdxconro                          @100      32
      b2 kdxcofbo                           @102      100
      b2 kdxcofeo                           @104      7681
      b2 kdxcoavs                           @106      7581
   b2 kdxlespl                              @108      0
   sb2 kdxlende                             @110      0
   ub4 kdxlenxt                             @112      0x00000000
   ub4 kdxleprv                             @116      0x00000000
   ub1 kdxledsz                             @120      0x06
   ub1 kdxleunuse                           @121      0x00
BBED> m /x 00 offset 95
 File: /u01/oracle/oradata/ora11g/system01.dbf (0)
 Block: 321              Offsets:   95 to  606           Dba:0x00000000
------------------------------------------------------------------------
 00000000 00200064 00011e9d 1d000000 00000000 00000000 00060000 00601f00
 00561f4b 1f401f35 1f2a1f1f 1f141f09 1ffe1ef3 1ee81edd 1ed21ec7 1ebc1eb1
 1ea61e9b 1e901e85 1e7a1e6f 1e641e59 1e4e1e43 1e381e2d 1e221e17 1e0c1e01
 1e000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 <32 bytes per line>
BBED> sum apply
Check value for File 0, Block 321:
current = 0x380e, required = 0x380e
BBED> verify
DBVERIFY - Verification starting
FILE = /u01/oracle/oradata/ora11g/system01.dbf
BLOCK = 321
DBVERIFY - Verification complete
Total Blocks Examined         : 1
Total Blocks Processed (Data) : 0
Total Blocks Failing   (Data) : 0
Total Blocks Processed (Index): 1
Total Blocks Failing   (Index): 0
Total Blocks Empty            : 0
Total Blocks Marked Corrupt   : 0
Total Blocks Influx           : 0

bbed提交事务open依旧报错

SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 2
ORA-00376: file 3 cannot be read at this time
ORA-01110: data file 3: '/u01/oracle/oradata/ora11g/undotbs01.dbf'
Process ID: 20229
Session ID: 125 Serial number: 5

做10046的trace文件错误依旧

=====================
PARSING IN CURSOR #3065238148 len=142 dep=3 uid=0 oct=3 lid=0 tim=1377519352653391 hv=361892850 ad='269a8064' sqlid='7bd391hat42zk'
select /*+ rule */ name,file#,block#,status$,user#,undosqn,xactsqn,scnbas,scnwrp,DECODE(inst#,0,NULL,inst#),ts#,spare1 from undo$ where us#=:1
END OF STMT
PARSE #3065238148:c=2999,e=3152,p=0,cr=0,cu=0,mis=1,r=0,dep=3,og=3,plh=0,tim=1377519352653388
BINDS #3065238148:
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=b6b32268  bln=22  avl=02  flg=05
  value=27
EXEC #3065238148:c=2999,e=3182,p=0,cr=0,cu=0,mis=1,r=0,dep=3,og=3,plh=906473769,tim=1377519352657030
WAIT #3065238148: nam='db file sequential read' ela= 169 file#=1 block#=321 blocks=1 obj#=34 tim=1377519352657400
WAIT #3065238148: nam='db file sequential read' ela= 25 file#=1 block#=225 blocks=1 obj#=15 tim=1377519352657546
FETCH #3065238148:c=1000,e=578,p=2,cr=2,cu=0,mis=0,r=1,dep=3,og=3,plh=906473769,tim=1377519352657738
STAT #3065238148 id=1 cnt=1 pid=0 pos=1 obj=15 op='TABLE ACCESS BY INDEX ROWID UNDO$ (cr=2 pr=2 pw=0 time=538 us)'
STAT #3065238148 id=2 cnt=1 pid=1 pos=1 obj=34 op='INDEX UNIQUE SCAN I_UNDO1 (cr=1 pr=1 pw=0 time=338 us)'
CLOSE #3065238148:c=0,e=13,dep=3,type=0,tim=1377519352657942
WAIT #3065201888: nam='control file sequential read' ela= 111 file#=0 block#=1 blocks=1 obj#=15 tim=1377519352658170
WAIT #3065201888: nam='control file sequential read' ela= 19 file#=0 block#=16 blocks=1 obj#=15 tim=1377519352658269
WAIT #3065201888: nam='control file sequential read' ela= 18 file#=0 block#=18 blocks=1 obj#=15 tim=1377519352658329
WAIT #3065201888: nam='control file sequential read' ela= 19 file#=0 block#=24 blocks=1 obj#=15 tim=1377519352658391
WAIT #3065201888: nam='control file sequential read' ela= 17 file#=0 block#=1 blocks=1 obj#=15 tim=1377519352658469
WAIT #3065201888: nam='control file sequential read' ela= 16 file#=0 block#=16 blocks=1 obj#=15 tim=1377519352658528
WAIT #3065201888: nam='control file sequential read' ela= 28 file#=0 block#=18 blocks=1 obj#=15 tim=1377519352658787
WAIT #3065201888: nam='control file sequential read' ela= 20 file#=0 block#=32 blocks=1 obj#=15 tim=1377519352658883
DDE rules only execution for: ORA 1110
----- START Event Driven Actions Dump ----
---- END Event Driven Actions Dump ----
----- START DDE Actions Dump -----
Executing SYNC actions
----- START DDE Action: 'DB_STRUCTURE_INTEGRITY_CHECK' (Async) -----
Successfully dispatched
----- END DDE Action: 'DB_STRUCTURE_INTEGRITY_CHECK' (SUCCESS, 0 csec) -----
Executing ASYNC actions
----- END DDE Actions Dump (total 0 csec) -----
FETCH #3065201888:c=11998,e=12032,p=5,cr=6,cu=0,mis=0,r=0,dep=2,og=4,plh=2853959010,tim=1377519352661044
STAT #3065201888 id=1 cnt=0 pid=0 pos=1 obj=18 op='TABLE ACCESS BY INDEX ROWID OBJ$ (cr=0 pr=0 pw=0 time=19 us)'
STAT #3065201888 id=2 cnt=1 pid=1 pos=1 obj=37 op='INDEX RANGE SCAN I_OBJ2 (cr=3 pr=3 pw=0 time=1001 us)'

通过观察每次的10046文件,发现总是在查询select /*+ rule */ name,file#, block#,status$,user#,undosqn,xactsqn, scnbas,scnwrp,DECODE(inst#,0,NULL,inst#),ts#,spare1 from undo$ where us#=:1并且总是us#=27号回滚段上过不去,通过使用bbed或者dul修改数据文件,让数据库启动时候不再扫描27号回滚段(主要是修改undo$.status$=1,block位置可以通过10046确定,也可以通过odu来确定,亦或者找个相同版本的库查看)

继续open数据库

SQL> startup pfile='/tmp/pfile' mount
ORACLE instance started.
Total System Global Area  175775744 bytes
Fixed Size                  1343668 bytes
Variable Size             117444428 bytes
Database Buffers           50331648 bytes
Redo Buffers                6656000 bytes
Database mounted.
SQL> alter database open;
Database altered.

删除有问题undo,重建新undo

SQL> drop tablespace undotbs1;
drop tablespace undotbs1
*
ERROR at line 1:
ORA-01561: failed to remove all objects in the tablespace specified
SQL> !oerr ora 01561
01561, 00000, "failed to remove all objects in the tablespace specified"
// *Cause: Failed to remove all objects when dropping a tablespace
// *Action: Retry the drop tablespace until all objects are dropped
SQL> drop tablespace undotbs1 including contents;
drop tablespace undotbs1 including contents
*
ERROR at line 1:
ORA-01561: failed to remove all objects in the tablespace specified
SQL> drop tablespace undotbs1 including contents and datafiles;
drop tablespace undotbs1 including contents and datafiles
*
ERROR at line 1:
ORA-01561: failed to remove all objects in the tablespace specified
-------------------------------------------------
因为前面使用了bbed/dul之类的工具,修改了undo$表,
导致undo$中的标记27号回滚段与seg$中不一致
解决方法使得修改undo$.status$为3或者2
-------------------------------------------------
SQL> update undo$ set status$=3 where us#=27;
1 row updated.
SQL> commit;
Commit complete.
SQL> drop tablespace undotbs1 ;
drop tablespace undotbs1
*
ERROR at line 1:
ORA-01561: failed to remove all objects in the tablespace specified
SQL> shutdown abort;
ORACLE instance shut down.
SQL> startup pfile='/tmp/pfile'
ORACLE instance started.
Total System Global Area  175775744 bytes
Fixed Size                  1343668 bytes
Variable Size             117444428 bytes
Database Buffers           50331648 bytes
Redo Buffers                6656000 bytes
Database mounted.
Database opened.
SQL> drop tablespace undotbs1 ;
Tablespace dropped.
SQL> create undo tablespace undotbs1 datafile '/u01/oracle/oradata/ora11g/undotbs1.dbf' size 10M;
Tablespace created.

直接修改基表,bbed/dul修改数据文件等操作都是危险级别非常高的操作,而且oracle not support,在没有十足把握之前,一定不要在生产环境中执行,如果数据库挂了需要类似方法恢复,也请一定保留现场,如果你搞不定能够还原现场
部分操作可以参考:通过bbed修改回滚段状态解决ORA-00704故障

重建控制文件丢失undo异常恢复—ORA-01173模拟与恢复

数据库异常关闭,使用resetlogs方式重建控制文件,不包含undo表空间相关数据库,然后尝试resetlogs打开数据库,非常容易出现ORA-01173: data dictionary indicates missing data file from system tablespace之类的错误,本文通过模拟该错误,并简单提供解决方法
模拟ORA-01173错误

SQL> select * from v$version;
BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - Production
PL/SQL Release 11.2.0.3.0 - Production
CORE    11.2.0.3.0      Production
TNS for Linux: Version 11.2.0.3.0 - Production
NLSRTL Version 11.2.0.3.0 - Production
SQL> shutdown abort;
ORACLE instance shut down.
--除掉undo表空间相关数据库然后使用resetlogs模式重建控制文件
SQL> startup nomount;
ORACLE instance started.
Total System Global Area  175775744 bytes
Fixed Size                  1343668 bytes
Variable Size             117444428 bytes
Database Buffers           50331648 bytes
Redo Buffers                6656000 bytes
SQL> @/tmp/ctl.sql
Control file created.
SQL> alter database open resetlogs;
alter database open resetlogs
*
ERROR at line 1:
ORA-01194: file 1 needs more recovery to be consistent
ORA-01110: data file 1: '/u01/oracle/oradata/ora11g/system01.dbf'
SQL> recover database until cancel;
ORA-00283: recovery session canceled due to errors
ORA-01610: recovery using the BACKUP CONTROLFILE option must be done
SQL> recover database until cancel using backup controlfile;
ORA-00279: change 12696930512547 generated at 08/26/2013 14:43:13 needed for
thread 1
ORA-00289: suggestion : /u01/oracle/oradata/ora11g/archivelog/1_8_824475918.dbf
ORA-00280: change 12696930512547 for thread 1 is in sequence #8
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
cancel;
ORA-00308: cannot open archived log 'cancel;'
ORA-27037: unable to obtain file status
Linux Error: 2: No such file or directory
Additional information: 3
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
cancel
ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below
ORA-01194: file 1 needs more recovery to be consistent
ORA-01110: data file 1: '/u01/oracle/oradata/ora11g/system01.dbf'
ORA-01112: media recovery not started
SQL> alter database open resetlogs;
alter database open resetlogs
*
ERROR at line 1:
ORA-01194: file 1 needs more recovery to be consistent
ORA-01110: data file 1: '/u01/oracle/oradata/ora11g/system01.dbf'
SQL> recover database until cancel using backup controlfile;
ORA-00279: change 12696930512547 generated at 08/26/2013 14:43:13 needed for
thread 1
ORA-00289: suggestion : /u01/oracle/oradata/ora11g/archivelog/1_8_824475918.dbf
ORA-00280: change 12696930512547 for thread 1 is in sequence #8
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
/u01/oracle/oradata/ora11g/redo02.log
Log applied.
Media recovery complete.
SQL> alter database open resetlogs;
alter database open resetlogs
*
ERROR at line 1:
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-01173: data dictionary indicates missing data file from system tablespace
Process ID: 15053
Session ID: 125 Serial number: 3

alert日志

Mon Aug 26 15:22:20 2013
alter database open resetlogs
RESETLOGS after complete recovery through change 12696930514503
Archived Log entry 1 added for thread 1 sequence 7 ID 0xfb91eb36 dest 1:
Archived Log entry 2 added for thread 1 sequence 8 ID 0xfb91eb36 dest 1:
Archived Log entry 3 added for thread 1 sequence 6 ID 0xfb91eb36 dest 1:
Clearing online redo logfile 1 /u01/oracle/oradata/ora11g/redo01.log
Clearing online log 1 of thread 1 sequence number 7
Clearing online redo logfile 1 complete
Clearing online redo logfile 2 /u01/oracle/oradata/ora11g/redo02.log
Clearing online log 2 of thread 1 sequence number 8
Clearing online redo logfile 2 complete
Clearing online redo logfile 3 /u01/oracle/oradata/ora11g/redo03.log
Clearing online log 3 of thread 1 sequence number 6
Clearing online redo logfile 3 complete
Resetting resetlogs activation ID 4220644150 (0xfb91eb36)
Online log /u01/oracle/oradata/ora11g/redo01.log: Thread 1 Group 1 was previously cleared
Online log /u01/oracle/oradata/ora11g/redo02.log: Thread 1 Group 2 was previously cleared
Online log /u01/oracle/oradata/ora11g/redo03.log: Thread 1 Group 3 was previously cleared
Mon Aug 26 15:22:26 2013
Setting recovery target incarnation to 2
Using SCN growth rate of 16384 per second
Mon Aug 26 15:22:26 2013
Assigning activation ID 4220652407 (0xfb920b77)
LGWR: STARTING ARCH PROCESSES
Mon Aug 26 15:22:26 2013
ARC0 started with pid=20, OS id=15131
ARC0: Archival started
LGWR: STARTING ARCH PROCESSES COMPLETE
ARC0: STARTING ARCH PROCESSES
Mon Aug 26 15:22:27 2013
ARC1 started with pid=21, OS id=15135
Mon Aug 26 15:22:27 2013
ARC2 started with pid=22, OS id=15139
Thread 1 opened at log sequence 1
  Current log# 1 seq# 1 mem# 0: /u01/oracle/oradata/ora11g/redo01.log
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Mon Aug 26 15:22:27 2013
SMON: enabling cache recovery
Mon Aug 26 15:22:27 2013
ARC3 started with pid=23, OS id=15143
ARC1: Archival started
ARC2: Archival started
ARC2: Becoming the 'no FAL' ARCH
ARC2: Becoming the 'no SRL' ARCH
ARC1: Becoming the heartbeat ARCH
Errors in file /u01/oracle/diag/rdbms/ora11g/ora11g/trace/ora11g_ora_15053.trc:
ORA-01173: data dictionary indicates missing data file from system tablespace
Errors in file /u01/oracle/diag/rdbms/ora11g/ora11g/trace/ora11g_ora_15053.trc:
ORA-01173: data dictionary indicates missing data file from system tablespace
Error 1173 happened during db open, shutting down database
USER (ospid: 15053): terminating the instance due to error 1173
Instance terminated by USER, pid = 15053
ORA-1092 signalled during: alter database open resetlogs...
opiodr aborting process unknown ospid (15053) as a result of ORA-1092
Mon Aug 26 15:22:29 2013
ORA-1092 : opitsk aborting process

解决方案
解决方案设置相关异常回滚段,然后屏蔽掉,数据库即可正常起来

--参数文件修改(使用event或者工具,系统命令等)
undo_management='manual'
_corrupted_rollback_segments
--重启数据库使得_corrupted_rollback_segments生效
shutdown immediate;
startup
--尝试open数据库
recover database;
alter database open;
--新建undo
create undo tablespace undo_new datafile '' size 100m autoextend on next 10m maxsize 30G;
--删除老undo
drop tablespace old_undo
--修改参数
shutdonw immediate
undo_management='auto'
undo_tablespace='unod_new'
--启动数据库
startup
--逻辑导出数据,导入新库

姊妹篇:重建控制文件丢失undo异常恢复—ORA-600[25025]模拟与恢复

重建控制文件丢失undo异常恢复—ORA-600 25025模拟与恢复

在现实的情况中,有些人因为某种原因重建控制文件(丢失undo[有意或者无意]),然后又resetlogs库尝试恢复,这样的操作可能导致各种比较麻烦的恢复,这里模拟ORA-600[25025]异常恢复
模拟ORA-600[25025]错误

SQL> select * from v$version;
BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - Production
PL/SQL Release 11.2.0.3.0 - Production
CORE    11.2.0.3.0      Production
TNS for Linux: Version 11.2.0.3.0 - Production
NLSRTL Version 11.2.0.3.0 - Production
SQL> alter database backup controlfile to trace as '/tmp/ctl';
Database altered.
SQL> create table chf.t_xifenfei_www as select * from dba_objects where 1=0;
Table created.
SQL> insert into chf.t_xifenfei_www select * from dba_objects;
74749 rows created.
--另外一个会话abort
SQL> shutdown abort;
ORACLE instance shut down.
SQL> STARTUP NOMOUNT
ORACLE instance started.
Total System Global Area  175775744 bytes
Fixed Size                  1343668 bytes
Variable Size             117444428 bytes
Database Buffers           50331648 bytes
Redo Buffers                6656000 bytes
SQL> !vi /tmp/ctl.sql
CREATE CONTROLFILE REUSE DATABASE "ORA11G" NORESETLOGS  ARCHIVELOG
    MAXLOGFILES 16
    MAXLOGMEMBERS 3
    MAXDATAFILES 100
    MAXINSTANCES 8
    MAXLOGHISTORY 292
LOGFILE
  GROUP 1 '/u01/oracle/oradata/ora11g/redo01.log'  SIZE 50M BLOCKSIZE 512,
  GROUP 2 '/u01/oracle/oradata/ora11g/redo02.log'  SIZE 50M BLOCKSIZE 512,
  GROUP 3 '/u01/oracle/oradata/ora11g/redo03.log'  SIZE 50M BLOCKSIZE 512
DATAFILE
  '/u01/oracle/oradata/ora11g/system01.dbf',
  '/u01/oracle/oradata/ora11g/sysaux01.dbf',
 -- '/u01/oracle/oradata/ora11g/undo02.dbf',
  '/u01/oracle/oradata/ora11g/users01.dbf',
  '/u01/oracle/oradata/ora11g/dbfs01.dbf',
  '/u01/oracle/oradata/ora11g/tts_xifenfei02.dbf',
  '/u01/oracle/oradata/ora11g/tts_xifenfei01.dbf',
  '/u01/oracle/oradata/ora11g/system02.dbf',
  '/u01/oracle/oradata/ora11g/czum01.dbf',
  '/u01/oracle/oradata/ora11g/undotbs02.dbf',
  '/u01/oracle/oradata/sp2008',
  '/u01/oracle/oradata/sp_2009',
  '/u01/oracle/oradata/sp_2010',
  '/u01/oracle/oradata/sp_2011',
  '/u01/oracle/oradata/sp_2012',
  '/u01/oracle/oradata/sp_2013',
  '/u01/oracle/oradata/sp_2014',
  '/u01/oracle/oradata/sp_2015',
  '/u01/oracle/oradata/sp_2016',
  '/u01/oracle/oradata/sp_2017',
  '/u01/oracle/oradata/sp_2018',
  '/u01/oracle/oradata/sp_2019',
  '/u01/oracle/oradata/sp_2020',
  '/u01/oracle/oradata/sp_2021',
  '/u01/oracle/oradata/sp_2022',
  '/u01/oracle/oradata/sp_2023',
  '/u01/oracle/oradata/sp_2024',
  '/u01/oracle/oradata/sp_2025',
  '/u01/oracle/oradata/sp_20max'
CHARACTER SET ZHS16GBK
;
"/tmp/ctl.sql" [New] 43L, 1519C written
SQL> @/tmp/ctl.sql
Control file created.
SQL> recover database using backup controlfile until cancel;
ORA-00279: change 12696930343864 generated at 05/18/2013 01:17:54 needed for
thread 1
ORA-00289: suggestion :
/u01/oracle/oradata/ora11g/archivelog/1_38_805394597.dbf
ORA-00280: change 12696930343864 for thread 1 is in sequence #38
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
/u01/oracle/oradata/ora11g/redo01.log
ORA-00310: archived log contains sequence 37; sequence 38 required
ORA-00334: archived log: '/u01/oracle/oradata/ora11g/redo01.log'
ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below
ORA-01194: file 1 needs more recovery to be consistent
ORA-01110: data file 1: '/u01/oracle/oradata/ora11g/system01.dbf'
SQL>  recover database using backup controlfile until cancel;
ORA-00279: change 12696930343864 generated at 05/18/2013 01:17:54 needed for
thread 1
ORA-00289: suggestion :
/u01/oracle/oradata/ora11g/archivelog/1_38_805394597.dbf
ORA-00280: change 12696930343864 for thread 1 is in sequence #38
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
/u01/oracle/oradata/ora11g/redo03.log
ORA-00310: archived log contains sequence 39; sequence 38 required
ORA-00334: archived log: '/u01/oracle/oradata/ora11g/redo03.log'
ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below
ORA-01194: file 1 needs more recovery to be consistent
ORA-01110: data file 1: '/u01/oracle/oradata/ora11g/system01.dbf'
SQL> recover database using backup controlfile until cancel;
ORA-00279: change 12696930343864 generated at 05/18/2013 01:17:54 needed for
thread 1
ORA-00289: suggestion :
/u01/oracle/oradata/ora11g/archivelog/1_38_805394597.dbf
ORA-00280: change 12696930343864 for thread 1 is in sequence #38
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
/u01/oracle/oradata/ora11g/redo02.log
ORA-00279: change 12696930370956 generated at 08/26/2013 13:00:25 needed for
thread 1
ORA-00289: suggestion :
/u01/oracle/oradata/ora11g/archivelog/1_39_805394597.dbf
ORA-00280: change 12696930370956 for thread 1 is in sequence #39
ORA-00278: log file '/u01/oracle/oradata/ora11g/redo02.log' no longer needed
for this recovery
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
/u01/oracle/oradata/ora11g/redo03.log
Log applied.
Media recovery complete.
SQL> alter database open resetlogs;
alter database open resetlogs
*
ERROR at line 1:
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00600: internal error code, arguments: [25025], [3], [], [], [], [], [],
[], [], [], [], []
Process ID: 12603
Session ID: 125 Serial number: 3

alert日志信息

ORA-279 signalled during: ALTER DATABASE RECOVER  database using backup controlfile until cancel  ...
ALTER DATABASE RECOVER    LOGFILE '/u01/oracle/oradata/ora11g/redo02.log'
Media Recovery Log /u01/oracle/oradata/ora11g/redo02.log
Mon Aug 26 13:05:00 2013
ORA-279 signalled during: ALTER DATABASE RECOVER    LOGFILE '/u01/oracle/oradata/ora11g/redo02.log'  ...
Mon Aug 26 13:05:10 2013
ALTER DATABASE RECOVER    LOGFILE '/u01/oracle/oradata/ora11g/redo03.log'
Media Recovery Log /u01/oracle/oradata/ora11g/redo03.log
Mon Aug 26 13:05:10 2013
Incomplete recovery applied all redo ever generated.
Recovery completed through change 12696930370973 time 08/26/2013 13:00:56
Media Recovery Complete (ora11g)
Completed: ALTER DATABASE RECOVER    LOGFILE '/u01/oracle/oradata/ora11g/redo03.log'
alter database open resetlogs
RESETLOGS after complete recovery through change 12696930370973
Archived Log entry 1 added for thread 1 sequence 37 ID 0xfa6fa6cb dest 1:
Archived Log entry 2 added for thread 1 sequence 38 ID 0xfa6fa6cb dest 1:
Archived Log entry 3 added for thread 1 sequence 39 ID 0xfa6fa6cb dest 1:
Clearing online redo logfile 1 /u01/oracle/oradata/ora11g/redo01.log
Clearing online log 1 of thread 1 sequence number 37
Mon Aug 26 13:05:22 2013
Clearing online redo logfile 1 complete
Clearing online redo logfile 2 /u01/oracle/oradata/ora11g/redo02.log
Clearing online log 2 of thread 1 sequence number 38
Clearing online redo logfile 2 complete
Clearing online redo logfile 3 /u01/oracle/oradata/ora11g/redo03.log
Clearing online log 3 of thread 1 sequence number 39
Clearing online redo logfile 3 complete
Resetting resetlogs activation ID 4201621195 (0xfa6fa6cb)
Online log /u01/oracle/oradata/ora11g/redo01.log: Thread 1 Group 1 was previously cleared
Online log /u01/oracle/oradata/ora11g/redo02.log: Thread 1 Group 2 was previously cleared
Online log /u01/oracle/oradata/ora11g/redo03.log: Thread 1 Group 3 was previously cleared
Mon Aug 26 13:05:33 2013
Setting recovery target incarnation to 2
Mon Aug 26 13:05:33 2013
Using SCN growth rate of 16384 per second
Mon Aug 26 13:05:33 2013
Assigning activation ID 4220644150 (0xfb91eb36)
LGWR: STARTING ARCH PROCESSES
Mon Aug 26 13:05:33 2013
ARC0 started with pid=20, OS id=12679
ARC0: Archival started
LGWR: STARTING ARCH PROCESSES COMPLETE
ARC0: STARTING ARCH PROCESSES
Mon Aug 26 13:05:35 2013
ARC1 started with pid=21, OS id=12683
Mon Aug 26 13:05:35 2013
ARC2 started with pid=22, OS id=12687
Mon Aug 26 13:05:36 2013
ARC3 started with pid=24, OS id=12691
ARC1: Archival started
ARC2: Archival started
ARC1: Becoming the 'no FAL' ARCH
ARC1: Becoming the 'no SRL' ARCH
ARC2: Becoming the heartbeat ARCH
Thread 1 opened at log sequence 1
  Current log# 1 seq# 1 mem# 0: /u01/oracle/oradata/ora11g/redo01.log
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Mon Aug 26 13:05:36 2013
SMON: enabling cache recovery
ARC3: Archival started
ARC0: STARTING ARCH PROCESSES COMPLETE
Errors in file /u01/oracle/diag/rdbms/ora11g/ora11g/trace/ora11g_ora_12603.trc  (incident=146705):
ORA-00600: internal error code, arguments: [25025], [3], [], [], [], [], [], [], [], [], [], []
Incident details in: /u01/oracle/diag/rdbms/ora11g/ora11g/incident/incdir_146705/ora11g_ora_12603_i146705.trc
Mon Aug 26 13:05:45 2013
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Undo initialization errored: err:600 serial:0 start:57601994 end:57610584 diff:8590 (85 seconds)
Errors in file /u01/oracle/diag/rdbms/ora11g/ora11g/trace/ora11g_ora_12603.trc:
ORA-00600: internal error code, arguments: [25025], [3], [], [], [], [], [], [], [], [], [], []
Errors in file /u01/oracle/diag/rdbms/ora11g/ora11g/trace/ora11g_ora_12603.trc:
ORA-00600: internal error code, arguments: [25025], [3], [], [], [], [], [], [], [], [], [], []
Error 600 happened during db open, shutting down database
USER (ospid: 12603): terminating the instance due to error 600
Instance terminated by USER, pid = 12603
ORA-1092 signalled during: alter database open resetlogs...
opiodr aborting process unknown ospid (12603) as a result of ORA-1092
Mon Aug 26 13:05:47 2013
ORA-1092 : opitsk aborting process

trace文件

*** 2013-08-26 13:05:38.945
dbkedDefDump(): Starting incident default dumps (flags=0x2, level=3, mask=0x0)
----- Current SQL Statement for this session (sql_id=7j16t46cacjt9) -----
alter database open resetlogs
----- Call Stack Trace -----
calling              call     entry                argument values in hex
location             type     point                (? means dubious value)
-------------------- -------- -------------------- ----------------------------
skdstdst()+41        call     kgdsdst()            BF8A34E4 ? 2 ?
ksedst1()+77         call     skdstdst()           BF8A34E4 ? 0 ? 1 ? 8592C38 ?
                                                   8592AB6 ? 8592C38 ?
ksedst()+33          call     ksedst1()            0 ? 1 ?
dbkedDefDump()+2704  call     ksedst()             0 ? BF8A40E7 ? 47EF1FF4 ?
                                                   BF8A3D66 ? 0 ? BF8A3BF4 ?
ksedmp()+47          call     dbkedDefDump()       3 ? 2 ?
ksfdmp()+59          call     ksedmp()             3EB ? BF8A5458 ? F1DADED ?
                                                   116056E0 ? 3EB ? 116056E0 ?
dbgexPhaseII()+1725  call     00000000             116056E0 ? 3EB ?
dbgexProcessError()  call     dbgexPhaseII()       B6C515AC ? B6A0C890 ?
+2550                                              BF8A8D30 ?
dbgeExecuteForError  call     dbgexProcessError()  B6C515AC ? B6A0C890 ? 1 ? 0 ?
()+65                                              B6C515AC ? B6A0C890 ?
dbgePostErrorKGE()+  call     dbgeExecuteForError  B6C515AC ? B6A0C890 ? 0 ? 1 ?
1794                          ()                   0 ?
dbkePostKGE_kgsf()+  call     dbgePostErrorKGE()   116056E0 ? B6C330D4 ? 258 ?
50
kgeade()+324         call     00000000             116056E0 ? B6C330D4 ? 258 ?
kgeriv_int()+107     call     kgeade()             116056E0 ? 11605808 ?
                                                   B6C330D4 ? 258 ? 0 ? 61C1 ?
kgeriv()+25          call     kgeriv_int()         116056E0 ? B6C330D4 ? 61C1 ?
                                                   0 ? 1 ? BF8A96B8 ?
kgesiv()+98          call     kgeriv()             116056E0 ? B6C330D4 ? 61C1 ?
                                                   1 ? BF8A96B8 ?
ksesic1()+48         call     kgesiv()             116056E0 ? B6C330D4 ? 61C1 ?
                                                   1 ? BF8A96B8 ? 61C1 ? 1 ?krta2rfn()+78
                                                   call     ksesic1()            61C1 ? 0 ? 3 ? 0 ?
kttsinfo()+496       call     krta2rfn()           3 ? 0 ? 0 ? 0 ? 0 ? 0 ?
ktusmout_online_ut(  call     kttsinfo()           9 ? 0 ? 0 ? BF8A9948 ?
)+810
ktusmiut_init_ut()+  call     ktusmout_online_ut(  B000 ? 89E57F8 ? 9 ?
                                                   BF8A96B8 ?
348                           )
ktuini()+518         call     ktusmiut_init_ut()   0 ? 0 ? 0 ? 0 ? 1 ? 0 ?
adbdrv()+46345       call     ktuini()             0 ? BF8A9DE0 ? 1004BF42 ?
                                                   BF8AA258 ? B6A0BFFC ?
                                                   10EA1F20 ?
opiexe()+16835       call     adbdrv()             25C8F1F8 ? 0 ? 0 ? 2A34F528 ?
                                                   2A0400C8 ? BF8AF07C ?
opiosq0()+3437       call     opiexe()             4 ? 0 ? BF8AFE8C ?
kpooprx()+239        call     opiosq0()            3 ? E ? BF8B0184 ? A4 ? 0 ?
kpoal8()+607         call     kpooprx()            BF8B2D6C ? BF8B10AC ? 1D ?
                                                   1 ? 0 ? A4 ?
opiodr()+962         call     00000000             5E ? 1C ? BF8B2D68 ?
ttcpip()+1930        call     00000000             5E ? 1C ? BF8B2D68 ? 0 ?
opitsk()+1355        call     ttcpip()             11616580 ? 5E ? BF8B2D68 ?
                                                   0 ? BF8B29F8 ? BF8B2E90 ?
                                                   FDEBA80 ? 0 ? BF8B2EBC ?
opiino()+827         call     opitsk()             0 ? 0 ?
opiodr()+962         call     00000000             3C ? 4 ? BF8B3E2C ?
opidrv()+479         call     opiodr()             3C ? 4 ? BF8B3E2C ? 0 ?
sou2o()+80           call     opidrv()             3C ? 4 ? BF8B3E2C ?
opimai_real()+109    call     sou2o()              BF8B3E10 ? 3C ? 4 ?
                                                   BF8B3E2C ?
ssthrdmain()+212     call     00000000             2 ? BF8B3F58 ? 0 ? 47DA6F14 ?
                                                   BF8B3F14 ? 47D9A670 ?
main()+147           call     ssthrdmain()         2 ? BF8B3F58 ?
__libc_start_main()  call     00000000             2 ? BF8B4054 ? BF8B4060 ?
+220                                               47D9A828 ? 0 ? 1 ?
_start()+33          call     __libc_start_main()  856F3B4 ? 2 ? BF8B4054 ?
                                                   BCC1EA0 ? BCC1E90 ?
                                                   47D8C790 ?
--------------------- Binary Stack Dump ---------------------

MOS中有类似描述ORA-600 [25025] [25] While Opening the Clone Database in Resetlog Mode (Doc ID 603100.1),该解决方案是重建控制文件增加所有数据文件,在本次测试中,我就是人为除掉了undo,模拟undo丢失[其实数据库已经resetlogs过了,就算加入undo重建控制文件也不会成功(人工修改undo文件头除外)],又做了不正确的重建控制文件操作的故障,我提供解决方案如下

解决办法

--参数文件修改
undo_management='manual'
--尝试open数据库
recover database;
alter database open;
--新建undo
create undo tablespace undo_new datafile '' size 100m autoextend on next 10m maxsize 30G;
--屏蔽需要恢复回滚段
select tablespace_name,segment_name,status from dba_rollback_segs;
_corrupted_rollback_segments
--重启数据库使得_corrupted_rollback_segments生效
shutdown immediate;
startup
--删除老undo
drop tablespace old_undo
--修改参数
shutdonw immediate
undo_management='auto'
undo_tablespace='unod_new'
--启动数据库
startup
--导出数据,导入新库

姊妹篇:重建控制文件丢失undo异常恢复—ORA-01173模拟与恢复

ORACLE db top 命令(oratop)

oracle 也推出来了数据库的top命令,叫做oratap,是使用c语言写的,目前只支持linux x86与x64的11gR2与12cR1
使用说明

[oracle@xifenfei tmp]$ ./oratap  -h
oratop: Release 13.2.4
Usage:
         oratop [ [Options] [Logon] ]
         Logon:
                {username[/password][@connect_identifier] | / }
                [AS {SYSDBA|SYSOPER}]
         Options:
             -i : Interval Delay (requires value in seconds, default: 3s)
             -f : Long format for header & section 4 (default: 80 column)
             -r : IORL mode for Section 2 (default is IOPS)
             -d : Real-Time Top 5 Wait Events (default: Cumulative)
             -m : MODULE/ACTION mode for Section 4 (default: USER/PROGRAM_NAME)
             -b : Batch mode
             -n : maximum number of iterations (requires number)
             -h : Help

使用说明
1. Log in to the system as the Oracle RDBMS software installation owner
2. Stage the oratop executable on the server on which the tool will be executed. On a RAC system it is only necessary to stage the executable on one node as it is RAC aware.
3. Change the name of the executable, eg.

$ mv oratop* oratop

4. Validate the permissions for oratop are 755 (-rwxr-xr-x). If the permissions are not currently set to 755, set the permissions on oratop as follows:

$ chmod 755 oratop

5. Configure the execution environment as follows (if not already set accordingly)

$ export TERM=xterm #or vt100
$ export ORACLE_HOME=<11.2 database home>
$ export LD_LIBRARY_PATH=$ORACLE_HOME/lib
$ export PATH=$ORACLE_HOME/bin:$PATH
$ export ORACLE_SID=<local 11.2 database SID to be monitored> #only needed if connecting to a local database

6. Invoke the tool to monitor a local database as follows (interval can be any interval in seconds):

$ ./oratop -i 10 / as sysdba

7. Non-dba privileged user requires specific grants to use the tool. See User Guide for grants required.
8. Databases can be monitored remotely using a tns alias as described in the oratop Users Guide .

$ ./oratop -i 10 username/password@tns_alias
$ ./oratop -i 10 system/manager@tns_alias

9. To exit the program simply press the keyboard key “q”. To abort, CTRL-C.
Troubleshooting:
If you receive an error similar to
./oratop: error while loading shared libraries: libclntsh.so.11.1: cannot open shared object file: No such file or directory that is an indication that libclntsh.so.11.1 was not found in the path defined in LD_LIBRARY_PATH similar to

$ export LD_LIBRARY_PATH=$ORACLE_HOME/lib (ie., the location of libclntsh.so.11.1)

使用效果

[oracle@xifenfei tmp]$ ./oratap  -i 10 chf/xifenfei@ora11g_p
oratop: Release 13.2.4 Production on Mon Aug 26 03:10:17 2013
Copyright (c) 2011, Oracle.  All rights reserved.
Processing ...
oratop 1:  148 ora1 03:10:13 up  3.3m,  1 ins, 301M mt,   3 sn,  1 us,   6% db
ID %CU HLD MBPS IOPS %FR PGAU ASC ASI ASW ASP   AAS USN  TPS UCPS SSRT DBC DBW
 1   0   0    0    1   5 133M   1   0   0   0   0.1   3    0    2   0u   0   0
EVENT     (Cumulative)           TOT WAITS  TIME(s)  AVG_MS PCT  WAIT_CLASS
DB CPU                                          218          68
control file parallel write           5535       30     5.5   9  System I/O
db file sequential read              10080       27     2.7   8  User I/O
os thread startup                      198       25   124.9   8  Concurrency
db file async I/O submit              2575       21     8.0   7  System I/O
ID   SID  SPID USR PROG  PGA OPN SQLID/BLOCKER  E/T STATUS STE WAIT_EVENT  W/T
 1   148  8254 CHF DEDI   2M SEL 4agz3g5aajkdc    0 ACTIVE CPU wa for cpu   0u

参考文档:
oratop – utility for near real-time monitoring of databases, RAC and Single Instance (Doc ID 1500864.1)

使用_allow_resetlogs_corruption导致ORA-00704/ORA-01555故障

以前写过一篇乱用_allow_resetlogs_corruption参数导致悲剧的文章,昨天晚上又遇到一个朋友不谨慎使用_allow_resetlogs_corruption导致ORA-00704/ORA-01555故障
环境描述
系统环境:solaris
数据库版本:10.2.0.5.7
数据存储方式:ASM
数据量:15T以上
补充事宜:数据库SCN距离headroom只有54天

报ORA-00020错误,实例crash
数据库因为超过了系统的进程数,出现dbwn进程写数据文件异常

Sun Aug 25 16:00:41 CST 2013
Errors in file /opt/oracle/admin/orcl/bdump/orcl_dbw0_7490.trc:
ORA-01148: 无法刷新数据文件 22 的文件大小
ORA-01110: 数据文件 22: '+DATA/orcl/datafile/index_jh.dbf'
ORA-00020: 超出最大进程数 ()
Sun Aug 25 16:00:41 CST 2013
Errors in file /opt/oracle/admin/orcl/bdump/orcl_dbw0_7490.trc:
ORA-01242: 数据文件出现介质故障: 数据库处于 NOARCHIVELOG 模式
ORA-01110: 数据文件 22: '+DATA/orcl/datafile/index_jh.dbf'
Sun Aug 25 16:00:41 CST 2013
DBW0: terminating instance due to error 1242
Termination issued to instance processes. Waiting for the processes to exit
Sun Aug 25 16:00:51 CST 2013
Instance termination failed to kill one or more processes
Instance terminated by DBW0, pid = 7490

ORA-00600[kcbtema_10]
实例恢复出现ORA-00600: 内部错误代码, 参数: [kcbtema_10], [1], [], [], [], [], [], []

Sun Aug 25 19:19:23 CST 2013
ALTER DATABASE OPEN
Sun Aug 25 19:19:38 CST 2013
Beginning crash recovery of 1 threads
 parallel recovery started with 16 processes
Sun Aug 25 19:19:40 CST 2013
Started redo scan
Sun Aug 25 19:20:07 CST 2013
Completed redo scan
 12016413 redo blocks read, 93405 data blocks need recovery
Sun Aug 25 19:20:19 CST 2013
Started redo application at
 Thread 1: logseq 53681, block 1091966
Sun Aug 25 19:20:19 CST 2013
Recovery of Online Redo Log: Thread 1 Group 1 Seq 53681 Reading mem 0
  Mem# 0: +DATA/orcl/onlinelog/redo_1_1.log
  Mem# 1: +DATA/orcl/onlinelog/redo_1_2.log
Sun Aug 25 19:20:21 CST 2013
Errors in file /opt/oracle/admin/orcl/bdump/orcl_p011_16944.trc:
ORA-00600: 内部错误代码, 参数: [kcbtema_10], [1], [], [], [], [], [], []
Sun Aug 25 19:20:23 CST 2013
Errors in file /opt/oracle/admin/orcl/bdump/orcl_p011_16944.trc:
ORA-00600: 内部错误代码, 参数: [kcbtema_10], [1], [], [], [], [], [], []
Sun Aug 25 19:20:23 CST 2013
Aborting crash recovery due to slave death, attempting serial crash recovery
Sun Aug 25 19:20:23 CST 2013
Beginning crash recovery of 1 threads
Sun Aug 25 19:20:23 CST 2013
Started redo scan
Sun Aug 25 19:20:47 CST 2013
Completed redo scan
 12016413 redo blocks read, 93405 data blocks need recovery
Sun Aug 25 19:20:54 CST 2013
Started redo application at
 Thread 1: logseq 53681, block 1091966
Sun Aug 25 19:20:54 CST 2013
Recovery of Online Redo Log: Thread 1 Group 1 Seq 53681 Reading mem 0
  Mem# 0: +DATA/orcl/onlinelog/redo_1_1.log
  Mem# 1: +DATA/orcl/onlinelog/redo_1_2.log
Sun Aug 25 19:20:54 CST 2013
Errors in file /opt/oracle/admin/orcl/udump/orcl_ora_16751.trc:
ORA-00600: 内部错误代码, 参数: [kcbtema_10], [1], [], [], [], [], [], []
Sun Aug 25 19:20:56 CST 2013
Aborting crash recovery due to error 600
Sun Aug 25 19:20:56 CST 2013
Errors in file /opt/oracle/admin/orcl/udump/orcl_ora_16751.trc:
ORA-00600: 内部错误代码, 参数: [kcbtema_10], [1], [], [], [], [], [], []
ORA-600 signalled during: ALTER DATABASE OPEN...

使用隐含参数

ALTER SYSTEM SET _allow_resetlogs_corruption=TRUE SCOPE=SPFILE;

报ORA-00704/ORA-01555
因为在前面的恢复中进行了不完全恢复,因此这里加入隐含参数,然后尝试resetlogs,然后报如下错误

Sun Aug 25 20:11:54 CST 2013
alter database open resetlogs
Sun Aug 25 20:12:10 CST 2013
RESETLOGS is being done without consistancy checks. This may result
in a corrupted database. The database should be recreated.
RESETLOGS after incomplete recovery UNTIL CHANGE 13429649847189
Resetting resetlogs activation ID 1312390734 (0x4e397e4e)
Sun Aug 25 20:16:25 CST 2013
Setting recovery target incarnation to 2
Sun Aug 25 20:16:42 CST 2013
************************************************************
Warning: The SCN headroom for this database is only 54 days!
************************************************************
Sun Aug 25 20:16:43 CST 2013
Assigning activation ID 1352200163 (0x5098efe3)
Thread 1 opened at log sequence 1
  Current log# 1 seq# 1 mem# 0: +DATA/orcl/onlinelog/redo_1_1.log
  Current log# 1 seq# 1 mem# 1: +DATA/orcl/onlinelog/redo_1_2.log
Successful open of redo thread 1
Sun Aug 25 20:16:43 CST 2013
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Sun Aug 25 20:16:52 CST 2013
SMON: enabling cache recovery
Sun Aug 25 20:16:52 CST 2013
ORA-01555 caused by SQL statement below (SQL ID: 4krwuz0ctqxdt, SCN: 0x0c36.d582339b):
Sun Aug 25 20:16:52 CST 2013
select ctime, mtime, stime from obj$ where obj# = :1
Sun Aug 25 20:16:52 CST 2013
Errors in file /opt/oracle/admin/orcl/udump/orcl_ora_2859.trc:
ORA-00704: 引导程序进程失败
ORA-00704: 引导程序进程失败
ORA-00604: 递归 SQL 级别 1 出现错误
ORA-01555: 快照过旧: 回退段号 143 (名称为 "_SYSSMU143$") 过小
Error 704 happened during db open, shutting down database
USER: terminating instance due to error 704
Termination issued to instance processes. Waiting for the processes to exit
Sun Aug 25 20:17:02 CST 2013
Instance termination failed to kill one or more processes
Instance terminated by USER, pid = 2859
ORA-1092 signalled during: alter database open resetlogs...

数据库当前SCN

SQL > select CHECKPOINT_CHANGE# from v$database;
CHECKPOINT_CHANGE#
------------------
    13429649947222
SQL > select distinct CHECKPOINT_CHANGE# from v$datafile_header;
CHECKPOINT_CHANGE#
------------------
    13429649947222

解决方法
因为该数据库版本为10.2.0.5.7,已经包含了scn patch,因此不能使用event或者隐含参数来修改scn,而且该库容量15T以上(asm),因此也无法使用bbed修改数据文件头,最后决定使用ordebug来解决该问题
使用oradebug DUMPvar SGA kcsgscn_
使用oradebug poke

sqlplus / as sysdba
startup mount
oradebug setmypid
oradebug DUMPvar SGA kcsgscn_
oradebug poke
recover database;
alter database open;

事后总结
查询MOS,发现ORA-00600[kcbtema_10] Raised During Recovery Operations (Doc ID 472282.1)

--故障原因
The cause of this problem has been identified and verified in unpublished Bug 5184359 ORA-600 [KCBTEMA_10].
Due to this bug, during recovery, the class designation of a data block has changed.
--处理方法
SQL>startup mount
SQL>recover database;
SQL>alter database open;

因为MOS上给的解决思路在该数据库中已经无法尝试,不能确定该方法一定可行,但是对于本次的恢复过程中,没有任何直接recover database操作(只有一次不完全恢复)确实让人有无限的遗憾和可惜。对于本次应该先查询MOS,尝试该种方法,慎重使用_allow_resetlogs_corruption参数

ORACLE 12C dbms_utility.expand_sql_text 查看SQL视图基表

在ORACLE 12C之间的版本,如果一条sql中包含了N多视图,我们要查询这条sql访问了哪些基表,以及他们的关联条件是一件比较麻烦的时候,需要进入一个个视图然后一个个去分析,纯体力活,到了12C之后,ORACLE 提供了dbms_utility.expand_sql_text来获得某条sql所对应的全部基表,不用我们再一级一级的去看.

dbms_utility.expand_sql_text


创建测试表
三个表分别来自数据库的v$datafile,v$tablespace,v$tempfile视图

CDB_PDB@CHF> create table datafile
  2  as
  3  select * from v$datafile;
表已创建。
CDB_PDB@CHF> create table tablespace
  2  as
  3  select * from v$tablespace;
表已创建。
CDB_PDB@CHF> create table tempfile
  2  as
  3  select * from v$tempfile;
表已创建。

创建视图
创建三个视图,datafile与tablespace,tempfile与tablespace,这两个视图然后做union all产生新视图

CDB_PDB@CHF> create view v_xifenfei1 as
  2  select d.name dname,t.name tname from datafile d,tablespace t where d.ts#=t.ts#;
视图已创建。
CDB_PDB@CHF> create view v_xifenfei2 as
  2  select d.name dname,t.name tname from tempfile d,tablespace t where d.ts#=t.ts#;
视图已创建。
CDB_PDB@CHF> create view v_xifenfei
  2  as
  3  select * from v_xifenfei1
  4  union all
  5  select * from v_xifenfei2;
视图已创建。

找出查询视图sql对应基表

CDB_PDB@CHF>  set linesize 32767 pagesize 0 serveroutput on
CDB_PDB@CHF>  declare
  2     original_sql clob :='select * from v_xifenfei';
  3     expanded_sql clob := empty_clob();
  4  begin
  5      dbms_utility.expand_sql_text(original_sql,expanded_sql);
  6      dbms_output.put_line(expanded_sql);
  7  end;
  8  /
SELECT "A1"."DNAME" "DNAME","A1"."TNAME" "TNAME" FROM  ( (SELECT "A4"."DNAME" "DNAME","A4"."TNAME" "TNAME" FROM  (SELECT
 "A6"."NAME" "DNAME","A5"."NAME" "TNAME" FROM CHF."DATAFILE" "A6",CHF."TABLESPACE" "A5" WHERE "A6"."TS#"="A5"."TS#") "A4
") UNION ALL  (SELECT "A3"."DNAME" "DNAME","A3"."TNAME" "TNAME" FROM  (SELECT "A8"."NAME" "DNAME","A7"."NAME" "TNAME" FR
OM CHF."TEMPFILE" "A8",CHF."TABLESPACE" "A7" WHERE "A8"."TS#"="A7"."TS#") "A3")) "A1"
PL/SQL 过程已成功完成。

格式化sql语句

/* Formatted on 2013/8/24 22:33:33 (QP5 v5.227.12220.39754) */
SELECT "A1"."DNAME" "DNAME", "A1"."TNAME" "TNAME"
  FROM ( (SELECT "A4"."DNAME" "DNAME", "A4"."TNAME" "TNAME"
            FROM (SELECT "A6"."NAME" "DNAME", "A5"."NAME" "TNAME"
                    FROM CHF."DATAFILE" "A6", CHF."TABLESPACE" "A5"
                   WHERE "A6"."TS#" = "A5"."TS#") "A4")
        UNION ALL
        (SELECT "A3"."DNAME" "DNAME", "A3"."TNAME" "TNAME"
           FROM (SELECT "A8"."NAME" "DNAME", "A7"."NAME" "TNAME"
                   FROM CHF."TEMPFILE" "A8", CHF."TABLESPACE" "A7"
                  WHERE "A8"."TS#" = "A7"."TS#") "A3")) "A1"

这里就非常清晰的看到是datafile与tablespace、tempfile与tablespace做union all的sql语句

ORACLE 12C Windows-Linux 部署DATAGURAD

环境描述
win 64中的ORACLE 12C(primary)与Linux 64中的ORACLE 12C(standby)搭建datagurad

primary force logging

C:\Users\XIFENFEI>sqlplus / as sysdba
SQL*Plus: Release 12.1.0.1.0 Production on 星期六 8月 24 16:59:53 2013
Copyright (c) 1982, 2013, Oracle.  All rights reserved.
连接到:
Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options
CDB_CDB$ROOT@SYS> select banner from v$version;
BANNER
--------------------------------------------------------------------------------
Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production
PL/SQL Release 12.1.0.1.0 - Production
CORE    12.1.0.1.0      Production
TNS for 64-bit Windows: Version 12.1.0.1.0 - Production
NLSRTL Version 12.1.0.1.0 - Production
CDB_CDB$ROOT@SYS> alter database force logging ;
数据库已更改。

primary rman backup

backup filesperset = 5 as compressed backupset database format 'd:/backup/rman/full_%U.rman';

primary 生成standby controlfile

CDB_CDB$ROOT@SYS> ALTER DATABASE CREATE STANDBY CONTROLFILE AS 'c:/control01.ctl';
数据库已更改。
CDB_CDB$ROOT@SYS> create pfile='e:/pfile.txt' from spfile;
文件已创建。

standby 参数文件

DB_CREATE_FILE_DEST='+DATA'
db_create_online_log_dest_1='+DATA'
db_unique_name='cdb_dg'
service_names='cdb'
log_archive_dest_1='LOCATION=/u02/app/oracle/archivelog/ valid_for=(all_logfiles,all_roles)'
log_archive_dest_2='service=primary lgwr async valid_for=(online_logfiles,primary_role) db_unique_name=cdb'
log_archive_config='dg_config=(cdb,cdb_dg)'
standby_file_management=auto
db_file_name_convert='E:\APP\XIFENFEI\ORADATA\','+DATA\CDB_DG\DATAFILE\'
log_file_name_convert='E:\APP\XIFENFEI\ORADATA\','+DATA\CDB_DG\LOGFILE\'
fal_server=primary

primary 修改参数

CDB_CDB$ROOT@SYS> alter system set log_archive_config='dg_config=(cdb,cdb_dg)';
系统已更改。
CDB_CDB$ROOT@SYS> alter system set log_archive_dest_2='
 2 service=standby lgwr async valid_for=(online_logfiles,primary_role) db_unique_name=cdb_dg';
系统已更改。
CDB_CDB$ROOT@SYS> alter system set log_archive_dest_1=
  2 'LOCATION=USE_DB_RECOVERY_FILE_DEST valid_for=(all_logfiles,all_roles)';
系统已更改。
CDB_CDB$ROOT@SYS> alter system set fal_server=standby;
系统已更改。
CDB_CDB$ROOT@SYS> alter system set standby_file_management=auto;
系统已更改。
CDB_CDB$ROOT@SYS> alter system set db_file_name_convert='+DATA\CDB_DG\DATAFILE\','E:\APP\XIFENFEI\ORADATA\' scope=spfile;
系统已更改。
CDB_CDB$ROOT@SYS> alter system set log_file_name_convert='+DATA\CDB_DG\LOGFILE\','E:\APP\XIFENFEI\ORADATA\' scope=spfile;
系统已更改。

primary and standby tns

STANDBY =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.30.32 )(PORT = 1521))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = cdb)
    )
  )
PRIMARY =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.30.1 )(PORT = 1521))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = cdb)
    )
  )

standby restore controlfile

ASMCMD> cp /tmp/CONTROL01.CTL control01.ctl
copying /tmp/CONTROL01.CTL -> +data/cdb/control01.ctl

standby password file

[oracle@xifenfei dbs]$ pwd
/u01/app/oracle/product/12.1/db_1/dbs
[oracle@xifenfei dbs]$ cp /tmp/rman/PWDcdb.ora  orapwcdb

standby mount

SYS% cdb> create spfile from pfile='initcdb.ora';
File created.
SYS% cdb> startup mount;
ORACLE instance started.
Total System Global Area  521936896 bytes
Fixed Size                  2290264 bytes
Variable Size             314576296 bytes
Database Buffers          197132288 bytes
Redo Buffers                7938048 bytes
Database mounted.
SYS% cdb> select banner from v$version;
BANNER
--------------------------------------------------------------------------------
Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production
PL/SQL Release 12.1.0.1.0 - Production
CORE    12.1.0.1.0      Production
TNS for Linux: Version 12.1.0.1.0 - Production
NLSRTL Version 12.1.0.1.0 - Production
SYS% cdb> select name from v$datafile;
NAME
--------------------------------------------------------------------------------
E:\APP\XIFENFEI\ORADATA\CDB\SYSTEM01.DBF
E:\APP\XIFENFEI\ORADATA\CDB\PDBSEED\SYSTEM01.DBF
E:\APP\XIFENFEI\ORADATA\CDB\SYSAUX01.DBF
E:\APP\XIFENFEI\ORADATA\CDB\PDBSEED\SYSAUX01.DBF
E:\APP\XIFENFEI\ORADATA\CDB\UNDOTBS01.DBF
E:\APP\XIFENFEI\ORADATA\CDB\USERS01.DBF
E:\APP\XIFENFEI\ORADATA\CDB\PDB\SYSTEM01.DBF
E:\APP\XIFENFEI\ORADATA\CDB\PDB\SYSAUX01.DBF
E:\APP\XIFENFEI\ORADATA\CDB\PDB\PDB_USERS01.DBF
9 rows selected.

standby rman restore

--清理控制文件中的备份集垃圾
DELETE noprompt OBSOLETE;
crosscheck backup;
delete noprompt expired backup;
--注册新备份集
catalog start with '/tmp/rman/';
--还原数据文件
run
{
set newname for database to '+data';
restore database;
switch datafile all;
}

standby clear redo

SYS% cdb> select group# from v$log;
    GROUP#
----------
         4
         6
         5
SYS% cdb> alter database clear logfile group 4;
Database altered.
SYS% cdb> alter database clear logfile group 5;
Database altered.
SYS% cdb> alter database clear logfile group 6;
Database altered.

standby add standby redolog

SYS% cdb> ALTER DATABASE ADD STANDBY LOGFILE GROUP 10  size 50M;
Database altered.
SYS% cdb> ALTER DATABASE ADD STANDBY LOGFILE GROUP 11  size 50M;
Database altered.
SYS% cdb> ALTER DATABASE ADD STANDBY LOGFILE GROUP 12  size 50M;
Database altered.
SYS% cdb> ALTER DATABASE ADD STANDBY LOGFILE GROUP 13  size 50M;
Database altered.

primary add standby redolog

CDB_CDB$ROOT@SYS> ALTER DATABASE ADD STANDBY LOGFILE GROUP 10  'E:\APP\XIFENFEI\ORADATA\CDB\std_redo10.log' size 50M;
数据库已更改。
CDB_CDB$ROOT@SYS> ALTER DATABASE ADD STANDBY LOGFILE GROUP 11  'E:\APP\XIFENFEI\ORADATA\CDB\std_redo11.log' size 50M;
数据库已更改。
CDB_CDB$ROOT@SYS> ALTER DATABASE ADD STANDBY LOGFILE GROUP 12  'E:\APP\XIFENFEI\ORADATA\CDB\std_redo12.log' size 50M;
数据库已更改。
CDB_CDB$ROOT@SYS> ALTER DATABASE ADD STANDBY LOGFILE GROUP 13  'E:\APP\XIFENFEI\ORADATA\CDB\std_redo13.log' size 50M;
数据库已更改。

standby readonly

SYS% cdb> ALTER DATABASE OPEN READ ONLY;
Database altered.

standby start mrp

SYS% cdb> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE USING CURRENT LOGFILE DISCONNECT FROM SESSION;
Database altered.

DATAGURAD 是否正常检查

--primary
CDB_CDB$ROOT@SYS> archive log list;
数据库日志模式            存档模式
自动存档             启用
存档终点            USE_DB_RECOVERY_FILE_DEST
最早的联机日志序列     374
下一个存档日志序列   376
当前日志序列           376
--standby
Mon Aug 12 13:56:51 2013
All non-current ORLs have been archived.
Mon Aug 12 13:56:53 2013
Media Recovery Log /u02/app/oracle/archivelog/1_370_820595806.dbf
Mon Aug 12 13:56:57 2013
Media Recovery Log /u02/app/oracle/archivelog/1_371_820595806.dbf
Mon Aug 12 13:57:02 2013
Media Recovery Log /u02/app/oracle/archivelog/1_372_820595806.dbf
Mon Aug 12 13:57:04 2013
Media Recovery Log /u02/app/oracle/archivelog/1_373_820595806.dbf
Mon Aug 12 13:57:05 2013
Media Recovery Log /u02/app/oracle/archivelog/1_374_820595806.dbf
Media Recovery Waiting for thread 1 sequence 375
Mon Aug 12 13:57:19 2013
Primary database is in MAXIMUM PERFORMANCE mode
RFS[2]: Assigned to RFS process (PID:26114)
RFS[2]: No standby redo logfiles created for thread 1
RFS[2]: Opened log for thread 1 sequence 376 dbid 1937199326 branch 820595806
Mon Aug 12 13:57:19 2013
RFS[3]: Assigned to RFS process (PID:26118)
RFS[3]: Opened log for thread 1 sequence 375 dbid 1937199326 branch 820595806
Mon Aug 12 13:57:19 2013
Archived Log entry 16 added for thread 1 sequence 375 rlc 820595806 ID 0x7377d8de dest 2:
Mon Aug 12 13:57:22 2013
Media Recovery Log /u02/app/oracle/archivelog/1_375_820595806.dbf
Media Recovery Waiting for thread 1 sequence 376 (in transit)