drdb脑裂分析解决

1、脑裂原因
1.1)drdb两边的磁盘数据不一致,并且不知道自动恢复,举例说明产生该现象原因:
a是从节点、b是主节点
1.1.1)a节点磁盘不能正常写入数据(磁盘、主机、网络等原因)
1.1.2)a、b两个节点角色发生改变(a为主节点,b磁盘不可用,两边的数据未一致),a中对磁盘进行了操作
1.1.3)此时b磁盘恢复可用,但是因为a、b磁盘不一致,出现脑裂
1.2)官方描述
DRBD detects split brain at the time connectivity becomes available again and the peer nodes exchange the initial DRBD protocol handshake. If DRBD detects that both nodes are (or were at some point, while disconnected) in the primary role, it immediately tears down the replication connection. The tell-tale sign of this is a message like the following appearing in the system log:
Split-Brain detected, dropping connection!
2、出现脑裂的现象
2.1)开机界面

2.2)进入系统查看drdb状态
2.2.1)备节点
[root@node1 ~]# cat /proc/drbd
version: 8.3.8 (api:88/proto:86-94)
GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build by mockbuild@builder10.centos.org, 2010-06-04 08:13:40
0: cs:StandAlone ro:Secondary/Unknown ds:UpToDate/DUnknown r—-
ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:620
2.2.2)主节点
[root@node1 ~]# service drbd status
drbd driver loaded OK; device status:
version: 8.3.8 (api:88/proto:86-94)
GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build by mockbuild@builder10.centos.org, 2010-06-04 08:13:40
m:res cs ro ds p mounted fstype
0:r0 StandAlone Secondary/Unknown UpToDate/DUnknown r—-
3、处理脑裂
需要选择一个节点为主节点(本实验选择node2为主节点)
3.1)从节点上
[root@node1 ~]# drbdadm secondary r0
[root@node1 ~]# drbdadm — –discard-my-data connect r0
[root@node1 ~]# cat /proc/drbd
version: 8.3.8 (api:88/proto:86-94)
GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build by mockbuild@builder10.centos.org, 2010-06-04 08:13:40
0: cs:WFConnection ro:Secondary/Unknown ds:UpToDate/DUnknown C r—-
ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:620
3.2)主节点,通过cat /proc/drbd查看状态,如果不是WFConnection状态,需要再手动连接
[root@node2 ~]# drbdadm connect r0
4、再次查看drdb状态
4.1)查看从节点
[root@node1 ~]# cat /proc/drbd
version: 8.3.8 (api:88/proto:86-94)
GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build by mockbuild@builder10.centos.org, 2010-06-04 08:13:40
0: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r—-
ns:0 nr:1156 dw:1156 dr:0 al:0 bm:16 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0
4.2)查看主节点
[root@node2 ~]# service drbd status
drbd driver loaded OK; device status:
version: 8.3.8 (api:88/proto:86-94)
GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build by mockbuild@builder10.centos.org, 2010-06-04 08:13:40
m:res cs ro ds p mounted fstype
0:r0 Connected Primary/Secondary UpToDate/UpToDate C /opt/mysql ext3
通过查看drbd状态发现,脑裂问题解决,node1中的数据和node2中的相同

heartbeat+drbd+mysql高可用配置

一、heartbeat
1、安装heartbeat
Heartbeat安装及简单配置
2、配置参数
2.1)/etc/ha.d/ha.cf
logfile /var/log/ha-log
keepalive 2
deadtime 30
warntime 10
initdead 120
udpport 694
ucast eth1 10.10.10.2
auto_failback off
watchdog /dev/watchdog
node node1
node node2
ping 192.168.1.1
respawn hacluster /usr/lib/heartbeat/ipfail
2.2)/etc/ha.d/haresources
node1 IPaddr::192.168.1.100/24/eth0/ drbddisk Filesystem::/dev/drbd0::/opt/mysql::ext3 mysql
2.3)/etc/ha.d/authkeys
auth 1
1 crc
二、DRBD
1、DRBD安装
DRBD安装配置说明
2、/etc/drbd.conf

global {
usage-count yes;
}
common {
  syncer { rate 10M; }
}
resource r0 {
protocol C;
disk {
    on-io-error   detach;
  }
net {
after-sb-0pri disconnect;
after-sb-1pri disconnect;
after-sb-2pri disconnect;
rr-conflict   disconnect;
}
on node1 {
    device     /dev/drbd0;
    disk       /dev/sdb1;
    address    10.10.10.1:7788;
    meta-disk  internal;
  }
on node2 {
    device    /dev/drbd0;
    disk      /dev/sdb1;
    address   10.10.10.2:7788;
    meta-disk internal;
  }
}

三、mysql
1、mysql安装
mysql 5.5二进制文件安装
四、服务配置
1、配置mysql服务器(不要设置开机启动,由heartbeat控制)
修改mysql.server文件
basedir=/opt/mysql/product/5.5
datadir=/opt/mysql/mysqldata
mv /opt/mysql/product/5.5/support-files/mysql.server /etc/init.d/mysql
2、配置heartbeat
chkconfig –add heartbeat
chkconfig heartbeat on

mysql 5.5二进制文件安装

1、创建相关目录用户
su – root
groupadd mysql
useradd -g mysql -p xifenfei -s /bin/bash -m mysql -d /opt/mysql
MYSQL_BASE=/opt/mysql
mkdir -p $MYSQL_BASE/product/5.5
mkdir -p $MYSQL_BASE/mysqldata
mkdir -p $MYSQL_BASE/mysqllog
mkdir -p /var/run/mysqld/
chmod -R 777 $MYSQL_BASE
chown -R mysql:mysql $MYSQL_BASE
chown -R mysql:mysql /var/run/mysqld
3、环境变量配置
export MYSQL_BASE=/opt/mysql
export basedir=$MYSQL_BASE/product/5.5
export datadir=$MYSQL_BASE/mysqldata
export LD_LIBRARY_PATH=$basedir/lib:/lib:/usr/lib:/usr/local/lib
export TMPDIR=/tmp
export PATH=${PATH}:/usr/bin:/bin:/usr/bin/X11:/usr/local/bin
export PATH=$basedir:$basedir/bin:${PATH}:$LD_LIBRARY_PATH
4、调整mysql 用户系统限制(可选)
编辑文件:/etc/security/limits.conf 加入以下语句:
mysql soft nproc 2047
mysql hard nproc 16384
mysql soft nofile 1024
mysql hard nofile 65536
确认如下语句是否存在于/etc/pam.d/login,如果不存在请增加:
session required pam_limits.so
如果用户SHELL 用的是Bourne, Bash, 或者Korn shell 修改/etc/profile,并增加如
下内容。(在SuSe 操作系统上,则需要修改/etc/profile.local):
if [ $USER = “mysql” ]; then
if [ $SHELL = “/bin/ksh” ]; then
ulimit -p 16384
ulimit -n 65536
else
ulimit -u 16384 -n 65536
fi
fi
5、安装数据库
5.1)解压数据库至/opt/mysql/product/5.5中
5.2)安装数据
[mysql@localhost ~]$ cd product/5.5/scripts/
[mysql@localhost scripts]$ ./mysql_install_db –basedir=$basedir –datadir=$datadir –user=mysql
5.3)设置my.cnf参数
vi /etc/my.cnf

[mysqladmin]
socket =/var/run/mysqld/mysqld.sock
[client]
port = 3306
socket =/var/run/mysqld/mysqld.sock
[mysqld]
port = 3306
socket = /var/run/mysqld/mysqld.sock
user = mysql
pid-file = /var/run/mysqld/mysqld.pid
basedir = /opt/mysql/product/5.5
datadir = /opt/mysql/mysqldata
skip-external-locking = 1
interactive_timeout = 1200
wait_timeout = 1200
character-set-server = utf8
back_log = 500
default_time_zone = '+08:00'
max_connections = 1000
max_connect_errors = 1000
connect_timeout = 300
table_open_cache = 100
max_allowed_packet = 16M
binlog_cache_size = 8M
max_heap_table_size = 64M
sort_buffer_size = 8M
join_buffer_size = 8M
thread_cache_size = 32
thread_concurrency = 16
query_cache_size = 0M
default-storage-engine = INNODB
thread_stack = 192K
transaction_isolation = READ-COMMITTED
tmp_table_size = 64M
log-bin =/opt/mysql/mysqllog/mysqlbin
log-error=/opt/mysql/mysqllog/mysqld.err
expire_logs_days = 7
binlog_format = ROW
max_binlog_size = 300M
slow_query_log = 1
slow_query_log_file =/opt/mysql/mysqllog/mysqld-slow
long_query_time = 10
tmpdir = /tmp
server-id = 1
key_buffer_size = 8M
read_buffer_size = 1M
read_rnd_buffer_size = 1M
bulk_insert_buffer_size = 1M
myisam_sort_buffer_size = 128K
innodb_additional_mem_pool_size = 16M
innodb_buffer_pool_size = 2G
innodb_data_file_path = ibdata1:1024M:autoextend
innodb_autoextend_increment = 64
innodb_file_per_table = 1
innodb_data_home_dir = /opt/mysql/mysqldata
innodb_file_io_threads = 4
innodb_thread_concurrency = 16
innodb_flush_log_at_trx_commit = 2
innodb_log_buffer_size = 8M
innodb_log_file_size = 512M
innodb_log_files_in_group = 3
innodb_max_dirty_pages_pct = 40
innodb_lock_wait_timeout = 120
innodb_locks_unsafe_for_binlog = 1
innodb_autoinc_lock_mode = 2
skip-name-resolve
lower_case_table_names=1
[mysqldump]
quick
max_allowed_packet = 16M
[mysql]
auto-rehash
[myisamchk]
key_buffer_size = 512M
sort_buffer_size = 512M
read_buffer = 8M
write_buffer = 8M
[mysqlhotcopy]
interactive-timeout
[mysqld_safe]
open-files-limit = 8192

5.4)异常处理
5.4.1)直接通过mysqld_safe不能正常启动数据库
ln -s /opt/mysql/product/5.5 /usr/local/mysql
或者
ln -s /opt/mysql/product/5.5/bin/mysqld /usr/local/mysql/bin/mysqld
5.4.2)mysql_secure_installation不能正常执行
1)
执行:/opt/mysql/product/5.5/bin/mysql_secure_installation出现以下错误,解决方法(ln -s /var/run/mysqld/mysqld.sock /tmp/mysql.sock)
Enter current password for root (enter for none):
ERROR 2002 (HY000): Can’t connect to local MySQL server through socket ‘/tmp/mysql.sock’ (2)
2)
修改mysql_secure_installation 脚本的do_query 函数(推荐处理方法)
do_query() {
echo “$1″ >$command
#sed ‘s,^,> ,’ < $command # Debugging # mysql --defaults-file=$config <$command ----此行修改如下: mysql --defaults-extra-file=$config <$command return $? } 6、添加开机启动服务
修改mysql.server文件
basedir=/opt/mysql/product/5.5
datadir=/opt/mysql/mysqldata
[root@localhost ~]# cp /opt/mysql/product/5.5/support-files/mysql.server /etc/init.d/mysqld
[root@localhost ~]# chkconfig –add mysqld
[root@localhost ~]# chkconfig mysqld on
[root@localhost ~]# chkconfig –list mysqld
mysqld 0:off 1:off 2:on 3:on 4:on 5:on 6:off

oracle 插入含&字符串

1、创建表
SQL> create table t(id number,name varchar2(20));
表已创建。
2、常规方式插入
SQL> insert into t values(1,’a&b’);
输入 b 的值: a&b
原值 1: insert into t values(1,’a&b’)
新值 1: insert into t values(1,’aa&b’)
已创建 1 行。
3、chr(38)插入
SQL> insert into t values(2,’c’||chr(38)||’d’);
已创建 1 行。
4、set define
SQL> set define `
SQL> insert into t values(3,’e&f’);
已创建 1 行。
5、查看结果
SQL> select * from t;
ID NAME
———- ——————–
1 aa&b
2 c&d
3 e&f
6、说明
第一种方法不能得到正常的,我想要的结果,而且第一种方法在应用程序中也比较难实现
第二种比较适合应用程序实现
第三种比较适合sqlplus进行批量处理数据

DRBD安装配置说明

一、安装前说明
DRBD(Distributed Replicated Block Device),分布式复制块设备,是一种通过TCP/IP网络实现块设备数据实时镜像的方案。利用这种方案,单一主节点模式(single primary mode)双机系统能够实时地将业务数据保存在主备节点的磁盘中,正常情况下两个节点的数据是一模一样的。
根据官方的说明文档,如果系统内核(linux)版本低于2.6.33,在安装软件之前需要加载DRBD模块,如果高于(或等于)2.6.33,则只安装客户端软件。
如果没有安装DRBD模块,会在启动drbd时出现如下错误:
[root@node1 tmp]# /etc/init.d/drbd start
Starting DRBD resources: Can not load the drbd module.
其实就是安装一个kmod-drbd包即可
二、安装前环境设定
主机名        IP地址       DRBD使用磁盘
node1     10.10.10.1    /dev/sdb1
node2     10.10.10.2    /dev/sdb1
版本信息
[root@node1 /]# uname -a
Linux node1 2.6.9-89.0.0.0.1.ELsmp #1 SMP Tue May 19 04:23:49 EDT 2009 i686 i686 i386 GNU/Linux
[root@node2 test]# uname -a
Linux node2 2.6.9-89.0.0.0.1.ELsmp #1 SMP Tue May 19 04:23:49 EDT 2009 i686 i686 i386 GNU/Linux
挂载点:/drbd
三、安装步骤(两个节点相同)
1、安装kmod-drbd包
[root@node1 ~]# rpm -ivh /tmp/kmod-drbd83-smp-8.3.8-1.el4_8.i686.rpm
警告:/tmp/kmod-drbd83-smp-8.3.8-1.el4_8.i686.rpm: V3 DSA 簽章:NOKEY, key ID 443e1821
準備中… ########################################### [100%]
1:kmod-drbd83-smp ########################################### [100%]
2、安装drbd包
[root@node1 ~]# rpm -ivh /tmp/drbd83-8.3.8-1.el4_8.i386.rpm
警告:/tmp/drbd83-8.3.8-1.el4_8.i386.rpm: V3 DSA 簽章:NOKEY, key ID 443e1821
準備中… ########################################### [100%]
1:drbd83 ########################################### [100%]
四、fdisk磁盘(两个节点均要)
[root@node1 ~]# fdisk /dev/sdb
Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel
Building a new DOS disklabel. Changes will remain in memory only,
until you decide to write them. After that, of course, the previous
content won’t be recoverable.
The number of cylinders for this disk is set to 2610.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
(e.g., DOS FDISK, OS/2 FDISK)
Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite)
Command (m for help): n
Command action
e extended
p primary partition (1-4)
p
Partition number (1-4): 1
First cylinder (1-2610, default 1):
Using default value 1
Last cylinder or +size or +sizeM or +sizeK (1-2610, default 2610): 200
Command (m for help): w
The partition table has been altered!
Calling ioctl() to re-read partition table.
Syncing disks.
五、/etc/drbd.conf配置(两个节点一样)

global {
usage-count yes;
}
common {
  syncer { rate 10M; }
}
resource r0 {
protocol C;
disk {
    on-io-error   detach;
  }
net {
after-sb-0pri disconnect;
after-sb-1pri disconnect;
after-sb-2pri disconnect;
rr-conflict   disconnect;
}
on node1 {
    device     /dev/drbd0;
    disk       /dev/sdb1;
    address    10.10.10.1:7788;
    meta-disk  internal;
  }
on node2 {
    device    /dev/drbd0;
    disk      /dev/sdb1;
    address   10.10.10.2:7788;
    meta-disk internal;
  }
}

六、创建DRBD资源
[root@node1 tmp]# drbdadm create-md r0 //r0为配置文件中定义的资源名
[root@node2 tmp]# drbdadm create-md r0
七、DRBD的启动、关闭以及测试
1、启动DRBD
[root@node1 ~]# /etc/init.d/drbd start
[root@node2 ~]# /etc/init.d/drbd start
2、查看状态(两种方法均可)
[root@node1 tmp]# cat /proc/drbd
version: 8.3.8 (api:88/proto:86-94)
GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build by mockbuild@builder10.centos.org, 2010-06-04 08:13:40
0: cs:Connected ro:Secondary/Secondary ds:Inconsistent/Inconsistent C r—-
ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:1606380
[root@node2 ~]# /etc/init.d/drbd status
drbd driver loaded OK; device status:
version: 8.3.8 (api:88/proto:86-94)
GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build by mockbuild@builder10.centos.org, 2010-06-04 08:13:40
m:res cs ro ds p mounted fstype
0:r0 Connected Secondary/Secondary Inconsistent/Inconsistent C
3、设置主机
[root@node1 tmp]# drbdsetup /dev/drbd0 primary -o
4、再查看状态
[root@node1 tmp]# cat /proc/drbd
version: 8.3.8 (api:88/proto:86-94)
GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build by mockbuild@builder10.centos.org, 2010-06-04 08:13:40
0: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent C r—-
ns:15520 nr:0 dw:0 dr:15520 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:1590860
[>………………..] sync’ed: 1.3% (1590860/1606380)K delay_probe: 0
finish: 0:03:23 speed: 7,760 (7,760) K/sec
[root@node2 ~]# /etc/init.d/drbd status //表示数据初始化已经完成
drbd driver loaded OK; device status:
version: 8.3.8 (api:88/proto:86-94)
GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build by mockbuild@builder10.centos.org, 2010-06-04 08:13:40
m:res cs ro ds p mounted fstype
0:r0 Connected Secondary/Primary UpToDate/UpToDate C
5、主节点磁盘操作
[root@node1 tmp]# mkfs.ext3 /dev/drbd0
mke2fs 1.35 (28-Feb-2004)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
200928 inodes, 401595 blocks
20079 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=415236096
13 block groups
32768 blocks per group, 32768 fragments per group
15456 inodes per group
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912
Writing inode tables: done
Creating journal (8192 blocks): done
Writing superblocks and filesystem accounting information: done
This filesystem will be automatically checked every 29 mounts or
180 days, whichever comes first. Use tune2fs -c or -i to override.
[root@node1 tmp]# mkdir /drbd
[root@node1 tmp]# mount /dev/drbd0 /drbd
[root@node1 tmp]# df -h
Filesystem 容量 已用 可用 已用% 挂载点
/dev/sda2 18G 9.1G 7.7G 55% /
/dev/sda1 99M 13M 82M 14% /boot
none 395M 0 395M 0% /dev/shm
/dev/drbd0 1.6G 35M 1.4G 3% /drbd
[root@node1 tmp]# cd /drbd/
[root@node1 drbd]# ll
总用量 16
drwx—— 2 root root 16384 7月 14 15:01 lost+found
[root@node1 drbd]# mkdir test
[root@node1 drbd]# mkdir test
[root@node1 drbd]# cd test
[root@node1 test]# vi xifenfei.cf
drbd 测试 //vi 写入xifenfei.cf中数据
6、启用备机磁盘
[root@node1 /]# umount /drbd/
[root@node1 /]# drbdsetup 0 secondary
[root@node2 ~]# drbdadm primary r0
[root@node2 ~]# /etc/init.d/drbd status
drbd driver loaded OK; device status:
version: 8.3.8 (api:88/proto:86-94)
GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build by mockbuild@builder10.centos.org, 2010-06-04 08:13:40
m:res cs ro ds p mounted fstype
0:r0 Connected Primary/Secondary UpToDate/UpToDate C
[root@node2 ~]# mkdir /drbd
[root@node2 ~]# mount /dev/drbd0 /drbd
[root@node2 ~]# df -h
Filesystem 容量 已用 可用 已用% 挂载点
/dev/sda2 18G 9.1G 7.7G 55% /
/dev/sda1 99M 13M 82M 14% /boot
none 395M 0 395M 0% /dev/shm
/dev/drbd0 1.6G 35M 1.4G 3% /drbd
[root@node2 ~]# cd /drbd
[root@node2 drbd]# ll
总用量 20
drwx—— 2 root root 16384 7月 14 15:01 lost+found
drwxr-xr-x 2 root root 4096 7月 14 15:05 test
[root@node2 drbd]# cd test/
[root@node2 test]# ll
总用量 4
-rw-r–r– 1 root root 10 7月 14 15:05 chengfei.cf
[root@node2 test]# more xifenfei.cf
drbd 测试 //主节点中 vi 写入xifenfei.cf中数据

heartbeat中ha.cf参数说明

#debugfile /var/log/ha-debug
说明:调试日志文件文件,取默认值
logfile /var/log/ha-log
说明:系统运行日志文件,取默认值
#logfacility local0
说明:用于syslog()/logger的设备
keepalive 2
说明:心跳频率,自己设定。1:表示1秒;200ms:表示200毫秒
deadtime 30
说明:节点死亡时间阀值,就是从节点在过了10后还没有收到心跳就认为主节点死亡,自己设定
warntime 10
说明:发出警告时间,自己设定
udpport 694
说明:心跳信息传递的udp端口,自己设定
#bcast eth0 # Linux
说明:采用udp广播播来通知心跳,建议在副节点不只一台时使用
ucast eth0 172.30.31.68
说明:采用网卡eth0的udp单播来通知心跳,eth0的IP
#mcast eth0 225.0.0.1 694 1 0
说明:采用udp多播播来通知心跳,建议在副节点不只一台时使用
auto_failback off
说明:主节点重启成功后,资源是自动拿回到主节点还是等到副节点down调后拿回资源
node heartbeat1
说明:主节点名称,与uname –n保持一致。排在第一的默认为主节点,所以不要搞措顺序
node heartbeat2
说明:副节点名称,与uname –n保持一致
watchdog /dev/watchdog
说明:看门狗。如果本节点在超过一分钟后还没有发出心跳,那么本节点自动重启
以上这些是我个人认为必配项,下面这些是可选项
stonith baytech /etc/ha.d/conf/stonith.baytech
说明:主/副等所有节点的一种校验。
respawn userid /path/name/to/run
说明:和heartbeat必须一起启动的本地服务
ping 10.10.10.254
说明:伪节点IP,伪节点就是其失效时主/副节点不会正常工作但本身不是主/副节点之一。
respawn hacluster /usr/lib/heartbeat/ipfail
说明:与ping选项一起使用,取默认值。
baud 19200
说明:串口波特率,与serial一起使用。
serial /dev/ttyS0 # Linux
说明:采用串口来传递心跳信息。

Heartbeat安装及简单配置

1、创建用户和组
[root@node1 ~]# groupadd -g 694 haclient
[root@node1 ~]# useradd -u 694 -g haclient hacluster

2、Heartbeat安装
1)安装libnet
[root@node1 software]# pwd
/tmp/software
[root@node1 software]# ll
总计 4200
-rw-r–r– 1 root root 3267773 08-16 18:51 heartbeat-2.0.8.tar.gz
-rw-r–r– 1 root root 1021236 08-16 18:51 libnet-1.1.2.1.tar.gz
[root@node1 software]# tar xf libnet-1.1.2.1.tar.gz
[root@node1 sofeware]# cd libnet
[root@node1 sofeware]#./configure
[root@node1 sofeware]#make
[root@node1 sofeware]#make install
2)安装heartbeat
[root@node1 software]# tar xf heartbeat-2.0.8.tar.gz
[root@node1 sofeware]# cd heartbeat-2.0.8
[root@node1 heartbeat-2.0.8]#./ConfigureMe configure –disable-swig  –disable-snmp-subagent
[root@node1 heartbeat-2.0.8]#make
[root@node1 heartbeat-2.0.8]#make install
[root@node1 heartbeat-2.0.8]# cp doc/ha.cf doc/authkeys doc/haresources /etc/ha.d/
#Heartbeat的主要配置文件有ha.cf,authkeys和haresources,在Heartbeat安装后,默认并没有这3个文件,可以从官网上下载,也可以从解压出来的源码目录中找到,所以我们这里直接在源码目录中拷贝即可。
3、修改参数
1)ha.cf
#logfacility     local0                 #可注释掉此选项,开启下面的日志路径;
logfile        /var/log/ha-log          #设置heartbeat日志存放位置;
keepalive 2 #设定心跳(监测)时间时间为2秒;
warntime 5 #连续多长时间联系不上后开始警告提示;
deadtime 20 #连续多长时间联系不上后认为对方挂掉了(单位是妙);
initdead 120 #这里主要是给重启后预留的一段忽略时间段(比如:重启后启动网络等,如果在网络还没有通,keepalive检测肯定通不过,但这时候并不能切换),此值至少为deadtime的两倍;
udpport 694 #设置广播通信的端口,默认为694;
baud 19200        #设置串行通讯的波特率;
bcast eth1 #指明心跳使用以太网的广播方式,并且在eth1口上进行广播;
ucast   eth1 10.10.10.2  #单播(广播,单播选择其一)
auto_failback off #恢复正常后是否需要再自动切换回来,此处off说明恢复后不需要切换;
node node1        #主节点主机名,可以通过“uname -n”查看;
node node2     #备用节点主机名;
ping 192.168.0.254 #测试网络连通性,此处自定义,一般设为网关地址,但要保证是通的;
respawn hacluster /usr/lib/heartbeat/ipfail #可选,列出和heartbeat一起启动和关闭的进程;
2)Haresources
node1 IPaddr::192.168.1.100/24/eth0/ Filesystem::/dev/sdc1::/shared::ext3 cups
node-name resource1 resource2 … resourceN
其中node-name即为集群中某一节点的名称,必须与uname –n相同,
后面的资源组resource1 resource2 …resourceN中每一个资源都是一个shell脚本,它们的搜索路径为/etc/init.d/和/usr/local/etc/ha.d/resource.d(该路径根据你所安装heartbeat的路径有所不同),heartbeat为我们提供了一个非常好的资源扩展框架,如果我们需要控制一种自己的资源,只需要实现一个支持start和stop参数的shell脚本就可以了,目前heartbeat所支持的资源脚本可以在我提供的上述路径中去查看。
1) 资源组的第一列是我们在ha.cf配置文件中的node之一,而且应该是当前准备作为primary节点的那一个node;
2)每一行代表一个资源组,如果一行写不下可以用” “换行;
3)资源组启动顺序是从左往右,关闭的顺序是从右往左;
4)脚本的参数通过::来分隔和传递;
5)一个资源组里面不同资源之间以空格分隔;
6)不同的资源组之间没有必然关系;
7)每个资源都是一个角本,可以是在/etc/init.d目录下面的,也可以是/usr/local/etc/ha.d/resource.d目录下面的角本。这些角本必须要支持xxx start;xxx stop;模式;

3)Authkeys
[root@node1 ~]# vim /etc/ha.d/authkeys
auth 1
1 crc
[root@node1 ~]#chmod 600 /etc/ha.d/authkeys
我们如果要采用sha1算法,只需要将authkeys中的auth 指令(去掉注释符)改为2,而对应的2 sha1行则需要去掉注释符(#),后面的密钥自己改变(两节点上必须相同)

ORA-01578坏块解决(2)

ORA-01578坏块解决(1)续集
如果在坏块之前,有rman备份,可以使用rman的备份来进行恢复,确保数据不会被丢失
1、使用rman进行恢复
[oracle@ECP-UC-DB1 ~]$ $ORACLE_HOME/bin/rman target /
Recovery Manager: Release 10.2.0.4.0 – Production on Sun Aug 14 22:21:13 2011
Copyright (c) 1982, 2007, Oracle. All rights reserved.
connected to target database: TEST (DBID=2056006906)
RMAN> blockrecover datafile 6 block 1477;
Starting blockrecover at 2011-08-14 22:21:16
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=145 devtype=DISK
channel ORA_DISK_1: restoring block(s)
channel ORA_DISK_1: specifying block(s) to restore from backup set
restoring blocks of datafile 00006
channel ORA_DISK_1: reading from backup piece /tmp/0fmk0ii5_1_1
channel ORA_DISK_1: restored block(s) from backup piece 1
piece handle=/tmp/0fmk0ii5_1_1 tag=TAG20110814T213357
channel ORA_DISK_1: block restore complete, elapsed time: 00:00:02
starting media recovery
media recovery complete, elapsed time: 00:00:03
Finished blockrecover at 2011-08-14 22:21:23
2、检查坏块是否被恢复
RMAN> backup check logical validate datafile 6;
Starting backup at 2011-08-14 22:22:11
using channel ORA_DISK_1
channel ORA_DISK_1: starting full datafile backupset
channel ORA_DISK_1: specifying datafile(s) in backupset
input datafile fno=00006 name=/opt/oracle/oradata/test/xifenfei01.dbf
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01
Finished backup at 2011-08-14 22:22:12
RMAN> exit
Recovery Manager complete.
[oracle@ECP-UC-DB1 ~]$ sqlplus / as sysdba
SQL*Plus: Release 10.2.0.4.0 – Production on Sun Aug 14 22:22:17 2011
Copyright (c) 1982, 2007, Oracle. All Rights Reserved.
Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 – 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
SQL> select file#,block#,blocks from v$database_block_corruption;
no rows selected
SQL> exit
Disconnected from Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 – 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
[oracle@ECP-UC-DB1 ~]$ dbv file =/opt/oracle/oradata/test/xifenfei01.dbf
DBVERIFY: Release 10.2.0.4.0 – Production on Sun Aug 14 22:22:38 2011
Copyright (c) 1982, 2007, Oracle. All rights reserved.
DBVERIFY – Verification starting : FILE = /opt/oracle/oradata/test/xifenfei01.dbf
DBVERIFY – Verification complete
Total Pages Examined : 2560
Total Pages Processed (Data) : 1372
Total Pages Failing (Data) : 0
Total Pages Processed (Index): 0
Total Pages Failing (Index): 0
Total Pages Processed (Other): 48
Total Pages Processed (Seg) : 0
Total Pages Failing (Seg) : 0
Total Pages Empty : 1140
Total Pages Marked Corrupt : 0
Total Pages Influx : 0
Highest block SCN : 1256690 (0.1256690)
3、验证数据是否正确
[oracle@ECP-UC-DB1 ~]$ sqlplus / as sysdba
SQL*Plus: Release 10.2.0.4.0 – Production on Sun Aug 14 22:34:18 2011
Copyright (c) 1982, 2007, Oracle. All Rights Reserved.
Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 – 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
SQL> select count(*) from t_rep;
COUNT(*)
———-
49857
ORA-01578坏块解决(1)中的模拟环境比较,数据恢复正确,坏块问题解决

ORA-01578坏块解决(1)

一、创建测试表
SQL> create table t_rep as
2  select * from all_objects;
Table created.
SQL> select count(*) from  t_rep;
COUNT(*)
———-
49857
二、使用bbed修改数据块
三、错误现象
1、sqlplus窗口
SQL> select count(*) from  t_rep;
select count(*) from  t_rep
*
ERROR at line 1:
ORA-01578: ORACLE data block corrupted (file # 6, block # 1477)
ORA-01110: data file 6: ‘/opt/oracle/oradata/test/xifenfei01.dbf’
2、alert.log文件中
Sun Aug 14 22:01:14 2011
Hex dump of (file 6, block 1477) in trace file /opt/oracle/admin/test/udump/test_ora_10785.trc
Corrupt block relative dba: 0x018005c5 (file 6, block 1477)
Bad check value found during buffer read
Data in bad block:
type: 6 format: 2 rdba: 0x018005c5
last change scn: 0x0000.001328ef seq: 0x2 flg: 0x04
spare1: 0x0 spare2: 0x0 spare3: 0x0
consistency value in tail: 0x28ef0602
check value in block header: 0x493
computed block checksum: 0x44b9
Reread of rdba: 0x018005c5 (file 6, block 1477) found same corrupted data
Sun Aug 14 22:01:15 2011
Corrupt Block Found
TSN = 6, TSNAME = XFF
RFN = 6, BLK = 1477, RDBA = 25167301
OBJN = 52727, OBJD = 52728, OBJECT = T_REP, SUBOBJECT =
SEGMENT OWNER = SYS, SEGMENT TYPE = Table Segment
四、验证是否真的坏块
1、dbv验证
[oracle@ECP-UC-DB1 ~]$ dbv file =/opt/oracle/oradata/test/xifenfei01.dbf
DBVERIFY: Release 10.2.0.4.0 – Production on Sun Aug 14 22:08:37 2011
Copyright (c) 1982, 2007, Oracle.  All rights reserved.
DBVERIFY – Verification starting : FILE = /opt/oracle/oradata/test/xifenfei01.dbf
Page 1477 is marked corrupt
Corrupt block relative dba: 0x018005c5 (file 6, block 1477)
Bad check value found during dbv:
Data in bad block:
type: 6 format: 2 rdba: 0x018005c5
last change scn: 0x0000.001328ef seq: 0x2 flg: 0x04
spare1: 0x0 spare2: 0x0 spare3: 0x0
consistency value in tail: 0x28ef0602
check value in block header: 0x493
computed block checksum: 0x44b9
DBVERIFY – Verification complete
Total Pages Examined         : 2560
Total Pages Processed (Data) : 1371
Total Pages Failing   (Data) : 0
Total Pages Processed (Index): 0
Total Pages Failing   (Index): 0
Total Pages Processed (Other): 48
Total Pages Processed (Seg)  : 0
Total Pages Failing   (Seg)  : 0
Total Pages Empty            : 1140
Total Pages Marked Corrupt   : 1
Total Pages Influx           : 0
Highest block SCN            : 1256043 (0.1256043)
2、rman验证
RMAN> backup check logical validate datafile 6;
Starting backup at 2011-08-14 22:09:51
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=157 devtype=DISK
channel ORA_DISK_1: starting full datafile backupset
channel ORA_DISK_1: specifying datafile(s) in backupset
input datafile fno=00006 name=/opt/oracle/oradata/test/xifenfei01.dbf
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01
Finished backup at 2011-08-14 22:09:53
RMAN> exit
Recovery Manager complete.
[oracle@ECP-UC-DB1 ~]$ sqlplus / as sysdba
SQL*Plus: Release 10.2.0.4.0 – Production on Sun Aug 14 22:10:00 2011
Copyright (c) 1982, 2007, Oracle.  All Rights Reserved.
Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 – 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
SQL>  select file#,block#,blocks from v$database_block_corruption;SQL> exec dbms_repair.skip_corrupt_blocks(‘SYS’,’T_REP’);
FILE#     BLOCK#     BLOCKS
———- ———- ———-
6       1477          1
五、跳过坏块读取其他数据
SQL> exec dbms_repair.skip_corrupt_blocks(‘SYS’,’T_REP’);
PL/SQL procedure successfully completed.
SQL> select skip_corrupt from dba_tables where table_name=’T_REP’;
SKIP_COR
——–
ENABLED
SQL> select count(*) from t_rep;
COUNT(*)
———-
49794
说明:数据发生丢失6号文件的1477块中的数据丢失

物化视图on prebuilt table

1、ORA-12059
执行语句:
CREATE MATERIALIZED VIEW mv_t2
on prebuilt TABLE
AS
SELECT * FROM SCOTT.emp
错误提示:
ORA-12059: prebuilt table “CHF”.”MV_T2″ does not exist
错误原因:
物化视图对应的表不存在(物化视图需要和表同名,结构相同)
解决方法:
CREATE TABLE mv_t2 AS
SELECT * FROM scott.emp WHERE 1=0;
2、ORA-23413
执行语句:
CREATE MATERIALIZED VIEW mv_t2
on prebuilt TABLE
WITH REDUCED PRECISION
refresh FAST on demand
AS
SELECT * FROM SCOTT.emp
错误提示:
ORA-23413: table “SCOTT”.”EMP” does not have a materialized view log
错误原因:
使用fast模式刷新物化视图,需要有物化视图日志
解决方法:
1)创建物化视图日志
CREATE MATERIALIZED VIEW LOG ON scott.emp ;
2)刷新模式改为force,其实实质是采用了complete刷新模式
3、ORA-12058
执行语句(mv log是基于rowid,表无主键)
CREATE MATERIALIZED VIEW mv_t2
on prebuilt TABLE
WITH REDUCED PRECISION
refresh FAST on DEMAND
WITH ROWID
AS
SELECT * FROM SCOTT.emp
错误提示:
ORA-12058: materialized view cannot use prebuilt table
错误原因:
fast刷新模式不能基于rowid进行
解决方法:
1)删除mv log,表添加主键,采用基于主键模式重新建mv log和物化视图
DROP MATERIALIZED VIEW mv_t1;
ALTER TABLE mv_t2 ADD PRIMARY KEY (EMPNO);
CREATE MATERIALIZED VIEW LOG ON scott.emp;
CREATE MATERIALIZED VIEW mv_t2
on prebuilt TABLE
WITH REDUCED PRECISION
refresh FAST on DEMAND
–WITH ROWID
AS
SELECT * FROM SCOTT.emp;
2)刷新模式改为force,其实实质是采用了complete刷新模式
4、总结
1)在执行创建物化视图之前,需要先创建同名、同结构表
2)如果使用fast模式刷新(疑惑在线重定义可以通过rowid实现,为什么直接通过物化视图就不可以),物化视图同表需要主键,需要物化视图日志
3)通过drop mv,发现同名表还存在,和on prebuilt table本质区别