联系:手机/微信(+86 17813235971) QQ(107644445)
标题:mount: wrong fs type, bad option, bad superblock恢复Oracle
作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]
有朋友找到我们,说对lv进行收缩操作之后,导致文件系统无法mount,提示超级块损坏.
尝试mount失败
[root@GZGSDB data]# mount /dev/vg_gzgsdb/lv_home /home mount: wrong fs type, bad option, bad superblock on /dev/mapper/vg_gzgsdb-lv_home, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so
系统日志报错
Aug 16 21:23:24 GZGSDB kernel: EXT4-fs (dm-5): ext4_check_descriptors: Block bitmap for group 1 not in group (block 0)! Aug 16 21:23:24 GZGSDB kernel: EXT4-fs (dm-5): group descriptors corrupted!
这里可以看出来,明显操作错误,在没有收缩文件系统之前,直接收缩的lv.
这里可以看出来,在发生故障之后,又做了很多操作,包括fsck和testdisk等,最终无法恢复,请求我们协助处理。
找出来备份超级块
[root@GZGSDB dul]# dumpe2fs /dev/vg_gzgsdb/lv_home |grep superblock dumpe2fs 1.41.12 (17-May-2010) ext2fs_read_bb_inode: A block group is missing an inode table Primary superblock at 0, Group descriptors at 1-52 Backup superblock at 32768, Group descriptors at 32769-32820 Backup superblock at 98304, Group descriptors at 98305-98356 Backup superblock at 163840, Group descriptors at 163841-163892 Backup superblock at 229376, Group descriptors at 229377-229428 Backup superblock at 294912, Group descriptors at 294913-294964 Backup superblock at 819200, Group descriptors at 819201-819252 Backup superblock at 884736, Group descriptors at 884737-884788 Backup superblock at 1605632, Group descriptors at 1605633-1605684 Backup superblock at 2654208, Group descriptors at 2654209-2654260 Backup superblock at 4096000, Group descriptors at 4096001-4096052 Backup superblock at 7962624, Group descriptors at 7962625-7962676 Backup superblock at 11239424, Group descriptors at 11239425-11239476 Backup superblock at 20480000, Group descriptors at 20480001-20480052 Backup superblock at 23887872, Group descriptors at 23887873-23887924 Backup superblock at 71663616, Group descriptors at 71663617-71663668 Backup superblock at 78675968, Group descriptors at 78675969-78676020 Backup superblock at 102400000, Group descriptors at 102400001-102400052 Backup superblock at 214990848, Group descriptors at 214990849-214990900
使用fsck修复
[root@GZGSDB data]# fsck -y /dev/vg_gzgsdb/lv_home fsck from util-linux-ng 2.17.2 e2fsck 1.41.12 (17-May-2010) fsck.ext4: Group descriptors look bad... trying backup blocks... fsck.ext4: The ext2 superblock is corrupt when using the backup blocks fsck.ext4: going back to original superblock fsck.ext4: Device or resource busy while trying to open /dev/mapper/vg_gzgsdb-lv_home Filesystem mounted or opened exclusively by another program? --指定超级块恢复 [root@GZGSDB data]# fsck -y -b 102400000 /dev/vg_gzgsdb/lv_home ………… Illegal block #0 (1296647770) in inode 354315. CLEARED. Illegal block #1 (1398362886) in inode 354315. CLEARED. Illegal block #3 (453538936) in inode 354315. CLEARED. Illegal block #5 (808333361) in inode 354315. CLEARED. Illegal block #6 (775434798) in inode 354315. CLEARED. Illegal block #8 (1180306180) in inode 354315. CLEARED. Illegal block #9 (1413893971) in inode 354315. CLEARED. Illegal block #10 (1229347423) in inode 354315. CLEARED. Illegal block #11 (1498613325) in inode 354315. CLEARED. Illegal indirect block (1296389203) in inode 354315. CLEARED. Inode 354315 is too big. Truncate? yes Block #1074301965 (69632) causes directory to be too big. CLEARED. Warning... fsck.ext4 for device /dev/mapper/vg_gzgsdb-lv_home exited with signal 11. [root@GZGSDB data]# mount /dev/vg_gzgsdb/lv_home /home mount: wrong fs type, bad option, bad superblock on /dev/mapper/vg_gzgsdb-lv_home, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so
至此基本上可以判断,直接修复该文件系统,让其正常mount的概率很小.直接通过lun层面来恢复.
通过工具解析lun
比较明显,客户经过一系列操作现在的现象是lv 1.03T,文件系统只有821G,明显和客户给我反馈的操作不符(应该是lv 821G),证明客户做了大量操作,已经导致文件系统损坏
由于文件系统严重异常,工具获取到的文件都是没有名称,直接从inode里面读取的数据.获取到这些数据之后,然后结合oracle的特性,判断出来对应的文件号关系(这里有大量文件重复);另外有个别文件通过inode恢复丢失,通过底层碎片重组进行恢复出来文件,然后恢复出来其中数据(asm disk header 彻底损坏恢复).这个客户比较幸运,system文件在另外一个分区中,不然工作量会大很多.
再次提醒:对lv操作一定要谨慎,特别是lvreduce操作,另外出现发生误操作之后,应该第一时间保护现场,而不是百度着去乱操作,可能导致故障更加悲剧