peterzil Posted October 19, 2020 Share #1 Posted October 19, 2020 Hi all For unknown reason the storage pool was crashed. 2 of 3 disk being with status "Initialized", completely healthy. How I can repair it ? Quote Link to comment Share on other sites More sharing options...
1 flyride Posted October 25, 2020 Share #2 Posted October 25, 2020 Ugh. btrfs stores its superblocks in three different places and we just tried to look for all of them, but the btrfs binary keeps crashing (on #1 and #3, #2 returned unusable data). For the sake of completeness, please post a dmesg to see if there is any kernel-related log information about the last crash. Because of the btrfs crashes, we have not positively proven that all three superblocks are inaccessible. Really the only thing left to try now is install a new Linux system, connect the drives to it, and see if a new Linux kernel and latest btrfs utilities would be able to read anything useful without core dumping. I suppose you could also try and reinstall DSM (maybe using the DS918 platform since it has the newest kernel) and see if that makes a difference, but I don't hold out much hope for that. Barring that result, whatever happened to your drives has caused them to return data that are so corrupted that there is probably no filesystem recovery possible without forensic tools. However, we haven't written over the filesystem areas of the disk, so forensic recovery should still be possible. And, the new metadata we created for the array will help a forensic lab know the correct order of the disks, should you decide to go in that direction. If you decide to abandon the array and remake it, test the two failed drives very carefully before putting production data on them again, because this could be the result of controller or drive failure (although two drives failing in this way at the same time seems unlikely). We did everything reasonable to recover this data. I'm sorry that the results were not better. Quote Link to comment Share on other sites More sharing options...
0 flyride Posted October 19, 2020 Share #3 Posted October 19, 2020 Log in via ssh and post output of "cat /proc/mdstat" Quote Link to comment Share on other sites More sharing options...
0 peterzil Posted October 19, 2020 Author Share #4 Posted October 19, 2020 admin@XP:/$ cat /proc/mdstat Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [raidF1] md3 : active raid5 sdi3[1] 11711401088 blocks super 1.2 level 5, 64k chunk, algorithm 2 [3/1] [_U_] md4 : active raid1 sda1[0] sdb1[1] 117216192 blocks super 1.2 [2/2] [UU] md2 : active raid1 sdg3[0] 3902196544 blocks super 1.2 [1/1] [U] md1 : active raid1 sdg2[0] sdh2[1] sdi2[2] sdj2[3] 2097088 blocks [12/4] [UUUU________] md0 : active raid1 sdg1[0] sdh1[1] sdi1[2] sdj1[3] 2490176 blocks [12/4] [UUUU________] unused devices: <none> Quote Link to comment Share on other sites More sharing options...
0 flyride Posted October 19, 2020 Share #5 Posted October 19, 2020 Well this does look a little odd. I would expect your /dev/sda and /dev/sdb in md0 and md1. Is this a baremetal install? Post the output of the following commands. You'll need to be root (sudo -i) # mdadm --detail /dev/md3 # mdadm --examine /dev/sd[hij]3 | egrep 'Event|/dev/sd' Quote Link to comment Share on other sites More sharing options...
0 peterzil Posted October 19, 2020 Author Share #6 Posted October 19, 2020 root@XP:~# mdadm --detail /dev/md3 /dev/md3: Version : 1.2 Creation Time : Sat Nov 16 12:10:31 2019 Raid Level : raid5 Array Size : 11711401088 (11168.86 GiB 11992.47 GB) Used Dev Size : 5855700544 (5584.43 GiB 5996.24 GB) Raid Devices : 3 Total Devices : 1 Persistence : Superblock is persistent Update Time : Mon Oct 19 22:13:04 2020 State : clean, FAILED Active Devices : 1 Working Devices : 1 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 64K Name : XPEH:2 UUID : 22a4b5c5:8103a815:1de617b2:3f23ee03 Events : 376 Number Major Minor RaidDevice State - 0 0 0 removed 1 8 131 1 active sync /dev/sdi3 - 0 0 2 removed root@XP:~# mdadm --examine /dev/sd[hij]3 | egrep 'Event|/dev/sd' mdadm: No md superblock detected on /dev/sdh3. mdadm: No md superblock detected on /dev/sdj3. /dev/sdi3: Events : 376 Thank you Quote Link to comment Share on other sites More sharing options...
0 flyride Posted October 19, 2020 Share #7 Posted October 19, 2020 So the two missing RAID5 disks exist but the partition looks badly damaged. Is the partition structure even still present for the RAID5? # fdisk -l /dev/sdh # fdisk -l /dev/sdj Quote Link to comment Share on other sites More sharing options...
0 peterzil Posted October 19, 2020 Author Share #8 Posted October 19, 2020 I think so (like on pic 1) root@XP:~# fdisk -l /dev/sdh Disk /dev/sdh: 5.5 TiB, 6001175126016 bytes, 11721045168 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes Disklabel type: gpt Disk identifier: 987F18E3-DCA2-431A-9174-AADC0F9C53EC Device Start End Sectors Size Type /dev/sdh1 2048 4982527 4980480 2.4G Linux RAID /dev/sdh2 4982528 9176831 4194304 2G Linux RAID /dev/sdh3 9437184 11720840351 11711403168 5.5T Linux RAID root@XP:~# fdisk -l /dev/sdj Disk /dev/sdj: 5.5 TiB, 6001175126016 bytes, 11721045168 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes Disklabel type: gpt Disk identifier: D6A7A561-92BA-4C7C-B9AA-7C6DA547F406 Device Start End Sectors Size Type /dev/sdj1 2048 4982527 4980480 2.4G Linux RAID /dev/sdj2 4982528 9176831 4194304 2G Linux RAID /dev/sdj3 9437184 11720840351 11711403168 5.5T Linux RAID Quote Link to comment Share on other sites More sharing options...
0 flyride Posted October 19, 2020 Share #9 Posted October 19, 2020 Ok, here's where things are. The system knows it has a RAID5 array and sees one valid array member. Something has happened to the other two drives that seems to have overwritten the MD superblocks that identify partition #3 as part of the RAID5 array. There is no graceful way of recovering this. But we can force the system to think that those disks are part of the array. This is a non-reversible operation and may still not result in the recovery of any data. For example, if the filesystem base structures was overwritten, it may be impossible to get the volume to mount. Or, some of your files or directories may be corrupted or missing. Needless to say, I hope you have this data backed up somewhere. So before you decide to do any of that, let's get some answers to the following: 1. What happened before, or to cause this? You said "unknown reasons" but is there any information at all about the circumstances? 2. Is there anything that was deliberately done to get rid of /md1 and /md0 partition members for the SSD's? 3. Anything else you think is relevant should you decide to try to brute-force recover your array? Quote Link to comment Share on other sites More sharing options...
0 peterzil Posted October 20, 2020 Author Share #10 Posted October 20, 2020 What happened before? I think the volume was overflowing. The web access to the storage returned a message like: the system cannot display the page (Synology's error message, not a standard browser error) After reboot the system asked me to install DSM again. After installation I get this situation. No addition actions were performed. I agree that trying to manually add disks to the raid it's only option with some chance to success. I will be grateful if you can tell me how to do this. Thank you Quote Link to comment Share on other sites More sharing options...
0 peterzil Posted October 24, 2020 Author Share #11 Posted October 24, 2020 Do someone have a technical manual for DMS ? I want to try to add the disks in the RAID in Linux configuration files. Thanks Quote Link to comment Share on other sites More sharing options...
0 IG-88 Posted October 24, 2020 Share #12 Posted October 24, 2020 8 hours ago, peterzil said: Do someone have a technical manual for DMS ? DSM uses just the normal mdadm und lvm stuff from linux https://wiki.archlinux.org/index.php/RAID https://www.thomas-krenn.com/en/wiki/Mdadm_recovery_and_resync https://www.thomas-krenn.com/en/wiki/Mdadm_recover_degraded_Array_procedure https://www.thomas-krenn.com/en/wiki/Partition_Alignment_detailed_explanation 8 hours ago, peterzil said: I want to try to add the disks in the RAID in Linux configuration files. if you have never done this before i'd suggest to wait for flyride or at least read one ore two of his threads where he helped people with recovery its most important to know the reason the the failure to prevent it happen again especially when doing things that cant be reverted and not having a backup in real important recovery cases you might have a image file of every disk and can try more then one thing (and it would be running on a approved and tested hardware) i do remember a case where he tried to help someone and the hardware in question did not work reliable - interesting read as long as its not your own data on that disks and you are just on the fence watching (and learning) Quote Link to comment Share on other sites More sharing options...
0 flyride Posted October 24, 2020 Share #13 Posted October 24, 2020 Sorry, I've had a crazy work schedule for the last couple of days, and not been able to get back to this. It isn't rocket science but I will post more detailed instructions when I get back from work in about 8 hours. Quote Link to comment Share on other sites More sharing options...
0 flyride Posted October 24, 2020 Share #14 Posted October 24, 2020 I had a few minutes, so here's a plan: 1. Retrieve the current array to filesystem relationship 2. Stop the array 3. Force (re-)create the array 4. Check the array for proper configuration before doing anything else (or report the exact failure response) Assumptions based on the prior posts: 1. Array members are on disks h (8), i (9), j (10) and the array is ordered in that sequence 2. Data corruption has at least damaged the array superblocks (/dev/md3 RAID5)- but the extent is unknown Comments and caveats: Note that this is an irreversible operation. Any metadata on the disks containing the array state will be overwritten. Files on the disks are not damaged by this operation (so you could, in theory, send for forensic recovery still). It's possible that the create operation will fail without zeroing the array superblocks first. I don't like doing that unless it's absolutely necessary. If corruption is extensive, the array will start but it will not be possible to mount the filesystem (we'll try and check that after the array creates correctly). Feel free to question, research, verify this suggestion prior to executing. At the end of the day it's your data, and your decision to follow the free advice you obtain here. Again, I hope you have a backup, because we already know there is some amount of data loss. Commands, execute in sequence: # cat /etc/fstab # mdadm --stop /dev/md3 # mdadm -v --create --assume-clean -e1.2 -n3 -l5 /dev/md3 /dev/sdh3 /dev/sdi3 /dev/sdj3 -u22a4b5c5:8103a815:1de617b2:3f23ee03 # cat /proc/mdstat Post output, including error messages from each of these. Quote Link to comment Share on other sites More sharing options...
0 peterzil Posted October 24, 2020 Author Share #15 Posted October 24, 2020 Thank you. This is the log of the commands: root@XP:~# cat /etc/fstab none /proc proc defaults 0 0 /dev/root / ext4 defaults 1 1 /dev/md3 /volume2 btrfs 0 0 /dev/mapper/cachedev_0 /volume1 ext4 usrjquota=aquota.user,grpjquota=aquota.group,jqfmt=vfsv0,synoacl,relatime 0 0 root@XP:~# mdadm --stop /dev/md3 mdadm: stopped /dev/md3 root@XP:~# mdadm -v --create --assume-clean -e1.2 -n3 -l5 /dev/md3 /dev/sdh3 /dev/sdi3 /dev/sdj3 -u22a4b5c5:8103a815:1de617b2:3f23ee03 mdadm: layout defaults to left-symmetric mdadm: chunk size defaults to 64K mdadm: /dev/sdi3 appears to be part of a raid array: level=raid5 devices=3 ctime=Sat Nov 16 12:10:31 2019 mdadm: size set to 5855700544K Continue creating array? y mdadm: array /dev/md3 started. root@XP:~# cat /proc/mdstat Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [raidF1] md3 : active raid5 sdj3[2] sdi3[1] sdh3[0] 11711401088 blocks super 1.2 level 5, 64k chunk, algorithm 2 [3/3] [UUU] md4 : active raid1 sda1[0] sdb1[1] 117216192 blocks super 1.2 [2/2] [UU] md2 : active raid1 sdg3[0] 3902196544 blocks super 1.2 [1/1] [U] md1 : active raid1 sdg2[0] sdh2[1] sdi2[2] sdj2[3] 2097088 blocks [12/4] [UUUU________] md0 : active raid1 sdg1[0] sdh1[1] sdi1[2] sdj1[3] 2490176 blocks [12/4] [UUUU________] unused devices: <none> The storage pool was repaired, but the volume is still "crashed" Quote Link to comment Share on other sites More sharing options...
0 flyride Posted October 25, 2020 Share #16 Posted October 25, 2020 Yep. There is corruption to the extent that DSM thinks it must be an ext4 volume because it cannot find the initial btrfs superblock. Your fstab says it was previously mounted as a btrfs volume, do you concur with that? If so, try and recover the btrfs superblock with: # btrfs rescue super-recover -v /dev/md3 If it errors out, post the error. If it suggests that it may have fixed the superblock, try mounting the volume in recovery mode: # mount -vs -t btrfs -o ro,recovery,errors=continue /dev/md3 /volume2 Quote Link to comment Share on other sites More sharing options...
0 peterzil Posted October 25, 2020 Author Share #17 Posted October 25, 2020 You are right - it was btrfs root@XP:~# btrfs rescue super-recover -v /dev/md3 No valid Btrfs found on /dev/md3 Usage or syntax errors Segmentation fault (core dumped) Quote Link to comment Share on other sites More sharing options...
0 flyride Posted October 25, 2020 Share #18 Posted October 25, 2020 Ok, let's see if there is any valid superblock in btrfs. # btrfs ins dump-super -fFa /dev/md3 Quote Link to comment Share on other sites More sharing options...
0 peterzil Posted October 25, 2020 Author Share #19 Posted October 25, 2020 root@XP:~# btrfs ins dump-super -fFa /dev/md3 superblock: bytenr=65536, device=/dev/md3 --------------------------------------------------------- btrfs: ctree.h:2183: btrfs_super_csum_size: Assertion `!(t >= (sizeof(btrfs_csum_sizes) / sizeof((btrfs_csum_sizes)[0])))' failed. csum 0xAborted (core dumped) Quote Link to comment Share on other sites More sharing options...
0 flyride Posted October 25, 2020 Share #20 Posted October 25, 2020 Ok, btrfs is crashing before we have tested all three superblocks. So let's try and reach the other two directly: # btrfs ins dump-super -Ffs 67108864 /dev/md3 # btrfs ins dump-super -Ffs 274877906944 /dev/md3 Quote Link to comment Share on other sites More sharing options...
0 peterzil Posted October 25, 2020 Author Share #21 Posted October 25, 2020 This is the log of the commands: root@XP:~# btrfs ins dump-super -Ffs 67108864 /dev/md3 superblock: bytenr=67108864, device=/dev/md3 --------------------------------------------------------- csum 0x00000000 [DON'T MATCH] bytenr 0 flags 0x0 magic ........ [DON'T MATCH] fsid 00000000-0000-0000-0000-000000000000 label generation 0 root 0 sys_array_size 0 chunk_root_generation 0 root_level 0 chunk_root 0 chunk_root_level 0 log_root 0 log_root_transid 0 log_root_level 0 total_bytes 0 bytes_used 0 sectorsize 0 nodesize 0 leafsize 0 stripesize 0 root_dir 0 num_devices 0 compat_flags 0x0 compat_ro_flags 0x0 incompat_flags 0x0 csum_type 0 csum_size 4 cache_generation 0 uuid_tree_generation 0 dev_item.uuid 00000000-0000-0000-0000-000000000000 dev_item.fsid 00000000-0000-0000-0000-000000000000 [match] dev_item.type 0 dev_item.total_bytes 0 dev_item.bytes_used 0 dev_item.io_align 0 dev_item.io_width 0 dev_item.sector_size 0 dev_item.devid 0 dev_item.dev_group 0 dev_item.seek_speed 0 dev_item.bandwidth 0 dev_item.generation 0 sys_chunk_array[2048]: backup_roots[4]: root@XP:~# btrfs ins dump-super -Ffs 274877906944 /dev/md3 superblock: bytenr=274877906944, device=/dev/md3 --------------------------------------------------------- btrfs: ctree.h:2183: btrfs_super_csum_size: Assertion `!(t >= (sizeof(btrfs_csum_sizes) / sizeof((btrfs_csum_sizes)[0])))' failed. csum 0xAborted (core dumped) Quote Link to comment Share on other sites More sharing options...
0 peterzil Posted October 25, 2020 Author Share #22 Posted October 25, 2020 Despite the unfortunate result, I really appreciate your help. You did a lot for me. I wish there were more professionals like you. Have a nice day Quote Link to comment Share on other sites More sharing options...
Question
peterzil
Hi all
For unknown reason the storage pool was crashed. 2 of 3 disk being with status "Initialized", completely healthy.
How I can repair it ?
Link to comment
Share on other sites
21 answers to this question
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.