DSM 6.2 on ESXi 6,7, storage pool crashed (Raid5, ext4)

Rihc0 · November 21, 2020

so I did the first 3 commands, and with the third command I got this.

output of: # mdadm -v --create --assume-clean -e1.2 -n5 -l5 /dev/md3 /dev/sdg3 /dev/sde3 /dev/sdf3 /dev/sdh3 missing -uff64862b:9edfe233:c498ea84:9d4b9ffd

ash-4.3# mdadm -v --create --assume-clean -e1.2 -n5 -l5 /dev/md3 /dev/sdg3 /dev/sde3 /dev/sdf3 /dev/sdh3 missing -uff64862b:9edfe233:c498ea84:9d4b9ffd
mdadm: layout defaults to left-symmetric
mdadm: chunk size defaults to 64K
mdadm: /dev/sdg3 appears to be part of a raid array:
       level=raid5 devices=5 ctime=Sat Jun 20 00:46:08 2020
mdadm: /dev/sde3 appears to be part of a raid array:
       level=raid5 devices=5 ctime=Sat Jun 20 00:46:08 2020
mdadm: cannot open /dev/sdf3: No such file or directory
ash-4.3#

flyride · November 22, 2020

The /dev/sdf3 error should not happen. Did you reboot the system since the last time you posted the drive stats?? It looks like your drives may have reassigned? Please repeat "sudo fdisk -l /dev/sd*"

Rihc0 · November 22, 2020

ash-4.3# fdisk -l /dev/sd*
Disk /dev/sdb: 16 GiB, 17179869184 bytes, 33554432 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x22d5f435

Device     Boot   Start      End  Sectors  Size Id Type
/dev/sdb1          2048  4982527  4980480  2.4G fd Linux raid autodetect
/dev/sdb2       4982528  9176831  4194304    2G fd Linux raid autodetect
/dev/sdb3       9437184 33349631 23912448 11.4G fd Linux raid autodetect
Disk /dev/sdb1: 2.4 GiB, 2550005760 bytes, 4980480 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk /dev/sdb2: 2 GiB, 2147483648 bytes, 4194304 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk /dev/sdb3: 11.4 GiB, 12243173376 bytes, 23912448 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk /dev/sdc: 16 GiB, 17179869184 bytes, 33554432 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x8504927a

Device     Boot   Start      End  Sectors  Size Id Type
/dev/sdc1          2048  4982527  4980480  2.4G fd Linux raid autodetect
/dev/sdc2       4982528  9176831  4194304    2G fd Linux raid autodetect
/dev/sdc3       9437184 33349631 23912448 11.4G fd Linux raid autodetect
Disk /dev/sdc1: 2.4 GiB, 2550005760 bytes, 4980480 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk /dev/sdc2: 2 GiB, 2147483648 bytes, 4194304 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk /dev/sdc3: 11.4 GiB, 12243173376 bytes, 23912448 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk /dev/sdd: 3.7 TiB, 4000225165312 bytes, 7812939776 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk /dev/sde: 3.7 TiB, 4000225165312 bytes, 7812939776 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: EB6537B4-CC88-4A1B-99A4-C235A327CFB6

Device       Start        End    Sectors  Size Type
/dev/sde1     2048    4982527    4980480  2.4G Linux RAID
/dev/sde2  4982528    9176831    4194304    2G Linux RAID
/dev/sde3  9437184 7812734975 7803297792  3.6T Linux RAID
Disk /dev/sde1: 2.4 GiB, 2550005760 bytes, 4980480 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk /dev/sde2: 2 GiB, 2147483648 bytes, 4194304 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk /dev/sde3: 3.6 TiB, 3995288469504 bytes, 7803297792 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
The primary GPT table is corrupt, but the backup appears OK, so that will be used.
Disk /dev/sdf: 3.7 TiB, 4000225165312 bytes, 7812939776 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: E3D1C69D-D406-4A97-BF00-168262F1025C

Device       Start        End    Sectors  Size Type
/dev/sdf1     2048    4982527    4980480  2.4G Linux RAID
/dev/sdf2  4982528    9176831    4194304    2G Linux RAID
/dev/sdf3  9437184 7812734975 7803297792  3.6T Linux RAID
Disk /dev/sdg: 3.7 TiB, 4000225165312 bytes, 7812939776 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: AB3D10CC-2A35-4075-AF8B-135C47B30870

Device       Start        End    Sectors  Size Type
/dev/sdg1     2048    4982527    4980480  2.4G Linux RAID
/dev/sdg2  4982528    9176831    4194304    2G Linux RAID
/dev/sdg3  9437184 7812734975 7803297792  3.6T Linux RAID
Disk /dev/sdg1: 2.4 GiB, 2550005760 bytes, 4980480 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk /dev/sdg2: 2 GiB, 2147483648 bytes, 4194304 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk /dev/sdg3: 3.6 TiB, 3995288469504 bytes, 7803297792 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk /dev/sdh: 3.7 TiB, 4000225165312 bytes, 7812939776 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: F681C773-F2AE-452F-8E71-55511FA5AEE0

Device       Start        End    Sectors  Size Type
/dev/sdh1     2048    4982527    4980480  2.4G Linux RAID
/dev/sdh2  4982528    9176831    4194304    2G Linux RAID
/dev/sdh3  9437184 7812734975 7803297792  3.6T Linux RAID
Disk /dev/sdh1: 2.4 GiB, 2550005760 bytes, 4980480 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk /dev/sdh2: 2 GiB, 2147483648 bytes, 4194304 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk /dev/sdh3: 3.6 TiB, 3995288469504 bytes, 7803297792 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk /dev/sdm3: 4 MiB, 4177408 bytes, 8159 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

I don't know what happend but according to the log it rebooted yea

IG-88 · November 22, 2020

7 hours ago, Rihc0 said:

I don't know what happend but according to the log it rebooted yea

disk count ~~and layout~~ are now different

~~according to the disk identifier the disks now have different positions aka sdX arrangement~~

~~the essential 4 disks are still there so the plan needs to be adjusted to the new sdX assignments~~

edit: in the 1st listing there where two more 3.7TB unusedable disks

former sdd (identifier 88AB940C-A74C-425C-B303-3DD15285C607) is gone and is place is now occupied by former sdi (no identifier)

both are not part of the recovery effort but just by rebooting the VM there should be no change in the disks amount, there must me more going on

~~edit2: here the list before the reboot and from the last listih of devices~~

~~sde -> sde~~

~~sdf -> sdg~~

~~sdg -> sdh~~

~~sdh -> sdf~~

Edited November 22, 2020 by IG-88

Rihc0 · November 22, 2020

The whole server rebooted because i got a purple screen view days ago. Also removed a few hard drives but those did not belong to the original raid, maybe i added them accidentally to the virtual machine en removed them when i pulled them out of the server.

you still think this can work ?

IG-88 · November 22, 2020

6 hours ago, Rihc0 said:

The whole server rebooted because i got a purple screen view days ago. Also removed a few hard drives but those did not belong to the original raid, maybe i added them accidentally to the virtual machine en removed them when i pulled them out of the server.

shouldn't you have mentioned that before you continued (as if nothing had changed)?

as flyride is afk we can at least gather information

so please do again

cat /proc/mdstat

mdadm --detail /dev/md3

mdadm --examine /dev/sd[efgh]3

6 hours ago, Rihc0 said:

you still think this can work ?

i guess so, as the plan was to put the disk manually into the raid set and if not working disassemble it and put it together in the next one (as in the table)

when applying the "translation" from above you could still use the table flyride made

lets see what the details about /dev/md3 say now

and we can also translate the table from before into the new disk arrangement too

Edited November 22, 2020 by IG-88

IG-88 · November 22, 2020

~~in theory the new table looks like this~~

image.png.ec2fd54d656fbee25021274e9b6967c4.png

~~the new "mdadm --examine /dev/sd[efgh]3" should show disks sdh and sde having "Events : 39101" as these are the two valid members of the 5 disk set~~

~~so the "transformed" line that did not work would be like this~~

~~mdadm -v --create --assume-clean -e1.2 -n5 -l5 /dev/md3 /dev/sdh3 /dev/sde3 /dev/sdg3 /dev/sdf3 missing -uff64862b:9edfe233:c498ea84:9d4b9ffd~~

~~but first we will need to see what happened on the try before (that was based on the old disk arrangement and can not result in a valid md3 device as the order of disks is wrong)~~

Edited November 22, 2020 by IG-88

IG-88 · November 22, 2020

4 hours ago, Rihc0 said:

The whole server rebooted because i got a purple screen view days ago.

that's not supposed to happen

maybe that server is unreliable because hardware problems?

when doing recovery it should be on a stable running hardware

Quote

Also removed a few hard drives but those did not belong to the original raid,

lets hope you did remove the right disk as it seemed to be two 4TB disks without a valid partition table and you removed one of them

in theory its possible to "stamp" the partition layout on that former 5th disk (partition table entry's created manually) and then try to see if the 3rd partition checks out as possible raid member when the attempt with the 4 disks does not work (that's based on the assumption that only a small part of the disk was overwritten when using it with the hardware raid controller) - "in theory" because i don't have a tested method to do this with linux, so i would not be able give you exact information on how to do it

before doing that you would examine the disks in question with a disk editor to see if the structures for the mdadm raid are still in the place in question and see which of the two disks are the right one (the basic layout can be seen on the two valid members of the raid set for comparison), but i guess that's nothing you can do so easily by yourself (at least not in short time as you would need to learn new skills and practice/test them)

Edited November 22, 2020 by IG-88

Rihc0 · November 22, 2020

Sorry for not saying the purple screen happend, totally forgot it due some personal issues. I have removed the gpu which cause the purple screen and will be uploading the output of the commands. And wont touch the server until you guys say so . Sorry for the trouble, I appreciate you guys helping me.

Rihc0 · November 22, 2020

I shut down the server and removed the GPU that was causing problems. I started the server and the virtual machine and this is the output of the commands you have send me.

this is with the harddrives the xpenology VM had in the first place

I won't touch it unless you say so ^^. sorry for doing many things wrong

output of "cat /proc/mdstat"

ash-4.3# cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [raidF1] 
md3 : active raid5 sdg3[0] sde3[1]
      15606591488 blocks super 1.2 level 5, 64k chunk, algorithm 2 [5/2] [UU___]
      
md2 : active raid1 sdb3[0] sdc3[1]
      11955200 blocks super 1.2 [2/2] [UU]
      
md127 : active raid1 sde1[1] sdg1[0]
      2490176 blocks [12/2] [UU__________]
      
md1 : active raid1 sdb2[0] sdc2[1] sde2[2] sdg2[3] sdh2[4]
      2097088 blocks [12/5] [UUUUU_______]
      
md0 : active raid1 sdb1[0] sdc1[1]
      2490176 blocks [12/2] [UU__________]
      
unused devices: <none>

output of "mdadm --detail /dev/md3"

ash-4.3# mdadm --detail /dev/md3
/dev/md3:
        Version : 1.2
  Creation Time : Sat Jun 20 00:46:08 2020
     Raid Level : raid5
     Array Size : 15606591488 (14883.61 GiB 15981.15 GB)
  Used Dev Size : 3901647872 (3720.90 GiB 3995.29 GB)
   Raid Devices : 5
  Total Devices : 2
    Persistence : Superblock is persistent

    Update Time : Sun Nov 22 18:24:28 2020
          State : clean, FAILED 
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 64K

           Name : Dabadoo:2
           UUID : ff64862b:9edfe233:c498ea84:9d4b9ffd
         Events : 39112

    Number   Major   Minor   RaidDevice State
       0       8       99        0      active sync   /dev/sdg3
       1       8       67        1      active sync   /dev/sde3
       -       0        0        2      removed
       -       0        0        3      removed
       -       0        0

output of "mdadm --examine /dev/sd[efgh]3"

ash-4.3# mdadm --examine /dev/sd[efgh]3
/dev/sde3:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : ff64862b:9edfe233:c498ea84:9d4b9ffd
           Name : Dabadoo:2
  Creation Time : Sat Jun 20 00:46:08 2020
     Raid Level : raid5
   Raid Devices : 5

 Avail Dev Size : 7803295744 (3720.90 GiB 3995.29 GB)
     Array Size : 15606591488 (14883.61 GiB 15981.15 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
   Unused Space : before=1968 sectors, after=0 sectors
          State : clean
    Device UUID : a5de7554:193b06b9:b1f1a8df:3c917e8b

    Update Time : Sun Nov 22 18:24:28 2020
       Checksum : 381624 - correct
         Events : 39112

         Layout : left-symmetric
     Chunk Size : 64K

   Device Role : Active device 1
   Array State : AA... ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdg3:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : ff64862b:9edfe233:c498ea84:9d4b9ffd
           Name : Dabadoo:2
  Creation Time : Sat Jun 20 00:46:08 2020
     Raid Level : raid5
   Raid Devices : 5

 Avail Dev Size : 7803295744 (3720.90 GiB 3995.29 GB)
     Array Size : 15606591488 (14883.61 GiB 15981.15 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
   Unused Space : before=1968 sectors, after=0 sectors
          State : clean
    Device UUID : e0c37824:42d56226:4bb0cdcc:d29cca2f

    Update Time : Sun Nov 22 18:24:28 2020
       Checksum : cf085fb5 - correct
         Events : 39112

         Layout : left-symmetric
     Chunk Size : 64K

   Device Role : Active device 0
   Array State : AA... ('A' == active, '.' == missing, 'R' == replacing)
mdadm: No md superblock detected on /dev/sdh3.
ash-4.3#

Edited November 22, 2020 by Rihc0

IG-88 · November 22, 2020

as it looked not as what i expected i re-checked and it looks like there is no change in drive assignments

i guess i took a older fdisk from the 1st page of the thread

so forget about the translation table from above

maybe i should not try to help and safe time and should keep watching

Rihc0 · November 22, 2020

Okay, I'll wait for flyride

IG-88 · November 22, 2020

i have no idea how to integrate /dev/sdf and /dev/sdh into the raid as it looks like there are no raid related information found by "mdadm --examine" on these two disks

imho its also a bad sign as the assumption was that only information at the beginning of the disk might have been altered, if the mdadm raid information's are lost or rendered unusable it indicates that the disks where altered in the area we are interested in and that more information inside the raid partition was altered and as there are no redundancy information left (only 4 disks with a valid partition table left) every bad/wrong data there are is destroyed/lost data as the information put together are not the correct one anymore (and without the parity to check there is no way of telling whats still good, so even when the file system could be accessed there is no way to be sure what files will have a valid content)

edit: for me that's the point where the 5th (maybe even more damaged?) disks comes into play and that together is way above my practical experience and in no way i would be able to remote assist by only text messages, i would need to poke my nose in places with a disk editor and would need to learn about mdadm and its data structures

thats the point i was about in a former comment:

On 11/13/2020 at 12:52 AM, IG-88 said:

if you are a data recovery specialist working for kroll ontrack you could

you could asks a professional recovery company and describe your case in details but my guess would be the recovery would cost you at least a one digit thousand bugs

Edited November 22, 2020 by IG-88

Rihc0 · November 24, 2020

Rightnow I use reclaime pro recovery software from a friend of mine and I can see the data, and try to get it of the raid. hopefully it works

flyride · November 24, 2020

Work has kept me away from monitoring what happens here, please DM me if you decide to bring the disks back to DSM for recovery.

DSM 6.2 on ESXi 6,7, storage pool crashed (Raid5, ext4)

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation