NAS

fix md0 (E) flag and system partition failed state for disks

Recommended Posts

Hello

 

I have problems to bring my disks (virtual vmdks) back in normal working state.

 

What I have done so fare:

  • - searching the internet to find a solution for hours!!! :mad:
    - figured out that Synology uses a custom disk state (flag) in combination with mdadm (E)
     
DiskStation> cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md3 : active linear sdb3[0](E) sdd3[2](E) sdc3[1](E)
     3207050304 blocks super 1.2 64k rounding [3/3] [EEE]
md2 : active raid1 sda3[0](E)
     3666240 blocks super 1.2 [1/1] [E]
md1 : active raid1 sda2[0] sdb2[1] sdc2[2] sdd2[3]
     2097088 blocks [12/4] [uUUU________]
md0 : active raid1 sda1[0](E)
     2490176 blocks [12/1] [E___________]
unused devices: 


- figured out that mdadm –stop and mdadm –examine set state only for the first raid volume (might be a bug and occurs if you are using JBOD) http://forum.synology.com/enu/viewtopic ... 39&t=32159
- so finally managed to repaired md2 and md3 (E) --> (U)
- can not fix md0 cause it is mounted to root / :sad:
 

DiskStation> df
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/md0               2451064    761308   1587356  32% /
/tmp                   1028420       340   1028080   0% /tmp
/run                   1028420      2428   1025992   0% /run
/dev/shm               1028420         0   1028420   0% /dev/shm
/dev/md2               3608608   1214156   2292052  35% /volume1
/dev/md3             3156710304 1720929688 1435678216  55% /volume2


- spending far to much time to get live linux booting in esxi vm to get access to md0
- managed this but no success removing the faulty flag cause “sudo mdadm --assemble –scan” brings up only m2 and m3 in linux live system
- volume 1 and 2 are now back in normal state and I have write access again but all disks still show “system partition failed”
- Using DSM 5.1 and there is no option to "repair the system partition" as mentioned in several threads but always referring to an older DSM
http://forum.synology.com/enu/viewtopic ... 15#p311355

 

Is there any secret and not documented command to check the disks and set the state back to normal (I assume there will be no problem, cause I can access all files and folders stored in the volumes)??

Share this post


Link to post
Share on other sites

Is there any secret and not documented command to check the disks and set the state back to normal (I assume there will be no problem, cause I can access all files and folders stored in the volumes)??

 

I know synology support can fix it over ssh. But I haven't seen it mentioned anywhere publicly how they manage to do it. I've had this happen a while back, and was forced to just backup my data and start over.

 

It looks like you have 4 disks. all 4 disks should be listed in each of the md#'s. Each storage disk's first partition is for OS (DSM), and they are all mirrors of each other. Each disk should have a listing under md0, similar to my system below:

 

DiskStation> cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md2 : active raid5 sda5[0] sdd5[3] sdc5[2] sdb5[1]
     8776594944 blocks super 1.2 level 5, 64k chunk, algorithm 2 [4/4] [uUUU]

md1 : active raid1 sda2[0] sdb2[1] sdc2[2] sdd2[3]
     2097088 blocks [12/4] [uUUU________]

md0 : active raid1 sda1[0] sdb1[1] sdc1[2] sdd1[3]
     2490176 blocks [12/4] [uUUU________]

unused devices: 
DiskStation>

 

You're also missing a disk from your storage array... so all your data will probably not be there. the disk listed with md2 should be listed with the ones in md3 most likely.

Share this post


Link to post
Share on other sites

You're also missing a disk from your storage array... so all your data will probably not be there. the disk listed with md2 should be listed with the ones in md3 most likely.

 

Although I do not have the output of the command cat /proc/mdstat before the error showed up I am sure everything is just as it should be.

1 IDE Drive for the bootloader

1 8 GB System Partition

3 x 1TB storage disk

 

I configured two volumes in DSM. Volume 1 for the system partition (8 GB disk no raid) and the 2nd as JBOD (no raid). I use raid functionality on the host (ESXi).

 

DiskStation> fdisk -l

Disk /dev/sda: 16 MB, 16515072 bytes
4 heads, 32 sectors/track, 252 cylinders
Units = cylinders of 128 * 512 = 65536 bytes

  Device Boot      Start         End      Blocks  Id System
/dev/sda1   *           1         252       16096+  e Win95 FAT16 (LBA)

Disk /dev/sdc: 8589 MB, 8589934592 bytes
255 heads, 63 sectors/track, 1044 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

  Device Boot      Start         End      Blocks  Id System
/dev/sdc1               1         311     2490240  fd Linux raid autodetect
Partition 1 does not end on cylinder boundary
/dev/sdc2             311         572     2097152  fd Linux raid autodetect
Partition 2 does not end on cylinder boundary
/dev/sdc3             588        1044     3667338  fd Linux raid autodetect

Disk /dev/sdd: 1099.5 GB, 1099511627776 bytes
255 heads, 63 sectors/track, 133674 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

  Device Boot      Start         End      Blocks  Id System
/dev/sdd1               1         311     2490240  fd Linux raid autodetect
Partition 1 does not end on cylinder boundary
/dev/sdd2             311         572     2097152  fd Linux raid autodetect
Partition 2 does not end on cylinder boundary
/dev/sdd3             588      133674  1069017813  fd Linux raid autodetect

Disk /dev/sde: 1099.5 GB, 1099511627776 bytes
255 heads, 63 sectors/track, 133674 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

  Device Boot      Start         End      Blocks  Id System
/dev/sde1               1         311     2490240  fd Linux raid autodetect
Partition 1 does not end on cylinder boundary
/dev/sde2             311         572     2097152  fd Linux raid autodetect
Partition 2 does not end on cylinder boundary
/dev/sde3             588      133674  1069017813  fd Linux raid autodetect

Disk /dev/sdf: 1099.5 GB, 1099511627776 bytes
255 heads, 63 sectors/track, 133674 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

  Device Boot      Start         End      Blocks  Id System
/dev/sdf1               1         311     2490240  fd Linux raid autodetect
Partition 1 does not end on cylinder boundary
/dev/sdf2             311         572     2097152  fd Linux raid autodetect
Partition 2 does not end on cylinder boundary
/dev/sdf3             588      133674  1069017813  fd Linux raid autodetect

Share this post


Link to post
Share on other sites

Then that makes sense why you have md2 and md3.

 

I'm still unsure about your md0. Normally it usually has a partition on every disk in the system (besides the bootloader).... but maybe it's different if you initially have a single disk volume, then create another volume with other disks. But then again, the swap partition (md1) is mirrored on each of your disks...

Share this post


Link to post
Share on other sites

What does the following look like on each of your disks?

 

DiskStation> sfdisk -l /dev/sda
/dev/sda1                   256         4980735         4980480  fd
/dev/sda2               4980736         9175039         4194304  fd
/dev/sda5               9453280      5860519007      5851065728  fd

 

I'm curious if each disk has the OS partion (sdx1).

Share this post


Link to post
Share on other sites
What does the following look like on each of your disks?

 

DiskStation> sfdisk -l /dev/sda
/dev/sda1                   256         4980735         4980480  fd
/dev/sda2               4980736         9175039         4194304  fd
/dev/sda5               9453280      5860519007      5851065728  fd

 

I'm curious if each disk has the OS partion (sdx1).

 

here is what I get

 

DiskStation> sfdisk -l /dev/sd[abcdef]
/dev/sda1                    63           32255           32193   e


/dev/sdc1                   256         4980735         4980480  fd
/dev/sdc2               4980736         9175039         4194304  fd
/dev/sdc3               9437184        16771859         7334676  fd


/dev/sdd1                   256         4980735         4980480  fd
/dev/sdd2               4980736         9175039         4194304  fd
/dev/sdd3               9437184      2147472809      2138035626  fd


/dev/sde1                   256         4980735         4980480  fd
/dev/sde2               4980736         9175039         4194304  fd
/dev/sde3               9437184      2147472809      2138035626  fd


/dev/sdf1                   256         4980735         4980480  fd
/dev/sdf2               4980736         9175039         4194304  fd
/dev/sdf3               9437184      2147472809      2138035626  fd

Share this post


Link to post
Share on other sites

 

I came across these to pages while crawling the net for solutions... as described above I managed to fix the (E) flag for almost all partitions except for md0 which is mounted as root and can not be unmounted while accessing it (telnet or ssh). And even unsuccessful attempts to fix it with a live linux booted.

 

I did not came across someone who described how to fix the md0. This why I opened this post hoping someone with more experience than I have can point me out a solution. :wink:

Share this post


Link to post
Share on other sites

Ok, lets assume I only have to fix the degraded state of md0. How can I mount md0 on another system?

 

As you can see here /dev/md0 is degraded (state):

DiskStation> mdadm --detail /dev/md0
/dev/md0:
       Version : 0.90
 Creation Time : Sat Jan  1 01:00:03 2000
    Raid Level : raid1
    Array Size : 2490176 (2.37 GiB 2.55 GB)
 Used Dev Size : 2490176 (2.37 GiB 2.55 GB)
  Raid Devices : 12
 Total Devices : 1
Preferred Minor : 0
   Persistence : Superblock is persistent

   Update Time : Wed May 27 22:11:16 2015
         State : clean, degraded
Active Devices : 1
Working Devices : 1
Failed Devices : 0
 Spare Devices : 0

          UUID : c7e6d0f9:bdefe38c:3017a5a8:c86610be (local to host DiskStation)
        Events : 0.4360955

   Number   Major   Minor   RaidDevice State
      0       8       33        0      active sync   /dev/sdc1
      1       0        0        1      removed
      2       0        0        2      removed
      3       0        0        3      removed
      4       0        0        4      removed
      5       0        0        5      removed
      6       0        0        6      removed
      7       0        0        7      removed
      8       0        0        8      removed
      9       0        0        9      removed
     10       0        0       10      removed
     11       0        0       11      removed

 

The disk /dev/sdc1 itself is clean:

DiskStation> mdadm --examine /dev/sdc1
/dev/sdc1:
         Magic : a92b4efc
       Version : 0.90.00
          UUID : c7e6d0f9:bdefe38c:3017a5a8:c86610be (local to host DiskStation)
 Creation Time : Sat Jan  1 01:00:03 2000
    Raid Level : raid1
 Used Dev Size : 2490176 (2.37 GiB 2.55 GB)
    Array Size : 2490176 (2.37 GiB 2.55 GB)
  Raid Devices : 12
 Total Devices : 1
Preferred Minor : 0

   Update Time : Wed May 27 22:14:57 2015
         State : clean
Active Devices : 1
Working Devices : 1
Failed Devices : 11
 Spare Devices : 0
      Checksum : b5fe3aa3 - correct
        Events : 4360957


     Number   Major   Minor   RaidDevice State
this     0       8       33        0      active sync   /dev/sdc1

  0     0       8       33        0      active sync   /dev/sdc1
  1     1       0        0        1      faulty removed
  2     2       0        0        2      faulty removed
  3     3       0        0        3      faulty removed
  4     4       0        0        4      faulty removed
  5     5       0        0        5      faulty removed
  6     6       0        0        6      faulty removed
  7     7       0        0        7      faulty removed
  8     8       0        0        8      faulty removed
  9     9       0        0        9      faulty removed
 10    10       0        0       10      faulty removed
 11    11       0        0       11      faulty removed

 

booting linux from cdrom, installing mdadm and trying to assemble md0 will fail with the following error:

root@ubuntu:/home/ubuntu# blkid /dev/sdb1
/dev/sdb1: UUID="c7e6d0f9-bdef-e38c-3017-a5a8c86610be" TYPE="linux_raid_member"
root@ubuntu:/home/ubuntu# mdadm --examine /dev/sdb1
mdadm: No md superblock detected on /dev/sdb1.
root@ubuntu:/home/ubuntu# mdadm --assemble --force -v /dev/md0 /dev/sdb1
mdadm: looking for devices for /dev/md0
mdadm: no recogniseable superblock on /dev/sdb1
mdadm: /dev/sdb1 has no superblock - assembly aborted

Share this post


Link to post
Share on other sites

md0 is degraded because your not using all 12 slots. Examine md0, md1, md2, md3, ect, and you'll see they all say degraded.. I think you can ignore that. Here's what mine looks like:

 

DiskStation> mdadm --detail /dev/md0
/dev/md0:
       Version : 0.90
 Creation Time : Fri Dec 31 19:00:03 1999
    Raid Level : raid1
    Array Size : 2490176 (2.37 GiB 2.55 GB)
 Used Dev Size : 2490176 (2.37 GiB 2.55 GB)
  Raid Devices : 12
 Total Devices : 4
Preferred Minor : 0
   Persistence : Superblock is persistent

   Update Time : Wed May 27 17:54:13 2015
         State : clean, degraded
Active Devices : 4
Working Devices : 4
Failed Devices : 0
 Spare Devices : 0

          UUID : cb413e5c:819a4ff3:3017a5a8:c86610be (local to host DiskStation)
        Events : 0.762227

   Number   Major   Minor   RaidDevice State
      0       8        1        0      active sync   /dev/hda1
      1       8       17        1      active sync   /dev/sdb1
      2       8       33        2      active sync   /dev/sdc1
      3       8       49        3      active sync   /dev/hdd1
      4       0        0        4      removed
      5       0        0        5      removed
      6       0        0        6      removed
      7       0        0        7      removed
      8       0        0        8      removed
      9       0        0        9      removed
     10       0        0       10      removed
     11       0        0       11      removed
DiskStation>

 

I'm not sure why some of my disks are labled hdx vs sdx... but this isn't the 1st time I've seen it. Kinda odd...

 

I'm not sure why you can't mount it linux. I've never tried doing that on anything but the storage array. Maybe just try mounting the first partition of the disk /sdb1 :?: If you can, set aside your disks, and make a set of new test disks configured how your current system is, and see how the md#'s are configured.

Share this post


Link to post
Share on other sites
md0 is degraded because your not using all 12 slots. Examine md0, md1, md2, md3, ect, and you'll see they all say degraded.. I think you can ignore that.

 

But I assume I can not ignore the (E) flag/state of md0:

DiskStation> cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md3 : active linear sdb3[0] sdd3[2] sdc3[1]
     3207050304 blocks super 1.2 64k rounding [3/3] [uUU]

md2 : active raid1 sda3[0]
     3666240 blocks super 1.2 [1/1] [u]

md1 : active raid1 sda2[0] sdb2[1] sdc2[2] sdd2[3]
     2097088 blocks [12/4] [uUUU________]

md0 : active raid1 sda1[0](E)
     2490176 blocks [12/1] [E___________]

unused devices: 

 

I did not find any other why to fix it except using mdadm --stop and mdadm --assemble --force -v like described in the above linked web pages.

 

Is it possible to overwrite the system partition without loosing settings and data? Thinking of the procedure of an DSM upgrade or migration process.

Share this post


Link to post
Share on other sites
md0 is degraded because your not using all 12 slots. Examine md0, md1, md2, md3, ect, and you'll see they all say degraded.. I think you can ignore that.

 

But I assume I can not ignore the (E) flag/state of md0:

DiskStation> cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md3 : active linear sdb3[0] sdd3[2] sdc3[1]
     3207050304 blocks super 1.2 64k rounding [3/3] [uUU]

md2 : active raid1 sda3[0]
     3666240 blocks super 1.2 [1/1] [u]

md1 : active raid1 sda2[0] sdb2[1] sdc2[2] sdd2[3]
     2097088 blocks [12/4] [uUUU________]

md0 : active raid1 sda1[0](E)
     2490176 blocks [12/1] [E___________]

unused devices: 

 

I did not find any other why to fix it except using mdadm --stop and mdadm --assemble --force -v like described in the above linked web pages.

 

Is it possible to overwrite the system partition without loosing settings and data? Thinking of the procedure of an DSM upgrade or migration process.

 

I'm pretty sure all your DSM settings are on that partition. But you could always dump a copy with 'dd' to revert back, ect. You could always try to edit the version number and then upgrade it with the same version to see if it fixes it and gives you the option to migrate your settings, and retain your data. But if I were you, i'd remove my data array before trying stuff, so it doesn't get messed up somehow.

Share this post


Link to post
Share on other sites

upgraded (migration) dsm to 5.2-5565 which did not fix the system partition :sad:

md0 : active raid1 sda1[0](E)
     2490176 blocks [12/1] [E___________]

Is there a way to place some code / commands in the boot process which are executed before md0 is mounted? This will give me the chance to fix md0.

Share this post


Link to post
Share on other sites

Hello community

 

Comming back to my last question: Is there a way to place some code / commands in the boot process which are executed before md0 is mounted? This will give me the chance to fix md0.

 

Where can I inject this code in the boot process?

 

Can anyone help?

Share this post


Link to post
Share on other sites

Hello community,

 

after many hours i found a solution to mount the system or root partition with ubuntu.

 

I got a hint on an other forum that the /dev/sda1 partition can be mounted manually with:

mount -t ext4 /dev/sda1 /mnt

 

From there it was a small step to read the mdadm manual again to look for a way to start a raid1 array without any metadata or superblocks.

 

Use the Build option:

mdadm --build /dev/md0 --level=1 --raid-devices=3 /dev/sda1 /dev/sdb1 /dev/sdc1

 

Now i was able to mount:

mount -t ext4 /dev/md0 /mnt

 

hope this still helps somebody.

Share this post


Link to post
Share on other sites