Jump to content
XPEnology Community

RAID6 Crashes on Rebuild


Logjammer

Recommended Posts

I have an XPEnology build running DSM 6.1.7-15284 Update 3 on an older Dell/Compellent SC030 Storage Controller.  The specs are as follows:

 

CPU:  Intel(R) Xeon(R) CPU E5240  @ 3.00GHz

RAM: 3172 MB

Disk Controllers: 2 x Fujitsu DS2607-8i crossflashed to an LSI 92118-8i with firmware P20 and in IT mode.

There are 16 2TB IBM SAS drives (a mix of IBM-ESXS HUS723020ALS64 and IBM-XIV ST2000NM0043 A4) attached to the controllers via the SAS-836TQ backplane.

 

This system previously had a mix of 16 1TB SATA drives and had no issues beyond bad sectors from old drives.  It was in a RAID5 configuration.

 

With the new setup, each time the RAID6 array rebuilds it immediately crashes at the end taking out 10 drives.  

Eight of the drives indicate Status=Crashed, S.M.A.R.T. Status=Normal and no bad sectors

Two of the drives indicated Status=Initialized, S.M.A.R.T. Status=Normal and no longer belong to the RAID Group.

When I reboot the device, the eight crashed drives indicate that they are working normally and rejoin the group.  I am then prompted to rebuild the array with the two other drives.

 

When installing the drives I booted from the USB in re-installation mode and built the RAID from scratch.

 

I have backups of the data, but am unsure of how to proceed.

 

Please let me know what I can do to prevent the array from crashing.

 

The volume is formatted as BTRFS.

 

Edited by Logjammer
Link to comment
Share on other sites

I decided to restart the device and now the drives are indicating okay.  cat /proc/mdstat shows the following with no intervention on my part:

ash-4.3# cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [raidF1]
md2 : active raid6 sda3[0] sdk3[17] sdi3[16] sdp3[14] sdj3[13] sdl3[12] sdm3[11] sde3[9] sdo3[8] sdf3[7] sdn3[6] sdg3[5] sdh3[4] sdb3[3] sdc3[2] sdd3[1]
      27281695616 blocks super 1.2 level 6, 64k chunk, algorithm 2 [16/14] [UUUUUUUUUU_UUUU_]
      [>....................]  recovery =  0.5% (10132308/1948692544) finish=3215.7min speed=10046K/sec
md1 : active raid1 sda2[0] sdb2[15] sdc2[1] sdd2[2] sde2[3] sdf2[4] sdg2[5] sdh2[6] sdi2[7] sdj2[8] sdk2[9] sdl2[10] sdm2[11] sdn2[12] sdo2[13] sdp2[14]
      2097088 blocks [16/16] [UUUUUUUUUUUUUUUU]
md0 : active raid1 sda1[0] sdb1[12](S) sdc1[13](S) sdd1[1] sde1[8] sdf1[10] sdg1[14](S) sdh1[15](S) sdi1[7] sdj1[4] sdk1[2] sdl1[5] sdm1[6] sdn1[11] sdo1[9] sdp1[3]
      2490176 blocks [12/12] [UUUUUUUUUUUU]
unused devices: <none>

So I guess I will wait the 53 hours for the parity check to complete.

Edited by Polanskiman
Please use code tags when posting command line, code or logs.
Link to comment
Share on other sites

As I have all the data backed up, I am going to try SHR.  I have made the changes in synoinfo.conf and am creating the volume again.  Maybe the RAID failed due to the differences in HDD models, but again, it failed either at completion or pretty close.  I was unable to monitor in realtime.

Link to comment
Share on other sites

Not sure if this is important, but after the crash, md0 and md1 rebuild/resync just fine.  It just fails on md2 and takes the drives offline.  I still have 10 hrs to go on a fresh SHR attempt.  Is there anymore information I could provide in regards to drive/controller health?

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...