• 0
golge13

Degraded mode 5/6 failed rebuild, files missing

Question

Hi!

One of my six hard drives gave up and the volume became degraded. I swapped the broken HDD and started repairing the volume. But the new hddn broke down in the meantime, now I see that an entire folder is missing. On the volume itself, the space is still the same as before but some files are gone. (Can`t see them).

After I had repair the device, the files still missing.

I run on DSM 5.0-4493 update 5 on a HP n54l. Shr-1 raid with 6 HDD 

 

I happened to do a factory reset in panic so I'm using an ubuntu live stick, as well as a Windows 10 live stick with UFS Explorer. I apologize for my bad computer skills and English.

 

I have saved the space_history_  logfiles when the volume was degraded and now. it seems that the partitions have moved a little bit there.

 

Is there anyone who could help me fix my problem?
I would be very grateful!

 

 

 

 

 

space_history_20171224_131310.xml

space_history_20180306_181156.xml

Share this post


Link to post
Share on other sites

4 answers to this question

Recommended Posts

  • 0

24.12.2017
sda    HDS722020ALA330      2GB
sdb    DT01ACA300                3TB
sdc    WD40EFRX                    4TB
sdd    WD40EFRX                    4TB
sde    ???                                 ???
sdf    HD103SI                          1TB

 

what disk size was sde, assumption would be 1 2 or 3 TB

when ignoring sde as it was never there and to build a shr1 raid according to the xml file (3 x raid5, 1 x raid1)


(1) raid 5 - 5 x 1TB = 4TB
(2) raid 5 - 4 x 1TB = 3TB
(3) raid 5 - 3 x 1TB = 2TB
(4) raid 1 - 2 x 1TB = 1TB
-> 10TB usable space

 

depending on sde and its size (1) to (3) might get additional 1TB chunks

the raid sets would look like this (disks are horizontal starting from left to right)
the raid set are "stitched" together as a logical volume, resulting in a big volume that is formatted with the file system

 

      4 4
   3 3 3 ?
2 2 2 2 ?
1 1 1 1 ? 1
a b c  d e f


06.03.2018 (there seems to be no failed disk as there are 6 of 6 disks present in this configuration - at least at 1st glance)

sda    HDS722020ALA330     2GB
sdb    DT01ACA300               3TB
sdc    WD40EFRX                   4TB
sdd    WD40EFRX                   4TB
sde    ST1000LM035              1TB
sdf    HD103SI                         1TB

 

the sde 1TB is in the first raid 5 set as to expect in shr1
BUT the last raid set formerly 2 x 1TB (the last 1TB chunks of both 4TB disks) are now a raid 5 set as if there was a raid extension, 2 disks are a complete raid1 and a raid set from 3 disks will be automatically a raid 5 in SHR1, as there are only 2 disks in this and its a raid5 at least this raid set is incomplete and will be degraded (but 2 disk of 3 is still a accessible raid set that can be used)

 

the piece needed would be mdstat to see the status of the raid sets, maybe three raid5 sets are incomplete and only the (1) is repaired with the 1TB disk? thy mytery is the change from raid1 to raid5 in the time between the two status files

to much missing information to come to a conclusive result

 

1. what was original sde disk (size) missing in config 24.12.2017
2. what was the 1st replacement disk that failed (size)
3. did the 1st rebuild succeed and the disk failed later?
4. most important what does mdstat gives back, are all raids only degraded or are there failed raids (broken, data loss)

5. did you do any repairs beside the raid rebuilds dsm tried? like repairing file system or logical volume

 

https://raid.wiki.kernel.org/index.php/Mdstat

 

as long as the raid things are not cleared there is no clean logical volume possible, so checking logical volume (vgdisplay) of even file system makes no sense for now

 

Share this post


Link to post
Share on other sites
  • 0
Posted (edited)

Thank you for a detailed answer! 

 

1. The original disk was also 1 TB

 

2. First I tried to repair with the same disk. maybe went 30-40% before it was disconnected. Then I tried with a new 10TB disk, but it did not even initialize the disk before it became completely useless. (crc-error or something, got a new one on warranty).I did not want to chance with one other 10TB so I bought a new 1TB. Then it worked well to repair. But the files were already gone.

 

3. Now I was unsure I'll be back to check if I can see it in the log file

 

4. Now mdstat says everything is okay, but I checked in the DiskStation when I had it and I attached the picture here. I do not know if it will look like that in md0 and md1. What I think it means is that I had 12 slots in 3612xs and that I only used 6 but I'm definitely not sure. I can not see md0 and md1 in ubuntu?

mdstat dsm 06-03.PNG

root@ubuntu:~# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] 
md2 : active raid5 sda5[0] sdd5[6] sdc5[5] sdf5[3] sde5[7] sdb5[1]
      4860138240 blocks super 1.2 level 5, 64k chunk, algorithm 2 [6/6] [UUUUUU]
      
md5 : active raid5 sdd8[0] sdc8[1]
      976742784 blocks super 1.2 level 5, 64k chunk, algorithm 2 [2/2] [UU]
      
md4 : active raid5 sdc7[0] sdd7[2] sdb7[1]
      1953485568 blocks super 1.2 level 5, 64k chunk, algorithm 2 [3/3] [UUU]
      
md3 : active raid5 sdb6[0] sdd6[3] sdc6[2] sda6[1]
      2930228352 blocks super 1.2 level 5, 64k chunk, algorithm 2 [4/4] [UUUU]
      
unused devices: <none>

5. No I do not think so.

Edited by golge13

Share this post


Link to post
Share on other sites
  • 0
root@ubuntu:~# vgdisplay -v
    Using volume group(s) on command line.
  --- Volume group ---
  VG Name               vg1000
  System ID             
  Format                lvm2
  Metadata Areas        4
  Metadata Sequence No  15
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                1
  Open LV               1
  Max PV                0
  Cur PV                4
  Act PV                4
  VG Size               9.98 TiB
  PE Size               4.00 MiB
  Total PE              2617331
  Alloc PE / Size       2617331 / 9.98 TiB
  Free  PE / Size       0 / 0   
  VG UUID               0mJfTu-HpKa-N00X-ecjA-ZWmh-nIsT-i7bDo2
   
  --- Logical volume ---
  LV Path                /dev/vg1000/lv
  LV Name                lv
  VG Name                vg1000
  LV UUID                SQcxkP-D3DV-cxNj-Zkr2-RZ5k-EmfX-Zcy9vq
  LV Write Access        read/write
  LV Creation host, time , 
  LV Status              available
  # open                 1
  LV Size                9.98 TiB
  Current LE             2617331
  Segments               9
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     1280
  Block device           253:0
   
  --- Physical volumes ---
  PV Name               /dev/md2     
  PV UUID               UaZBcq-yq6p-UQk7-HKwn-t53m-WiyO-ST7XfK
  PV Status             allocatable
  Total PE / Free PE    1186557 / 0
   
  PV Name               /dev/md3     
  PV UUID               2MCtMR-GdSF-UufZ-m16o-Fx7c-c4GU-687R78
  PV Status             allocatable
  Total PE / Free PE    715387 / 0
   
  PV Name               /dev/md4     
  PV UUID               tUpwsB-N0ue-Pk1q-VTe0-MJwF-Pt8U-O2wFMl
  PV Status             allocatable
  Total PE / Free PE    476925 / 0
   
  PV Name               /dev/md5     
  PV UUID               RBUtMK-zd4z-cNns-i8c5-jOoL-k8kx-vwfSuk
  PV Status             allocatable
  Total PE / Free PE    238462 / 0

I removed insignificant items from the log file and uploaded it below. I see that I tried to fix the disk more times than I remember.:?

synosys.log avskalad.rtf

Share this post


Link to post
Share on other sites
  • 0

md0 and md1 are of no importance at this point they only contain the dsm system partition and swap partition

i thought maybe there was a raid extension after repair (with a 4 TB or bigger disk) and all md3-md5 got one 1TB more from disk sde, that would explain the change of raid level for md5 from 1 to 5 (2 disk -> 3 disk) but with mdstat we can see that md3 and md4 are still have the "normal" amount of disks as before (24.12.2017)

there is no explanation how the raid level of /dev/md5 changed and more problematic is that a raid5 needs at least 3 disks and you md5 raid5 consists of 2 disks and no disk is missing, that's kind of impossible (for a "normal" raid5)

 

to be a "normal" raid5 (like /dev/md4) i think it would have to look like this

 

md5 : active raid5 sdd8[0] sdc8[1]

1953485568 blocks super 1.2 level 5, 64k chunk, algorithm 2 [3/2] [UU_]

 

number of blocks are the number of 2 disks and there would be 3 disks with one missing (UU_)

 

when creating a raid5  with mdadm you can create it with two disks (just the data disk without any redundancy) and add a 3rd after it (that is used when migrating a raid like raid1 to 5) but from what you write there where no steps like this (migrations), also if it where a raid5 of 2 disks then the amount of blocks should be the same as /dev/md4 i guess

 

so from the history you give and how it looks the /dev/md5 has a wrong raid type (it was not migrated to be raid5), the disk are two raid1 disks and will be used as raid5 disks so the data you would see in this raid set would be a random mashed up mess and if you assemble that to a logical volume then there would be a 1TB part in that volume that consists of wrong data, how this looks in a file system depends on the file system you use but you see that something is missing so that's might be kind of proof that the files system is not consistent (you can also check the file system of the logical volume without correcting it, there should be a good amount of wrong date when 1TB of 10TB is wrong)

LVM can not see/know about that it just puts the four blocks md2-md5 together (all have the right size to do so)

 

you could do a

sudo mdadm --detail /dev/md5

to see a better status but i guess as md5 is in a kind of impossible state there is no use in whatever this shows you

 

as long as you have not written to that file system (the logical volume), things might be still recoverable

the point is how to change md5 to take the 2 disks as raid1 without changing the content of the disks, like manually assign a raid type

and that's the point where i would leave because i haven't done something like that before (only some ordinary mdadm stuff like repairing a regular raid set with failed disk)

there might be two ways, one would be to change the metadata of md5 to raid1 (kind of reverse how it presumably came into existence) or two dissemble(?) md5 and assemble it as raid1 using the tool mdadm (or only assemble it newly as raid1 without destroying it before?)

 

i also think that it might be already to late, you wrote that after the try with the 10TB disk the data where already missing so there was already something wrong and repairing with another disk might have bin a problem, the recovery should have started on this point, also you mentioned you reset the system (reinstalled dsm), that's also a point where its hard to guess what happened

 

usually when doing data recovery you work with copy's of the originals so if something goes wrong you can try again (but most people don't have multi TB disks laying around)

you can read this for understanding what do do

https://raid.wiki.kernel.org/index.php/RAID_Recovery

(or search the internet for other good sources, don't rush tings)

 

anything of the following is pure theory and you should assume there are errors in it, you should understand the process and see the commands as a example you need to translate into your case, this assumes the system was started from a recovery/live linux and the logical volume was not started or in use

also maybe wait e few days if someone else here has a comment about it (i could be wrong in more then one point)

 

 

to preserve the information
 

mdadm --examine /dev/sdc8 >> raid.status

mdadm --examine /dev/sdd8 >> raid.status 

 

to recreating it as raid1 i would assume this
 

mdadm --stop /dev/md5

mdadm --create /dev/md5 --assume-clean --raid-devices=1 --level=1 /dev/sdc8

that leaves out /dev/sdd8 as we don't know if both partitions are the same, as former raid1 they should have the same content but if not? so we decide for one disk to be used

after assembling the logical volume again and hopefully all data are there (checking file system without correcting anything), ith that does not checks out correctly you could doe the same (stop end create) with the other disk sdd8 and check if the file system in better with this disk

the incomplete raid1 (containing only one disk) will get a missing 2nd disk by adding it

mdadm --add /dev/md5 /dev/sdd8

 

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now