• 0

Crashed volume - SHR


Go to solution Solved by flyride,

Question

sigh... my NAS is now reporting a crashed volume.

 

image.thumb.png.d80e776db76b0119ee8b389aab3c7284.png

 

if I do the following 'cat /proc/mdstat' it reports:
 

cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md3 : active raid5 sdf6[9] sda6[6] sdb6[7] sdc6[10] sdd6[2] sde6[8]
      4883714560 blocks super 1.2 level 5, 64k chunk, algorithm 2 [6/6] [UUUUUU]

md4 : active raid1 sda7[0] sdb7[1]
      7811854208 blocks super 1.2 [2/2] [UU]

md2 : active raid5 sda5[11] sde5[6] sdf5[7] sdd5[8] sdc5[9] sdb5[10]
      14627177280 blocks super 1.2 level 5, 64k chunk, algorithm 2 [6/6] [UUUUUU]

md1 : active raid1 sda2[0] sdb2[1] sdc2[2] sdd2[3] sde2[4] sdf2[5]
      2097088 blocks [12/6] [UUUUUU______]

md0 : active raid1 sda1[0] sdb1[1] sdc1[5] sdd1[3] sde1[2] sdf1[4]
      2490176 blocks [12/6] [UUUUUU______]

unused devices: <none>

Any assistance or suggestions to articles to read would be most appreciated please.

 

Thank you.

 

Link to post
Share on other sites

22 answers to this question

Recommended Posts

  • 0
  • Solution
19 minutes ago, flyride said:

I still don't know what exactly has been done that was destructive, and would hope to identify the action that caused the corruption before attempting any irreversible action for recovery.

 

This thread (starting with the linked post) details a specific recovery using lvm's backup.  However, your underlying problem (which is not really known) is different than the the thread original poster. https://xpenology.com/forum/topic/41307-storage-pool-crashed/?do=findComment&comment=195342

 

The folder with the backup data is /etc/lvm/backup

The restore command involved is vgcfgrestore (or in your case lvm vgcfgrestore)

 

You are running older lvm (because of DSM 5.2) than this example so there might be other differences.

Link to post
Share on other sites
  • 0
Posted (edited)

hi Flyride I know I definitely have ext4 filesystem and if I SSH in I have no volumes listed.  I do not have the 'dump' command.

 

vi /etc/fstab:


none /proc proc defaults 0 0
/dev/root / ext4 defaults 1 1
/dev/vg1000/lv /volume1 ext4  0 0


 

 cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md3 : active raid5 sdf6[9] sda6[6] sdb6[7] sdc6[10] sdd6[2] sde6[8]
      4883714560 blocks super 1.2 level 5, 64k chunk, algorithm 2 [6/6] [UUUUUU]

md4 : active raid1 sda7[0] sdb7[1]
      7811854208 blocks super 1.2 [2/2] [UU]

md2 : active raid5 sda5[11] sde5[6] sdf5[7] sdd5[8] sdc5[9] sdb5[10]
      14627177280 blocks super 1.2 level 5, 64k chunk, algorithm 2 [6/6] [UUUUUU]

md1 : active raid1 sda2[0] sdb2[1] sdc2[2] sdd2[3] sde2[4] sdf2[5]
      2097088 blocks [12/6] [UUUUUU______]

md0 : active raid1 sda1[0] sdb1[1] sdc1[5] sdd1[3] sde1[2] sdf1[4]
      2490176 blocks [12/6] [UUUUUU______]

unused devices: <none>
 

 

image.thumb.png.7b6dc9ea0cd43d0e720b571bb3af1dd5.png

 

image.thumb.png.7017b5b71afd7f75a62403b8acf59036.png

 

vgdisplay shows absolutely nothing.

 

I hope this helps?

 

thank you.

Edited by gizmomelb
Link to post
Share on other sites
  • 0

Well generally, you need to validate the members of your volume group and start it - i.e. sudo vgchange -ay

 

When you do get it to start, then try to mount the volume per the fstab.  If it won't mount, then fsck is the tool that is needed to attempt to fix your volume.  Any post that references btrfs does not apply to you.

 

Investigate your pv/lv starting with this post, and really any linux lvm data recovery thread.

https://xpenology.com/forum/topic/14337-volume-crash-after-4-months-of-stability/?do=findComment&comment=107971

 

 

Link to post
Share on other sites
  • 0
Posted (edited)

Hi Flyride,

 

vgchange -ay   displays nothing, just goes to the next command line.

 

I followed all the steps in the thread you mentioed and posted the results of the questions asked in that thread as well.

 

mount /dev/vg1000/lv /volume1
mount: open failed, msg:No such file or directory
mount: mounting /dev/vg1000/lv on /volume1 failed: No such device
 

mount -o clear_cache /dev/vg1000/lv /volume1
mount: open failed, msg:No such file or directory
mount: mounting /dev/vg1000/lv on /volume1 failed: No such device
 

mount -o recovery /dev/vg1000/lv /volume1
mount: open failed, msg:No such file or directory
mount: mounting /dev/vg1000/lv on /volume1 failed: No such device
 

 

fsck.ext4 -v /dev/vg1000/lv
e2fsck 1.42.6 (21-Sep-2012)
fsck.ext4: No such file or directory while trying to open /dev/vg1000/lv
Possibly non-existent device?


Looks like I need to rebuild the volume group?

Edited by gizmomelb
Link to post
Share on other sites
  • 0
39 minutes ago, flyride said:

Well generally, you need to validate the members of your volume group

 

You kind of skipped the first step.  Figure out what's wrong here, otherwise you are just blindly following random tasks and pushing buttons.

Link to post
Share on other sites
  • 0
Posted (edited)

Hi,

 

yes I am not a *nix expert..  I ran the vgchange -ay command it is literally does nothing, displays nothing and goes to the new command line

I tried vgscan but I do not have that command.



ok maybe I'm going in the wrong direction (please tell me if I am) but looking at this thread - https://community.synology.com/enu/forum/17/post/84956

the array seems to still be intact but has a size of '0'.

 

fdisk -l
fdisk: device has more than 2^32 sectors, can't use all of them

Disk /dev/sda: 2199.0 GB, 2199023255040 bytes
255 heads, 63 sectors/track, 267349 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks  Id System
/dev/sda1               1      267350  2147483647+ ee EFI GPT
fdisk: device has more than 2^32 sectors, can't use all of them

Disk /dev/sdb: 2199.0 GB, 2199023255040 bytes
255 heads, 63 sectors/track, 267349 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks  Id System
/dev/sdb1               1      267350  2147483647+ ee EFI GPT
fdisk: device has more than 2^32 sectors, can't use all of them

Disk /dev/sdc: 2199.0 GB, 2199023255040 bytes
255 heads, 63 sectors/track, 267349 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks  Id System
/dev/sdc1               1      267350  2147483647+ ee EFI GPT
fdisk: device has more than 2^32 sectors, can't use all of them

Disk /dev/sdd: 2199.0 GB, 2199023255040 bytes
255 heads, 63 sectors/track, 267349 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks  Id System
/dev/sdd1               1      267350  2147483647+ ee EFI GPT
fdisk: device has more than 2^32 sectors, can't use all of them

Disk /dev/sde: 2199.0 GB, 2199023255040 bytes
255 heads, 63 sectors/track, 267349 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks  Id System
/dev/sde1               1      267350  2147483647+ ee EFI GPT
fdisk: device has more than 2^32 sectors, can't use all of them

Disk /dev/sdf: 2199.0 GB, 2199023255040 bytes
255 heads, 63 sectors/track, 267349 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks  Id System
/dev/sdf1               1      267350  2147483647+ ee EFI GPT

Disk /dev/synoboot: 8382 MB, 8382316544 bytes
4 heads, 32 sectors/track, 127904 cylinders
Units = cylinders of 128 * 512 = 65536 bytes

        Device Boot      Start         End      Blocks  Id System
/dev/synoboot1   *           1         384       24544+  e Win95 FAT16 (LBA)
 

 

cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md2 : active raid5 sda5[11] sde5[6] sdf5[7] sdd5[8] sdc5[9] sdb5[10]
      14627177280 blocks super 1.2 level 5, 64k chunk, algorithm 2 [6/6] [UUUUUU]

md3 : active raid5 sdf6[9] sda6[6] sdb6[7] sdc6[10] sdd6[2] sde6[8]
      4883714560 blocks super 1.2 level 5, 64k chunk, algorithm 2 [6/6] [UUUUUU]

md4 : active raid1 sda7[0] sdb7[1]
      7811854208 blocks super 1.2 [2/2] [UU]

md1 : active raid1 sda2[0] sdb2[1] sdc2[2] sdd2[3] sde2[4] sdf2[5]
      2097088 blocks [12/6] [UUUUUU______]

md0 : active raid1 sda1[0] sdb1[1] sdc1[5] sdd1[3] sde1[2] sdf1[4]
      2490176 blocks [12/6] [UUUUUU______]

unused devices: <none>
 

 

mdadm --detail /dev/md2
/dev/md2:
        Version : 1.2
  Creation Time : Sat Aug  6 21:35:22 2016
     Raid Level : raid5
     Array Size : 14627177280 (13949.56 GiB 14978.23 GB)
  Used Dev Size : 2925435456 (2789.91 GiB 2995.65 GB)
   Raid Devices : 6
  Total Devices : 6
    Persistence : Superblock is persistent

    Update Time : Fri Jun  4 12:39:36 2021
          State : clean
 Active Devices : 6
Working Devices : 6
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 64K

           Name : DiskStation:2
           UUID : dc74ec9b:f2e2c601:4cf78ff1:38dc0feb
         Events : 1211707

    Number   Major   Minor   RaidDevice State
      11       8        5        0      active sync   /dev/sda5
      10       8       21        1      active sync   /dev/sdb5
       9       8       37        2      active sync   /dev/sdc5
       8       8       53        3      active sync   /dev/sdd5
       7       8       85        4      active sync   /dev/sdf5
       6       8       69        5      active sync   /dev/sde5
 

 

 

mdadm --detail /dev/md3
/dev/md3:
        Version : 1.2
  Creation Time : Mon Jul 23 12:51:46 2018
     Raid Level : raid5
     Array Size : 4883714560 (4657.47 GiB 5000.92 GB)
  Used Dev Size : 976742912 (931.49 GiB 1000.18 GB)
   Raid Devices : 6
  Total Devices : 6
    Persistence : Superblock is persistent

    Update Time : Fri Jun  4 12:40:26 2021
          State : clean
 Active Devices : 6
Working Devices : 6
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 64K

           Name : GIZNAS01:3  (local to host GIZNAS01)
           UUID : 9717fe13:d84d4533:3a7153e2:17f9a9d0
         Events : 434333

    Number   Major   Minor   RaidDevice State
       9       8       86        0      active sync   /dev/sdf6
       8       8       70        1      active sync   /dev/sde6
       2       8       54        2      active sync   /dev/sdd6
      10       8       38        3      active sync   /dev/sdc6
       7       8       22        4      active sync   /dev/sdb6
       6       8        6        5      active sync   /dev/sda6
 

 

 mdadm --detail /dev/md4
/dev/md4:
        Version : 1.2
  Creation Time : Tue Dec 15 10:41:07 2020
     Raid Level : raid1
     Array Size : 7811854208 (7449.96 GiB 7999.34 GB)
  Used Dev Size : 7811854208 (7449.96 GiB 7999.34 GB)
   Raid Devices : 2
  Total Devices : 2
    Persistence : Superblock is persistent

    Update Time : Fri Jun  4 12:40:31 2021
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

           Name : GIZNAS01:4  (local to host GIZNAS01)
           UUID : 65bfef18:22c472bd:99421137:9958e284
         Events : 4

    Number   Major   Minor   RaidDevice State
       0       8        7        0      active sync   /dev/sda7
       1       8       23        1      active sync   /dev/sdb7
 

 

If I am missing the obvious, please let me know as it is not obvious to me sorry.

 

Thank you for your help and time.

 

 

 

 

Edited by gizmomelb
Link to post
Share on other sites
  • 0
Posted (edited)
7 hours ago, gizmomelb said:

yes I am not a *nix expert..  I ran the vgchange -ay command it is literally does nothing, displays nothing and goes to the new command line

I tried vgscan but I do not have that command.

 

I understand.  You really need to become a student of mdadm, lvm and ext4 in order for this to go well.  I urge you to take the time and understand the commands you are typing in and not just blindly follow someone else's (potentially misinformed) process.

 

There many permutations of lvm commands, just do some googling.  pvscan alias for "lvm pvscan" for example.

 

FWIW, vgchange -ay is the command that starts the logical volume if all its physical members are working. It would make your /dev/lv1000 device start working if the members were intact and functioning.

 

7 hours ago, gizmomelb said:

the array seems to still be intact but has a size of '0'.

 

Yes, your array appears fine (per your original post) so don't mess with that. The array is your Disk Group.  That is the same size it always was.  The DSM volume (filesystem) within it has a size of 0 because DSM cannot recognize a filesystem at the moment.

 

7 hours ago, gizmomelb said:

ok maybe I'm going in the wrong direction (please tell me if I am) but looking at this thread - https://community.synology.com/enu/forum/17/post/84956

 

There is a lot of stuff in that thread that has to do with the array, does not apply to you and would be quite dangerous if you followed it.  Again, right now you need to figure out what is going on with the "physical" storage devices (/dev/md2, /dev/md3, /dev/md4) that are part of the lv.  To be clear lv = logical volume.  This is storage and not the DSM volume that lives within it and is called a "volume" within the UI.

 

The entire LVM metadata configuration can be recreated from an automatic backup that is saved within your /etc structure.  But you have to figure out what is happening with the devices first.

Edited by flyride
Link to post
Share on other sites
  • 0

Is this the same system?  Did you do something to try and expand with your 10GB disk and toast your volume?

 

If so, you should explain exactly what was attempted and what happened.  Also it would have been helpful if you had mentioned the system was a 5.2 build.

Link to post
Share on other sites
  • 0
Posted (edited)

Hi yes that was my post about expanding the volume before the volume crashed - I tried expanding the volume following your instructions here:

 

 

My apologies for not saying this was DSM 5.2 earlier - I don't know what I need to post and that is why I was asking for assistance.  I was looking at similar issues across many forums but most solutions involved earlier versions of DSM with resize being available.

Edited by gizmomelb
Link to post
Share on other sites
  • 0

Generally DSM automatically expands when physical drives are added.

 

The link you posted is a quite different use case using virtual drives on ESXi, where the space gets expanded by modifying the VM underlying storage, without adding drives.

 

So, do you know what was executing when it crashed?  Also, any luck investigating the physical members of the lv - i.e. lvm pvscan

Link to post
Share on other sites
  • 0
Posted (edited)

Hi Flyride,

 

thank you for continuing to assist me - it's been a busy day but I will make the time to read up more about how mdadm works (it makes sense to me a little already).

 

I think the damage I caused was in step 10 or 11 as detailed here:
 

9. Inform lvm that the physical device got bigger.

$ sudo pvresize /dev/md2 Physical volume "/dev/md2" changed 1 physical volume(s) resized / 0 physical volume(s) not resized

If you re-run vgdisplay now, you should see some free space.

 

10. Extend the lv

$ sudo lvextend -l +100%FREE /dev/vg1/volume_1 Size of logical volume vg1/volume_1 changed from 15.39 GiB (3939 extents) to 20.48 GiB (5244 extents). Logical volume volume_1 successfully resized.

 

11. Finally, extend the filesystem (this is for ext4, there is a different command for btrfs)

$ sudo resize2fs -f /dev/vg1/volume_1

I was trying to extend the partition for sdc (the replacement 10TB drive for the orginal 4TB drive) as DSM didn't automatically resize the new hdd partition after the rebuild finished.
 

as requested:

 

lvm pvscan
  PV /dev/md2                      lvm2 [13.62 TB]
  PV /dev/md3                      lvm2 [4.55 TB]
  PV /dev/md4                      lvm2 [7.28 TB]
  Total: 3 [25.45 TB] / in use: 0 [0   ] / in no VG: 3 [25.45 TB]
 

Thank you.

Edited by gizmomelb
Link to post
Share on other sites
  • 0

Ahh I just found a screenshot I made last night doesn't appear to be destructive though, just testing the filesystem for errors.  This is what I had typed:

 

syno_poweroff_task -d

vgchange -ay

fsck.ext4 -pvf -C 0 /dev/vg1000/lv
 

then I executed these commands:

 

vgchange -an vg1000

sync

init 6

 

that looks to be it.. I deactivated the VG  which  explains why there isn't a VG when I execute vgs or vgdisplay etc.

To re-activate the VG I need to execute 'vgchange -ay vg1000' - but I'll wait until it is confirmed.

 

Thank you.

 

Link to post
Share on other sites
  • 0
Posted (edited)

So if you review the response about trying to expand your physical volume on the other thread, you can see the broken logic.

DSM didn't automatically expand your SHR because it isn't possible with the SHR structure and rules.

 

Now understand that to the filesystem (ext4), this is all done under the covers:

/dev/md2 + /dev/md3 + /dev/md4 = continuous storage (vg1000) which ext4 then writes a filesystem upon.

 

The storage expansion procedure that you apparently were following is a special use case (which was stated in the thread header, along with admonishment for backup).  A virtual disk can be grown in ESXi, but it does not get registered as a disk change event with DSM, so its automatic expansion doesn't occur.  So my procedure adds storage to the end of an existing partition, which is relatively simple to inform lvm and DSM, and doesn't disturb any existing data.

 

I think essentially what you have attempted is to add storage in the middle of the lvm.  DSM does this as part of its expansion logic, but I don't know exactly how it is accomplished.  If done indiscriminately, this corrupts ext4 as parts of the filesystem with data changed LBA's.  But we cannot even see your lv's so something else seems to have happened that seems more catastrophic.

 

Given that you want more space - if you delete and remake the SHR, you will get 32TB.  If you have a backup of your data, that may be preferable.

 

image.thumb.png.aaadb7e06e1d3b126f42c5c5f3bc291a.png

 

If you don't have a backup, you might try and restore the lvm from its automatic backup.  I still don't know what exactly has been done that was destructive, and would hope to identify the action that caused the corruption before attempting any irreversible action for recovery.

Edited by flyride
Link to post
Share on other sites
  • 0
Posted (edited)

GIZNAS01> vgcfgrestore vg1000
  Restored volume group vg1000
GIZNAS01> lvm vgscan
  Reading all physical volumes.  This may take a while...
  Found volume group "vg1000" using metadata type lvm2
GIZNAS01> lvm lvscan
  inactive          '/dev/vg1000/lv' [25.45 TB] inherit
GIZNAS01>

 

GIZNAS01> pvs
  PV         VG     Fmt  Attr PSize  PFree
  /dev/md2   vg1000 lvm2 a-   13.62T    0
  /dev/md3   vg1000 lvm2 a-    4.55T    0
  /dev/md4   vg1000 lvm2 a-    7.28T    0
GIZNAS01> vgs
  VG     #PV #LV #SN Attr   VSize  VFree
  vg1000   3   1   0 wz--n- 25.45T    0
GIZNAS01> lvs
  LV   VG     Attr   LSize  Origin Snap%  Move Log Copy%  Convert
  lv   vg1000 -wi--- 25.45T
GIZNAS01>
 

GIZNAS01> vgchange -ay vg1000
  1 logical volume(s) in volume group "vg1000" now active
GIZNAS01>
 

GIZNAS01> pvdisplay
  --- Physical volume ---
  PV Name               /dev/md2
  VG Name               vg1000
  PV Size               13.62 TB / not usable 320.00 KB
  Allocatable           yes (but full)
  PE Size (KByte)       4096
  Total PE              3571088
  Free PE               0
  Allocated PE          3571088
  PV UUID               HO2fWh-RA9j-oqGv-kiCI-z9pY-41R1-9IJbgt

  --- Physical volume ---
  PV Name               /dev/md3
  VG Name               vg1000
  PV Size               4.55 TB / not usable 3.94 MB
  Allocatable           yes (but full)
  PE Size (KByte)       4096
  Total PE              1192312
  Free PE               0
  Allocated PE          1192312
  PV UUID               nodvRa-r0cq-NjRs-eW9K-xEuO-YPUT-t7NHbP

  --- Physical volume ---
  PV Name               /dev/md4
  VG Name               vg1000
  PV Size               7.28 TB / not usable 3.88 MB
  Allocatable           yes (but full)
  PE Size (KByte)       4096
  Total PE              1907190
  Free PE               0
  Allocated PE          1907190
  PV UUID               8qIF1n-Wf00-CD79-M3M5-Dx5l-Lsx2-HZ9qle

 

GIZNAS01> vgdisplay
  --- Volume group ---
  VG Name               vg1000
  System ID
  Format                lvm2
  Metadata Areas        3
  Metadata Sequence No  23
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                1
  Open LV               0
  Max PV                0
  Cur PV                3
  Act PV                3
  VG Size               25.45 TB
  PE Size               4.00 MB
  Total PE              6670590
  Alloc PE / Size       6670590 / 25.45 TB
  Free  PE / Size       0 / 0
  VG UUID               BTXwFZ-flEc-6vgu-qTpk-82eW-k6wQ-NNha6o

 

GIZNAS01> lvdisplay
  --- Logical volume ---
  LV Name                /dev/vg1000/lv
  VG Name                vg1000
  LV UUID                6IB6tO-tatM-LsAF-1ll1-yOnx-V2D1-KkFuIG
  LV Write Access        read/write
  LV Status              available
  # open                 0
  LV Size                25.45 TB
  Current LE             6670590
  Segments               3
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     384
  Block device           253:0
 


GIZNAS01> cat /etc/fstab
none /proc proc defaults 0 0
/dev/root / ext4 defaults 1 1
/dev/vg1000/lv /volume1 ext4  0 0
GIZNAS01>
 

 

seems to be getting closer.

 

 

Edited by gizmomelb
Link to post
Share on other sites
  • 0

good news!!  yes I rebooted and the volume mounts and my data is there (whether it is intact is another thing, but it should be!)

 

I don't know if this is too early but thank you so much for you help recovering the volume.

 

Link to post
Share on other sites
  • 0

Excellent, very glad it seems to be working out given the circumstances.

 

My advice after a corruption and recovery event like this is always to offload all the data, delete everything (volume, storage pool, shr, etc) and then rebuild it from scratch.  Otherwise something we did not find may bite you in the future.

 

You may want to do this anyway in order to get more space as detailed prior.

Link to post
Share on other sites
  • 0

I know a backup would be best but I don't have the storage space to be able to do that (data is non essential, but a pain to have to re-rip all my DVDs, CDs, blurays etc. and at least another few months of work).

 

If it's possible to expand the 4TB rebuilt partition to the 10TB capacity of the actual replacement drive it'd be a nice win.

But also many, many thanks for sharing your time and knowledge helping me out and for my learning a little more how mdadm handles LVs and VGs.

 

Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Answer this question...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.