Crashed volume - SHR

gizmomelb · June 3, 2021

sigh... my NAS is now reporting a crashed volume.

if I do the following 'cat /proc/mdstat' it reports:

cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md3 : active raid5 sdf6[9] sda6[6] sdb6[7] sdc6[10] sdd6[2] sde6[8]
4883714560 blocks super 1.2 level 5, 64k chunk, algorithm 2 [6/6] [UUUUUU]

md4 : active raid1 sda7[0] sdb7[1]
7811854208 blocks super 1.2 [2/2] [UU]

md2 : active raid5 sda5[11] sde5[6] sdf5[7] sdd5[8] sdc5[9] sdb5[10]
14627177280 blocks super 1.2 level 5, 64k chunk, algorithm 2 [6/6] [UUUUUU]

md1 : active raid1 sda2[0] sdb2[1] sdc2[2] sdd2[3] sde2[4] sdf2[5]
2097088 blocks [12/6] [UUUUUU______]

md0 : active raid1 sda1[0] sdb1[1] sdc1[5] sdd1[3] sde1[2] sdf1[4]
2490176 blocks [12/6] [UUUUUU______]

unused devices: <none>

Any assistance or suggestions to articles to read would be most appreciated please.

Thank you.

flyride · June 4, 2021

19 minutes ago, flyride said:

I still don't know what exactly has been done that was destructive, and would hope to identify the action that caused the corruption before attempting any irreversible action for recovery.

This thread (starting with the linked post) details a specific recovery using lvm's backup. However, your underlying problem (which is not really known) is different than the the thread original poster. https://xpenology.com/forum/topic/41307-storage-pool-crashed/?do=findComment&comment=195342

The folder with the backup data is /etc/lvm/backup

The restore command involved is vgcfgrestore (or in your case lvm vgcfgrestore)

You are running older lvm (because of DSM 5.2) than this example so there might be other differences.

flyride · June 3, 2021

If your array is healthy, then you have a vg or filesystem issue. Make sure you know which filesystem type the volume was (btrfs or ext4). Dumping /etc/fstab will tell you.

This thread may help serve as a template for data recovery: https://xpenology.com/forum/topic/14337-volume-crash-after-4-months-of-stability

gizmomelb · June 3, 2021

hi Flyride I know I definitely have ext4 filesystem and if I SSH in I have no volumes listed. I do not have the 'dump' command.

vi /etc/fstab:

none /proc proc defaults 0 0
/dev/root / ext4 defaults 1 1
/dev/vg1000/lv /volume1 ext4 0 0

cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md3 : active raid5 sdf6[9] sda6[6] sdb6[7] sdc6[10] sdd6[2] sde6[8]
4883714560 blocks super 1.2 level 5, 64k chunk, algorithm 2 [6/6] [UUUUUU]

md4 : active raid1 sda7[0] sdb7[1]
7811854208 blocks super 1.2 [2/2] [UU]

md2 : active raid5 sda5[11] sde5[6] sdf5[7] sdd5[8] sdc5[9] sdb5[10]
14627177280 blocks super 1.2 level 5, 64k chunk, algorithm 2 [6/6] [UUUUUU]

md1 : active raid1 sda2[0] sdb2[1] sdc2[2] sdd2[3] sde2[4] sdf2[5]
2097088 blocks [12/6] [UUUUUU______]

md0 : active raid1 sda1[0] sdb1[1] sdc1[5] sdd1[3] sde1[2] sdf1[4]
2490176 blocks [12/6] [UUUUUU______]

unused devices: <none>

vgdisplay shows absolutely nothing.

I hope this helps?

thank you.

Edited June 3, 2021 by gizmomelb

flyride · June 3, 2021

Well generally, you need to validate the members of your volume group and start it - i.e. sudo vgchange -ay

When you do get it to start, then try to mount the volume per the fstab. If it won't mount, then fsck is the tool that is needed to attempt to fix your volume. Any post that references btrfs does not apply to you.

Investigate your pv/lv starting with this post, and really any linux lvm data recovery thread.

https://xpenology.com/forum/topic/14337-volume-crash-after-4-months-of-stability/?do=findComment&comment=107971

gizmomelb · June 3, 2021

Hi Flyride,

vgchange -ay displays nothing, just goes to the next command line.

I followed all the steps in the thread you mentioed and posted the results of the questions asked in that thread as well.

mount /dev/vg1000/lv /volume1
mount: open failed, msg:No such file or directory
mount: mounting /dev/vg1000/lv on /volume1 failed: No such device

mount -o clear_cache /dev/vg1000/lv /volume1
mount: open failed, msg:No such file or directory
mount: mounting /dev/vg1000/lv on /volume1 failed: No such device

mount -o recovery /dev/vg1000/lv /volume1
mount: open failed, msg:No such file or directory
mount: mounting /dev/vg1000/lv on /volume1 failed: No such device

fsck.ext4 -v /dev/vg1000/lv
e2fsck 1.42.6 (21-Sep-2012)
fsck.ext4: No such file or directory while trying to open /dev/vg1000/lv
Possibly non-existent device?

Looks like I need to rebuild the volume group?

Edited June 3, 2021 by gizmomelb

flyride · June 3, 2021

39 minutes ago, flyride said:

Well generally, you need to validate the members of your volume group

You kind of skipped the first step. Figure out what's wrong here, otherwise you are just blindly following random tasks and pushing buttons.

gizmomelb · June 3, 2021

Hi,

yes I am not a *nix expert.. I ran the vgchange -ay command it is literally does nothing, displays nothing and goes to the new command line

I tried vgscan but I do not have that command.

ok maybe I'm going in the wrong direction (please tell me if I am) but looking at this thread - https://community.synology.com/enu/forum/17/post/84956

the array seems to still be intact but has a size of '0'.

fdisk -l
fdisk: device has more than 2^32 sectors, can't use all of them

Disk /dev/sda: 2199.0 GB, 2199023255040 bytes
255 heads, 63 sectors/track, 267349 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System
/dev/sda1 1 267350 2147483647+ ee EFI GPT
fdisk: device has more than 2^32 sectors, can't use all of them

Disk /dev/sdb: 2199.0 GB, 2199023255040 bytes
255 heads, 63 sectors/track, 267349 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System
/dev/sdb1 1 267350 2147483647+ ee EFI GPT
fdisk: device has more than 2^32 sectors, can't use all of them

Disk /dev/sdc: 2199.0 GB, 2199023255040 bytes
255 heads, 63 sectors/track, 267349 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System
/dev/sdc1 1 267350 2147483647+ ee EFI GPT
fdisk: device has more than 2^32 sectors, can't use all of them

Disk /dev/sdd: 2199.0 GB, 2199023255040 bytes
255 heads, 63 sectors/track, 267349 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System
/dev/sdd1 1 267350 2147483647+ ee EFI GPT
fdisk: device has more than 2^32 sectors, can't use all of them

Disk /dev/sde: 2199.0 GB, 2199023255040 bytes
255 heads, 63 sectors/track, 267349 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System
/dev/sde1 1 267350 2147483647+ ee EFI GPT
fdisk: device has more than 2^32 sectors, can't use all of them

Disk /dev/sdf: 2199.0 GB, 2199023255040 bytes
255 heads, 63 sectors/track, 267349 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System
/dev/sdf1 1 267350 2147483647+ ee EFI GPT

Disk /dev/synoboot: 8382 MB, 8382316544 bytes
4 heads, 32 sectors/track, 127904 cylinders
Units = cylinders of 128 * 512 = 65536 bytes

Device Boot Start End Blocks Id System
/dev/synoboot1 * 1 384 24544+ e Win95 FAT16 (LBA)

cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md2 : active raid5 sda5[11] sde5[6] sdf5[7] sdd5[8] sdc5[9] sdb5[10]
14627177280 blocks super 1.2 level 5, 64k chunk, algorithm 2 [6/6] [UUUUUU]

md3 : active raid5 sdf6[9] sda6[6] sdb6[7] sdc6[10] sdd6[2] sde6[8]
4883714560 blocks super 1.2 level 5, 64k chunk, algorithm 2 [6/6] [UUUUUU]

md4 : active raid1 sda7[0] sdb7[1]
7811854208 blocks super 1.2 [2/2] [UU]

md1 : active raid1 sda2[0] sdb2[1] sdc2[2] sdd2[3] sde2[4] sdf2[5]
2097088 blocks [12/6] [UUUUUU______]

md0 : active raid1 sda1[0] sdb1[1] sdc1[5] sdd1[3] sde1[2] sdf1[4]
2490176 blocks [12/6] [UUUUUU______]

unused devices: <none>

mdadm --detail /dev/md2
/dev/md2:
Version : 1.2
Creation Time : Sat Aug 6 21:35:22 2016
Raid Level : raid5
Array Size : 14627177280 (13949.56 GiB 14978.23 GB)
Used Dev Size : 2925435456 (2789.91 GiB 2995.65 GB)
Raid Devices : 6
Total Devices : 6
Persistence : Superblock is persistent

Update Time : Fri Jun 4 12:39:36 2021
State : clean
Active Devices : 6
Working Devices : 6
Failed Devices : 0
Spare Devices : 0

Layout : left-symmetric
Chunk Size : 64K

Name : DiskStation:2
UUID : dc74ec9b:f2e2c601:4cf78ff1:38dc0feb
Events : 1211707

Number Major Minor RaidDevice State
11 8 5 0 active sync /dev/sda5
10 8 21 1 active sync /dev/sdb5
9 8 37 2 active sync /dev/sdc5
8 8 53 3 active sync /dev/sdd5
7 8 85 4 active sync /dev/sdf5
6 8 69 5 active sync /dev/sde5

mdadm --detail /dev/md3
/dev/md3:
Version : 1.2
Creation Time : Mon Jul 23 12:51:46 2018
Raid Level : raid5
Array Size : 4883714560 (4657.47 GiB 5000.92 GB)
Used Dev Size : 976742912 (931.49 GiB 1000.18 GB)
Raid Devices : 6
Total Devices : 6
Persistence : Superblock is persistent

Update Time : Fri Jun 4 12:40:26 2021
State : clean
Active Devices : 6
Working Devices : 6
Failed Devices : 0
Spare Devices : 0

Layout : left-symmetric
Chunk Size : 64K

Name : GIZNAS01:3 (local to host GIZNAS01)
UUID : 9717fe13:d84d4533:3a7153e2:17f9a9d0
Events : 434333

Number Major Minor RaidDevice State
9 8 86 0 active sync /dev/sdf6
8 8 70 1 active sync /dev/sde6
2 8 54 2 active sync /dev/sdd6
10 8 38 3 active sync /dev/sdc6
7 8 22 4 active sync /dev/sdb6
6 8 6 5 active sync /dev/sda6

mdadm --detail /dev/md4
/dev/md4:
Version : 1.2
Creation Time : Tue Dec 15 10:41:07 2020
Raid Level : raid1
Array Size : 7811854208 (7449.96 GiB 7999.34 GB)
Used Dev Size : 7811854208 (7449.96 GiB 7999.34 GB)
Raid Devices : 2
Total Devices : 2
Persistence : Superblock is persistent

Update Time : Fri Jun 4 12:40:31 2021
State : clean
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0

Name : GIZNAS01:4 (local to host GIZNAS01)
UUID : 65bfef18:22c472bd:99421137:9958e284
Events : 4

Number Major Minor RaidDevice State
0 8 7 0 active sync /dev/sda7
1 8 23 1 active sync /dev/sdb7

If I am missing the obvious, please let me know as it is not obvious to me sorry.

Thank you for your help and time.

Edited June 3, 2021 by gizmomelb

flyride · June 3, 2021

7 hours ago, gizmomelb said:

yes I am not a *nix expert.. I ran the vgchange -ay command it is literally does nothing, displays nothing and goes to the new command line

I tried vgscan but I do not have that command.

I understand. You really need to become a student of mdadm, lvm and ext4 in order for this to go well. I urge you to take the time and understand the commands you are typing in and not just blindly follow someone else's (potentially misinformed) process.

There many permutations of lvm commands, just do some googling. pvscan alias for "lvm pvscan" for example.

FWIW, vgchange -ay is the command that starts the logical volume if all its physical members are working. It would make your /dev/lv1000 device start working if the members were intact and functioning.

7 hours ago, gizmomelb said:

the array seems to still be intact but has a size of '0'.

Yes, your array appears fine (per your original post) so don't mess with that. The array is your Disk Group. That is the same size it always was. The DSM volume (filesystem) within it has a size of 0 because DSM cannot recognize a filesystem at the moment.

7 hours ago, gizmomelb said:

ok maybe I'm going in the wrong direction (please tell me if I am) but looking at this thread - https://community.synology.com/enu/forum/17/post/84956

There is a lot of stuff in that thread that has to do with the array, does not apply to you and would be quite dangerous if you followed it. Again, right now you need to figure out what is going on with the "physical" storage devices (/dev/md2, /dev/md3, /dev/md4) that are part of the lv. To be clear lv = logical volume. This is storage and not the DSM volume that lives within it and is called a "volume" within the UI.

The entire LVM metadata configuration can be recreated from an automatic backup that is saved within your /etc structure. But you have to figure out what is happening with the devices first.

Edited June 4, 2021 by flyride

flyride · June 3, 2021

Is this the same system? Did you do something to try and expand with your 10GB disk and toast your volume?

If so, you should explain exactly what was attempted and what happened. Also it would have been helpful if you had mentioned the system was a 5.2 build.

gizmomelb · June 4, 2021

Hi yes that was my post about expanding the volume before the volume crashed - I tried expanding the volume following your instructions here:

My apologies for not saying this was DSM 5.2 earlier - I don't know what I need to post and that is why I was asking for assistance. I was looking at similar issues across many forums but most solutions involved earlier versions of DSM with resize being available.

Edited June 4, 2021 by gizmomelb

flyride · June 4, 2021

Generally DSM automatically expands when physical drives are added.

The link you posted is a quite different use case using virtual drives on ESXi, where the space gets expanded by modifying the VM underlying storage, without adding drives.

So, do you know what was executing when it crashed? Also, any luck investigating the physical members of the lv - i.e. lvm pvscan

gizmomelb · June 4, 2021

Hi Flyride,

thank you for continuing to assist me - it's been a busy day but I will make the time to read up more about how mdadm works (it makes sense to me a little already).

I think the damage I caused was in step 10 or 11 as detailed here:

9. Inform lvm that the physical device got bigger.

$ sudo pvresize /dev/md2 Physical volume "/dev/md2" changed 1 physical volume(s) resized / 0 physical volume(s) not resized

If you re-run vgdisplay now, you should see some free space.

10. Extend the lv

$ sudo lvextend -l +100%FREE /dev/vg1/volume_1 Size of logical volume vg1/volume_1 changed from 15.39 GiB (3939 extents) to 20.48 GiB (5244 extents). Logical volume volume_1 successfully resized.

11. Finally, extend the filesystem (this is for ext4, there is a different command for btrfs)

$ sudo resize2fs -f /dev/vg1/volume_1

I was trying to extend the partition for sdc (the replacement 10TB drive for the orginal 4TB drive) as DSM didn't automatically resize the new hdd partition after the rebuild finished.

as requested:

lvm pvscan
PV /dev/md2 lvm2 [13.62 TB]
PV /dev/md3 lvm2 [4.55 TB]
PV /dev/md4 lvm2 [7.28 TB]
Total: 3 [25.45 TB] / in use: 0 [0 ] / in no VG: 3 [25.45 TB]

Thank you.

Edited June 4, 2021 by gizmomelb

gizmomelb · June 4, 2021

Ahh I just found a screenshot I made last night doesn't appear to be destructive though, just testing the filesystem for errors. This is what I had typed:

syno_poweroff_task -d

vgchange -ay

fsck.ext4 -pvf -C 0 /dev/vg1000/lv

then I executed these commands:

vgchange -an vg1000

sync

init 6

that looks to be it.. I deactivated the VG which explains why there isn't a VG when I execute vgs or vgdisplay etc.

To re-activate the VG I need to execute 'vgchange -ay vg1000' - but I'll wait until it is confirmed.

Thank you.

flyride · June 4, 2021

Ok, but we did a vgchange -ay some time ago. Try a lvm vgscan and lvm lvscan to see what is there first.

gizmomelb · June 4, 2021

GIZNAS01> lvm vgscan
Reading all physical volumes. This may take a while...
GIZNAS01>
GIZNAS01> lvm lvscan
GIZNAS01>

nothing

flyride · June 4, 2021

So if you review the response about trying to expand your physical volume on the other thread, you can see the broken logic.

DSM didn't automatically expand your SHR because it isn't possible with the SHR structure and rules.

Now understand that to the filesystem (ext4), this is all done under the covers:

/dev/md2 + /dev/md3 + /dev/md4 = continuous storage (vg1000) which ext4 then writes a filesystem upon.

The storage expansion procedure that you apparently were following is a special use case (which was stated in the thread header, along with admonishment for backup). A virtual disk can be grown in ESXi, but it does not get registered as a disk change event with DSM, so its automatic expansion doesn't occur. So my procedure adds storage to the end of an existing partition, which is relatively simple to inform lvm and DSM, and doesn't disturb any existing data.

I think essentially what you have attempted is to add storage in the middle of the lvm. DSM does this as part of its expansion logic, but I don't know exactly how it is accomplished. If done indiscriminately, this corrupts ext4 as parts of the filesystem with data changed LBA's. But we cannot even see your lv's so something else seems to have happened that seems more catastrophic.

Given that you want more space - if you delete and remake the SHR, you will get 32TB. If you have a backup of your data, that may be preferable.

If you don't have a backup, you might try and restore the lvm from its automatic backup. I still don't know what exactly has been done that was destructive, and would hope to identify the action that caused the corruption before attempting any irreversible action for recovery.

Edited June 4, 2021 by flyride

gizmomelb · June 4, 2021

where is the automatic backup and how do I restore it please? Thank you.

gizmomelb · June 4, 2021

GIZNAS01> vgcfgrestore vg1000
Restored volume group vg1000
GIZNAS01> lvm vgscan
Reading all physical volumes. This may take a while...
Found volume group "vg1000" using metadata type lvm2
GIZNAS01> lvm lvscan
inactive '/dev/vg1000/lv' [25.45 TB] inherit
GIZNAS01>

GIZNAS01> pvs
PV VG Fmt Attr PSize PFree
/dev/md2 vg1000 lvm2 a- 13.62T 0
/dev/md3 vg1000 lvm2 a- 4.55T 0
/dev/md4 vg1000 lvm2 a- 7.28T 0
GIZNAS01> vgs
VG #PV #LV #SN Attr VSize VFree
vg1000 3 1 0 wz--n- 25.45T 0
GIZNAS01> lvs
LV VG Attr LSize Origin Snap% Move Log Copy% Convert
lv vg1000 -wi--- 25.45T
GIZNAS01>

GIZNAS01> vgchange -ay vg1000
1 logical volume(s) in volume group "vg1000" now active
GIZNAS01>

GIZNAS01> pvdisplay
--- Physical volume ---
PV Name /dev/md2
VG Name vg1000
PV Size 13.62 TB / not usable 320.00 KB
Allocatable yes (but full)
PE Size (KByte) 4096
Total PE 3571088
Free PE 0
Allocated PE 3571088
PV UUID HO2fWh-RA9j-oqGv-kiCI-z9pY-41R1-9IJbgt

--- Physical volume ---
PV Name /dev/md3
VG Name vg1000
PV Size 4.55 TB / not usable 3.94 MB
Allocatable yes (but full)
PE Size (KByte) 4096
Total PE 1192312
Free PE 0
Allocated PE 1192312
PV UUID nodvRa-r0cq-NjRs-eW9K-xEuO-YPUT-t7NHbP

--- Physical volume ---
PV Name /dev/md4
VG Name vg1000
PV Size 7.28 TB / not usable 3.88 MB
Allocatable yes (but full)
PE Size (KByte) 4096
Total PE 1907190
Free PE 0
Allocated PE 1907190
PV UUID 8qIF1n-Wf00-CD79-M3M5-Dx5l-Lsx2-HZ9qle

GIZNAS01> vgdisplay
--- Volume group ---
VG Name vg1000
System ID
Format lvm2
Metadata Areas 3
Metadata Sequence No 23
VG Access read/write
VG Status resizable
MAX LV 0
Cur LV 1
Open LV 0
Max PV 0
Cur PV 3
Act PV 3
VG Size 25.45 TB
PE Size 4.00 MB
Total PE 6670590
Alloc PE / Size 6670590 / 25.45 TB
Free PE / Size 0 / 0
VG UUID BTXwFZ-flEc-6vgu-qTpk-82eW-k6wQ-NNha6o

GIZNAS01> lvdisplay
--- Logical volume ---
LV Name /dev/vg1000/lv
VG Name vg1000
LV UUID 6IB6tO-tatM-LsAF-1ll1-yOnx-V2D1-KkFuIG
LV Write Access read/write
LV Status available
# open 0
LV Size 25.45 TB
Current LE 6670590
Segments 3
Allocation inherit
Read ahead sectors auto
- currently set to 384
Block device 253:0

GIZNAS01> cat /etc/fstab
none /proc proc defaults 0 0
/dev/root / ext4 defaults 1 1
/dev/vg1000/lv /volume1 ext4 0 0
GIZNAS01>

seems to be getting closer.

Edited June 4, 2021 by gizmomelb

flyride · June 4, 2021

Try and reboot and see if your volume mounts.

gizmomelb · June 4, 2021

good news!! yes I rebooted and the volume mounts and my data is there (whether it is intact is another thing, but it should be!)

I don't know if this is too early but thank you so much for you help recovering the volume.

flyride · June 4, 2021

Excellent, very glad it seems to be working out given the circumstances.

My advice after a corruption and recovery event like this is always to offload all the data, delete everything (volume, storage pool, shr, etc) and then rebuild it from scratch. Otherwise something we did not find may bite you in the future.

You may want to do this anyway in order to get more space as detailed prior.

gizmomelb · June 4, 2021

I know a backup would be best but I don't have the storage space to be able to do that (data is non essential, but a pain to have to re-rip all my DVDs, CDs, blurays etc. and at least another few months of work).

If it's possible to expand the 4TB rebuilt partition to the 10TB capacity of the actual replacement drive it'd be a nice win.

But also many, many thanks for sharing your time and knowledge helping me out and for my learning a little more how mdadm handles LVs and VGs.

Polanskiman · October 12, 2022

The question(s) in this topic have been answered and/or the topic author has resolved their issue. This topic is now closed. If you have other questions, please open a new topic.

Crashed volume - SHR

Question

Link to comment

Share on other sites

23 answers to this question

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites