Storage Pool Crashed

pdavey · February 23, 2021

I have a Synology DS415+ and the storage pool crashed without warning.

All drives were showing Healthy and I need help identifying what the problem is and how to recover the data (if possible).

I was going to try what Synology suggest and load Ubuntu on my PC and try and restore the volume using the following.

root@ubuntu:~$ mdadm -Asf && vgchange -ay

$ mount ${device_path} ${mount_point} -o ro

Before attempting this, I came across this forum and read some of the advice given to others about putting the drives back into the NAS and repairing the volume locally.

Any advice or guidance on my journey to recover the data and identify the problem would be gratefully received.

My NAS is configured as follows:

Storage Pool 1 single 1.81 TB Drive 4 Healthy

Storage Pool 2 three Drives - Crashed

Drive 1. - 4.5TB System Partition Failed (40 Bad Sectors) passed Extended SMART

Drive 2 - 5.5 TB Healthy

Drive 3 – 2.7TB System Partition Failed

root@DS415:~# cat /proc/mdstat

Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]

md2 : active raid5 sdc5[3] sdb5[5]

3897366528 blocks super 1.2 level 5, 64k chunk, algorithm 2 [3/2] [U_U]

md4 : active raid5 sdc6[0] sdb6[1]

1953485824 blocks super 1.2 level 5, 64k chunk, algorithm 2 [3/2] [UU_]

md3 : active raid1 sdb7[0]

1953494912 blocks super 1.2 [2/1] [U_]

md5 : active raid1 sdd5[0]

1948683456 blocks super 1.2 [1/1] [U]

md1 : active raid1 sda2[0] sdb2[1] sdc2[2] sdd2[3]

2097088 blocks [4/4] [UUUU]

md0 : active raid1 sdb1[3] sdd1[2]

2490176 blocks [4/2] [__UU]

unused devices: <none>

pdavey · February 23, 2021

root@DS415:~# mdadm --detail /dev/md2

/dev/md2:

Version : 1.2

Creation Time : Mon Apr 20 12:20:53 2015

Raid Level : raid5

Array Size : 3897366528 (3716.82 GiB 3990.90 GB)

Used Dev Size : 1948683264 (1858.41 GiB 1995.45 GB)

Raid Devices : 3

Total Devices : 2

Persistence : Superblock is persistent

Update Time : Sun Feb 21 18:23:01 2021

State : clean, degraded

Active Devices : 2

Working Devices : 2

Failed Devices : 0

Spare Devices : 0

Layout : left-symmetric

Chunk Size : 64K

Name : DS415:2 (local to host DS415)

UUID : 20106822:98678da8:508d800e:b196f334

Events : 610085

Number Major Minor RaidDevice State

3 8 37 0 active sync /dev/sdc5

- 0 0 1 removed

5 8 21 2 active sync /dev/sdb5

pdavey · February 23, 2021

root@DS415:/# fdisk -l /dev/sda
Disk /dev/sda: 4.6 TiB, 5000981078016 bytes, 9767541168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: 3AB98EE9-A4F3-4746-B2BC-AA0BEA61A05B

Device Start End Sectors Size Type
/dev/sda1 2048 4982527 4980480 2.4G Linux RAID
/dev/sda2 4982528 9176831 4194304 2G Linux RAID
/dev/sda5 9453280 3906822239 3897368960 1.8T Linux RAID
/dev/sda6 3906838336 5860326239 1953487904 931.5G Linux RAID
/dev/sda7 5860342336 9767334239 3906991904 1.8T Linux RAID

root@DS415:/# fdisk -l /dev/sdb
Disk /dev/sdb: 5.5 TiB, 6001175126016 bytes, 11721045168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: F27E8497-B367-4862-864C-39FDB67E8EB2

Device Start End Sectors Size Type
/dev/sdb1 2048 4982527 4980480 2.4G Linux RAID
/dev/sdb2 4982528 9176831 4194304 2G Linux RAID
/dev/sdb5 9453280 3906822239 3897368960 1.8T Linux RAID
/dev/sdb6 3906838336 5860326239 1953487904 931.5G Linux RAID
/dev/sdb7 5860342336 9767334239 3906991904 1.8T Linux RAID

root@DS415:/# fdisk -l /dev/sdc
Disk /dev/sdc: 2.7 TiB, 3000592982016 bytes, 5860533168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: 604C99D8-6C7A-4465-91A4-591796CEBE56

Device Start End Sectors Size Type
/dev/sdc1 2048 4982527 4980480 2.4G Linux RAID
/dev/sdc2 4982528 9176831 4194304 2G Linux RAID
/dev/sdc5 9453280 3906822239 3897368960 1.8T Linux RAID
/dev/sdc6 3906838336 5860326239 1953487904 931.5G Linux RAID

flyride · February 23, 2021

So looking this over (nice forensic data gathering, by the way), it would appear that /dev/sda disconnected or otherwise became unavailable to the system.

You have three dissimilar drives, which has resulted in SHR creating three different arrays (/dev/md2, /dev/md3, /dev/md4) to maximize the space available (high complexity). All three arrays have to be working and healthy for your lvm to bind them, and your volume to mount.

So please do a mdadm --detail on /dev/md3 and /dev/md4 as well and post those.

pdavey · February 23, 2021

Sorry, I'd already done them but forgot to include them.

I got side tracked reading the documentation so I could understand whats going on.

I figured out it was /dev/sda and I can see its missing from both /dev/md2 and /dev/md4 but I dont understand how its linked to /dev/md3 which is Raid 1?

Is one array using partitions {sda5, sdb5, sdc5} (1.8T), another {sda6,sdb6,sdc6} (931G) and another {sda7, sdb7} (1.8T) and another

I am bit lost now, does each one have a separate volume and the total combined is the pool?

root@DS415:~# mdadm --detail /dev/md4

/dev/md4:

Version : 1.2

Creation Time : Mon Dec 2 08:31:27 2019

Raid Level : raid5

Array Size : 1953485824 (1862.99 GiB 2000.37 GB)

Used Dev Size : 976742912 (931.49 GiB 1000.18 GB)

Raid Devices : 3

Total Devices : 2

Persistence : Superblock is persistent

Update Time : Sun Feb 21 18:23:01 2021

State : clean, degraded

Active Devices : 2

Working Devices : 2

Failed Devices : 0

Spare Devices : 0

Layout : left-symmetric

Chunk Size : 64K

Name : DS415:4 (local to host DS415)

UUID : 0f8073d8:3666a524:faf4218d:785d611c

Events : 12147

Number Major Minor RaidDevice State

0 8 38 0 active sync /dev/sdc6

1 8 22 1 active sync /dev/sdb6

- 0 0 2 removed

root@DS415:~#

root@DS415:~# mdadm --detail /dev/md3

/dev/md3:

Version : 1.2

Creation Time : Tue Dec 10 02:52:19 2019

Raid Level : raid1

Array Size : 1953494912 (1863.00 GiB 2000.38 GB)

Used Dev Size : 1953494912 (1863.00 GiB 2000.38 GB)

Raid Devices : 2

Total Devices : 1

Persistence : Superblock is persistent

Update Time : Sun Feb 21 18:22:58 2021

State : clean, degraded

Active Devices : 1

Working Devices : 1

Failed Devices : 0

Spare Devices : 0

Name : DS415:3 (local to host DS415)

UUID : a508b67e:6933bca7:1bf77190:96030000

Events : 32

Number Major Minor RaidDevice State

0 8 23 0 active sync /dev/sdb7

- 0 0 1 removed

pdavey · February 23, 2021

I think I understand the RAID1 thing now, the most efficient way to put RAID5 over two drives is just mirroring.

So that explains the removed drive from /dev/md3 its /dev/sda7

Why didn't the NAS recognise the healthy drive sda when I put it back in. Is there a flag to tell the system its dirty?

flyride · February 23, 2021

This grid might help you see things more clearly:

md3 isn't a RAID5, it's a RAID1. 0.9TiB of space is wasted on /dev/sdb as the largest drive, since there is nowhere available to replicate the data.

md0 is the DSM operating system spanned (via RAID1) across all disks, and md1 is the Linux swap partition similarly configured. The "System Partition" error is because there are two members that are not participating in md0 that should be. On the Storage Manager Overview screen, there should be a button to repair the System Partition and those members will be restored.

I am not 100% certain why your Storage Pool 2 is indicating crashed as all the members are clean and degraded.

After /dev/sda went offline for whatever reason, if the arrays are written to, they are no longer consistent with the missing disk so it won't be reinserted automatically. All writes are serialized so that the system knows this for sure. If we have an ARRAY that is crashed because there are not enough members to start it, we can evaluate the serials and determine how much risk we will incur by forcing it back into service. But since you have a consistent and operating array set (albeit degraded) we should try and start the Storage Pool and give you an opportunity to offload your data before doing anything else.

So if you don't have ssh access to your system already, please turn it on. Then try the following (obviously stop if you see something you don't like):

$ sudo -i

# vgchange -ay

# mount

And post the results of each.

Edited February 23, 2021 by flyride

pdavey · February 23, 2021

root@DS415:/# vgchange -ay
1 logical volume(s) in volume group "vg1001" now active
Couldn't find device with uuid bocvSr-hmj0-LUH0-BM8g-BXBS-TicT-LbjYQ9.
Refusing activation of partial LV vg1000/lv. Use '--activationmode partial' to override.
0 logical volume(s) in volume group "vg1000" now active

flyride · February 23, 2021

Well, that would be why it is crashed. Some more forensic investigation is in order:

# pvs

# vgs

# lvs

# pvdisplay

# vgdisplay

# lvdisplay

flyride · February 23, 2021

Also, why you are at it:

# cat /etc/fstab

pdavey · February 23, 2021

root@DS415:/# pvs
Couldn't find device with uuid bocvSr-hmj0-LUH0-BM8g-BXBS-TicT-LbjYQ9.
PV VG Fmt Attr PSize PFree
/dev/md2 vg1000 lvm2 a-- 3.63t 0
/dev/md4 vg1000 lvm2 a-- 1.82t 0
/dev/md5 vg1001 lvm2 a-- 1.81t 0
unknown device vg1000 lvm2 a-m 1.82t 0

root@DS415:/# vgs
Couldn't find device with uuid bocvSr-hmj0-LUH0-BM8g-BXBS-TicT-LbjYQ9.
VG #PV #LV #SN Attr VSize VFree
vg1000 3 1 0 wz-pn- 7.27t 0
vg1001 1 1 0 wz--n- 1.81t 0

root@DS415:/# lvs
Couldn't find device with uuid bocvSr-hmj0-LUH0-BM8g-BXBS-TicT-LbjYQ9.
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
lv vg1000 -wi-----p- 7.27t
lv vg1001 -wi-ao---- 1.81t

root@DS415:/# pvdisplay
--- Physical volume ---
PV Name /dev/md5
VG Name vg1001
PV Size 1.81 TiB / not usable 3.19 MiB
Allocatable yes (but full)
PE Size 4.00 MiB
Total PE 475752
Free PE 0
Allocated PE 475752
PV UUID p7cJsO-la6l-vXp7-ga51-4ugu-he7H-SeoHgZ

Couldn't find device with uuid bocvSr-hmj0-LUH0-BM8g-BXBS-TicT-LbjYQ9.
--- Physical volume ---
PV Name /dev/md2
VG Name vg1000
PV Size 3.63 TiB / not usable 1.44 MiB
Allocatable yes (but full)
PE Size 4.00 MiB
Total PE 951505
Free PE 0
Allocated PE 951505
PV UUID 2NkG9U-9Rh6-5xFW-M1iM-GA0f-nbOd-aJHEUS

--- Physical volume ---
PV Name /dev/md4
VG Name vg1000
PV Size 1.82 TiB / not usable 128.00 KiB
Allocatable yes (but full)
PE Size 4.00 MiB
Total PE 476925
Free PE 0
Allocated PE 476925
PV UUID nolEeP-0392-QvMt-ZOkW-1JDr-COn4-QybVK7

--- Physical volume ---
PV Name unknown device
VG Name vg1000
PV Size 1.82 TiB / not usable 1.31 MiB
Allocatable yes (but full)
PE Size 4.00 MiB
Total PE 476927
Free PE 0
Allocated PE 476927
PV UUID bocvSr-hmj0-LUH0-BM8g-BXBS-TicT-LbjYQ9

root@DS415:/# vgdisplay
--- Volume group ---
VG Name vg1001
System ID
Format lvm2
Metadata Areas 1
Metadata Sequence No 2
VG Access read/write
VG Status resizable
MAX LV 0
Cur LV 1
Open LV 1
Max PV 0
Cur PV 1
Act PV 1
VG Size 1.81 TiB
PE Size 4.00 MiB
Total PE 475752
Alloc PE / Size 475752 / 1.81 TiB
Free PE / Size 0 / 0
VG UUID pHhunz-cg0H-Fkcg-na1y-AAcT-D9fU-gdDTet

Couldn't find device with uuid bocvSr-hmj0-LUH0-BM8g-BXBS-TicT-LbjYQ9.
--- Volume group ---
VG Name vg1000
System ID
Format lvm2
Metadata Areas 2
Metadata Sequence No 12
VG Access read/write
VG Status resizable
MAX LV 0
Cur LV 1
Open LV 0
Max PV 0
Cur PV 3
Act PV 2
VG Size 7.27 TiB
PE Size 4.00 MiB
Total PE 1905357
Alloc PE / Size 1905357 / 7.27 TiB
Free PE / Size 0 / 0
VG UUID kPgiVt-X4fO-Eoxr-f0GL-rsKm-s4fE-Zl6u4Z

root@DS415:/# lvdisplay
--- Logical volume ---
LV Path /dev/vg1001/lv
LV Name lv
VG Name vg1001
LV UUID Pl33so-ldeW-HS2w-QGeE-3Zwh-QLuG-pqC1TE
LV Write Access read/write
LV Creation host, time ,
LV Status available
# open 1
LV Size 1.81 TiB
Current LE 475752
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 4096
Block device 253:0

Couldn't find device with uuid bocvSr-hmj0-LUH0-BM8g-BXBS-TicT-LbjYQ9.
--- Logical volume ---
LV Path /dev/vg1000/lv
LV Name lv
VG Name vg1000
LV UUID Zqg7q0-2u5X-oQcl-ejyL-zh1Y-iUA5-531Ls9
LV Write Access read/write
LV Creation host, time ,
LV Status NOT available
LV Size 7.27 TiB
Current LE 1905357
Segments 4
Allocation inherit
Read ahead sectors auto

root@DS415:/# cat /etc/fstab
none /proc proc defaults 0 0
/dev/root / ext4 defaults 1 1
/dev/vg1000/lv /volume1 btrfs auto_reclaim_space,synoacl,relatime 0 0
/dev/vg1001/lv /volume2 btrfs auto_reclaim_space,synoacl,relatime 0 0
root@DS415:/#

Edited February 23, 2021 by pdavey

pdavey · February 23, 2021

I am guessing the PV Name should be /dev/md3

PV Name unknown device
VG Name vg1000
PV Size 1.82 TiB / not usable 1.31 MiB
Allocatable yes (but full)
PE Size 4.00 MiB
Total PE 476927
Free PE 0
Allocated PE 476927
PV UUID bocvSr-hmj0-LUH0-BM8g-BXBS-TicT-LbjYQ9

pdavey · February 23, 2021

What does the -m flag signify on the PV?

root@DS415:/# pvs
Couldn't find device with uuid bocvSr-hmj0-LUH0-BM8g-BXBS-TicT-LbjYQ9.
PV VG Fmt Attr PSize PFree
/dev/md2 vg1000 lvm2 a-- 3.63t 0
/dev/md4 vg1000 lvm2 a-- 1.82t 0
/dev/md5 vg1001 lvm2 a-- 1.81t 0
unknown device vg1000 lvm2 a-m 1.82t 0

flyride · February 23, 2021

m signifies "missing" - so it thinks it should have a device with that UUID, but can't find it. The pvname comes from the device itself, that's why it's "unknown."

Have you rebooted the NAS since this happened? We have a couple of options to try to get /dev/md3 back into working order. First, let's let's see if lvm can figure it out with a scan of devices.

# lvm pvscan

Please post the results as usual.

flyride · February 23, 2021

Also, look for files in /etc/lvm/backup and post them as attachments here if you find any.

pdavey · February 24, 2021

Yes I have rebooted the NAS.

I took out sda and rebooted.

Then put it back and rebooted to see if it would offer me a repair.

root@DS415:/# lvm pvscan
Couldn't find device with uuid bocvSr-hmj0-LUH0-BM8g-BXBS-TicT-LbjYQ9.
PV /dev/md5 VG vg1001 lvm2 [1.81 TiB / 0 free]
PV /dev/md2 VG vg1000 lvm2 [3.63 TiB / 0 free]
PV /dev/md4 VG vg1000 lvm2 [1.82 TiB / 0 free]
PV unknown device VG vg1000 lvm2 [1.82 TiB / 0 free]
Total: 4 [9.08 TiB] / in use: 4 [9.08 TiB] / in no VG: 0 [0 ]

root@DS415:/etc/lvm/backup# dir
total 20
drwxr-xr-x 2 root root 4096 Dec 10 2019 .
drwxr-xr-x 5 root root 4096 May 27 2020 ..
-rw-r--r-- 1 root root 2261 Dec 10 2019 vg1000
-rw-r--r-- 1 root root 1215 Dec 10 2019 vg1001
-rw-r--r-- 1 root root 1422 Apr 10 2015 vg3

root@DS415:/etc/lvm/backup# cat vg1000
# Generated by LVM2 version 2.02.132(2)-git (2015-09-22): Tue Dec 10 02:52:40 2019

contents = "Text Format Volume Group"
version = 1

description = "Created *after* executing '/sbin/lvextend --alloc inherit /dev/vg1000/lv -l100%VG'"

creation_host = "DS415" # Linux DS415 3.10.105 #24922 SMP Wed Jul 3 16:37:24 CST 2019 x86_64
creation_time = 1575946360 # Tue Dec 10 02:52:40 2019

vg1000 {
id = "kPgiVt-X4fO-Eoxr-f0GL-rsKm-s4fE-Zl6u4Z"
seqno = 12
format = "lvm2" # informational
status = ["RESIZEABLE", "READ", "WRITE"]
flags = []
extent_size = 8192 # 4 Megabytes
max_lv = 0
max_pv = 0
metadata_copies = 0

physical_volumes {

pv0 {
id = "2NkG9U-9Rh6-5xFW-M1iM-GA0f-nbOd-aJHEUS"
device = "/dev/md2" # Hint only

status = ["ALLOCATABLE"]
flags = []
dev_size = 7794731904 # 3.6297 Terabytes
pe_start = 1152
pe_count = 951505 # 3.6297 Terabytes
}

pv1 {
id = "nolEeP-0392-QvMt-ZOkW-1JDr-COn4-QybVK7"
device = "/dev/md4" # Hint only

status = ["ALLOCATABLE"]
flags = []
dev_size = 3906970496 # 1.81932 Terabytes
pe_start = 1152
pe_count = 476925 # 1.81932 Terabytes
}

pv2 {
id = "bocvSr-hmj0-LUH0-BM8g-BXBS-TicT-LbjYQ9"
device = "/dev/md3" # Hint only

status = ["ALLOCATABLE"]
flags = []
dev_size = 3906988672 # 1.81933 Terabytes
pe_start = 1152
pe_count = 476927 # 1.81933 Terabytes
}
}

logical_volumes {

lv {
id = "Zqg7q0-2u5X-oQcl-ejyL-zh1Y-iUA5-531Ls9"
status = ["READ", "WRITE", "VISIBLE"]
flags = []
segment_count = 4

segment1 {
start_extent = 0
extent_count = 951505 # 3.6297 Terabytes

type = "striped"
stripe_count = 1 # linear

stripes = [
"pv0", 0
]
}
segment2 {
start_extent = 951505
extent_count = 238462 # 931.492 Gigabytes

type = "striped"
stripe_count = 1 # linear

stripes = [
"pv1", 0
]
}
segment3 {
start_extent = 1189967
extent_count = 476927 # 1.81933 Terabytes

type = "striped"
stripe_count = 1 # linear

stripes = [
"pv2", 0
]
}
segment4 {
start_extent = 1666894
extent_count = 238463 # 931.496 Gigabytes

type = "striped"
stripe_count = 1 # linear

stripes = [
"pv1", 238462
]
}
}
}
}

root@DS415:/etc/lvm/backup# cat vg1001
# Generated by LVM2 version 2.02.132(2)-git (2015-09-22): Tue Dec 10 10:09:16 2019

contents = "Text Format Volume Group"
version = 1

description = "Created *after* executing '/sbin/lvcreate /dev/vg1001 -n lv -l100%FREE'"

creation_host = "DS415" # Linux DS415 3.10.105 #24922 SMP Wed Jul 3 16:37:24 CST 2019 x86_64
creation_time = 1575972556 # Tue Dec 10 10:09:16 2019

vg1001 {
id = "pHhunz-cg0H-Fkcg-na1y-AAcT-D9fU-gdDTet"
seqno = 2
format = "lvm2" # informational
status = ["RESIZEABLE", "READ", "WRITE"]
flags = []
extent_size = 8192 # 4 Megabytes
max_lv = 0
max_pv = 0
metadata_copies = 0

physical_volumes {

pv0 {
id = "p7cJsO-la6l-vXp7-ga51-4ugu-he7H-SeoHgZ"
device = "/dev/md5" # Hint only

status = ["ALLOCATABLE"]
flags = []
dev_size = 3897366912 # 1.81485 Terabytes
pe_start = 1152
pe_count = 475752 # 1.81485 Terabytes
}
}

logical_volumes {

lv {
id = "Pl33so-ldeW-HS2w-QGeE-3Zwh-QLuG-pqC1TE"
status = ["READ", "WRITE", "VISIBLE"]
flags = []
segment_count = 1

segment1 {
start_extent = 0
extent_count = 475752 # 1.81485 Terabytes

type = "striped"
stripe_count = 1 # linear

stripes = [
"pv0", 0
]
}
}
}
}

root@DS415:/etc/lvm/backup# cat vg3
# Generated by LVM2 version 2.02.38 (2008-06-11): Fri Apr 10 03:35:38 2015

contents = "Text Format Volume Group"
version = 1

description = "Created *after* executing '/sbin/lvcreate /dev/vg3 -n volume_3 -l100%FREE'"

creation_host = "DS415" # Linux DS415 3.2.40 #5022 SMP Wed Jan 7 14:19:49 CST 2015 x86_64
creation_time = 1428629738 # Fri Apr 10 03:35:38 2015

vg3 {
id = "B3OU2S-jllW-8UN6-M9s6-hVwO-q8aK-Lqu8rE"
seqno = 3
status = ["RESIZEABLE", "READ", "WRITE"]
extent_size = 8192 # 4 Megabytes
max_lv = 0
max_pv = 0

physical_volumes {

pv0 {
id = "3eYQCk-V8MJ-3LVt-m7Th-1oAM-CKWY-vJ8Txd"
device = "/dev/md4" # Hint only

status = ["ALLOCATABLE"]
dev_size = 5850870528 # 2.72452 Terabytes
pe_start = 1152
pe_count = 714217 # 2.72452 Terabytes
}
}

logical_volumes {

syno_vg_reserved_area {
id = "Epdama-h0ex-swu2-jsyB-nsVq-43rJ-hW4m4I"
status = ["READ", "WRITE", "VISIBLE"]
segment_count = 1

segment1 {
start_extent = 0
extent_count = 3 # 12 Megabytes

type = "striped"
stripe_count = 1 # linear

stripes = [
"pv0", 0
]
}
}

volume_3 {
id = "889ZKw-87Vv-nVY9-PvBk-ytwK-ZvJX-AaHcue"
status = ["READ", "WRITE", "VISIBLE"]
segment_count = 1

segment1 {
start_extent = 0
extent_count = 714214 # 2.72451 Terabytes

type = "striped"
stripe_count = 1 # linear

stripes = [
"pv0", 3
]
}
}
}
}

flyride · February 24, 2021

So we haven't done anything irreversible yet. But all possible steps going forward have us modifying something, which is potentially destructive.

I guess I should point out you have a real Synology device, and you have the option of engaging them for remote support - this is the sort of thing they can fix.

If you want to continue on your own using advice from some yahoo on the Internet, that's fine. The next thing to do is to force the lvm UUID back onto /dev/md3.

# pvcreate --uuid bocvSr-hmj0-LUH0-BM8g-BXBS-TicT-LbjYQ9 /dev/md3

# vgcfgrestore vg1000

# pvs

pdavey · February 24, 2021

Yahoo .. it is .

(I tried Synology but they refused to do anything using the CLI on the grounds I might do irreversible damage, as if! ) BTW they also told me my HDD werent compatible with my NAS that's when I went down the SMR route.

Fingers crossed

pdavey · February 24, 2021

root@DS415:/# pvcreate --uuid bocvSr-hmj0-LUH0-BM8g-BXBS-TicT-LbjYQ9 /dev/md3
--restorefile is required with --uuid
Run `pvcreate --help' for more information.

root@DS415:/# pvcreate --help
pvcreate: Initialize physical volume(s) for use by LVM

pvcreate
[--norestorefile]
[--restorefile file]
[--commandprofile ProfileName]
[-d|--debug]
[-f[f]|--force [--force]]
[-h|-?|--help]
[--labelsector sector]
[-M|--metadatatype 1|2]
[--pvmetadatacopies #copies]
[--bootloaderareasize BootLoaderAreaSize[bBsSkKmMgGtTpPeE]]
[--metadatasize MetadataSize[bBsSkKmMgGtTpPeE]]
[--dataalignment Alignment[bBsSkKmMgGtTpPeE]]
[--dataalignmentoffset AlignmentOffset[bBsSkKmMgGtTpPeE]]
[--setphysicalvolumesize PhysicalVolumeSize[bBsSkKmMgGtTpPeE]
[-t|--test]
[-u|--uuid uuid]
[-v|--verbose]
[-y|--yes]
[-Z|--zero {y|n}]
[--version]
PhysicalVolume [PhysicalVolume...]

flyride · February 24, 2021

hmm. synology compile is a little different than generic linux. I'm stepping away for a short while and will update when I get back.

pdavey · February 24, 2021

OK No Problem ... many thanks BTW.

I could take the drives out and mount on a Linux box if that would be better?

flyride · February 24, 2021

No, that is usually a mistake despite the many tutorials that seem to encourage it. If you can't get things to boot, that's one thing, but troubleshooting in situ is usually safest.

# blkid | grep "/dev/md."

pdavey · February 24, 2021

root@DS415:/# blkid | grep "/dev/md." /dev/md0: LABEL="1.42.6-5004" UUID="1dd621d5-e876-4e53-81e7-b9855ac902f0" TYPE="ext4"
/dev/md1: UUID="bf6d195b-d017-42af-ac08-5c33cf88fb75" TYPE="swap"
/dev/md5: UUID="p7cJsO-la6l-vXp7-ga51-4ugu-he7H-SeoHgZ" TYPE="LVM2_member"
/dev/md4: UUID="nolEeP-0392-QvMt-ZOkW-1JDr-COn4-QybVK7" TYPE="LVM2_member"
/dev/md2: UUID="2NkG9U-9Rh6-5xFW-M1iM-GA0f-nbOd-aJHEUS" TYPE="LVM2_member"

Forgive my ignorance but surely the whole point of a RAID is that it can cope with the loss of 1 drive.

Why then when I remove sda does it not give me the option to repair and rebuild the array on a new drive? Is its because sdc is missing also?

Also if we use PVCREATE does this create a physical partition on the disk or a logical volume inside a partition.

If its the former I am not sure how the HDD will do it without destroying the rest of the data on the disk.

Would it be better manually removing the Partition and remove the LV from the array

flyride · February 24, 2021

All reasonable questions. To recap:

RAID1 and RAID5 as technologies can cope with the loss of one drive. You have a SHR, which is a concatenation of three arrays (using lvm). This creates a stateful dependency among all three - a risk that is not present in a simple RAID1 or RAID5 array.

All three of your arrays are currently impacted due to /dev/sda failing in some way, but are (according to mdadm) all intact in degraded state. That has no bearing on the quality of the data inside each array (garbage in, garbage out), nor does it mean that the arrays are consistent in their stateful relationships. it just means that the mdadm job did not see any loss of integrity of each individual array.

If the lvm started up normally, you could just repair all three arrays with /dev/sda and be done with it. Unfortunately while /dev/md3 array says it is intact, it seems to be missing the UUID signature to participate in the lvm. This is a corruption of some type on /dev/md3 and is why the Volume is reporting crashed. The corruption could be minor or significant. And if we don't have an intact volume/filesystem, we do not want to try and repair the array otherwise we will sacrifice potentially usable redundant information.

So we are first trying to correct the /dev/md3 device in the hopes that it can be recognized by the lvm. The pvcreate command we are trying to use is not going to zero the disk, it will just write the missing metadata signature into the appropriate reserved area. Hopefully the rest of the data on the disk is intact. If so and we can then start the lvm and mount the filesystem, I will strongly advise you to offload/recover all files from the system at that moment. Afterward, if you are satisfied with your data recovery, you'll be advised to rebuild the entire array structure from scratch (delete Storage Pool and recreate) to ensure there are no other downstream issues from the corruption.

If we can't get lvm to accept /dev/md3 or if we cannot extract files from the filesystem, we can attempt to force the other member of /dev/md3 (/dev/sda7) into service. It was flagged as stale by mdadm, but we can verify exactly how far out of date it is, and more importantly, it may also not suffer from the corruption we are trying to fix.

If neither approach works to get to the filesystem, then the files that are on the /dev/md3 array are probably lost. Files contained in the other two arrays could be recovered with forensic recovery tools (typically expensive).

Here's some reference information if you'd like to validate yourself:

https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/4/html/cluster_logical_volume_manager/mdatarecover

So, if you want to proceed:

# pvcreate --uuid bocvSr-hmj0-LUH0-BM8g-BXBS-TicT-LbjYQ9 --restorefile /etc/lvm/backup/vg1000 /dev/md3

# vgcfgrestore vg1000

# pvs

Edited February 24, 2021 by flyride

pdavey · February 24, 2021

Thanks for the explanation, I'm not sure SHR was a wise choice., my ignorance has come back to bite me.

I was just reading the following and they confirm your wise advice.

https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/configuring_and_managing_logical_volumes/troubleshooting-lvm_configuring-and-managing-logical-volumes

Here goes

Storage Pool Crashed

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation