pdavey Posted February 23, 2021 Share #1 Posted February 23, 2021 I have a Synology DS415+ and the storage pool crashed without warning. All drives were showing Healthy and I need help identifying what the problem is and how to recover the data (if possible). I was going to try what Synology suggest and load Ubuntu on my PC and try and restore the volume using the following. root@ubuntu:~$ mdadm -Asf && vgchange -ay $ mount ${device_path} ${mount_point} -o ro Before attempting this, I came across this forum and read some of the advice given to others about putting the drives back into the NAS and repairing the volume locally. Any advice or guidance on my journey to recover the data and identify the problem would be gratefully received. My NAS is configured as follows: Storage Pool 1 single 1.81 TB Drive 4 Healthy Storage Pool 2 three Drives - Crashed Drive 1. - 4.5TB System Partition Failed (40 Bad Sectors) passed Extended SMART Drive 2 - 5.5 TB Healthy Drive 3 – 2.7TB System Partition Failed root@DS415:~# cat /proc/mdstat Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] md2 : active raid5 sdc5[3] sdb5[5] 3897366528 blocks super 1.2 level 5, 64k chunk, algorithm 2 [3/2] [U_U] md4 : active raid5 sdc6[0] sdb6[1] 1953485824 blocks super 1.2 level 5, 64k chunk, algorithm 2 [3/2] [UU_] md3 : active raid1 sdb7[0] 1953494912 blocks super 1.2 [2/1] [U_] md5 : active raid1 sdd5[0] 1948683456 blocks super 1.2 [1/1] [U] md1 : active raid1 sda2[0] sdb2[1] sdc2[2] sdd2[3] 2097088 blocks [4/4] [UUUU] md0 : active raid1 sdb1[3] sdd1[2] 2490176 blocks [4/2] [__UU] unused devices: <none> Quote Link to comment Share on other sites More sharing options...
pdavey Posted February 23, 2021 Author Share #2 Posted February 23, 2021 root@DS415:~# mdadm --detail /dev/md2 /dev/md2: Version : 1.2 Creation Time : Mon Apr 20 12:20:53 2015 Raid Level : raid5 Array Size : 3897366528 (3716.82 GiB 3990.90 GB) Used Dev Size : 1948683264 (1858.41 GiB 1995.45 GB) Raid Devices : 3 Total Devices : 2 Persistence : Superblock is persistent Update Time : Sun Feb 21 18:23:01 2021 State : clean, degraded Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 64K Name : DS415:2 (local to host DS415) UUID : 20106822:98678da8:508d800e:b196f334 Events : 610085 Number Major Minor RaidDevice State 3 8 37 0 active sync /dev/sdc5 - 0 0 1 removed 5 8 21 2 active sync /dev/sdb5 Quote Link to comment Share on other sites More sharing options...
pdavey Posted February 23, 2021 Author Share #3 Posted February 23, 2021 root@DS415:/# fdisk -l /dev/sda Disk /dev/sda: 4.6 TiB, 5000981078016 bytes, 9767541168 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes Disklabel type: gpt Disk identifier: 3AB98EE9-A4F3-4746-B2BC-AA0BEA61A05B Device Start End Sectors Size Type /dev/sda1 2048 4982527 4980480 2.4G Linux RAID /dev/sda2 4982528 9176831 4194304 2G Linux RAID /dev/sda5 9453280 3906822239 3897368960 1.8T Linux RAID /dev/sda6 3906838336 5860326239 1953487904 931.5G Linux RAID /dev/sda7 5860342336 9767334239 3906991904 1.8T Linux RAID root@DS415:/# fdisk -l /dev/sdb Disk /dev/sdb: 5.5 TiB, 6001175126016 bytes, 11721045168 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes Disklabel type: gpt Disk identifier: F27E8497-B367-4862-864C-39FDB67E8EB2 Device Start End Sectors Size Type /dev/sdb1 2048 4982527 4980480 2.4G Linux RAID /dev/sdb2 4982528 9176831 4194304 2G Linux RAID /dev/sdb5 9453280 3906822239 3897368960 1.8T Linux RAID /dev/sdb6 3906838336 5860326239 1953487904 931.5G Linux RAID /dev/sdb7 5860342336 9767334239 3906991904 1.8T Linux RAID root@DS415:/# fdisk -l /dev/sdc Disk /dev/sdc: 2.7 TiB, 3000592982016 bytes, 5860533168 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes Disklabel type: gpt Disk identifier: 604C99D8-6C7A-4465-91A4-591796CEBE56 Device Start End Sectors Size Type /dev/sdc1 2048 4982527 4980480 2.4G Linux RAID /dev/sdc2 4982528 9176831 4194304 2G Linux RAID /dev/sdc5 9453280 3906822239 3897368960 1.8T Linux RAID /dev/sdc6 3906838336 5860326239 1953487904 931.5G Linux RAID Quote Link to comment Share on other sites More sharing options...
flyride Posted February 23, 2021 Share #4 Posted February 23, 2021 So looking this over (nice forensic data gathering, by the way), it would appear that /dev/sda disconnected or otherwise became unavailable to the system. You have three dissimilar drives, which has resulted in SHR creating three different arrays (/dev/md2, /dev/md3, /dev/md4) to maximize the space available (high complexity). All three arrays have to be working and healthy for your lvm to bind them, and your volume to mount. So please do a mdadm --detail on /dev/md3 and /dev/md4 as well and post those. Quote Link to comment Share on other sites More sharing options...
pdavey Posted February 23, 2021 Author Share #5 Posted February 23, 2021 Sorry, I'd already done them but forgot to include them. I got side tracked reading the documentation so I could understand whats going on. I figured out it was /dev/sda and I can see its missing from both /dev/md2 and /dev/md4 but I dont understand how its linked to /dev/md3 which is Raid 1? Is one array using partitions {sda5, sdb5, sdc5} (1.8T), another {sda6,sdb6,sdc6} (931G) and another {sda7, sdb7} (1.8T) and another I am bit lost now, does each one have a separate volume and the total combined is the pool? root@DS415:~# mdadm --detail /dev/md4 /dev/md4: Version : 1.2 Creation Time : Mon Dec 2 08:31:27 2019 Raid Level : raid5 Array Size : 1953485824 (1862.99 GiB 2000.37 GB) Used Dev Size : 976742912 (931.49 GiB 1000.18 GB) Raid Devices : 3 Total Devices : 2 Persistence : Superblock is persistent Update Time : Sun Feb 21 18:23:01 2021 State : clean, degraded Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 64K Name : DS415:4 (local to host DS415) UUID : 0f8073d8:3666a524:faf4218d:785d611c Events : 12147 Number Major Minor RaidDevice State 0 8 38 0 active sync /dev/sdc6 1 8 22 1 active sync /dev/sdb6 - 0 0 2 removed root@DS415:~# root@DS415:~# mdadm --detail /dev/md3 /dev/md3: Version : 1.2 Creation Time : Tue Dec 10 02:52:19 2019 Raid Level : raid1 Array Size : 1953494912 (1863.00 GiB 2000.38 GB) Used Dev Size : 1953494912 (1863.00 GiB 2000.38 GB) Raid Devices : 2 Total Devices : 1 Persistence : Superblock is persistent Update Time : Sun Feb 21 18:22:58 2021 State : clean, degraded Active Devices : 1 Working Devices : 1 Failed Devices : 0 Spare Devices : 0 Name : DS415:3 (local to host DS415) UUID : a508b67e:6933bca7:1bf77190:96030000 Events : 32 Number Major Minor RaidDevice State 0 8 23 0 active sync /dev/sdb7 - 0 0 1 removed Quote Link to comment Share on other sites More sharing options...
pdavey Posted February 23, 2021 Author Share #6 Posted February 23, 2021 I think I understand the RAID1 thing now, the most efficient way to put RAID5 over two drives is just mirroring. So that explains the removed drive from /dev/md3 its /dev/sda7 Why didn't the NAS recognise the healthy drive sda when I put it back in. Is there a flag to tell the system its dirty? Quote Link to comment Share on other sites More sharing options...
flyride Posted February 23, 2021 Share #7 Posted February 23, 2021 (edited) This grid might help you see things more clearly: md3 isn't a RAID5, it's a RAID1. 0.9TiB of space is wasted on /dev/sdb as the largest drive, since there is nowhere available to replicate the data. md0 is the DSM operating system spanned (via RAID1) across all disks, and md1 is the Linux swap partition similarly configured. The "System Partition" error is because there are two members that are not participating in md0 that should be. On the Storage Manager Overview screen, there should be a button to repair the System Partition and those members will be restored. I am not 100% certain why your Storage Pool 2 is indicating crashed as all the members are clean and degraded. After /dev/sda went offline for whatever reason, if the arrays are written to, they are no longer consistent with the missing disk so it won't be reinserted automatically. All writes are serialized so that the system knows this for sure. If we have an ARRAY that is crashed because there are not enough members to start it, we can evaluate the serials and determine how much risk we will incur by forcing it back into service. But since you have a consistent and operating array set (albeit degraded) we should try and start the Storage Pool and give you an opportunity to offload your data before doing anything else. So if you don't have ssh access to your system already, please turn it on. Then try the following (obviously stop if you see something you don't like): $ sudo -i # vgchange -ay # mount And post the results of each. Edited February 23, 2021 by flyride Quote Link to comment Share on other sites More sharing options...
pdavey Posted February 23, 2021 Author Share #8 Posted February 23, 2021 root@DS415:/# vgchange -ay 1 logical volume(s) in volume group "vg1001" now active Couldn't find device with uuid bocvSr-hmj0-LUH0-BM8g-BXBS-TicT-LbjYQ9. Refusing activation of partial LV vg1000/lv. Use '--activationmode partial' to override. 0 logical volume(s) in volume group "vg1000" now active Quote Link to comment Share on other sites More sharing options...
flyride Posted February 23, 2021 Share #9 Posted February 23, 2021 Well, that would be why it is crashed. Some more forensic investigation is in order: # pvs # vgs # lvs # pvdisplay # vgdisplay # lvdisplay Quote Link to comment Share on other sites More sharing options...
flyride Posted February 23, 2021 Share #10 Posted February 23, 2021 Also, why you are at it: # cat /etc/fstab Quote Link to comment Share on other sites More sharing options...
pdavey Posted February 23, 2021 Author Share #11 Posted February 23, 2021 (edited) root@DS415:/# pvs Couldn't find device with uuid bocvSr-hmj0-LUH0-BM8g-BXBS-TicT-LbjYQ9. PV VG Fmt Attr PSize PFree /dev/md2 vg1000 lvm2 a-- 3.63t 0 /dev/md4 vg1000 lvm2 a-- 1.82t 0 /dev/md5 vg1001 lvm2 a-- 1.81t 0 unknown device vg1000 lvm2 a-m 1.82t 0 root@DS415:/# vgs Couldn't find device with uuid bocvSr-hmj0-LUH0-BM8g-BXBS-TicT-LbjYQ9. VG #PV #LV #SN Attr VSize VFree vg1000 3 1 0 wz-pn- 7.27t 0 vg1001 1 1 0 wz--n- 1.81t 0 root@DS415:/# lvs Couldn't find device with uuid bocvSr-hmj0-LUH0-BM8g-BXBS-TicT-LbjYQ9. LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert lv vg1000 -wi-----p- 7.27t lv vg1001 -wi-ao---- 1.81t root@DS415:/# pvdisplay --- Physical volume --- PV Name /dev/md5 VG Name vg1001 PV Size 1.81 TiB / not usable 3.19 MiB Allocatable yes (but full) PE Size 4.00 MiB Total PE 475752 Free PE 0 Allocated PE 475752 PV UUID p7cJsO-la6l-vXp7-ga51-4ugu-he7H-SeoHgZ Couldn't find device with uuid bocvSr-hmj0-LUH0-BM8g-BXBS-TicT-LbjYQ9. --- Physical volume --- PV Name /dev/md2 VG Name vg1000 PV Size 3.63 TiB / not usable 1.44 MiB Allocatable yes (but full) PE Size 4.00 MiB Total PE 951505 Free PE 0 Allocated PE 951505 PV UUID 2NkG9U-9Rh6-5xFW-M1iM-GA0f-nbOd-aJHEUS --- Physical volume --- PV Name /dev/md4 VG Name vg1000 PV Size 1.82 TiB / not usable 128.00 KiB Allocatable yes (but full) PE Size 4.00 MiB Total PE 476925 Free PE 0 Allocated PE 476925 PV UUID nolEeP-0392-QvMt-ZOkW-1JDr-COn4-QybVK7 --- Physical volume --- PV Name unknown device VG Name vg1000 PV Size 1.82 TiB / not usable 1.31 MiB Allocatable yes (but full) PE Size 4.00 MiB Total PE 476927 Free PE 0 Allocated PE 476927 PV UUID bocvSr-hmj0-LUH0-BM8g-BXBS-TicT-LbjYQ9 root@DS415:/# vgdisplay --- Volume group --- VG Name vg1001 System ID Format lvm2 Metadata Areas 1 Metadata Sequence No 2 VG Access read/write VG Status resizable MAX LV 0 Cur LV 1 Open LV 1 Max PV 0 Cur PV 1 Act PV 1 VG Size 1.81 TiB PE Size 4.00 MiB Total PE 475752 Alloc PE / Size 475752 / 1.81 TiB Free PE / Size 0 / 0 VG UUID pHhunz-cg0H-Fkcg-na1y-AAcT-D9fU-gdDTet Couldn't find device with uuid bocvSr-hmj0-LUH0-BM8g-BXBS-TicT-LbjYQ9. --- Volume group --- VG Name vg1000 System ID Format lvm2 Metadata Areas 2 Metadata Sequence No 12 VG Access read/write VG Status resizable MAX LV 0 Cur LV 1 Open LV 0 Max PV 0 Cur PV 3 Act PV 2 VG Size 7.27 TiB PE Size 4.00 MiB Total PE 1905357 Alloc PE / Size 1905357 / 7.27 TiB Free PE / Size 0 / 0 VG UUID kPgiVt-X4fO-Eoxr-f0GL-rsKm-s4fE-Zl6u4Z root@DS415:/# lvdisplay --- Logical volume --- LV Path /dev/vg1001/lv LV Name lv VG Name vg1001 LV UUID Pl33so-ldeW-HS2w-QGeE-3Zwh-QLuG-pqC1TE LV Write Access read/write LV Creation host, time , LV Status available # open 1 LV Size 1.81 TiB Current LE 475752 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 4096 Block device 253:0 Couldn't find device with uuid bocvSr-hmj0-LUH0-BM8g-BXBS-TicT-LbjYQ9. --- Logical volume --- LV Path /dev/vg1000/lv LV Name lv VG Name vg1000 LV UUID Zqg7q0-2u5X-oQcl-ejyL-zh1Y-iUA5-531Ls9 LV Write Access read/write LV Creation host, time , LV Status NOT available LV Size 7.27 TiB Current LE 1905357 Segments 4 Allocation inherit Read ahead sectors auto root@DS415:/# cat /etc/fstab none /proc proc defaults 0 0 /dev/root / ext4 defaults 1 1 /dev/vg1000/lv /volume1 btrfs auto_reclaim_space,synoacl,relatime 0 0 /dev/vg1001/lv /volume2 btrfs auto_reclaim_space,synoacl,relatime 0 0 root@DS415:/# Edited February 23, 2021 by pdavey Quote Link to comment Share on other sites More sharing options...
pdavey Posted February 23, 2021 Author Share #12 Posted February 23, 2021 I am guessing the PV Name should be /dev/md3 PV Name unknown device VG Name vg1000 PV Size 1.82 TiB / not usable 1.31 MiB Allocatable yes (but full) PE Size 4.00 MiB Total PE 476927 Free PE 0 Allocated PE 476927 PV UUID bocvSr-hmj0-LUH0-BM8g-BXBS-TicT-LbjYQ9 Quote Link to comment Share on other sites More sharing options...
pdavey Posted February 23, 2021 Author Share #13 Posted February 23, 2021 What does the -m flag signify on the PV? root@DS415:/# pvs Couldn't find device with uuid bocvSr-hmj0-LUH0-BM8g-BXBS-TicT-LbjYQ9. PV VG Fmt Attr PSize PFree /dev/md2 vg1000 lvm2 a-- 3.63t 0 /dev/md4 vg1000 lvm2 a-- 1.82t 0 /dev/md5 vg1001 lvm2 a-- 1.81t 0 unknown device vg1000 lvm2 a-m 1.82t 0 Quote Link to comment Share on other sites More sharing options...
flyride Posted February 23, 2021 Share #14 Posted February 23, 2021 m signifies "missing" - so it thinks it should have a device with that UUID, but can't find it. The pvname comes from the device itself, that's why it's "unknown." Have you rebooted the NAS since this happened? We have a couple of options to try to get /dev/md3 back into working order. First, let's let's see if lvm can figure it out with a scan of devices. # lvm pvscan Please post the results as usual. Quote Link to comment Share on other sites More sharing options...
flyride Posted February 23, 2021 Share #15 Posted February 23, 2021 Also, look for files in /etc/lvm/backup and post them as attachments here if you find any. Quote Link to comment Share on other sites More sharing options...
pdavey Posted February 24, 2021 Author Share #16 Posted February 24, 2021 Yes I have rebooted the NAS. I took out sda and rebooted. Then put it back and rebooted to see if it would offer me a repair. root@DS415:/# lvm pvscan Couldn't find device with uuid bocvSr-hmj0-LUH0-BM8g-BXBS-TicT-LbjYQ9. PV /dev/md5 VG vg1001 lvm2 [1.81 TiB / 0 free] PV /dev/md2 VG vg1000 lvm2 [3.63 TiB / 0 free] PV /dev/md4 VG vg1000 lvm2 [1.82 TiB / 0 free] PV unknown device VG vg1000 lvm2 [1.82 TiB / 0 free] Total: 4 [9.08 TiB] / in use: 4 [9.08 TiB] / in no VG: 0 [0 ] root@DS415:/etc/lvm/backup# dir total 20 drwxr-xr-x 2 root root 4096 Dec 10 2019 . drwxr-xr-x 5 root root 4096 May 27 2020 .. -rw-r--r-- 1 root root 2261 Dec 10 2019 vg1000 -rw-r--r-- 1 root root 1215 Dec 10 2019 vg1001 -rw-r--r-- 1 root root 1422 Apr 10 2015 vg3 root@DS415:/etc/lvm/backup# cat vg1000 # Generated by LVM2 version 2.02.132(2)-git (2015-09-22): Tue Dec 10 02:52:40 2019 contents = "Text Format Volume Group" version = 1 description = "Created *after* executing '/sbin/lvextend --alloc inherit /dev/vg1000/lv -l100%VG'" creation_host = "DS415" # Linux DS415 3.10.105 #24922 SMP Wed Jul 3 16:37:24 CST 2019 x86_64 creation_time = 1575946360 # Tue Dec 10 02:52:40 2019 vg1000 { id = "kPgiVt-X4fO-Eoxr-f0GL-rsKm-s4fE-Zl6u4Z" seqno = 12 format = "lvm2" # informational status = ["RESIZEABLE", "READ", "WRITE"] flags = [] extent_size = 8192 # 4 Megabytes max_lv = 0 max_pv = 0 metadata_copies = 0 physical_volumes { pv0 { id = "2NkG9U-9Rh6-5xFW-M1iM-GA0f-nbOd-aJHEUS" device = "/dev/md2" # Hint only status = ["ALLOCATABLE"] flags = [] dev_size = 7794731904 # 3.6297 Terabytes pe_start = 1152 pe_count = 951505 # 3.6297 Terabytes } pv1 { id = "nolEeP-0392-QvMt-ZOkW-1JDr-COn4-QybVK7" device = "/dev/md4" # Hint only status = ["ALLOCATABLE"] flags = [] dev_size = 3906970496 # 1.81932 Terabytes pe_start = 1152 pe_count = 476925 # 1.81932 Terabytes } pv2 { id = "bocvSr-hmj0-LUH0-BM8g-BXBS-TicT-LbjYQ9" device = "/dev/md3" # Hint only status = ["ALLOCATABLE"] flags = [] dev_size = 3906988672 # 1.81933 Terabytes pe_start = 1152 pe_count = 476927 # 1.81933 Terabytes } } logical_volumes { lv { id = "Zqg7q0-2u5X-oQcl-ejyL-zh1Y-iUA5-531Ls9" status = ["READ", "WRITE", "VISIBLE"] flags = [] segment_count = 4 segment1 { start_extent = 0 extent_count = 951505 # 3.6297 Terabytes type = "striped" stripe_count = 1 # linear stripes = [ "pv0", 0 ] } segment2 { start_extent = 951505 extent_count = 238462 # 931.492 Gigabytes type = "striped" stripe_count = 1 # linear stripes = [ "pv1", 0 ] } segment3 { start_extent = 1189967 extent_count = 476927 # 1.81933 Terabytes type = "striped" stripe_count = 1 # linear stripes = [ "pv2", 0 ] } segment4 { start_extent = 1666894 extent_count = 238463 # 931.496 Gigabytes type = "striped" stripe_count = 1 # linear stripes = [ "pv1", 238462 ] } } } } root@DS415:/etc/lvm/backup# cat vg1001 # Generated by LVM2 version 2.02.132(2)-git (2015-09-22): Tue Dec 10 10:09:16 2019 contents = "Text Format Volume Group" version = 1 description = "Created *after* executing '/sbin/lvcreate /dev/vg1001 -n lv -l100%FREE'" creation_host = "DS415" # Linux DS415 3.10.105 #24922 SMP Wed Jul 3 16:37:24 CST 2019 x86_64 creation_time = 1575972556 # Tue Dec 10 10:09:16 2019 vg1001 { id = "pHhunz-cg0H-Fkcg-na1y-AAcT-D9fU-gdDTet" seqno = 2 format = "lvm2" # informational status = ["RESIZEABLE", "READ", "WRITE"] flags = [] extent_size = 8192 # 4 Megabytes max_lv = 0 max_pv = 0 metadata_copies = 0 physical_volumes { pv0 { id = "p7cJsO-la6l-vXp7-ga51-4ugu-he7H-SeoHgZ" device = "/dev/md5" # Hint only status = ["ALLOCATABLE"] flags = [] dev_size = 3897366912 # 1.81485 Terabytes pe_start = 1152 pe_count = 475752 # 1.81485 Terabytes } } logical_volumes { lv { id = "Pl33so-ldeW-HS2w-QGeE-3Zwh-QLuG-pqC1TE" status = ["READ", "WRITE", "VISIBLE"] flags = [] segment_count = 1 segment1 { start_extent = 0 extent_count = 475752 # 1.81485 Terabytes type = "striped" stripe_count = 1 # linear stripes = [ "pv0", 0 ] } } } } root@DS415:/etc/lvm/backup# cat vg3 # Generated by LVM2 version 2.02.38 (2008-06-11): Fri Apr 10 03:35:38 2015 contents = "Text Format Volume Group" version = 1 description = "Created *after* executing '/sbin/lvcreate /dev/vg3 -n volume_3 -l100%FREE'" creation_host = "DS415" # Linux DS415 3.2.40 #5022 SMP Wed Jan 7 14:19:49 CST 2015 x86_64 creation_time = 1428629738 # Fri Apr 10 03:35:38 2015 vg3 { id = "B3OU2S-jllW-8UN6-M9s6-hVwO-q8aK-Lqu8rE" seqno = 3 status = ["RESIZEABLE", "READ", "WRITE"] extent_size = 8192 # 4 Megabytes max_lv = 0 max_pv = 0 physical_volumes { pv0 { id = "3eYQCk-V8MJ-3LVt-m7Th-1oAM-CKWY-vJ8Txd" device = "/dev/md4" # Hint only status = ["ALLOCATABLE"] dev_size = 5850870528 # 2.72452 Terabytes pe_start = 1152 pe_count = 714217 # 2.72452 Terabytes } } logical_volumes { syno_vg_reserved_area { id = "Epdama-h0ex-swu2-jsyB-nsVq-43rJ-hW4m4I" status = ["READ", "WRITE", "VISIBLE"] segment_count = 1 segment1 { start_extent = 0 extent_count = 3 # 12 Megabytes type = "striped" stripe_count = 1 # linear stripes = [ "pv0", 0 ] } } volume_3 { id = "889ZKw-87Vv-nVY9-PvBk-ytwK-ZvJX-AaHcue" status = ["READ", "WRITE", "VISIBLE"] segment_count = 1 segment1 { start_extent = 0 extent_count = 714214 # 2.72451 Terabytes type = "striped" stripe_count = 1 # linear stripes = [ "pv0", 3 ] } } } } Quote Link to comment Share on other sites More sharing options...
flyride Posted February 24, 2021 Share #17 Posted February 24, 2021 So we haven't done anything irreversible yet. But all possible steps going forward have us modifying something, which is potentially destructive. I guess I should point out you have a real Synology device, and you have the option of engaging them for remote support - this is the sort of thing they can fix. If you want to continue on your own using advice from some yahoo on the Internet, that's fine. The next thing to do is to force the lvm UUID back onto /dev/md3. # pvcreate --uuid bocvSr-hmj0-LUH0-BM8g-BXBS-TicT-LbjYQ9 /dev/md3 # vgcfgrestore vg1000 # pvs Quote Link to comment Share on other sites More sharing options...
pdavey Posted February 24, 2021 Author Share #18 Posted February 24, 2021 Yahoo .. it is . (I tried Synology but they refused to do anything using the CLI on the grounds I might do irreversible damage, as if! ) BTW they also told me my HDD werent compatible with my NAS that's when I went down the SMR route. Fingers crossed Quote Link to comment Share on other sites More sharing options...
pdavey Posted February 24, 2021 Author Share #19 Posted February 24, 2021 root@DS415:/# pvcreate --uuid bocvSr-hmj0-LUH0-BM8g-BXBS-TicT-LbjYQ9 /dev/md3 --restorefile is required with --uuid Run `pvcreate --help' for more information. root@DS415:/# pvcreate --help pvcreate: Initialize physical volume(s) for use by LVM pvcreate [--norestorefile] [--restorefile file] [--commandprofile ProfileName] [-d|--debug] [-f[f]|--force [--force]] [-h|-?|--help] [--labelsector sector] [-M|--metadatatype 1|2] [--pvmetadatacopies #copies] [--bootloaderareasize BootLoaderAreaSize[bBsSkKmMgGtTpPeE]] [--metadatasize MetadataSize[bBsSkKmMgGtTpPeE]] [--dataalignment Alignment[bBsSkKmMgGtTpPeE]] [--dataalignmentoffset AlignmentOffset[bBsSkKmMgGtTpPeE]] [--setphysicalvolumesize PhysicalVolumeSize[bBsSkKmMgGtTpPeE] [-t|--test] [-u|--uuid uuid] [-v|--verbose] [-y|--yes] [-Z|--zero {y|n}] [--version] PhysicalVolume [PhysicalVolume...] Quote Link to comment Share on other sites More sharing options...
flyride Posted February 24, 2021 Share #20 Posted February 24, 2021 hmm. synology compile is a little different than generic linux. I'm stepping away for a short while and will update when I get back. Quote Link to comment Share on other sites More sharing options...
pdavey Posted February 24, 2021 Author Share #21 Posted February 24, 2021 OK No Problem ... many thanks BTW. I could take the drives out and mount on a Linux box if that would be better? Quote Link to comment Share on other sites More sharing options...
flyride Posted February 24, 2021 Share #22 Posted February 24, 2021 No, that is usually a mistake despite the many tutorials that seem to encourage it. If you can't get things to boot, that's one thing, but troubleshooting in situ is usually safest. # blkid | grep "/dev/md." Quote Link to comment Share on other sites More sharing options...
pdavey Posted February 24, 2021 Author Share #23 Posted February 24, 2021 root@DS415:/# blkid | grep "/dev/md." /dev/md0: LABEL="1.42.6-5004" UUID="1dd621d5-e876-4e53-81e7-b9855ac902f0" TYPE="ext4" /dev/md1: UUID="bf6d195b-d017-42af-ac08-5c33cf88fb75" TYPE="swap" /dev/md5: UUID="p7cJsO-la6l-vXp7-ga51-4ugu-he7H-SeoHgZ" TYPE="LVM2_member" /dev/md4: UUID="nolEeP-0392-QvMt-ZOkW-1JDr-COn4-QybVK7" TYPE="LVM2_member" /dev/md2: UUID="2NkG9U-9Rh6-5xFW-M1iM-GA0f-nbOd-aJHEUS" TYPE="LVM2_member" Forgive my ignorance but surely the whole point of a RAID is that it can cope with the loss of 1 drive. Why then when I remove sda does it not give me the option to repair and rebuild the array on a new drive? Is its because sdc is missing also? Also if we use PVCREATE does this create a physical partition on the disk or a logical volume inside a partition. If its the former I am not sure how the HDD will do it without destroying the rest of the data on the disk. Would it be better manually removing the Partition and remove the LV from the array Quote Link to comment Share on other sites More sharing options...
flyride Posted February 24, 2021 Share #24 Posted February 24, 2021 (edited) All reasonable questions. To recap: RAID1 and RAID5 as technologies can cope with the loss of one drive. You have a SHR, which is a concatenation of three arrays (using lvm). This creates a stateful dependency among all three - a risk that is not present in a simple RAID1 or RAID5 array. All three of your arrays are currently impacted due to /dev/sda failing in some way, but are (according to mdadm) all intact in degraded state. That has no bearing on the quality of the data inside each array (garbage in, garbage out), nor does it mean that the arrays are consistent in their stateful relationships. it just means that the mdadm job did not see any loss of integrity of each individual array. If the lvm started up normally, you could just repair all three arrays with /dev/sda and be done with it. Unfortunately while /dev/md3 array says it is intact, it seems to be missing the UUID signature to participate in the lvm. This is a corruption of some type on /dev/md3 and is why the Volume is reporting crashed. The corruption could be minor or significant. And if we don't have an intact volume/filesystem, we do not want to try and repair the array otherwise we will sacrifice potentially usable redundant information. So we are first trying to correct the /dev/md3 device in the hopes that it can be recognized by the lvm. The pvcreate command we are trying to use is not going to zero the disk, it will just write the missing metadata signature into the appropriate reserved area. Hopefully the rest of the data on the disk is intact. If so and we can then start the lvm and mount the filesystem, I will strongly advise you to offload/recover all files from the system at that moment. Afterward, if you are satisfied with your data recovery, you'll be advised to rebuild the entire array structure from scratch (delete Storage Pool and recreate) to ensure there are no other downstream issues from the corruption. If we can't get lvm to accept /dev/md3 or if we cannot extract files from the filesystem, we can attempt to force the other member of /dev/md3 (/dev/sda7) into service. It was flagged as stale by mdadm, but we can verify exactly how far out of date it is, and more importantly, it may also not suffer from the corruption we are trying to fix. If neither approach works to get to the filesystem, then the files that are on the /dev/md3 array are probably lost. Files contained in the other two arrays could be recovered with forensic recovery tools (typically expensive). Here's some reference information if you'd like to validate yourself: https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/4/html/cluster_logical_volume_manager/mdatarecover So, if you want to proceed: # pvcreate --uuid bocvSr-hmj0-LUH0-BM8g-BXBS-TicT-LbjYQ9 --restorefile /etc/lvm/backup/vg1000 /dev/md3 # vgcfgrestore vg1000 # pvs Edited February 24, 2021 by flyride Quote Link to comment Share on other sites More sharing options...
pdavey Posted February 24, 2021 Author Share #25 Posted February 24, 2021 Thanks for the explanation, I'm not sure SHR was a wise choice., my ignorance has come back to bite me. I was just reading the following and they confirm your wise advice. https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/configuring_and_managing_logical_volumes/troubleshooting-lvm_configuring-and-managing-logical-volumes Here goes Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.