jj-lolo Posted March 1, 2022 Share #1 Posted March 1, 2022 Running DSM 6.2.2-24922 Update 3 under esxi 7 on an hp microserver 8. Lost power today and got a volume crashed upon reboot. Would appreciate any help in recreating volume/recovering data. I do have a partial backup in case I have to go that route. Here are some screenshots/info. Not sure what would help. root@XPE_1:/# ls -l total 52 lrwxrwxrwx 1 root root 7 Oct 12 2019 bin -> usr/bin drwxr-xr-x 7 root root 0 Mar 1 10:24 config drwxr-xr-x 10 root root 18840 Mar 1 12:24 dev drwxr-xr-x 48 root root 4096 Mar 1 10:24 etc drwxr-xr-x 43 root root 4096 Oct 12 2019 etc.defaults drwxr-xr-x 2 root root 4096 May 9 2019 initrd lrwxrwxrwx 1 root root 7 Oct 12 2019 lib -> usr/lib lrwxrwxrwx 1 root root 9 Oct 12 2019 lib32 -> usr/lib32 lrwxrwxrwx 1 root root 7 Oct 12 2019 lib64 -> usr/lib drwx------ 2 root root 4096 May 9 2019 lost+found drwxr-xr-x 2 root root 4096 May 9 2019 mnt drwx--x--x 3 root root 4096 Oct 17 2019 opt dr-xr-xr-x 376 root root 0 Mar 1 10:24 proc drwx------ 3 root root 4096 Feb 27 2021 root drwxr-xr-x 25 root root 1280 Mar 1 16:00 run lrwxrwxrwx 1 root root 8 Oct 12 2019 sbin -> usr/sbin dr-xr-xr-x 12 root root 0 Mar 1 10:24 sys drwxrwxrwt 12 root root 1280 Mar 1 16:07 tmp drwxr-xr-x 2 root root 4096 Oct 12 2019 tmpRoot drwxr-xr-x 11 root root 4096 May 9 2019 usr drwxr-xr-x 17 root root 4096 Mar 1 10:24 var drwxr-xr-x 14 root root 4096 Oct 12 2019 var.defaults drwxr-xr-x 3 root root 4096 Mar 1 09:58 volume1 drwxr-xr-x 5 root root 4096 Mar 1 10:24 volumeSATA1 root@XPE_1:/# cat /proc/mdstat Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [raidF1] md2 : active raid6 sdb3[0] sde3[3] sdd3[2] sdc3[1] 27335120896 blocks super 1.2 level 6, 64k chunk, algorithm 2 [4/4] [UUUU] md1 : active raid1 sdb2[0] sdc2[1] sdd2[2] sde2[3] 2097088 blocks [12/4] [UUUU________] md0 : active raid1 sdb1[0] sdc1[1] sdd1[2] sde1[3] 2490176 blocks [12/4] [UUUU________] unused devices: <none> root@XPE_1:/# mdadm --detail /dev/md0 /dev/md0: Version : 0.90 Creation Time : Sat Oct 12 01:49:00 2019 Raid Level : raid1 Array Size : 2490176 (2.37 GiB 2.55 GB) Used Dev Size : 2490176 (2.37 GiB 2.55 GB) Raid Devices : 12 Total Devices : 4 Preferred Minor : 0 Persistence : Superblock is persistent Update Time : Tue Mar 1 16:09:16 2022 State : clean, degraded Active Devices : 4 Working Devices : 4 Failed Devices : 0 Spare Devices : 0 UUID : affa3cf2:4bc1be17:3017a5a8:c86610be Events : 0.22852208 Number Major Minor RaidDevice State 0 8 17 0 active sync /dev/sdb1 1 8 33 1 active sync /dev/sdc1 2 8 49 2 active sync /dev/sdd1 3 8 65 3 active sync /dev/sde1 - 0 0 4 removed - 0 0 5 removed - 0 0 6 removed - 0 0 7 removed - 0 0 8 removed - 0 0 9 removed - 0 0 10 removed - 0 0 11 removed root@XPE_1:/# mdadm --detail /dev/md1 /dev/md1: Version : 0.90 Creation Time : Sat Oct 12 01:49:03 2019 Raid Level : raid1 Array Size : 2097088 (2047.94 MiB 2147.42 MB) Used Dev Size : 2097088 (2047.94 MiB 2147.42 MB) Raid Devices : 12 Total Devices : 4 Preferred Minor : 1 Persistence : Superblock is persistent Update Time : Tue Mar 1 10:24:12 2022 State : clean, degraded Active Devices : 4 Working Devices : 4 Failed Devices : 0 Spare Devices : 0 UUID : d913496f:3d84522e:3017a5a8:c86610be Events : 0.162375 Number Major Minor RaidDevice State 0 8 18 0 active sync /dev/sdb2 1 8 34 1 active sync /dev/sdc2 2 8 50 2 active sync /dev/sdd2 3 8 66 3 active sync /dev/sde2 - 0 0 4 removed - 0 0 5 removed - 0 0 6 removed - 0 0 7 removed - 0 0 8 removed - 0 0 9 removed - 0 0 10 removed - 0 0 11 removed root@XPE_1:/# mdadm --detail /dev/md2 /dev/md2: Version : 1.2 Creation Time : Fri Oct 11 19:29:34 2019 Raid Level : raid6 Array Size : 27335120896 (26068.80 GiB 27991.16 GB) Used Dev Size : 13667560448 (13034.40 GiB 13995.58 GB) Raid Devices : 4 Total Devices : 4 Persistence : Superblock is persistent Update Time : Tue Mar 1 10:24:21 2022 State : clean Active Devices : 4 Working Devices : 4 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 64K Name : XPE1:2 UUID : 683881bb:0c6fecad:dc5ac778:4d0d9d2c Events : 4591 Number Major Minor RaidDevice State 0 8 19 0 active sync /dev/sdb3 1 8 35 1 active sync /dev/sdc3 2 8 51 2 active sync /dev/sdd3 3 8 67 3 active sync /dev/sde3 Quote Link to comment Share on other sites More sharing options...
blue-label1989 Posted March 1, 2022 Share #2 Posted March 1, 2022 Very strange, in ESXI did you make RawDataMappings off your disks? Or did you just connected the disks? Quote Link to comment Share on other sites More sharing options...
blue-label1989 Posted March 2, 2022 Share #3 Posted March 2, 2022 Maybe you Will find something usefull in one of these sites: https://www.vsam.pro/crashed-synology-volume-and-how-to-restore-ds415-play/ Or BTRFS Restore goodluck. Quote Link to comment Share on other sites More sharing options...
jj-lolo Posted March 2, 2022 Author Share #4 Posted March 2, 2022 1 hour ago, blue-label1989 said: Very strange, in ESXI did you make RawDataMappings off your disks? Or did you just connected the disks? Raw data mappings Quote Link to comment Share on other sites More sharing options...
jj-lolo Posted March 2, 2022 Author Share #5 Posted March 2, 2022 49 minutes ago, blue-label1989 said: Maybe you Will find something usefull in one of these sites: https://www.vsam.pro/crashed-synology-volume-and-how-to-restore-ds415-play/ Or BTRFS Restore goodluck. Thanks, I had found the first article but when I do the lvm vgscan it doesn't find any volumes (file system corruption?) so I think his issue is more of a disk issue. I'll try to follow the second article and see what's useful Quote Link to comment Share on other sites More sharing options...
jj-lolo Posted March 2, 2022 Author Share #6 Posted March 2, 2022 when I follow the steps in the recovery, command like vgdisplay return no info and # lvdisplay -v returns Using logical volume(s) on command line. No volume groups found. Help! Quote Link to comment Share on other sites More sharing options...
flyride Posted March 3, 2022 Share #7 Posted March 3, 2022 Doesn't look like you're using an lv, just a plain RAID6. You need to know if you are using btrfs or ext4. Post the output of cat /etc/fstab Quote Link to comment Share on other sites More sharing options...
jj-lolo Posted March 3, 2022 Author Share #8 Posted March 3, 2022 6 hours ago, flyride said: Doesn't look like you're using an lv, just a plain RAID6. You need to know if you are using btrfs or ext4. Post the output of cat /etc/fstab none /proc proc defaults 0 0 /dev/root / ext4 defaults 1 1 /dev/md2 /volume1 btrfs auto_reclaim_space,synoacl,relatime 0 0 Quote Link to comment Share on other sites More sharing options...
flyride Posted March 3, 2022 Share #9 Posted March 3, 2022 Ok, this confirms you have a simple RAID6 and btrfs. The link cited https://xpenology.com/forum/topic/14337-volume-crash-after-4-months-of-stability/?do=findComment&comment=107979 is a reasonable one to follow. You can ignore any instructions involving lvm (lvdisplay/pvdisplay/vgchange) and just focus on the filesystem mounting and/or repair commands. The link above is to the post that you should start with. Quote Link to comment Share on other sites More sharing options...
jj-lolo Posted March 5, 2022 Author Share #10 Posted March 5, 2022 Thanks so much for the tip to ignore the lv stuff as that was confusing me! I was able to do a recovery mount (sudo mount -o recovery /dev/md2 /volume1) A few questions if you don't mind: 1. What do you think happened and is there anything I can do to avoid in the future? (I have a UPS but it failed, so I need to order a new battery and figure out how to get ESXI to shut down XPEnology automatically) 2. I am assuming the volume can't be fixed and I need to backup/recreate the volume. Is this correct? 3. I have another (real, but old) synology NAS where most (because I don't have enough HDD space) the data is backed up to and I want to make sure everything is backed up. If I backup /volume1/@appstore will all the data for plex, docker, and hyperbackup be backed up or do they store info elsewhere? 4. Since it seems I need to do a restore, this may be a good time to upgrade my DSM to 7.0.1; Would you recommend this and is there a better guide than this? https://www.tsunati.com/blog/xpenology-7-0-1-on-esxi-7-x I use RawDataMappings so I'll need to remember how I did that a while back 5. Any recommendations on how to set up the new volume (e.g. SHR-2, write cache off) Thanks so much for your time and knowledge! Quote Link to comment Share on other sites More sharing options...
flyride Posted March 5, 2022 Share #11 Posted March 5, 2022 11 hours ago, jj-lolo said: 1. What do you think happened and is there anything I can do to avoid in the future? (I have a UPS but it failed, so I need to order a new battery and figure out how to get ESXI to shut down XPEnology automatically) Couldn't say. Corruptions happen and power outages seem as good as any of a reason. It is quite possible to plug the UPS into the ESXi host and add that USB device to your XPe VM profile, so that it can see the power status. 11 hours ago, jj-lolo said: 2. I am assuming the volume can't be fixed and I need to backup/recreate the volume. Is this correct? btrfs is not really designed for an offline repair. If it can, it repairs itself in real-time. Everything I have ever seen posted from Synology is to offload, re-create and repopulate if there is sustained corruption on a btrfs volume. This is opposite the advice given for ext4 (bring offline, run repair utilities, restart). 11 hours ago, jj-lolo said: 3. I have another (real, but old) synology NAS where most (because I don't have enough HDD space) the data is backed up to and I want to make sure everything is backed up. If I backup /volume1/@appstore will all the data for plex, docker, and hyperbackup be backed up or do they store info elsewhere? The typical method of backing up @appstore is Hyper Backup as it is not accessible from the filesystem sharing toolsets. Syno apps typically store configuration data in @appstore and you control where they store their data elsewhere, but it's not a hard and fast rule. It is one of the reasons I converted from Syno apps exclusively to Docker apps - to improve portability (and get better access to new releases in most cases). 11 hours ago, jj-lolo said: 4. Since it seems I need to do a restore, this may be a good time to upgrade my DSM to 7.0.1; Would you recommend this and is there a better guide than this? https://www.tsunati.com/blog/xpenology-7-0-1-on-esxi-7-x I use RawDataMappings so I'll need to remember how I did that a while back If you are asking me personally, loader development and ongoing upgrade procedures still are in early beta status. I haven't moved any production data to 7.0.1 yet. I also don't pay much attention to documentation authored or hosted elsewhere, so I can't comment on that guide. Your risk tolerance may be higher, and many are using 7.x. 11 hours ago, jj-lolo said: 5. Any recommendations on how to set up the new volume (e.g. SHR-2, write cache off) No idea what your disks and requirements are. I maintain all my datasets replicated on two XPe systems, so I use RAID5 only (actually RAIDF1) as two-disk redundancy seems overkill. I also prefer plain RAID over SHR (what you have now) as it makes recovery simpler, which you just experienced. But if you need the flexibility of SHR, there really isn't a substitute except making multiple Storage Pools. Quote Link to comment Share on other sites More sharing options...
jj-lolo Posted March 8, 2022 Author Share #12 Posted March 8, 2022 (edited) Thanks for all the help. What throughput should I be getting backing up data on a recovery mount to another synology box over a wired network? Basically using rsync, I've only copied 130GB in 5 hours. Here is some detail: I just installed a brand new 14TB Seagate exos on my old Synology 213j and I'm using rsync -avux (also tried with -z) from my XPe I'm trying to backup over a wired 1Gb network to my old Synology 213J and I seem to be only getting ~10MB/s (network download speed - I was able to get at least 60+ on the Xpe over the network before the volume crash). Is it the 213J or the NAS that's in recovery mode that’s causing this? I haven’t used the 213 in 3 years and updated it before trying the restore and never remembered testing the throughput on it as I only used it for a regular backup UPDATE: I did a test from a wireless laptop to the 213J and I got a throughput peak of over 30MB/s so I'm guessing it's the way things are mounted? Anything I can try to speed up the backup as at this speed it will take weeks to backup 14TB at 10MB/s. Unfortunately, I don't have any more bays on my HP Microserver to insert another drive, so I also tried to mount a USB SSD directly the the esxi host and pass it through to the XPenology VM, but when I do an fdisk -l it doesn't show up. Another thought if I can't get the speed up higher- I have another server I can install Xpenology to but 14TB HDDs are expensive... I do have a partial backup on another synology. Can I "break the raid 6" and take two disks out and use those to configure a new machine and restore the partial to them and then still be able to access the broken raid on the original xpenology (the one in recovery) or is that too much risk? Your thoughts as to what I should try next would be appreciated! Edited March 8, 2022 by jj-lolo Quote Link to comment Share on other sites More sharing options...
jj-lolo Posted March 9, 2022 Author Share #13 Posted March 9, 2022 Update: in case this is useful to anyone I went ahead and broke the raid by taking a drive out and it still read (albeit it slowly) and was able to backup what was missing and went ahead and recreated the volumes in RAID 5/BTFRS and am now restoring (at around 90MB/s) from my old synology DS1511 so it's going to take a while! I am also creating a new XPennology server to act more as a long term backup. Thanks for your help. Two final questions: - for your docker do you use the synology docker package or something else (to make sure I backup everything) - I'd like my new backup synology to be "hot swappable" with my main one; what's the best way to backup to it so that it's a mirror image of my primary NAS that can be used as my main NAS in case it fails again? Thanks! Quote Link to comment Share on other sites More sharing options...
flyride Posted March 9, 2022 Share #14 Posted March 9, 2022 30 minutes ago, jj-lolo said: - for your docker do you use the synology docker package or something else (to make sure I backup everything) Docker apps that are running on DSM are using Synology Docker package. You specify a folder for docker and that can be backed up or replicated in its entirety (this doesn't absolve you from properly configuring your docker apps with a data folder mounted to the image). 32 minutes ago, jj-lolo said: - I'd like my new backup synology to be "hot swappable" with my main one; what's the best way to backup to it so that it's a mirror image of my primary NAS that can be used as my main NAS in case it fails again? For true hot swappability you may want to look into the HA services that Synology offers. But that has been hit or miss with XPe and I haven't really heard of anyone relying on it. I just use Snapshot Replication to keep my archive server in sync. This wasn't an option with your 213j (doesn't support btrfs/snapshots) but if you are running two XPe systems, the choice becomes available. Quote Link to comment Share on other sites More sharing options...
jj-lolo Posted March 10, 2022 Author Share #15 Posted March 10, 2022 Thanks so much! Quote Link to comment Share on other sites More sharing options...
jj-lolo Posted March 12, 2022 Author Share #16 Posted March 12, 2022 (edited) Ok, bad news. I finally got done with the restore, and before going at it hard again, I hooked up my UPS via USB (passthrough) to shut down the DSM the first time I tested it it worked fine, shut down DSM but I had accidentally ticked shut down UPS as well, so it shut that down once DSM shut down, so I removed that switch and restarted everything. I then tested the UPS again by unplugging it, and this time, DSM shut down (saw ups message, then couldn't access it via web) but didn't shut down the UPS. Success I thought. WRONG. When I plugged the UPS in again, I waited for everything to boot then when I went into the web interface, I had a message saying the DSM had been reset and I had to re-install DSM, so I did BUT the new raid 5 volume crashed again 😕 I wasn't accessing the shared drives from anywhere else, but on synology I had plex server running (but no clients running), as well as under docker, open-vm-tools, pi-hole, portainer, and crashplan Ugh- what am I doing wrong? Edited March 12, 2022 by jj-lolo Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.