Volume crashed after power outage

jj-lolo · March 1, 2022

Running DSM 6.2.2-24922 Update 3 under esxi 7 on an hp microserver 8. Lost power today and got a volume crashed upon reboot.

Would appreciate any help in recreating volume/recovering data. I do have a partial backup in case I have to go that route.

Here are some screenshots/info. Not sure what would help.

root@XPE_1:/# ls -l
total 52
lrwxrwxrwx   1 root root     7 Oct 12 2019 bin -> usr/bin
drwxr-xr-x   7 root root     0 Mar 1 10:24 config
drwxr-xr-x 10 root root 18840 Mar 1 12:24 dev
drwxr-xr-x 48 root root 4096 Mar 1 10:24 etc
drwxr-xr-x 43 root root 4096 Oct 12 2019 etc.defaults
drwxr-xr-x   2 root root 4096 May 9 2019 initrd
lrwxrwxrwx   1 root root     7 Oct 12 2019 lib -> usr/lib
lrwxrwxrwx   1 root root     9 Oct 12 2019 lib32 -> usr/lib32
lrwxrwxrwx   1 root root     7 Oct 12 2019 lib64 -> usr/lib
drwx------   2 root root 4096 May 9 2019 lost+found
drwxr-xr-x   2 root root 4096 May 9 2019 mnt
drwx--x--x   3 root root 4096 Oct 17 2019 opt
dr-xr-xr-x 376 root root     0 Mar 1 10:24 proc
drwx------   3 root root 4096 Feb 27 2021 root
drwxr-xr-x 25 root root 1280 Mar 1 16:00 run
lrwxrwxrwx   1 root root     8 Oct 12 2019 sbin -> usr/sbin
dr-xr-xr-x 12 root root     0 Mar 1 10:24 sys
drwxrwxrwt 12 root root 1280 Mar 1 16:07 tmp
drwxr-xr-x   2 root root 4096 Oct 12 2019 tmpRoot
drwxr-xr-x 11 root root 4096 May 9 2019 usr
drwxr-xr-x 17 root root 4096 Mar 1 10:24 var
drwxr-xr-x 14 root root 4096 Oct 12 2019 var.defaults
drwxr-xr-x   3 root root 4096 Mar 1 09:58 volume1
drwxr-xr-x   5 root root 4096 Mar 1 10:24 volumeSATA1
root@XPE_1:/# cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [raidF1]
md2 : active raid6 sdb3[0] sde3[3] sdd3[2] sdc3[1]
      27335120896 blocks super 1.2 level 6, 64k chunk, algorithm 2 [4/4] [UUUU]

md1 : active raid1 sdb2[0] sdc2[1] sdd2[2] sde2[3]
      2097088 blocks [12/4] [UUUU________]

md0 : active raid1 sdb1[0] sdc1[1] sdd1[2] sde1[3]
      2490176 blocks [12/4] [UUUU________]

unused devices: <none>
root@XPE_1:/# mdadm --detail /dev/md0
/dev/md0:
        Version : 0.90
Creation Time : Sat Oct 12 01:49:00 2019
     Raid Level : raid1
     Array Size : 2490176 (2.37 GiB 2.55 GB)
Used Dev Size : 2490176 (2.37 GiB 2.55 GB)
   Raid Devices : 12
Total Devices : 4
Preferred Minor : 0
    Persistence : Superblock is persistent

Update Time : Tue Mar 1 16:09:16 2022
State : clean, degraded
Active Devices : 4
Working Devices : 4
Failed Devices : 0
Spare Devices : 0

UUID : affa3cf2:4bc1be17:3017a5a8:c86610be
Events : 0.22852208

    Number   Major   Minor   RaidDevice State
       0       8       17        0      active sync   /dev/sdb1
       1       8       33        1      active sync   /dev/sdc1
       2       8       49        2      active sync   /dev/sdd1
       3       8       65        3      active sync   /dev/sde1
       -       0        0        4      removed
       -       0        0        5      removed
       -       0        0        6      removed
       -       0        0        7      removed
       -       0        0        8      removed
       -       0        0        9      removed
       -       0        0       10      removed
       -       0        0       11      removed
root@XPE_1:/# mdadm --detail /dev/md1
/dev/md1:
        Version : 0.90
Creation Time : Sat Oct 12 01:49:03 2019
     Raid Level : raid1
     Array Size : 2097088 (2047.94 MiB 2147.42 MB)
Used Dev Size : 2097088 (2047.94 MiB 2147.42 MB)
   Raid Devices : 12
Total Devices : 4
Preferred Minor : 1
    Persistence : Superblock is persistent

Update Time : Tue Mar 1 10:24:12 2022
State : clean, degraded
Active Devices : 4
Working Devices : 4
Failed Devices : 0
Spare Devices : 0

UUID : d913496f:3d84522e:3017a5a8:c86610be
Events : 0.162375

    Number   Major   Minor   RaidDevice State
       0       8       18        0      active sync   /dev/sdb2
       1       8       34        1      active sync   /dev/sdc2
       2       8       50        2      active sync   /dev/sdd2
       3       8       66        3      active sync   /dev/sde2
       -       0        0        4      removed
       -       0        0        5      removed
       -       0        0        6      removed
       -       0        0        7      removed
       -       0        0        8      removed
       -       0        0        9      removed
       -       0        0       10      removed
       -       0        0       11      removed
root@XPE_1:/# mdadm --detail /dev/md2
/dev/md2:
        Version : 1.2
Creation Time : Fri Oct 11 19:29:34 2019
     Raid Level : raid6
     Array Size : 27335120896 (26068.80 GiB 27991.16 GB)
Used Dev Size : 13667560448 (13034.40 GiB 13995.58 GB)
   Raid Devices : 4
Total Devices : 4
    Persistence : Superblock is persistent

Update Time : Tue Mar 1 10:24:21 2022
State : clean
Active Devices : 4
Working Devices : 4
Failed Devices : 0
Spare Devices : 0

Layout : left-symmetric
Chunk Size : 64K

           Name : XPE1:2
           UUID : 683881bb:0c6fecad:dc5ac778:4d0d9d2c
         Events : 4591

    Number   Major   Minor   RaidDevice State
       0       8       19        0      active sync   /dev/sdb3
       1       8       35        1      active sync   /dev/sdc3
       2       8       51        2      active sync   /dev/sdd3
       3       8       67        3      active sync   /dev/sde3

blue-label1989 · March 1, 2022

Very strange, in ESXI did you make RawDataMappings off your disks? Or did you just connected the disks?

blue-label1989 · March 2, 2022

Maybe you Will find something usefull in one of these sites:

https://www.vsam.pro/crashed-synology-volume-and-how-to-restore-ds415-play/

Or

BTRFS Restore

goodluck.

jj-lolo · March 2, 2022

1 hour ago, blue-label1989 said:

Very strange, in ESXI did you make RawDataMappings off your disks? Or did you just connected the disks?

Raw data mappings

jj-lolo · March 2, 2022

49 minutes ago, blue-label1989 said:

Maybe you Will find something usefull in one of these sites:

https://www.vsam.pro/crashed-synology-volume-and-how-to-restore-ds415-play/

Or

BTRFS Restore

goodluck.

Thanks, I had found the first article but when I do the lvm vgscan it doesn't find any volumes (file system corruption?) so I think his issue is more of a disk issue.

I'll try to follow the second article and see what's useful

jj-lolo · March 2, 2022

when I follow the steps in the recovery, command like vgdisplay return no info and

# lvdisplay -v returns

Using logical volume(s) on command line.

No volume groups found.

Help!

flyride · March 3, 2022

Doesn't look like you're using an lv, just a plain RAID6.

You need to know if you are using btrfs or ext4.

Post the output of cat /etc/fstab

jj-lolo · March 3, 2022

6 hours ago, flyride said:

Doesn't look like you're using an lv, just a plain RAID6.

You need to know if you are using btrfs or ext4.

Post the output of cat /etc/fstab

none /proc proc defaults 0 0

/dev/root / ext4 defaults 1 1

/dev/md2 /volume1 btrfs auto_reclaim_space,synoacl,relatime 0 0

flyride · March 3, 2022

Ok, this confirms you have a simple RAID6 and btrfs.

The link cited https://xpenology.com/forum/topic/14337-volume-crash-after-4-months-of-stability/?do=findComment&comment=107979

is a reasonable one to follow. You can ignore any instructions involving lvm (lvdisplay/pvdisplay/vgchange) and just focus on the filesystem mounting and/or repair commands.

The link above is to the post that you should start with.

jj-lolo · March 5, 2022

Thanks so much for the tip to ignore the lv stuff as that was confusing me!

I was able to do a recovery mount (sudo mount -o recovery /dev/md2 /volume1)

A few questions if you don't mind:

1. What do you think happened and is there anything I can do to avoid in the future? (I have a UPS but it failed, so I need to order a new battery and figure out how to get ESXI to shut down XPEnology automatically)

2. I am assuming the volume can't be fixed and I need to backup/recreate the volume. Is this correct?

3. I have another (real, but old) synology NAS where most (because I don't have enough HDD space) the data is backed up to and I want to make sure everything is backed up. If I backup /volume1/@appstore will all the data for plex, docker, and hyperbackup be backed up or do they store info elsewhere?

4. Since it seems I need to do a restore, this may be a good time to upgrade my DSM to 7.0.1; Would you recommend this and is there a better guide than this? https://www.tsunati.com/blog/xpenology-7-0-1-on-esxi-7-x I use RawDataMappings so I'll need to remember how I did that a while back

5. Any recommendations on how to set up the new volume (e.g. SHR-2, write cache off)

Thanks so much for your time and knowledge!

flyride · March 5, 2022

11 hours ago, jj-lolo said:

1. What do you think happened and is there anything I can do to avoid in the future? (I have a UPS but it failed, so I need to order a new battery and figure out how to get ESXI to shut down XPEnology automatically)

Couldn't say. Corruptions happen and power outages seem as good as any of a reason. It is quite possible to plug the UPS into the ESXi host and add that USB device to your XPe VM profile, so that it can see the power status.

11 hours ago, jj-lolo said:

2. I am assuming the volume can't be fixed and I need to backup/recreate the volume. Is this correct?

btrfs is not really designed for an offline repair. If it can, it repairs itself in real-time. Everything I have ever seen posted from Synology is to offload, re-create and repopulate if there is sustained corruption on a btrfs volume. This is opposite the advice given for ext4 (bring offline, run repair utilities, restart).

11 hours ago, jj-lolo said:

3. I have another (real, but old) synology NAS where most (because I don't have enough HDD space) the data is backed up to and I want to make sure everything is backed up. If I backup /volume1/@appstore will all the data for plex, docker, and hyperbackup be backed up or do they store info elsewhere?

The typical method of backing up @appstore is Hyper Backup as it is not accessible from the filesystem sharing toolsets. Syno apps typically store configuration data in @appstore and you control where they store their data elsewhere, but it's not a hard and fast rule. It is one of the reasons I converted from Syno apps exclusively to Docker apps - to improve portability (and get better access to new releases in most cases).

11 hours ago, jj-lolo said:

4. Since it seems I need to do a restore, this may be a good time to upgrade my DSM to 7.0.1; Would you recommend this and is there a better guide than this? https://www.tsunati.com/blog/xpenology-7-0-1-on-esxi-7-x I use RawDataMappings so I'll need to remember how I did that a while back

If you are asking me personally, loader development and ongoing upgrade procedures still are in early beta status. I haven't moved any production data to 7.0.1 yet. I also don't pay much attention to documentation authored or hosted elsewhere, so I can't comment on that guide. Your risk tolerance may be higher, and many are using 7.x.

11 hours ago, jj-lolo said:

5. Any recommendations on how to set up the new volume (e.g. SHR-2, write cache off)

No idea what your disks and requirements are. I maintain all my datasets replicated on two XPe systems, so I use RAID5 only (actually RAIDF1) as two-disk redundancy seems overkill. I also prefer plain RAID over SHR (what you have now) as it makes recovery simpler, which you just experienced. But if you need the flexibility of SHR, there really isn't a substitute except making multiple Storage Pools.

jj-lolo · March 8, 2022

Thanks for all the help.

What throughput should I be getting backing up data on a recovery mount to another synology box over a wired network?

Basically using rsync, I've only copied 130GB in 5 hours. Here is some detail:

I just installed a brand new 14TB Seagate exos on my old Synology 213j and I'm using rsync -avux (also tried with -z) from my XPe I'm trying to backup over a wired 1Gb network to my old Synology 213J and I seem to be only getting ~10MB/s (network download speed - I was able to get at least 60+ on the Xpe over the network before the volume crash).

Is it the 213J or the NAS that's in recovery mode that’s causing this? I haven’t used the 213 in 3 years and updated it before trying the restore and never remembered testing the throughput on it as I only used it for a regular backup

UPDATE: I did a test from a wireless laptop to the 213J and I got a throughput peak of over 30MB/s so I'm guessing it's the way things are mounted? Anything I can try to speed up the backup as at this speed it will take weeks to backup 14TB at 10MB/s.

Unfortunately, I don't have any more bays on my HP Microserver to insert another drive, so I also tried to mount a USB SSD directly the the esxi host and pass it through to the XPenology VM, but when I do an fdisk -l it doesn't show up.

Another thought if I can't get the speed up higher- I have another server I can install Xpenology to but 14TB HDDs are expensive... I do have a partial backup on another synology. Can I "break the raid 6" and take two disks out and use those to configure a new machine and restore the partial to them and then still be able to access the broken raid on the original xpenology (the one in recovery) or is that too much risk?

Your thoughts as to what I should try next would be appreciated!

Edited March 8, 2022 by jj-lolo

jj-lolo · March 9, 2022

Update: in case this is useful to anyone I went ahead and broke the raid by taking a drive out and it still read (albeit it slowly) and was able to backup what was missing and went ahead and recreated the volumes in RAID 5/BTFRS and am now restoring (at around 90MB/s) from my old synology DS1511 so it's going to take a while!

I am also creating a new XPennology server to act more as a long term backup.

Thanks for your help.

Two final questions:

- for your docker do you use the synology docker package or something else (to make sure I backup everything)

- I'd like my new backup synology to be "hot swappable" with my main one; what's the best way to backup to it so that it's a mirror image of my primary NAS that can be used as my main NAS in case it fails again?

Thanks!

flyride · March 9, 2022

30 minutes ago, jj-lolo said:

- for your docker do you use the synology docker package or something else (to make sure I backup everything)

Docker apps that are running on DSM are using Synology Docker package. You specify a folder for docker and that can be backed up or replicated in its entirety (this doesn't absolve you from properly configuring your docker apps with a data folder mounted to the image).

32 minutes ago, jj-lolo said:

- I'd like my new backup synology to be "hot swappable" with my main one; what's the best way to backup to it so that it's a mirror image of my primary NAS that can be used as my main NAS in case it fails again?

For true hot swappability you may want to look into the HA services that Synology offers. But that has been hit or miss with XPe and I haven't really heard of anyone relying on it. I just use Snapshot Replication to keep my archive server in sync. This wasn't an option with your 213j (doesn't support btrfs/snapshots) but if you are running two XPe systems, the choice becomes available.

jj-lolo · March 10, 2022

Thanks so much!

jj-lolo · March 12, 2022

Ok, bad news. I finally got done with the restore, and before going at it hard again, I hooked up my UPS via USB (passthrough) to shut down the DSM the first time I tested it it worked fine, shut down DSM but I had accidentally ticked shut down UPS as well, so it shut that down once DSM shut down, so I removed that switch and restarted everything.

I then tested the UPS again by unplugging it, and this time, DSM shut down (saw ups message, then couldn't access it via web) but didn't shut down the UPS. Success I thought. WRONG. When I plugged the UPS in again, I waited for everything to boot then when I went into the web interface, I had a message saying the DSM had been reset and I had to re-install DSM, so I did BUT the new raid 5 volume crashed again 😕

I wasn't accessing the shared drives from anywhere else, but on synology I had plex server running (but no clients running), as well as under docker, open-vm-tools, pi-hole, portainer, and crashplan

Ugh- what am I doing wrong?

Edited March 12, 2022 by jj-lolo

Volume crashed after power outage

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation