Volume crashed after power outage


Recommended Posts

Running DSM 6.2.2-24922 Update 3 under esxi 7 on an hp microserver 8. Lost power today and got a volume crashed upon reboot.

 

Would appreciate any help in recreating volume/recovering data. I do have a partial backup in case I have to go that route.

 

Here are some screenshots/info. Not sure what would help.

 

 

 

root@XPE_1:/# ls -l
total 52
lrwxrwxrwx   1 root root     7 Oct 12  2019 bin -> usr/bin
drwxr-xr-x   7 root root     0 Mar  1 10:24 config
drwxr-xr-x  10 root root 18840 Mar  1 12:24 dev
drwxr-xr-x  48 root root  4096 Mar  1 10:24 etc
drwxr-xr-x  43 root root  4096 Oct 12  2019 etc.defaults
drwxr-xr-x   2 root root  4096 May  9  2019 initrd
lrwxrwxrwx   1 root root     7 Oct 12  2019 lib -> usr/lib
lrwxrwxrwx   1 root root     9 Oct 12  2019 lib32 -> usr/lib32
lrwxrwxrwx   1 root root     7 Oct 12  2019 lib64 -> usr/lib
drwx------   2 root root  4096 May  9  2019 lost+found
drwxr-xr-x   2 root root  4096 May  9  2019 mnt
drwx--x--x   3 root root  4096 Oct 17  2019 opt
dr-xr-xr-x 376 root root     0 Mar  1 10:24 proc
drwx------   3 root root  4096 Feb 27  2021 root
drwxr-xr-x  25 root root  1280 Mar  1 16:00 run
lrwxrwxrwx   1 root root     8 Oct 12  2019 sbin -> usr/sbin
dr-xr-xr-x  12 root root     0 Mar  1 10:24 sys
drwxrwxrwt  12 root root  1280 Mar  1 16:07 tmp
drwxr-xr-x   2 root root  4096 Oct 12  2019 tmpRoot
drwxr-xr-x  11 root root  4096 May  9  2019 usr
drwxr-xr-x  17 root root  4096 Mar  1 10:24 var
drwxr-xr-x  14 root root  4096 Oct 12  2019 var.defaults
drwxr-xr-x   3 root root  4096 Mar  1 09:58 volume1
drwxr-xr-x   5 root root  4096 Mar  1 10:24 volumeSATA1
root@XPE_1:/# cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [raidF1]
md2 : active raid6 sdb3[0] sde3[3] sdd3[2] sdc3[1]
      27335120896 blocks super 1.2 level 6, 64k chunk, algorithm 2 [4/4] [UUUU]
      
md1 : active raid1 sdb2[0] sdc2[1] sdd2[2] sde2[3]
      2097088 blocks [12/4] [UUUU________]
      
md0 : active raid1 sdb1[0] sdc1[1] sdd1[2] sde1[3]
      2490176 blocks [12/4] [UUUU________]
      
unused devices: <none>
root@XPE_1:/# mdadm --detail /dev/md0
/dev/md0:
        Version : 0.90
  Creation Time : Sat Oct 12 01:49:00 2019
     Raid Level : raid1
     Array Size : 2490176 (2.37 GiB 2.55 GB)
  Used Dev Size : 2490176 (2.37 GiB 2.55 GB)
   Raid Devices : 12
  Total Devices : 4
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Tue Mar  1 16:09:16 2022
          State : clean, degraded
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0

           UUID : affa3cf2:4bc1be17:3017a5a8:c86610be
         Events : 0.22852208

    Number   Major   Minor   RaidDevice State
       0       8       17        0      active sync   /dev/sdb1
       1       8       33        1      active sync   /dev/sdc1
       2       8       49        2      active sync   /dev/sdd1
       3       8       65        3      active sync   /dev/sde1
       -       0        0        4      removed
       -       0        0        5      removed
       -       0        0        6      removed
       -       0        0        7      removed
       -       0        0        8      removed
       -       0        0        9      removed
       -       0        0       10      removed
       -       0        0       11      removed
root@XPE_1:/# mdadm --detail /dev/md1
/dev/md1:
        Version : 0.90
  Creation Time : Sat Oct 12 01:49:03 2019
     Raid Level : raid1
     Array Size : 2097088 (2047.94 MiB 2147.42 MB)
  Used Dev Size : 2097088 (2047.94 MiB 2147.42 MB)
   Raid Devices : 12
  Total Devices : 4
Preferred Minor : 1
    Persistence : Superblock is persistent

    Update Time : Tue Mar  1 10:24:12 2022
          State : clean, degraded
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0

           UUID : d913496f:3d84522e:3017a5a8:c86610be
         Events : 0.162375

    Number   Major   Minor   RaidDevice State
       0       8       18        0      active sync   /dev/sdb2
       1       8       34        1      active sync   /dev/sdc2
       2       8       50        2      active sync   /dev/sdd2
       3       8       66        3      active sync   /dev/sde2
       -       0        0        4      removed
       -       0        0        5      removed
       -       0        0        6      removed
       -       0        0        7      removed
       -       0        0        8      removed
       -       0        0        9      removed
       -       0        0       10      removed
       -       0        0       11      removed
root@XPE_1:/# mdadm --detail /dev/md2
/dev/md2:
        Version : 1.2
  Creation Time : Fri Oct 11 19:29:34 2019
     Raid Level : raid6
     Array Size : 27335120896 (26068.80 GiB 27991.16 GB)
  Used Dev Size : 13667560448 (13034.40 GiB 13995.58 GB)
   Raid Devices : 4
  Total Devices : 4
    Persistence : Superblock is persistent

    Update Time : Tue Mar  1 10:24:21 2022
          State : clean
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 64K

           Name : XPE1:2
           UUID : 683881bb:0c6fecad:dc5ac778:4d0d9d2c
         Events : 4591

    Number   Major   Minor   RaidDevice State
       0       8       19        0      active sync   /dev/sdb3
       1       8       35        1      active sync   /dev/sdc3
       2       8       51        2      active sync   /dev/sdd3
       3       8       67        3      active sync   /dev/sde3

Screen Shot 2022-03-01 at 10.29.38 AM.png

Link to post
Share on other sites
1 hour ago, blue-label1989 said:

Very strange, in ESXI did you make RawDataMappings off your disks? Or did you just connected the disks?

 

 

 

 

 

Raw data mappings

Link to post
Share on other sites
49 minutes ago, blue-label1989 said:

Maybe you Will find something usefull in one of these sites:

 

https://www.vsam.pro/crashed-synology-volume-and-how-to-restore-ds415-play/

 

Or 

BTRFS Restore

goodluck.

 

 

Thanks, I had found the first article but when I do the lvm vgscan it doesn't find any volumes (file system corruption?) so I think his issue is more of a disk issue.

 

I'll try to follow the second article and see what's useful

Link to post
Share on other sites

when I follow the steps in the recovery, command like vgdisplay return no info and

 

# lvdisplay -v returns 

    Using logical volume(s) on command line.

    No volume groups found.

 

Help!

Link to post
Share on other sites
  • 6 hours ago, flyride said:

    Doesn't look like you're using an lv, just a plain RAID6.

     

    You need to know if you are using btrfs or ext4.

     

    Post the output of cat /etc/fstab

    none /proc proc defaults 0 0

    /dev/root / ext4 defaults 1 1

    /dev/md2 /volume1 btrfs auto_reclaim_space,synoacl,relatime 0 0

Link to post
Share on other sites

Ok, this confirms you have a simple RAID6 and btrfs.

 

The link cited https://xpenology.com/forum/topic/14337-volume-crash-after-4-months-of-stability/?do=findComment&comment=107979

 

is a reasonable one to follow.  You can ignore any instructions involving lvm (lvdisplay/pvdisplay/vgchange) and just focus on the filesystem mounting and/or repair commands.

 

The link above is to the post that you should start with.

Link to post
Share on other sites

Thanks so much for the tip to ignore the lv stuff as that was confusing me!

 

I was able to do a recovery mount (sudo mount -o recovery /dev/md2 /volume1)

 

A few questions if you don't mind:

1. What do you think happened and is there anything I can do to avoid in the future? (I have a UPS but it failed, so I need to order a new battery and figure out how to get ESXI to shut down XPEnology automatically)

2. I am assuming the volume can't be fixed and I need to backup/recreate the volume. Is this correct?

3. I have another (real, but old) synology NAS where most (because I don't have enough HDD space) the data is backed up to and I want to make sure everything is backed up. If I backup /volume1/@appstore will all the data for plex, docker, and hyperbackup be backed up or do they store info elsewhere?

4. Since it seems I need to do a restore, this may be a good time to upgrade my DSM to 7.0.1; Would you recommend this and is there a better guide than this? https://www.tsunati.com/blog/xpenology-7-0-1-on-esxi-7-x  I use RawDataMappings so I'll need to remember how I did that a while back

5. Any recommendations on how to set up the new volume (e.g. SHR-2, write cache off)

 

Thanks so much for your time and knowledge!

Link to post
Share on other sites
11 hours ago, jj-lolo said:

1. What do you think happened and is there anything I can do to avoid in the future? (I have a UPS but it failed, so I need to order a new battery and figure out how to get ESXI to shut down XPEnology automatically)

Couldn't say.  Corruptions happen and power outages seem as good as any of a reason.  It is quite possible to plug the UPS into the ESXi host and add that USB device to your XPe VM profile, so that it can see the power status.

 

11 hours ago, jj-lolo said:

2. I am assuming the volume can't be fixed and I need to backup/recreate the volume. Is this correct?

btrfs is not really designed for an offline repair.  If it can, it repairs itself in real-time.  Everything I have ever seen posted from Synology is to offload, re-create and repopulate if there is sustained corruption on a btrfs volume.  This is opposite the advice given for ext4 (bring offline, run repair utilities, restart).

 

11 hours ago, jj-lolo said:

3. I have another (real, but old) synology NAS where most (because I don't have enough HDD space) the data is backed up to and I want to make sure everything is backed up. If I backup /volume1/@appstore will all the data for plex, docker, and hyperbackup be backed up or do they store info elsewhere?

The typical method of backing up @appstore is Hyper Backup as it is not accessible from the filesystem sharing toolsets.  Syno apps typically store configuration data in @appstore and you control where they store their data elsewhere, but it's not a hard and fast rule.  It is one of the reasons I converted from Syno apps exclusively to Docker apps - to improve portability (and get better access to new releases in most cases).

 

11 hours ago, jj-lolo said:

4. Since it seems I need to do a restore, this may be a good time to upgrade my DSM to 7.0.1; Would you recommend this and is there a better guide than this? https://www.tsunati.com/blog/xpenology-7-0-1-on-esxi-7-x  I use RawDataMappings so I'll need to remember how I did that a while back

If you are asking me personally, loader development and ongoing upgrade procedures still are in early beta status.  I haven't moved any production data to 7.0.1 yet.  I also don't pay much attention to documentation authored or hosted elsewhere, so I can't comment on that guide.  Your risk tolerance may be higher, and many are using 7.x.

 

11 hours ago, jj-lolo said:

5. Any recommendations on how to set up the new volume (e.g. SHR-2, write cache off)

No idea what your disks and requirements are.  I maintain all my datasets replicated on two XPe systems, so I use RAID5 only (actually RAIDF1) as two-disk redundancy seems overkill.  I also prefer plain RAID over SHR (what you have now) as it makes recovery simpler, which you just experienced.  But if you need the flexibility of SHR, there really isn't a substitute except making multiple Storage Pools.

Link to post
Share on other sites
Posted (edited)

Thanks for all the help.

 

What throughput should I be getting backing up data on a recovery mount to another synology box over a wired network?

 

Basically using rsync, I've only copied 130GB in 5 hours. Here is some detail:

 

I just installed a brand new 14TB Seagate exos on my old Synology 213j and I'm using rsync -avux (also tried with -z) from my XPe I'm trying to backup  over a wired 1Gb network to my old Synology 213J and I seem to be only getting ~10MB/s  (network download speed - I was able to get at least 60+ on the Xpe over the network before the volume crash). 

 

Is it the 213J or the NAS that's in recovery mode that’s causing this? I haven’t used the 213 in 3 years and updated it before trying the restore and never remembered testing the throughput on it as I only used it for a regular backup

 

UPDATE: I did a test from a wireless laptop to the 213J and I got a throughput peak of over 30MB/s so I'm guessing it's the way things are mounted? Anything I can try to speed up the backup as at this speed it will take weeks to backup 14TB at 10MB/s.

 

Unfortunately, I don't have any more bays on my HP Microserver to insert another drive, so I also tried to mount a USB SSD directly the the esxi host and pass it through to the XPenology VM, but when I do an fdisk -l it doesn't show up.

 

Another thought if I can't get the speed up higher- I have another server I can install Xpenology to but 14TB HDDs are expensive... I do have a partial backup on another synology. Can I "break the raid 6" and take two disks out and use those to configure a new machine and restore the partial to them and then still be able to access the broken raid on the original xpenology (the one in recovery) or is that too much risk?

 

Your thoughts as to what I should try next would be appreciated! 

Edited by jj-lolo
Link to post
Share on other sites

Update: in case this is useful to anyone I went ahead and broke the raid by taking a drive out and it still read (albeit it slowly) and was able to backup what was missing and went ahead and recreated the volumes in RAID 5/BTFRS and am now restoring (at around 90MB/s) from my old synology DS1511 so it's going to take a while!

 

I am also creating a new XPennology server to act more as a long term backup.

 

Thanks for your help.

 

Two final questions:

- for your docker do you use the synology docker package or something else (to make sure I backup everything)

- I'd like my new backup synology to be "hot swappable" with my main one; what's the best way to backup to it so that it's a mirror image of my primary NAS that can be used as my main NAS in case it fails again?

 

Thanks!

 

 

Link to post
Share on other sites
30 minutes ago, jj-lolo said:

- for your docker do you use the synology docker package or something else (to make sure I backup everything)

 

Docker apps that are running on DSM are using Synology Docker package.  You specify a folder for docker and that can be backed up or replicated in its entirety (this doesn't absolve you from properly configuring your docker apps with a data folder mounted to the image).

 

32 minutes ago, jj-lolo said:

- I'd like my new backup synology to be "hot swappable" with my main one; what's the best way to backup to it so that it's a mirror image of my primary NAS that can be used as my main NAS in case it fails again?

For true hot swappability you may want to look into the HA services that Synology offers.  But that has been hit or miss with XPe and I haven't really heard of anyone relying on it.  I just use Snapshot Replication to keep my archive server in sync.  This wasn't an option with your 213j (doesn't support btrfs/snapshots) but if you are running two XPe systems, the choice becomes available.

Link to post
Share on other sites
Posted (edited)

Ok, bad news. I finally got done with the restore, and before going at it hard again, I hooked up my UPS via USB (passthrough) to shut down the DSM the first time I tested it it worked fine, shut down DSM but I had accidentally ticked shut down UPS as well, so it shut that down once DSM shut down, so I removed that switch and restarted everything. 

 

I then tested the UPS again by unplugging it, and this time, DSM shut down (saw ups message, then couldn't access it via web) but didn't shut down the UPS. Success I thought. WRONG. When I plugged the UPS in again, I waited for everything to boot then when I went into the web interface, I had a message saying the DSM had been reset and I had to re-install DSM, so I did BUT the new raid 5 volume crashed again 😕 

 

I wasn't accessing the shared drives from anywhere else, but on synology I had plex server running (but no clients running), as well as under docker, open-vm-tools, pi-hole, portainer, and crashplan

 

 

Ugh- what am I doing wrong?

Edited by jj-lolo
Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.