Jump to content
XPEnology Community
  • 0

Volume Crashed Looking to Recover Data


drcrow_
 Share

Question

I been trying to troubleshoot my volume crash myself but I am at the end of my wits here. I am hoping someone can shine some light on to what my issue is and how to fix it.

 

A couple weeks I started to receive email alerts stating, “Checksum mismatch on NAS. Please check Log Center for more details.” I hopped on my NAS WebUI and I did not really seem much in the logs. After checking my systems were still functioning properly and I could access my file, I figured something was wrong but was not a major issue…..how wrong I was.

That brings us up until today, where I notice my NAS was only in read only mode. Which I thought was really odd. I tried logging into the WebUI but after I entered my username and password, I was not getting the NAS’s dashboard.

I figured I would reboot the NAS, thinking it would fix the issue. I had problems with the WebUI being buggy in the past and a reboot seemed to always take care of it.

But after the reboot I received the dreaded email, “Volume 1 (SHR, btrfs) on NAS has crashed”. I am unable to access the WebUI. But luckily, I have SSH enabled and logged on to the server and that’s where we are now.  

 

Some info about my system:

12 x 10TB Drives
Synology 6.1.X as a DS3617xs

1 SSD Cache

24 GBs of RAM

1 x XEON CPU

 

Here is the output of some of the commands I tried already: (Have to edit some of the outputs due to SPAM detection)

 

Looks like the RAID comes up as md2. Seems to have the 12 drives active, not 100% sure 

Quote

ash-4.3# cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [raidF1]
md2 : active raid6 sdb5[0] sdm5[11] sdl5[10] sdk5[9] sdj5[8] sdi5[7] sdh5[6] sdg5[5] sdf5[4] sde5[3] sdd5[2] sdc5[1]
      97615989760 blocks super 1.2 level 6, 64k chunk, algorithm 2 [12/12] [UUUUUUUUUUUU]

md3 : active raid0 sda1[0]
      250049664 blocks super 1.2 64k chunks [1/1]

md1 : active raid1 sdb2[0] sdc2[1] sdd2[2] sde2[3] sdf2[4] sdg2[5] sdh2[6] sdi2[7] sdj2[8] sdk2[9] sdl2[10] sdm2[11]
      2097088 blocks [13/12] [UUUUUUUUUUUU_]

md0 : active raid1 sdb1[0] sdc1[2] sdd1[11] sde1[10] sdf1[8] sdg1[7] sdh1[6] sdi1[5] sdj1[4] sdk1[9] sdl1[3] sdm1[1]
      2490176 blocks [12/12] [UUUUUUUUUUUU]

unused devices: <none>

 

Received an error when running the this command: GPT PMBR size mismatch (102399 != 60062499) will be corrected by w(rite). I think this might have to do something with the checksum errors I was getting before.

Quote

ash-4.3# fdisk -l
Disk /dev/sda: 238.5 GiB, 256060514304 bytes, 500118192 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: dos
Disk identifier: 0xa9b8704b

Device     Boot Start       End   Sectors   Size Id Type
/dev/sda1        2048 500103449 500101402 238.5G fd Linux raid autodetect


GPT PMBR size mismatch (102399 != 60062499) will be corrected by w(rite).

 

 

Quote

ash-4.3# vgdisplay
  --- Volume group ---
  VG Name               vg1000
  System ID
  Format                lvm2
  Metadata Areas        1
  Metadata Sequence No  2
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                1
  Open LV               1
  Max PV                0
  Cur PV                1
  Act PV                1
  VG Size               90.91 TiB
  PE Size               4.00 MiB
  Total PE              23832028
  Alloc PE / Size       23832028 / 90.91 TiB
  Free  PE / Size       0 / 0
  VG UUID               rc3DXE-ddO3-qaOp-7gLC-6wll-hesC-yC5YFE

 

 

Quote

ash-4.3# lvdisplay -v
    Using logical volume(s) on command line.
  --- Logical volume ---
  LV Path                /dev/vg1000/lv
  LV Name                lv
  VG Name                vg1000
  LV UUID                NUab2g-gp1H-bmCu-Vie0-1qmK-ougT-uNop9i
  LV Write Access        read/write
  LV Creation host, time ,
  LV Status              available
  # open                 1
  LV Size                90.91 TiB
  Current LE             23832028
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  currently set to     2560
  Block device           253:0

 

When I try to interact with the LV it says it couldn't open file system.

 

Quote

ash-4.3# btrfs check  /dev/vg1000/lv
Couldn't open file system  

 

I tried to unmounted the LV and/or remount it, it gives me errors saying its not mounted, already mounted or busy. 

 

Quote

ash-4.3# umount /dev/vg1000/lv
umount: /dev/vg1000/lv: not mounted
ash-4.3# mount -o recovery /dev/vg1000/lv /volume1
mount: /dev/vg1000/lv is already mounted or /volume1 busy

 

Can anyone comment on whether this is a possibility to recover the data? Am I going in the right direction?

 

Any help would be greatly appreciated!

Edited by Polanskiman
Cleaned html code.
Link to comment
Share on other sites

11 answers to this question

Recommended Posts

  • 0

You might want to review this thread: https://xpenology.com/forum/topic/14337-volume-crash-after-4-months-of-stability

 

And in particular, recovering files per post #14: https://xpenology.com/forum/topic/14337-volume-crash-after-4-months-of-stability/?do=findComment&comment=108021

 

Mostly btrfs tends to self-heal, but there are not a lot of easy options on Synology to fix a btrfs volume once it has corrupted.  At least, none that are documented and functional.

Edited by flyride
fixed url
Link to comment
Share on other sites

  • 0

Thanks for responding @flyride. I was actually using that post to help troubleshoot my issues. But I ran into issues with your recovering files comment. When I try to run:

 

Quote

sudo btrfs check --init-extent-tree /dev/vg1000/lv

sudo btrfs check --init-csum-tree /dev/vg1000/lv

sudo btrfs check --repair /dev/vg1000/lv

 

I get the following errors:
 

Quote

ash-4.3# btrfs check --init-extent-tree /dev/vg1000/lv
Couldn't open file system
ash-4.3# btrfs check --init-csum-tree /dev/vg1000/lv
Creating a new CRC tree
Couldn't open file system
ash-4.3#  btrfs check --repair /dev/vg1000/lv
enabling repair mode
Couldn't open file system

 

I can't seem to interact with the LV. Got any other steps/commands I should try?

I was really hoping you would respond, since you seemed to help out the other guy in the thread you linked. 

 

I think having the checksum error emails and then when I do a fdisk -l, I get GPT PMBR size mismatch (102399 != 60062499) will be corrected by w(rite). Makes me think the issue stems from there. 

 

But I am willing to give anything you suggest a try!

Link to comment
Share on other sites

  • 0

Honestly, I don't think any of the repair options are likely to help.  Your LV seems to be ok, but the FS is toast.  The post I linked specifically discussed the recovery option.  The FS does not have to mount in order to use that option to recover your files.

 

Your biggest challenge will be to find enough storage to perform the recovery.  I would probably build up another NAS and NFS mount it to the problem server.

Link to comment
Share on other sites

  • 0

I see what talking about in the other post, you mean this right?

 

Quote

Btrfs has a special option to dump the whole filesystem to completely separate location, even if the source cannot be mounted.  So if you have a free SATA port, install an adequately sized drive, create a second storage pool and set up a second volume to use as a recovery target.  Alternatively, you could build it on another NAS and NFS mount it.  Whatever you come up with has to be directly accessible on the problem system.

 

For example's sake, let's say that you have installed and configured /volume2.  This command should extract all the files from your broken btrfs filesystem and drop them on /volume2.  Note that /volume2 can be set up as btrfs or ext4 - the filesystem type does not matter.

sudo btrfs restore /dev/vg1000/lv /volume2

 

1 hour ago, flyride said:

Your biggest challenge will be to find enough storage to perform the recovery.  I would probably build up another NAS and NFS mount it to the problem server.

 

That is my biggest problem. I have another NAS with roughly 20TB of storage and a friends NAS with 16TB. Is there away to just restore the data not the entire array? Meaning, does the btrfs restore /dev/vg1000/lv /volume2 need to be as big as the entire volume, ~90TB, or just as big as the data I stored on it, ~35TB?

Additionally, is that all the info on Btrfs restore via this link, https://btrfs.wiki.kernel.org/index.php/Restore. I was hoping for some more information. 

 

Ideally, I could use part of my 20TB NAS and part of my friends NAS. 

 

BTW, thanks for your help so far. Seems kind of grim. 

Edited by drcrow_
Link to comment
Share on other sites

  • 0

I might of misspoke when I was looking at the size of my file. I am not sure. But when I do a mdadm -D /dev/md2 I get:

 

Quote

ash-4.3# mdadm -D /dev/md2
/dev/md2:
        Version : 1.2
  Creation Time : Mon Dec 17 21:03:41 2018
     Raid Level : raid6
     Array Size : 97615989760 (93093.86 GiB 99958.77 GB)
  Used Dev Size : 9761598976 (9309.39 GiB 9995.88 GB)
   Raid Devices : 12
  Total Devices : 12
    Persistence : Superblock is persistent

    Update Time : Wed Jul  3 10:50:25 2019
          State : clean
 Active Devices : 12
Working Devices : 12
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 64K

           Name : NAS:2  (local to host NAS)
           UUID : ddd4bb0a:74d2802b:7bc45a50:a8e617eb
         Events : 571

    Number   Major   Minor   RaidDevice State
       0       8       21        0      active sync   /dev/sdb5
       1       8       37        1      active sync   /dev/sdc5
       2       8       53        2      active sync   /dev/sdd5
       3       8       69        3      active sync   /dev/sde5
       4       8       85        4      active sync   /dev/sdf5
       5       8      101        5      active sync   /dev/sdg5
       6       8      117        6      active sync   /dev/sdh5
       7       8      133        7      active sync   /dev/sdi5
       8       8      149        8      active sync   /dev/sdj5
       9       8      165        9      active sync   /dev/sdk5
      10       8      181       10      active sync   /dev/sdl5
      11       8      197       11      active sync   /dev/sdm5

 

Do you know if the Used Dev Size means the size used? That would mean I only have used roughly 9.99TB. Which would would in terms of using my NAS with 20TB. 

 

I just want to confirm your thoughts before I do ahead and try to run the command. 

Link to comment
Share on other sites

  • 0

The restore only needs enough room to save the files stored on the volume, not the entire volume size, so potentially good news there.

 

"Used dev" in mdadm output on the previous post refers to the size of the parity device in the array... nothing to do with the usage of the volume.

Link to comment
Share on other sites

  • 0

Maybe, but I don't know how to do that without mounting it.  It will take a long time and it will write regular files to the destination, so you could probably move stuff off well in time to make room if you don't have quite enough space.

Edited by flyride
Link to comment
Share on other sites

  • 0

@flyride Thanks for all the help. I think I was able to get the rough size of my volume, ~21.61TB. You can see in the output below, bytes_used              21619950809088. Hopefully that is the right amount. 

 

Quote

ash-4.3# btrfs inspect-internal dump-super -f /dev/vg1000/lv
superblock: bytenr=65536, device=/dev/vg1000/lv
---------------------------------------------------------
csum                    0xd294ff59 [match]
bytenr                  65536
flags                   0x1
                        ( WRITTEN )
magic                   _BHRfS_M [match]
fsid                    dcb0f8f7-432e-4039-8f60-e43cdb417df1
label                   2018.12.18-02:03:43 v15217
generation              263749
root                    1401334054912
sys_array_size          129
chunk_root_generation   255129
root_level              1
chunk_root              21004288
chunk_root_level        1
log_root                1401336610816
log_root_transid        0
log_root_level          0
total_bytes             99958770368512
bytes_used              21619950809088
sectorsize              4096
nodesize                16384
leafsize                16384
stripesize              4096
root_dir                6
num_devices             1
compat_flags            0x8000000000000000
compat_ro_flags         0x0
incompat_flags          0x16b

 

I have amounted my other NAS and I am running  btrfs restore /dev/vg1000/lv /root/hope/. Where /root/hope/ is my remote NFS mount to my other NAS. One question I did have is on the btrfs restore command they mention you can use the following flag:

 

Quote

--path-regex: Regex for files to restore. In order to restore only a single folder somewhere in the btrfs tree, it is unfortunately necessary to construct a slightly nontrivial regex, e.g.: '^/(|home(|/username(|/Desktop(|/.*))))$'

 

Now I am not a regex wizard, but my file path on my NAS when something like /Volume1/Media/TV, /Volume1/Media/Movies, /Volume1/Media/Home Videos. Let's say I just wanted to restore my movies folder, I think the regex should be ^/(|Media(|/Movies(|/.*)))$. But when I tried to do a dry run with that,  btrfs restore -D --path-regex '^/(|Media(|/Movies(|/.*)))$' /dev/vg1000/lv /root/hope/, it did not seem to work. 

Do you know if there is something wrong with my syntax? 

Link to comment
Share on other sites

  • 0

So I ran the btrfs restore /dev/vg1000/lv /root/hope but it got about 250GBs in and hit this error. Tried googling it and nothing really came up. Do you have any idea whats going on?

Maybe the regex file path would help by skipping this section. It looks like its just the recycle bin anyways, not something I care about. 

Any chance you know how to fix the regex?

Capture.PNG

Link to comment
Share on other sites

  • 0

i had a similar error from last crash. everything was green except the volume, so i have mounted it read only directly to the volume path and rebooted after that. Maybe it will help you too..

 

mount -t btrfs -o recovery,ro /dev/vg1000/lv /volumeX

 <- X stands for the volume number it should be

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Answer this question...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

 Share

×
×
  • Create New...