Volume Crashed but RAID and all drives healthy - help?

mattyjasper · March 13, 2022

Hi,

Randomly yesterday I got an email from my Xpenology box saying my volume had crashed! After logging in to DSM I can see that the volume is indeed in a crashed state and appears to have been set to read-only. However, checking my HDD status all are healthy, as is the RAID5 Storage Pool. This is very out of the ordinary as everything has been running well for years on end now. I did recently attempt to update to TinyCore but couldn't get the past the correct drive mapping so gave up and went back to my Jun loader USB which booted fine as usual and everything has been working flawlessly since!

I have been checking similar recent posts from search (https://xpenology.com/forum/topic/57921-volume-crashed-after-power-outage/?tab=comments#comment-269420 and https://xpenology.com/forum/topic/14337-volume-crash-after-4-months-of-stability/#comment-107979) but after enabling SSH and running some of the check commands I'm not getting the same results.

I'm currently backing up data that I can to USB drive just in case, but with 12TB of data this is going to take some time.

Some more details on my setup;

Jun's 1.03b DS3615xs loader running DSM 6.2.3 u3

Core i3-4130

8GB RAM

Gigabtye H81M S2V motherboard

4x 6TB HGST Enterprise HUS726T6TALE6L4 in RAID5

Btrfs Volume

If anyone is able to offer some assistance on a possible repair of the volume that would be fantastic. I haven't tried simple things like a reboot yet as I wanted to recover as much data while I can!

Thanks

Matt

IG-88 · March 13, 2022

you gave no details about btrfs stats but if disks and mdadm raid is ok i'd expect that to be the problem and if thats the case there is often not much to do with repairs, usually i'd expect ram or disk/cable problems, check /var/log/messages and boot a live linux and use memtest to check the ram

mattyjasper · March 13, 2022

Thanks for the reply IG-88. Do you have any suggested commands to run that I can post output from? Thanks

mattyjasper · March 21, 2022

Ok so I just finished backing up all of the data I could (Docker and Downloads folders appeared to be inaccessible) - took over a week via USB onto multiple drives!!

Rebooted the NAS after completing the backups before attempting a fresh install..... and everything now shows healthy!! What the hell 🤯😂

Is there anything else I should be checking? The only anomaly I know of on my install is that the "Data Scrubbing" task wont run - it gets to around 1.7% and then stops and shows "Never performed" as the status. Nothing I've tried has been able to resolve this, I even ran the scrub task via SSH. I'm wondering if the lack of data scrub could have been the cause for the potentially incorrect "Crashed volume"?

Again - any assistance greatly appreciated

mattyjasper · March 21, 2022

I've found the root cause - some files I was downloading at the time of the initial crash appear to be corrupt/causing the volume to crash. Trying to delete them causes the volume to go into the crashed state again, which can be resolved by rebooting.

I've tried deleting the suspect files via SMB and File Station with the same result. Is there any way I can delete them via SSH or similar to bypass the issues? Or run a file system check somehow?

Sign In

Volume Crashed but RAID and all drives healthy - help?

Question

mattyjasper

Link to comment

Share on other sites

4 answers to this question

Recommended Posts

IG-88

Link to comment

Share on other sites

mattyjasper

Link to comment

Share on other sites

mattyjasper

Link to comment

Share on other sites

mattyjasper

Link to comment

Share on other sites

Join the conversation

Forums

What's new

MUST READ

Members