Recommended Posts

After 4-5 years happy usage of Debian9 + Proxmox5.2 + Xpenology dsm 5.2 I upgraded six months ago to DSM6.2.3 Proxmox6.3-3 Debian10. All went well, until a few weeks ago...

One night at 3am my volume 2 crashed!

Since then I can read my shares from volume 2, but not write. It is read-only. I suspect the activation of write back cache in proxmox is the reason.

Status of SMART and all my harddrives and raid controller is just fine, so I suspect no hardware issue.

After using google, this forum and youtube I still have no solution.

 

The crashed volume has all my music, video's, photo's and documents. I have a spare 3TB WD RED HD directly connected to the MB and copied most importend files.

However I have a backup of the most important files, I prefer to restore the crashed volume, but starting to become a bit desparate.

 

I think the solution is to stop /dev/md3 and re-create /dev/md3 with the same values, but something is keeps /dev/md3 bussy and prevents me to do.  Could someone please help me?

 

Setup:

- Hardware: Asrock µATX Bxxx(I forgot), Adaptec 8158Z raid controller with 4 x 3TB WD RED (2 new, 2 second hand and already replaced b'cos broken within waranty)

-Software: XPenology DSM6.2.3 Proxmox6.3-3 Debian10

-Xpenology: Volume1 is 12GB and has the OS (i guess?), Volume2 is single unprotected 8TB ext4 and has all my shares. 4GB RAM and 2 cores.

 

Output:

see attachment

output.txt

Link to post
Share on other sites
19 hours ago, Poelie said:

Since then I can read my shares from volume 2, but not write. It is read-only.

 

If you can read from volume2, copy off everything, delete it and remake it.

Link to post
Share on other sites

Hi,

 

Thanks for your answer.

 

This is indeed a possible solution, but not my favorite one.

However I already have backups in the cloud and on my spare HD, I am in trouble if I have to restore everything every six month because there is a out of the bleu crash.

At least this guy managed according to youtube: How To: Reset a Crashed Volume in XPEnology/Synology - YouTube

 

Thinking to delete the lvm partition and re-create it with the same values and afterwards to stop the raid /dev/md3 and also recreate it with the same values.

I keep this post updated.

Link to post
Share on other sites

I strongly suggest that your courses of action will have no positive effect as your filesystem is compromised but accessible in a read-only mode.  This means that there is some internal structure to the filesystem that is inconsistent.

 

MD binds multiple physical disk partitions into an array of raw storage.  LVM binds multiple logical partitions (such as an MD array) together into a continuous block of raw storage.  The filesystem is built on top of all that.  Issues with MD or LVM will either affect your redundancy or completely disable access to the filesystem. That is not the state of your system now.

 

There are ways to fix a damaged filesystem, but much more information is needed.  ext4 cannot self-repair and needs a manual repair operation.  This might be feasible and I would have confidence in a repair of an ext4 filesystem for long-term use.  btrfs is designed to self-heal and if it cannot, there is something significantly wrong. Personally, I will not use a manually repaired btrfs filesystem as there are likely to still be latent issues that will crop up later.  Better to offload and rebuild the fs.

Link to post
Share on other sites
  • 2 weeks later...

Hi,

 

I appreciate it you answer my topic.

 

I deleted the crashed volume and rebuild it.

After a few days still bussy copying my backup files and re-installing less important vm's.

 

I prefer the crashed volume to be repaired, but at least I can move on now. :)

This is the first time I am not able to repair myself and hope also the last time.

 

As mentioned, I appreciate your answer.

 

Can be marked as solved

 

Link to post
Share on other sites
On 3/30/2021 at 10:55 PM, Poelie said:

One night at 3am my volume 2 crashed!

you did not post any logs, in baremetal systems I'd expect to find things before the cash happend

it might be essential to find out how this happened or you might see the same in a foreseeable future again

as its a vm check if there is caching active and if there are problems in that area (crashes where cache content was lost)

some people use pci passthrough of the controller or raw mapping of drives that would prevent any host related cache problems

Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.