I've just recently migrated to xpenology with DSM 7.2 and things went seriously wrong while expanding a storage pool. Here is the run-down of what I have done so far...
I was moving off on an unraid array with 4 disks to xpenology. I freed up one of the disks on my physical unraid machine and moved it to my new xpenology VM on ESXi 8.
Attached that disk to my xpenology VM as an RDM (via a spare 4 port USB3 enclosure I had laying around), created a new SHR volume and copied all my data over.
Freed another disk from unraid and moved it to the Xpenology VM, added it to the storage pool and waited for that to sync up - great, now I have a RAID1 array on xpenology with all my data.
Feeling confident that my data was now protected in Xpenology, I moved the remaining 2 disks from my unraid array into Xpenology. Now I have all 4 disks in my USB3 enclosure. Each disk is mapped as an RDM to my xpenology VM.
Went into DSM and started pool expansion using the 2 new disks that were just added.
Expansion ran for 12 hours and was maybe 20% complete. At this point I started looking into why it was taking so long and found out that I had accidentally attached the enclosure to a USB 2.0 port. Woops. I did some reading and found that I could safely shutdown the xpenology VM via the shutdown option in DSM and it should resume expansion when powered back on.
Shut it down, moved the enclosure to a USB3 port, remapped the RDM's, being careful to make sure they were attached to the VM on the same exact SATA addresses. Booted back up. As advertised, the expansion picked up where it left off and was chugging along much faster. It was now saying another 15 hours to finish, which I felt much better about.
After about 2 hours I got an email saying disk 1 (the one I originally created the pool on) had crashed. This was in the middle of my work day, so I didn't have time to investigate right then and there. I did see that the expansion was still going, and I could still access my data, so at this point I crossed my fingers that it would complete and I would at least have an SHR pool with 3 disks at the end.
A couple hours later, got another email saying the ENTIRE POOL had crashed. WTF. After some investigating I found that ESX had completely dropped the USB connection to the enclosure and I couldn't even see the devices anymore from ESX's perspective.
I have now gotten the USB connection stable again, but I cannot even boot into DSM. I have noticed that I can ping the IP address of the VM while it is booting, but once it starts load the kernel pings drop again and never come back (supposedly because DSM never finishes initializing). I noticed this behavior previoulsy with the pings while the system was healthy. It seems like the system does an initial boot which brings networking online, then when DSM starts to load pings drop again until everything is up and running. So I guess that is normal behavior even on a healthy system? So it seems like something is going wrong when DSM is loading, but without networking I have no way to see whats going on inside the VM.
I can see that CPU is steady at about 30% when this happens, like it is stuck in a loop of some sort. No disk activity is happening at this time.
Any ideas on what steps I should take from here?