Darkened

SHR2/BTRFS array degraded after adding a disk

Darkened replied to Darkened's question in Answered Questions

Final update (for now). The array repair, RAID scrubbing and the file system de-fragmentation went through without a hitch. Sadly I did two things at once (changed the drive bay and thus a different sata power cable for the drive and disabled the HD hibernation feature from DSM). One of those fixed the issue, but I just didn't want to mess around with a "production" server by testing these things one by one. Next up I'll expand the server by swapping one of the 1Tb drives with a 3Tb WD Red. Hopefully everything goes well with that Big thanks to IG-88 for pointing me to the right direction! Janne

November 17, 2017
8 replies
- wd
- red
- (and 3 more)
  Tagged with:
  - wd
  - red
  - degraded
  - shr2
  - btrfs

SHR2/BTRFS array degraded after adding a disk

Darkened replied to Darkened's question in Answered Questions

Morning update: The array repair was successful and no weird behavior has occurred after the repair finished. Just now I started raid scrubbing, which was suggested by DSM and after that it will run the file system check. I'll report back after those are done, but I'm cautiously optimistic about this now Janne

November 16, 2017
8 replies
- wd
- red
- (and 3 more)
  Tagged with:
  - wd
  - red
  - degraded
  - shr2
  - btrfs

SHR2/BTRFS array degraded after adding a disk

Darkened replied to Darkened's question in Answered Questions

Hey again, I do think I found the issue from the log (disk_log.xml for future reference). This was the first try on 2017-10-01 when I tried expanding the array: <kernel time="2017/10/01 03:58:04" path="/dev/sde" model="WD30EFRX-68EUZN0" SN="WD-WMC4N1830023" type="serror" raw="590338" show="0">RecovComm Persist PHYRdyChg 10B8B </ker nel> <kernel time="2017/10/01 04:18:48" path="/dev/sde" model="WD30EFRX-68EUZN0" SN="WD-WMC4N1830023" type="serror" raw="66048" show="0">Persist PHYRdyChg </kernel> <hotplug time="2017/10/01 04:39:55" path="/dev/sde" model="WD30EFRX-68EUZN0" SN="WD-WMC4N1830023" show="1">plugout</hotplug> <hotplug time="2017/10/01 04:39:55" path="/dev/sde" model="WD30EFRX-68EUZN0" SN="WD-WMC4N1830023" show="1">plugin</hotplug> <hotplug time="2017/10/01 05:01:59" path="/dev/sde" model="WD30EFRX-68EUZN0" SN="WD-WMC4N1830023" show="1">plugout</hotplug> <hotplug time="2017/10/01 05:01:59" path="/dev/sde" model="WD30EFRX-68EUZN0" SN="WD-WMC4N1830023" show="1">plugin</hotplug> <kernel time="2017/10/01 05:31:08" path="/dev/sde" model="WD30EFRX-68EUZN0" SN="WD-WMC4N1830023" type="serror" raw="66050" show="0">RecovComm Persist PHYRdyChg </kernel> <kernel time="2017/10/01 05:52:41" path="/dev/sde" model="WD30EFRX-68EUZN0" SN="WD-WMC4N1830023" type="serror" raw="66048" show="0">Persist PHYRdyChg </kernel> <kernel time="2017/10/01 06:13:44" path="/dev/sde" model="WD30EFRX-68EUZN0" SN="WD-WMC4N1830023" type="serror" raw="66048" show="0">Persist PHYRdyChg </kernel> <kernel time="2017/10/01 06:36:21" path="/dev/sde" model="WD30EFRX-68EUZN0" SN="WD-WMC4N1830023" type="serror" raw="66048" show="0">Persist PHYRdyChg </kernel> <kernel time="2017/10/01 06:57:24" path="/dev/sde" model="WD30EFRX-68EUZN0" SN="WD-WMC4N1830023" type="serror" raw="66048" show="0">Persist PHYRdyChg </kernel> <kernel time="2017/10/01 07:18:03" path="/dev/sde" model="WD30EFRX-68EUZN0" SN="WD-WMC4N1830023" type="serror" raw="66048" show="0">Persist PHYRdyChg </kernel> <kernel time="2017/10/01 07:39:06" path="/dev/sde" model="WD30EFRX-68EUZN0" SN="WD-WMC4N1830023" type="serror" raw="66048" show="0">Persist PHYRdyChg </kernel> <kernel time="2017/10/01 07:59:42" path="/dev/sde" model="WD30EFRX-68EUZN0" SN="WD-WMC4N1830023" type="serror" raw="66048" show="0">Persist PHYRdyChg </kernel> <kernel time="2017/10/01 08:20:22" path="/dev/sde" model="WD30EFRX-68EUZN0" SN="WD-WMC4N1830023" type="serror" raw="66048" show="0">Persist PHYRdyChg </kernel> <hotplug time="2017/10/01 08:41:23" path="/dev/sde" model="WD30EFRX-68EUZN0" SN="WD-WMC4N1830023" show="1">plugout</hotplug> <hotplug time="2017/10/01 08:41:23" path="/dev/sde" model="WD30EFRX-68EUZN0" SN="WD-WMC4N1830023" show="1">plugin</hotplug> <kernel time="2017/10/01 09:02:25" path="/dev/sde" model="WD30EFRX-68EUZN0" SN="WD-WMC4N1830023" type="serror" raw="66050" show="0">RecovComm Persist PHYRdyChg </kernel> <kernel time="2017/10/01 09:23:28" path="/dev/sde" model="WD30EFRX-68EUZN0" SN="WD-WMC4N1830023" type="serror" raw="66048" show="0">Persist PHYRdyChg </kernel> <kernel time="2017/10/01 09:44:07" path="/dev/sde" model="WD30EFRX-68EUZN0" SN="WD-WMC4N1830023" type="serror" raw="66048" show="0">Persist PHYRdyChg </kernel> <kernel time="2017/10/01 10:06:22" path="/dev/sde" model="WD30EFRX-68EUZN0" SN="WD-WMC4N1830023" type="serror" raw="66048" show="0">Persist PHYRdyChg </kernel> <kernel time="2017/10/01 10:27:25" path="/dev/sde" model="WD30EFRX-68EUZN0" SN="WD-WMC4N1830023" type="serror" raw="66048" show="0">Persist PHYRdyChg </kernel> And the second time on 2017-11-12: <kernel time="2017/11/12 03:50:24" path="/dev/sde" model="WD30EFRX-68EUZN0" SN="WD-WMC4N1830023" type="serror" raw="590338" show="0">RecovComm Persist PHYRdyChg 10B8B </ker nel> <hotplug time="2017/11/12 04:11:05" path="/dev/sde" model="WD30EFRX-68EUZN0" SN="WD-WMC4N1830023" show="1">plugout</hotplug> <hotplug time="2017/11/12 04:11:05" path="/dev/sde" model="WD30EFRX-68EUZN0" SN="WD-WMC4N1830023" show="1">plugin</hotplug> <kernel time="2017/11/12 04:32:05" path="/dev/sde" model="WD30EFRX-68EUZN0" SN="WD-WMC4N1830023" type="serror" raw="66050" show="0">RecovComm Persist PHYRdyChg </kernel> <kernel time="2017/11/12 05:06:43" path="/dev/sde" model="WD30EFRX-68EUZN0" SN="WD-WMC4N1830023" type="serror" raw="66048" show="0">Persist PHYRdyChg </kernel> <kernel time="2017/11/12 05:27:45" path="/dev/sde" model="WD30EFRX-68EUZN0" SN="WD-WMC4N1830023" type="serror" raw="66048" show="0">Persist PHYRdyChg </kernel> <kernel time="2017/11/12 05:48:49" path="/dev/sde" model="WD30EFRX-68EUZN0" SN="WD-WMC4N1830023" type="serror" raw="66048" show="0">Persist PHYRdyChg </kernel> <kernel time="2017/11/12 06:09:25" path="/dev/sde" model="WD30EFRX-68EUZN0" SN="WD-WMC4N1830023" type="serror" raw="66048" show="0">Persist PHYRdyChg </kernel> <kernel time="2017/11/12 06:30:13" path="/dev/sde" model="WD30EFRX-68EUZN0" SN="WD-WMC4N1830023" type="serror" raw="66048" show="0">Persist PHYRdyChg </kernel> <kernel time="2017/11/12 06:56:13" path="/dev/sde" model="WD30EFRX-68EUZN0" SN="WD-WMC4N1830023" type="serror" raw="66048" show="0">Persist PHYRdyChg </kernel> <kernel time="2017/11/12 07:17:42" path="/dev/sde" model="WD30EFRX-68EUZN0" SN="WD-WMC4N1830023" type="serror" raw="66048" show="0">Persist PHYRdyChg </kernel> <kernel time="2017/11/12 07:38:44" path="/dev/sde" model="WD30EFRX-68EUZN0" SN="WD-WMC4N1830023" type="serror" raw="66048" show="0">Persist PHYRdyChg </kernel> <kernel time="2017/11/12 08:00:24" path="/dev/sde" model="WD30EFRX-68EUZN0" SN="WD-WMC4N1830023" type="serror" raw="66048" show="0">Persist PHYRdyChg </kernel> <kernel time="2017/11/12 08:34:54" path="/dev/sde" model="WD30EFRX-68EUZN0" SN="WD-WMC4N1830023" type="serror" raw="66048" show="0">Persist PHYRdyChg </kernel> <kernel time="2017/11/12 08:55:56" path="/dev/sde" model="WD30EFRX-68EUZN0" SN="WD-WMC4N1830023" type="serror" raw="66048" show="0">Persist PHYRdyChg </kernel> <kernel time="2017/11/12 09:16:37" path="/dev/sde" model="WD30EFRX-68EUZN0" SN="WD-WMC4N1830023" type="serror" raw="66048" show="0">Persist PHYRdyChg </kernel> <kernel time="2017/11/12 09:38:24" path="/dev/sde" model="WD30EFRX-68EUZN0" SN="WD-WMC4N1830023" type="serror" raw="66048" show="0">Persist PHYRdyChg </kernel> <kernel time="2017/11/12 09:59:00" path="/dev/sde" model="WD30EFRX-68EUZN0" SN="WD-WMC4N1830023" type="serror" raw="66048" show="0">Persist PHYRdyChg </kernel> <kernel time="2017/11/12 10:19:43" path="/dev/sde" model="WD30EFRX-68EUZN0" SN="WD-WMC4N1830023" type="serror" raw="66048" show="0">Persist PHYRdyChg </kernel> <kernel time="2017/11/12 10:40:28" path="/dev/sde" model="WD30EFRX-68EUZN0" SN="WD-WMC4N1830023" type="serror" raw="66048" show="0">Persist PHYRdyChg </kernel> <kernel time="2017/11/12 11:02:24" path="/dev/sde" model="WD30EFRX-68EUZN0" SN="WD-WMC4N1830023" type="serror" raw="66048" show="0">Persist PHYRdyChg </kernel> The same issue repeated, but without the several "plugout/plugin" events. After this revelation I found some reference from the unRaid wiki. So it seems that the PHYRdyChg / 10B8B-errors are most probably due to bad connections from either the SATA-cable or the SATA power cable. The difference between the two tries was that I changed the SATA-cable to another one and I didn't get any CRC-errors on either time. This leads me to the SATA power cable/plug or HD connector. I didn't change the power cable between the two tries. There is two other possibilities still, namely too many hard drives on one lead/rail (although the WD Red was closest to the PSU). I also had HD hibernation on from DSM, which could mimic the hotplug events if the drive doesn't spool up fast enough etc. This could well be the culprit here, since the drive was ok until the expansion / repair was done, which took several hours both times. So according to this research: Faulty SATA cable (changed, no difference) Faulty SATA power cable / plug (to be tested) Faulty SATA connector on drive (visual inspection ok) Too many drives on one lead / rail Software issue with WD Red drives and hibernation For now I've disabled HD hibernation and I'll plug the drive to another slot / cable and proceed with the repair. I'll report back after the server has run the repair overnight. Janne

November 15, 2017
8 replies
- 1
- wd
- red
- (and 3 more)
  Tagged with:
  - wd
  - red
  - degraded
  - shr2
  - btrfs

SHR2/BTRFS array degraded after adding a disk

Darkened replied to Darkened's question in Answered Questions

Hey IG-88, Sorry about the delay. I couldn't test your suggestions before today, but here goes. The WD Red run through DLG-tests without any issues. This was done a few days ago. For C1E state, I don't think it exists in the bios of this mobo. C6 is enabled, but C1E is nowhere to be found. Although I must say that the server works just fine, so if the C1E issue is more with getting the Xpenology to boot, then it's not the issue here. I've been using the server with 4 x 1Tb drives a while now without any problems. The hard drives are and have been in AHCI-mode from the beginning. And I haven't had any issues connecting to the server, so I don't think there's anything wrong with the NIC. I'll boot up the server next and try to go through the logs via Putty. I'll get back to you with the result from those. Janne

November 15, 2017
8 replies
- wd
- red
- (and 3 more)
  Tagged with:
  - wd
  - red
  - degraded
  - shr2
  - btrfs

SHR2/BTRFS array degraded after adding a disk

Darkened replied to Darkened's question in Answered Questions

Hey IG-88, I'm running the Asus E35M1-M without any add-ons. Other than that, 4Gb of G.Skill DDR3 ram and Corsair PSU. The mobo is running the latest available bios. DLG is running as I'm writing this and at least the short test went through without an issue. I'll update the result once the test completes. I'll also check the proper log files after the test is done. Can't get the nas on the table at the moment Janne

November 12, 2017
8 replies
- wd
- red
- (and 3 more)
  Tagged with:
  - wd
  - red
  - degraded
  - shr2
  - btrfs

SHR2/BTRFS array degraded after adding a disk

Darkened posted a question in Answered Questions

Hey, I'm running XPEnology DSM 6.0.2-8451 Update 11 on a self built computer. I started out with 4 x 1Tb older Samsung drives (HD103UJ & HD103SJ). These are in SHR2/BTRFS array (enabled SHR for DS3615xs). This setup hasn't had any issues and I intended to expand the array with other 1Tb drives, but I decided to go with bigger drives since I had the chance to do so. So I added a 3Tb WD Red and started expanding the volume. The main goal was to replace the 1Tb drives one by one with 3Tb drives and have 5x 3Tb WD Reds in the end. The expansion went ok and so did the consistency check. Then for some unknown reason the newly added disk was restarted, then degraded the swap system volume, then degraded the volume, was "inserted" and "removed" (although I didn't do anything) and finally it degraded the root system volume on the disk. I tried repairing the volume, but it didn't help. I shut the server down and no new data has been written on the array since this issue. Yesterday I finally had time to do something about this, so I removed the disk, emptied everything on the disk and re-inserted it in the XPEnology server. I also changed the SATA-cable and power cable for the disk. Repair was successful like the array expansion before and so was the consistency check. After this had finished, I started the RAID scrub. Even the scrub went through just fine at 3:28:01 and then it did just about the same as when I was expanding the array. Disk restarted due to "unknown error", volumes degraded and the disk 5 was inserted and removed from the array. This is the situation now along with: Next step is of course to run diagnostics on that WD Red, but for some reason I don't think it's the disk that is causing this issue. I also have a few other WD Reds which I could be able to try out, but I'd need to empty them first. If you have some inkling on what could be causing this, It'd be appreciated. Best regards, Darkened aka. Janne

November 12, 2017
8 replies
- wd
- red
- (and 3 more)
  Tagged with:
  - wd
  - red
  - degraded
  - shr2
  - btrfs

Sign In

Posts

Joined

Last visited

Recent Profile Visitors

Darkened's Achievements

Newbie (1/7)

Reputation

SHR2/BTRFS array degraded after adding a disk

SHR2/BTRFS array degraded after adding a disk

SHR2/BTRFS array degraded after adding a disk

SHR2/BTRFS array degraded after adding a disk

SHR2/BTRFS array degraded after adding a disk

SHR2/BTRFS array degraded after adding a disk

Forums

What's new

MUST READ

Members