CrazyFin Posted July 31, 2017 Share #1 Posted July 31, 2017 (edited) Since DSM 6.1.2-15132 I have started to experience random shut downs of my Xpenelogy server. My feeling is that this happens when disks (I have totally 8 disks with 4TB each) are doing a recovery scrub after an upgrade or any other very heavy file operation. It might be my PSU that is starting to give up and I´ll most likely try with replacing the PSU this coming weekend. The PSU is a Corsair RM1000 and I have had it installed for approx a year in my barebone Xpenology server (Asus P7F-X with X3440 CPU, LSI9211 controller card with 8 x 4TB WD green disks). I can not see any particular events in the kern.log file and there is nothing that indicates a PSU problem or CPU heating problem. CPU seems to run at approx 40-44 degrees Celsius when the shut down happens. These are the last lines I see in the kern.log file. There is no message about powering down and the interrupt errors shown below can be seen other times as well without Xpenology shutting down: 2017-07-30T23:45:39+02:00 CrazyServer kernel: [214024.761531] sky2 0000:03:00.0: error interrupt status=0x40000008 2017-07-30T23:45:40+02:00 CrazyServer kernel: [214026.041864] sky2 0000:03:00.0: error interrupt status=0x8 2017-07-30T23:45:44+02:00 CrazyServer kernel: [214029.548165] sky2 0000:03:00.0: error interrupt status=0x40000008 2017-07-30T23:45:59+02:00 CrazyServer kernel: [214045.313551] sky2 0000:03:00.0: error interrupt status=0x40000008 2017-07-30T23:46:01+02:00 CrazyServer kernel: [214046.695887] sky2 0000:03:00.0: error interrupt status=0x40000008 2017-07-30T23:46:01+02:00 CrazyServer kernel: [214047.351521] sky2 0000:03:00.0: error interrupt status=0x40000008 2017-07-30T23:46:02+02:00 CrazyServer kernel: [214047.559463] sky2 0000:03:00.0: error interrupt status=0x40000008 2017-07-30T23:46:11+02:00 CrazyServer kernel: [214056.790955] sky2 0000:03:00.0: error interrupt status=0x40000008 2017-07-30T23:46:12+02:00 CrazyServer kernel: [214058.117275] sky2 0000:03:00.0: error interrupt status=0x40000008 2017-07-30T23:46:16+02:00 CrazyServer kernel: [214062.056383] sky2 0000:03:00.0: error interrupt status=0x8 2017-07-30T23:46:17+02:00 CrazyServer kernel: [214063.124915] sky2 0000:03:00.0: error interrupt status=0x8 2017-07-30T23:46:20+02:00 CrazyServer kernel: [214065.942479] sky2 0000:03:00.0: error interrupt status=0x40000008 2017-07-30T23:47:33+02:00 CrazyServer kernel: [214139.442902] sky2 0000:03:00.0: error interrupt status=0x40000008 2017-07-31T00:00:28+02:00 CrazyServer kernel: [214914.051214] sky2 0000:03:00.0: error interrupt status=0x40000008 2017-07-31T00:10:25+02:00 CrazyServer kernel: [215510.528919] sky2 0000:03:00.0: error interrupt status=0x40000008 2017-07-31T00:13:13+02:00 CrazyServer kernel: [215678.491358] sky2 0000:03:00.0: error interrupt status=0x40000008 2017-07-31T00:20:10+02:00 CrazyServer kernel: [216095.276985] sky2 0000:03:00.0: error interrupt status=0x40000008 2017-07-31T00:20:34+02:00 CrazyServer kernel: [216119.265336] sky2 0000:03:00.0: error interrupt status=0x8 2017-07-31T00:28:23+02:00 CrazyServer kernel: [216587.629802] sky2 0000:03:00.0: error interrupt status=0x40000008 2017-07-31T00:44:50+02:00 CrazyServer kernel: [217574.558506] sky2 0000:03:00.0: error interrupt status=0x40000008 And here is where I powered on the server again: 2017-07-31T07:20:52+02:00 CrazyServer kernel: [ 0.000000] Linux version 3.10.102 (root@build1) (gcc version 4.9.3 20150311 (prerelease) (crosstool-NG 1.20.0) ) #15152 SMP Thu Jul 13 04:20:59 CST 2017 My gut feeling is that it happens only on highly disk intensive operations such as scrubbing, repairing, file copying of many large files but anyway, my first step will be to replace to PSU. Edited July 31, 2017 by CrazyFin Link to comment Share on other sites More sharing options...
Polanskiman Posted August 1, 2017 Share #2 Posted August 1, 2017 Do you have a UPS? Link to comment Share on other sites More sharing options...
CrazyFin Posted August 1, 2017 Author Share #3 Posted August 1, 2017 Nope no UPS. I have a couple of them waiting to be installed though... Hmmm maybe it would be better to test with an UPS first to see if it actually is the PSU and not something else.. In fact, I´ll install the UPS tomorrow and start a scrub of my disk volume which usually triggers the sporadic shut downs. Link to comment Share on other sites More sharing options...
Polanskiman Posted August 1, 2017 Share #4 Posted August 1, 2017 Yes that would be the first thing to try. Sporadic shutdowns could be due to power instability. If you keep getting those unexpected shutdowns after trying with a UPS then it could literally be anything hardware related (PSU, CPU, RAM, MOBO). If you have a spare PSU try it first before buying a new one. You could also stress test the CPU under another OS with a software such as https://www.mersenne.org/download/#stresstest Link to comment Share on other sites More sharing options...
CrazyFin Posted August 24, 2017 Author Share #5 Posted August 24, 2017 (edited) Alright, sorry for this late reply back. I am quite embarrased but the solution was pretty clear when I started to open up the case in order to replace the PSU... No need to replace PSU... When I was connecting the PSU to see if the server was going down totally or if it was just the PSU shutting down I also decided to open up the case to prepare for a PSU replace I realised that there was a dust filter at the bottom of the chassi that I had forgotten about. I always clean the dust filters on the chassis 1-2 times per month but I had TOTALLY forgotten about this dust filter at bottom of the chassi... and guess what... it was totally clogged with dust so it is pretty clear that the PSU was closing down itself due to overheating.. After cleaning and then testing the server several times for a couple of days with for example a srubbing operations it is running perfectly well now. Edited August 24, 2017 by CrazyFin Link to comment Share on other sites More sharing options...
Polanskiman Posted August 29, 2017 Share #6 Posted August 29, 2017 Yes, dust is a killer. I usually clean the fans dust collectors every 2/3 months. Link to comment Share on other sites More sharing options...
Recommended Posts