CrazyFin

Xpenelogy / Synology server shutting down unexpectedly sometimes

Recommended Posts

Since DSM 6.1.2-15132 I have started to experience random shut downs of my Xpenelogy server. My feeling is that this happens when disks (I have totally 8 disks with 4TB each) are doing a recovery scrub after an upgrade or any other very heavy file operation.

It might be my PSU that is starting to give up and I´ll most likely try with replacing the PSU this coming weekend.

The PSU is a Corsair RM1000 and I have had it installed for approx a year in my barebone Xpenology server (Asus P7F-X with X3440 CPU, LSI9211 controller card with 8 x 4TB WD green disks).

 

I can not see any particular events in the kern.log file and there is nothing that indicates a PSU problem or CPU heating problem. CPU seems to run at approx 40-44 degrees Celsius when the shut down happens.

 

These are the last lines I see in the kern.log file. There is no message about powering down and the interrupt errors shown below can be seen other times as well without Xpenology shutting down:

2017-07-30T23:45:39+02:00 CrazyServer kernel: [214024.761531] sky2 0000:03:00.0: error interrupt status=0x40000008
2017-07-30T23:45:40+02:00 CrazyServer kernel: [214026.041864] sky2 0000:03:00.0: error interrupt status=0x8
2017-07-30T23:45:44+02:00 CrazyServer kernel: [214029.548165] sky2 0000:03:00.0: error interrupt status=0x40000008
2017-07-30T23:45:59+02:00 CrazyServer kernel: [214045.313551] sky2 0000:03:00.0: error interrupt status=0x40000008
2017-07-30T23:46:01+02:00 CrazyServer kernel: [214046.695887] sky2 0000:03:00.0: error interrupt status=0x40000008
2017-07-30T23:46:01+02:00 CrazyServer kernel: [214047.351521] sky2 0000:03:00.0: error interrupt status=0x40000008
2017-07-30T23:46:02+02:00 CrazyServer kernel: [214047.559463] sky2 0000:03:00.0: error interrupt status=0x40000008
2017-07-30T23:46:11+02:00 CrazyServer kernel: [214056.790955] sky2 0000:03:00.0: error interrupt status=0x40000008
2017-07-30T23:46:12+02:00 CrazyServer kernel: [214058.117275] sky2 0000:03:00.0: error interrupt status=0x40000008
2017-07-30T23:46:16+02:00 CrazyServer kernel: [214062.056383] sky2 0000:03:00.0: error interrupt status=0x8
2017-07-30T23:46:17+02:00 CrazyServer kernel: [214063.124915] sky2 0000:03:00.0: error interrupt status=0x8
2017-07-30T23:46:20+02:00 CrazyServer kernel: [214065.942479] sky2 0000:03:00.0: error interrupt status=0x40000008
2017-07-30T23:47:33+02:00 CrazyServer kernel: [214139.442902] sky2 0000:03:00.0: error interrupt status=0x40000008
2017-07-31T00:00:28+02:00 CrazyServer kernel: [214914.051214] sky2 0000:03:00.0: error interrupt status=0x40000008
2017-07-31T00:10:25+02:00 CrazyServer kernel: [215510.528919] sky2 0000:03:00.0: error interrupt status=0x40000008
2017-07-31T00:13:13+02:00 CrazyServer kernel: [215678.491358] sky2 0000:03:00.0: error interrupt status=0x40000008
2017-07-31T00:20:10+02:00 CrazyServer kernel: [216095.276985] sky2 0000:03:00.0: error interrupt status=0x40000008
2017-07-31T00:20:34+02:00 CrazyServer kernel: [216119.265336] sky2 0000:03:00.0: error interrupt status=0x8
2017-07-31T00:28:23+02:00 CrazyServer kernel: [216587.629802] sky2 0000:03:00.0: error interrupt status=0x40000008
2017-07-31T00:44:50+02:00 CrazyServer kernel: [217574.558506] sky2 0000:03:00.0: error interrupt status=0x40000008

 

And here is where I powered on the server again:
 

2017-07-31T07:20:52+02:00 CrazyServer kernel: [    0.000000] Linux version 3.10.102 (root@build1) (gcc version 4.9.3 20150311 (prerelease) (crosstool-NG 1.20.0) ) #15152 SMP Thu Jul 13 04:20:59 CST 2017

 

My gut feeling is that it happens only on highly disk intensive operations such as scrubbing, repairing, file copying of many large files but anyway, my first step will be to replace to PSU.

 

Edited by CrazyFin

Share this post


Link to post
Share on other sites

Nope no UPS.

I have a couple of them waiting to be installed though... :oops:

Hmmm maybe it would be better to test with an UPS first to see if it actually is the PSU and not something else.. In fact, I´ll install the UPS tomorrow and start a scrub of my disk volume which usually triggers the sporadic shut downs.

 

Share this post


Link to post
Share on other sites

Yes that would be the first thing to try. Sporadic shutdowns could be due to power instability. If you keep getting those unexpected shutdowns after trying with a UPS then it could literally be anything hardware related (PSU, CPU, RAM, MOBO). If you have a spare PSU try it first before buying a new one. You could also stress test the CPU under another OS with a software such as https://www.mersenne.org/download/#stresstest

Share this post


Link to post
Share on other sites

Alright, sorry for this late reply back.

 

I am quite embarrased but the solution was pretty clear when I started to open up the case in order to replace the PSU... No need to replace PSU...

 

When I was connecting the PSU to see if the server was going down totally or if it was just the PSU shutting down I also decided to open up the case to prepare for a PSU replace I realised that there was a dust filter at the bottom of the chassi that I had forgotten about.

 

I always clean the dust filters on the chassis 1-2 times per month but I had TOTALLY forgotten about this dust filter at bottom of the chassi... and guess what... it was totally clogged with dust so it is pretty clear that the PSU was closing down itself due to overheating..

 

After cleaning and then testing the server several times for a couple of days with for example a srubbing operations it is running perfectly well now. :-)

Edited by CrazyFin

Share this post


Link to post
Share on other sites
Guest
This topic is now closed to further replies.