• 0
Crusher55

My NAS is dead every couple of weeks

Question

Posted (edited)

So, this has happened to me twice.

After a week or two, something goes wrong and my XPenology baremetal NAS is simply broken.

It does auto boot correctly (I have set up boot and shutdown times) but none of the services work.

And when I open DSM it just tells me: "Sorry, the page you are looking for is not found."

I cannot access any services like Plex, the Web Station, anything.

SSH is working but when I login, it seems like a lot of files are gone.

Volume1(my only mounted HDD) for example is empty.

 

The first time around I lost a lot of my files, so in case it would happen again I did daily backups but this is just annoying having to reinstall the NAS every couple of weeks.

Is this normal for Xpenlogy baremetal setups or am I doing something wrong?

 

My baremetal server is custom built having:

 

I am running DSM 6.1.5-15254 Update 1 on JUN'S LOADER v1.02b - DS916+

During the 2nd installation I used a Serial from an old DS216J or something to possibly avoid issues.

For the first install I used a randomly generated Serial.

The Mac Address was correcly set and I carefully followed the installation guide both times.

I mean everything works splendid for a while (1-2 weeks) and then out of the blue it just breaks.

 

Furthermore I had the following apps running:

  • Plex Server
  • Web Station
  • Several Docker containers with stuff like a Minecraft server, Headphones, Cuchpotato, Home Assistant
  • Synology Drive
  • FHEM (only the second time)
  • some other Synology Apps
  • some other 3rd party Apps

 

The following might also be of interest:

  • I have Optware installed (via a 3rd party bootstrapper package) to use IPKG
  • I have an automatic boot/shutdown schedule running, it shuts down the NAS every evening and boots it early in the morning
  • At 6 am it runs a full auto. backup using Hyperbackup to a USB drive (has worked nicely)
  • I did try once or twice to login with my Synology Account (before I knew that that wouldnt work anyway and would only cause trouble) which of course did not work
  • I spent many hours correctly configuring the access rights for all docke and non docker apps
  • I disabled root SSH access and only allow public key auth. for SSH access
  • Telnet is disabled
  • The password for the admin account that I use is very strong and random (32 bits, with all security requirements enabled)
  • Every minute I run an sh scripts which checks whether new stuff has arrived on an external server and the copies it over using SFTP

 

Can anybody please help me. I can provide any data/information you need.

 

Thanks in advance!

Edited by Crusher55

Share this post


Link to post
Share on other sites

8 answers to this question

Recommended Posts

  • 0
39 minutes ago, Crusher55 said:

Is this normal for Xpenlogy baremetal setups or am I doing something wrong?

 

If this was normal I don't think anyone with its right mind would be using XPEnology. This being out of the way I have a question for you. Do you have a UPS? Have you checked that your machine is not overheating or is too dusty?

Share this post


Link to post
Share on other sites
  • 0
Posted (edited)

Ye I have a UPS but the NAS is not connected to it (Just noticed that... Ugh...), could a sudden power loss cause this? We didnt have any outages the last weeks, so that would be realy weird and my power supply is strong enough as far as I know. It can't be too dusty since it is brand new. I built it like a couple weeks ago and have been carefull removing every bit of dust since.

I do not think that is overheating. I checked DSM a couple of times (not daily however) and it said that the temperature of the CPU was Normal .

I do have a decent CPU fan (not the standard Intel one) and 1 front case fan. The fan mode DSM was set to was Quiet.

Would an overheat of the CPU cause data loss like this?

Btw after the first time this happened I thought the HDD was broken so I ordered a new one, but after this having happend twice, I supposed it was not the HDD.

Might it have something to do with the bootloader version I use, should I not use the DS916+ one?

Edited by Crusher55

Share this post


Link to post
Share on other sites
  • 0

CPU overheating leads to the machine being hard reset, meaning hard shutdown in order to protect hardware integrity. This could lead to data corruption if disk was writing data at that moment. However it seems unlikely that your issue is related to overheating considering what you said.

 

I would suggest running a torture test (RAM + CPU + general hardware) on that machine with a different OS like linux or Windows or whatever you have handy in order to test hardware stability. It could be RAM, it could be PSU or even MOBO.

 

I don't think this has to do with the loader else other people would have complained already and reported similar issues. Any reason you are use DS916+ instead of DS3615xs or DS3617xs?

Share this post


Link to post
Share on other sites
  • 0

No particular reason.

I thought it wouldnt matter so I just picked the DS916+ one.

I will however try the DS3617xs one after doing a stability test.

So you are sure that it is hardware related and not some kind of sofware problem?

I have a few thoughts:

  • Could Optware be the problem?
  • Could Synology have done this?
  • Could some App have caused this?

I will be home in a week and after having done the stability test I will post an update.

Share this post


Link to post
Share on other sites
  • 0

I recommend using DS3615xs. It is the most used and supported version.

 

No I am not sure (specially not being physically next to the machine and having no access to it)  but you have to eliminate the most likely source first which for me is hardware. Could be software (perhaps a compiled module). I don't think an app would cause that.

 

Did Volume1 also appeared empty the second time that this happened?

Share this post


Link to post
Share on other sites
  • 0

Yes. Volume1 is empty. It exists but when opening it in an SSH session, there is nothing in it.

The first time this happened, I booted the machine using a Linux USB and checked the HDD.

However the HDD did not automount, so I tried to mount it manually but it refused to do so saying it is already mounted.

When trying to unmount it, it said that the drive was no mounted.

 

This time I cannot yet tell since i wont be home to do that but I will update this post as soon as I get home.

I am currently on vacation so the only thing I can do is remote access to my network via a VPN server.

Share this post


Link to post
Share on other sites
  • 0
Posted (edited)

So, I am home now.

I booted the machine using a live ubuntu usb and mounted volume1. It looks just fine.

All files are in place nothing seems to be gone.

Last time I tried to mount volume1 it wouldnt work, so this is something new.

 

I ran a 30 min stress test using "stress", testing CPU & memory, which terminated fine.

I also ran an extensive S.M.A.R.T Test which said that my Disk is in perfect shape.

Also CPU & HDD temperatures are normal, so overheating is not the issue.

 

I have switched over to the DS3615xs bootloader now (carefully following your tutorial) and my Diskstation is now up and running again normally. I also connected the NAS to UPS.

I am however not entirely convinced that this is the end of this problem.

Any more ideas?

 

And thanks for the help so far!

Edited by Crusher55

Share this post


Link to post
Share on other sites
  • 0

Stress tests should normally be left to run for several hours. 30 min is not enough. Personally when I want to make sure a hardware is defect free I leave a stress test run for at minimum 6 hours. Best is to leave it overnight. For CPU I usually use Prime95 and for RAM I use MemTest86+

 

Did you have you looked at the logs see if there was anything unusual?

 

If the machine crashes every 1-2 weeks I think this is more likely to be a hardware problem. Else I think it would happen more often and at regular/predictable intervals. Of course software can't be completely excluded. Also make sure you UPS is not the one responsible. It could be that it is defective or battery damaged?

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now