Random shutdown when using NVME cache with sequential IO option enabled


Recommended Posts

Posted (edited)

Hello.

I'm currently running baremetal DSM and it's working great. Here's some details about my setup:

  • Baremetal install
  • DSM Ver 6.2.3-25426
  • Jun's v1.04 loader
  • extra.lzma
  • nvme patch

 

Since i wanted to have better performance when working with the nas i've decided to add a read-write cache using 2 nvme drives.

Both drives are properly detected and i can create and attach said cache to the drive pool just fine and the system is stable.

 

I wanted to also then enable the "sequential IO" option so that large file transfers hit the NVME write cache as well.

If i do that everything seems to work as intended at first. I can transfer at much higher than HDD speeds.

Howhever it seems that after 10 minutes or so (haven't timed it) be it that the system is IDLE or that i'm doing something the machine just shuts off.

 

I've tried checking the /var/log/messages and there doesn't seem anything that stands out as to what the problem could be.

If any of you have ideas on other things i could try let me know.

 

Here's part of the log file before the nas shutdowns:

 

Nas synoddsmd: utils.cpp:35 Fail to get synology account
Nas synoddsmd: user.cpp:129 get account info fail [100]
Nas synoddsmd: synoddsm-hostd.cpp:227 Fail to get DDSM licenses, errCode: 0x100
Nas synosnmpcd: snmp_get_client_data.cpp:150 Align history time success
Nas synopoweroff: system_sys_init.c:95 synopoweroff: System is going to poweroff
Nas [  972.515703] init: synonetd main process (6584) killed by TERM signal
Nas [  972.519283] init: synostoraged main process (12775) terminated with status 15
Nas [  972.524128] init: hotplugd main process (13271) killed by TERM signal
Nas [  972.524501] init: smbd main process (14435) killed by TERM signal
Nas synodisklatencyd: synodisklatencyd.cpp:659 Stop disk latency monitor daemon by SIGTERM
Nas syno_poweroff_task: System is acting poweroff.

 

Edited by luckcolors
Link to post
Share on other sites

Did you install using DDSM PAT file or with the proper DS918+ PAT file?  If DDSM, its validation check is failing and it is behaving as expected (shutting down to deny you services). DDSM is only for Synology VMM.

Link to post
Share on other sites

It's been a while since i did the install so i don't rememeber.
Is there a way i can check wich one it is currently installed?

 

In the system settings > update: it says DS918+ DSM 6.2.3-25426 Update 2.

 

Link to post
Share on other sites
Posted (edited)

That's the right one.  May as well install update 3 (not 6.2.4!) and see if it overwrites whatever is going wrong.

https://global.download.synology.com/download/DSM/criticalupdate/update_pack/25426-3/synology_apollolake_918%2B.pat

 

Barring that I would just run a migration install of 6.2.3 from DS Assistant and see if that fixes things.

https://global.download.synology.com/download/DSM/release/6.2.3/25426/DSM_DS918%2B_25426.pat

Edited by flyride
Link to post
Share on other sites

I've removed the cache and then updated for being safe.

The update went fine.

I've recreated the cache again (i didn't seem to need to reinstall the nvme patch as i could already select the drives in the storage manager).

Aaaaaand the moment i enabled the option after like 5 minutes it shutdown again.

 

I'm not sure reinstalling will solve anything, as i've never used SSH for anything other than interacting with docker and installing the patch.

If you think it's going to help i'll try.

Any other log file you think i could check?

Link to post
Share on other sites

So there's something new that i wasn't here the first time it happened.

 

From the storage pool log: "Disk overheat: Disk [Cache device 1] had reached 70°C, shutdown system now."

This would really seem the culprit right?.

 

It doesn't make sense for the drive for getting this hot though.

I'll prod in the case a bit maybe it really is getting this hot, but it shouldn't it is a well ventilated case.

Could it be that nas is ready the temperatures invalid?

 

Link to post
Share on other sites

I think the remperatures readings are correct wich means i'll need some better cooling.

Thanks for the help.

 

I guess this solves the problem, it also explains why it would cleanly shutdown after a bit.

 

Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.