TemplarB

HP Gen 8 problems with ‘System health’ etc

Recommended Posts

I’m a noob to NASes and thus what I’ll write below may sound stupid or inconsistent, but keep reading :)

I have HP Gen8 with baremetal installation of DSM_RS3617xs_15152 and all critical updates up to DSM 6.1.3-15152 Update 8. The system has 4x 1.5Tb Samsung disk in Raid5, all SMART diagnostics showing healthy.

I installed the latest critical update yesterday and the system restarted normally (or at least w/o informing me of any errors or warnings in the log). After about 12 hours of the system running smoothly I tried to open download station, but received a request to set it up (give default folders for files etc). At the same time, it was seeding torrents (I checked the trackers), so I decided just to reboot the system.

The system hasn’t rebooted properly in 10 min, so hard reboot was used. After this the DSM started w/o complaint (except for the warning ‘System booted up from an improper shutdown.’). Download station started properly and without issues. However, ‘System health’ widget shows no info and when I try to open ‘Storage Manager’ it doesn’t show me any disks and auto-closes (crashes?) in a few secs after showing a small msg window w/o any text and ‘ok’ button. See screenshots below. Another reboot hasn’t helped, still had to be done in hard way and still no info on ‘System health’, etc.

 

I can still access files and folders with Windows Explorer, disk station, audio and video station, so it seems there is no corruption. Logs are clear. Physically there are also no warning signs like whining HDDs.

What can it be and what can be done?

pics  

hdd.pngsys-health.png

Hide  
Edited by TemplarB

Share this post


Link to post
Share on other sites
8 minutes ago, sbv3000 said:

What update were you on before applying U8?

U7 for a week or so, w/o visible problems

Share this post


Link to post
Share on other sites

Link to the same problem: https://xpenology.com/forum/topic/8066-dsm-613-15152-update-5/?page=2&tab=comments#comment-78231 but on U5 and 1 post below - on U4 even. So probably not related to updates, or at least not to lates updates.

 

I had same problem a week ago on U7, but was thinking its HW issue.

 

Solution to migrate will probably work, but its better to go to root cause, otherwise it might happen again and again...

Edited by DorianGray

Share this post


Link to post
Share on other sites

Another observation - most of such cases were reported on hp Microserver Gen8, others has HW not specified (i.e. it maybe also Gen8).

Probably its Gen8-specific issue?

Share this post


Link to post
Share on other sites

it seems possible that the problem is specific to HPE Gen8

The relevant part of dmesg

log  

[    8.332102] scsi 6:0:0:0: Direct-Access     HP iLO   Internal SD-CARD                                                                                                                                                                      2.10 PQ: 0 ANSI: 0
[    8.333991] sd 6:0:0:0: [synoboot] 7862272 512-byte logical blocks: (4.02 GB/                                                                                                                                                             3.74 GiB)
[    8.335357] sd 6:0:0:0: [synoboot] Write Protect is off
[    8.335361] sd 6:0:0:0: [synoboot] Mode Sense: 23 00 00 00
[    8.336618] sd 6:0:0:0: [synoboot] No Caching mode page found
[    8.336762] sd 6:0:0:0: [synoboot] Assuming drive cache: write through
[    8.342635] sd 6:0:0:0: [synoboot] No Caching mode page found
[    8.342779] sd 6:0:0:0: [synoboot] Assuming drive cache: write through
[    8.358119]  synoboot: synoboot1 synoboot2 synoboot3
[    8.364334] sd 6:0:0:0: [synoboot] No Caching mode page found
[    8.364495] sd 6:0:0:0: [synoboot] Assuming drive cache: write through
[    8.364656] sd 6:0:0:0: [synoboot] Attached SCSI disk
[   11.059264] md: Autodetecting RAID arrays.
[   11.097858] md: invalid raid superblock magic on sda3
[   11.097990] md: sda3 does not have a valid v0.90 superblock, not importing!
[   11.127825] md: invalid raid superblock magic on sdb3
[   11.127951] md: sdb3 does not have a valid v0.90 superblock, not importing!
[   11.166628] md: invalid raid superblock magic on sdc3
[   11.166753] md: sdc3 does not have a valid v0.90 superblock, not importing!
[   11.227293] md: invalid raid superblock magic on sdd3
[   11.227417] md: sdd3 does not have a valid v0.90 superblock, not importing!
[   11.227422] md: Scanned 12 and added 8 devices.
[   11.227423] md: autorun ...
[   11.227425] md: considering sda1 ...
[   11.227429] md:  adding sda1 ...
[   11.227432] md: sda2 has different UUID to sda1
[   11.227435] md:  adding sdb1 ...
[   11.227437] md: sdb2 has different UUID to sda1
[   11.227440] md:  adding sdc1 ...
[   11.227443] md: sdc2 has different UUID to sda1
[   11.227446] md:  adding sdd1 ...
[   11.227448] md: sdd2 has different UUID to sda1
[   11.227459] md: created md0
[   11.227461] md: bind<sdd1>
[   11.227477] md: bind<sdc1>
[   11.227486] md: bind<sdb1>
[   11.227494] md: bind<sda1>
[   11.227501] md: running: <sda1><sdb1><sdc1><sdd1>
[   11.227687] md/raid1:md0: active with 4 out of 12 mirrors
 

Hide  

Share this post


Link to post
Share on other sites

Update on the topic.

As was mentioned above, similar problem occurred more than once after critical updates on HP Gen8 [supposedly with ds3617, but maybe 3615 too]. I have a pet theory of mine that the problem is not caused by a specific update but by server’s consequent reboot: people most likely reboot their servers rarely, thus while updates that led to the failure are different (it was 4, 5, 7 and 8 according to the forum history and it happened only for some, not all) the reboot is the common thing. Possibly that an update changes some config files or the like, I’m too much not a tech person to see the exact reason.

However, I assume that the problem starts when the system decides after reboot to autodet RAID arrays. According to linux manuals this is an old feature that shouldn’t be used. It can detect only 0.9 superblock, but I have 1.2 superblock according to this:

cat  

cat /proc/mdstat

Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [raidF1]

md2 : active raid5 sda3[0] sdd3[3] sdc3[2] sdb3[1]

      4380946368 blocks super 1.2 level 5, 64k chunk, algorithm 2 [4/4] [UUUU]

 

md1 : active raid1 sda2[0] sdb2[1] sdc2[2] sdd2[3]

      2097088 blocks [12/4] [UUUU________]

 

md0 : active raid1 sda1[0] sdb1[1] sdc1[2] sdd1[3]

      2490176 blocks [12/4] [UUUU________]

 

unused devices: <none>

Hide  

so, it is not surprising it says in log

sda3 does not have a valid v0.90 superblock, not importing!

 

Any thoughts?

Share this post


Link to post
Share on other sites

Problem solved!

Ok, what happened and what people should try.

Because while I had problems described in the first post, the system kinda worked, so I planned to migrate it on weekend. Thus on Friday I got a warning from the Security Advisor that DSM system files have incorrect hash values, more precisely, libsynostoragemgmt.so

The file had the same size and date as in other, perfectly working server, namely:

Quote

 

/usr/lib$ ls -l libsynostoragemgmt.so

-rw-r--r-- 1 root root 649896 Jul 13 01:51 libsynostoragemgmt.so

 

However, when I replaced it with a version from the working server, the system was able to reboot normally (previously only hard reboot helped) and is ok ever since.

So, if you have a similar problem

1.       Scan your system with the Security Advisor

2.       If there are wrong hashes, replace the mentioned files, check that their attributes are correct

3.       reboot

 

i207^cimgpsh_orig.png

  • Like 1
  • Thanks 1

Share this post


Link to post
Share on other sites
Posted (edited)

I had another system failure, this time GUI was out and, what was worse, the volume1 showed no directories in shh, so I haven’t been able to access my archive of usr/lib with correct files. Here are the steps what to do to restore the system

 

Get PUTTY or other program to get to SHH

Log in in SHH

Run dmesg to see the loading log and errors

 

Get a working copy of usr/lib either from someone with working DSM or have your ow archive. To make the archive use

 

Run sudo zip -r9 /volume1/lib9.zip /usr/lib/ in PUTTY on working DSM. The path may differ

 

Run sudo diff --brief -r /usr/lib/ /volume1/usr/lib/ | grep ' differ'

 

You’ll get a list of bad files like:

 

Files /usr/lib/libsynoshare.so.6 and /volume1/music/usr/lib/libsynoshare.so.6 differ

 

Go to lib by running cd /usr/lib

 

Rename bad file so (you’ll be asked for password again):

sudo mv libsynoshare.so.6 ~libsynoshare.so.6

 

Replace it with good copy:

sudo cp /volume1/usr/lib/libsynoshare.so.6 .

(note the dot at the end)

 

Set right as needed:

sudo chmod 644 libsynoshare.so.6

 

Reboot:

sudo reboot

 

I hope it will help you as it helped me!

Edited by TemplarB
techincal error
  • Like 1

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now