Jump to content
XPEnology Community

HP MicroServer Gen8 - ESXi v6 - Disk Errors


Schumi

Recommended Posts

Hi Everyone.

 

First of all XPEnology is great fun project. I hope its never end or Synology wont block all of us and we will use future release (DSM6) with our customs systems.

I bought 1 months ago HP MS Gen8. What a beatiful and usefull small piece of system. But I have some weird problems. Mine system come with G1640T and 4 GB memory. After some trials I change some parts.

 

Intel Xeon E3-1270 v2 - 4 x 3.5 Ghz

2 x 8 GB Kingstone 10600 Mhz ECC

4 x 4TB Western Digital Red NAS Disks

1 x 240 GB SSD

AHCI Mode

 

At first I use HP Custom ESXi 6 Update 1 image from Jan 2016 with MicroSD card. But sometimes when I reboot system it wont see the card. I know that is card of iLo problem. Then I move to USB stick for ESXi. But couple of time USB make some problems and I reinstall. Know I'm using another USB. At that stage I have G1640T and 4GB Ram.

 

Last week ESXi 6 Update 2 rolled out. And I made fresh install on USB. It found my datastore on 240 GB SSD. I create Physical RDM files for my 4 x 4TB Disks and create DSM 5.2 VM. So far so good. I have good speed (100-110 mb write over Samba) and I copy almost all data from my old Zyxel NAS to DSM 5.2. Then I reboot ESXi and DSM wont start. It said it cant lock file or somethink like that. After couple restart it start but it freeze. VMWare Embedded Web Client wont freeze but I cant even stop DSM. Even after couple restart nothing change. Then I look ESXi Logs. I saw lots of errors.

 

2016-03-23T23:50:04.959Z cpu5:32796)ScsiDeviceIO: 2613: Cmd(0x439d80362680) 0x85, CmdSN 0x2f from world 35341 to dev "t10.ATA_____WDC_WD40EFRX2D68WT0N0_________________________WD2DWCC4E4XPHS2P" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0xb 0x0 0x0.
2016-03-23T23:50:04.964Z cpu5:32796)ScsiDeviceIO: 2613: Cmd(0x439d80362680) 0x85, CmdSN 0x2f from world 35341 to dev "t10.ATA_____WDC_WD40EFRX2D68WT0N0_________________________WD2DWCC4E4XPHS2P" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0xb 0x0 0x0.
2016-03-23T23:50:05.022Z cpu5:32796)ScsiDeviceIO: 2613: Cmd(0x439d80362680) 0x85, CmdSN 0x2f from world 35341 to dev "t10.ATA_____WDC_WD40EFRX2D68WT0N0_________________________WD2DWCC4E4XPHS2P" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0xb 0x0 0x0.
2016-03-23T23:50:05.035Z cpu5:32796)ScsiDeviceIO: 2613: Cmd(0x439d80362680) 0x85, CmdSN 0x2f from world 35341 to dev "t10.ATA_____WDC_WD40EFRX2D68WT0N0_________________________WD2DWCC4E4XPHS2P" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0xb 0x0 0x0.
2016-03-23T23:50:05.063Z cpu5:32796)ScsiDeviceIO: 2613: Cmd(0x439d80362680) 0x85, CmdSN 0x2f from world 35341 to dev "t10.ATA_____WDC_WD40EFRX2D68WT0N0_________________________WD2DWCC4E4XPHS2P" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0xb 0x0 0x0.
2016-03-23T23:50:05.076Z cpu5:32796)ScsiDeviceIO: 2613: Cmd(0x439d80362680) 0x85, CmdSN 0x2f from world 35341 to dev "t10.ATA_____WDC_WD40EFRX2D68WT0N0_________________________WD2DWCC4E4XPHS2P" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0xb 0x0 0x0.
2016-03-23T23:50:05.106Z cpu5:32796)ScsiDeviceIO: 2613: Cmd(0x439d80362680) 0x85, CmdSN 0x2f from world 35341 to dev "t10.ATA_____WDC_WD40EFRX2D68WT0N0_________________________WD2DWCC4E4XPHS2P" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0xb 0x0 0x0.
2016-03-23T23:50:05.115Z cpu5:32796)ScsiDeviceIO: 2613: Cmd(0x439d80362680) 0x85, CmdSN 0x2f from world 35341 to dev "t10.ATA_____WDC_WD40EFRX2D68WT0N0_________________________WD2DWCC4E4XPHS2P" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0xb 0x0 0x0.
2016-03-23T23:50:05.157Z cpu5:32796)ScsiDeviceIO: 2613: Cmd(0x439d80362680) 0x85, CmdSN 0x2f from world 35341 to dev "t10.ATA_____WDC_WD40EFRX2D68WT0N0_________________________WD2DWCC4E4XPHS2P" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0xb 0x0 0x0.
2016-03-23T23:50:05.161Z cpu5:32796)ScsiDeviceIO: 2613: Cmd(0x439d80362680) 0x85, CmdSN 0x2f from world 35341 to dev "t10.ATA_____WDC_WD40EFRX2D68WT0N0_________________________WD2DWCC4E4XPHS2P" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0xb 0x0 0x0.

 

And this errors for all of my hard drives except SSD. Even I see errors for USB Stick which is only used for boot. Even I delete RDM files and just restart ESXi without starting any vm I can see this errors. And after lots of errors like 2000s its starting to reseting AHCI ports. I tried switch B120i Raid Mode and create Raid0 for each drive but I can see similar "failed H:0x0 D:0x2 P:0x0 Valid sense data: 0xb 0x0 0x0." type errors. Not so much as AHCI but it still. I cant be sure is second hand Xeon CPU or new rams problematic or ESXi. Becasuse even I remove all HDDs I can see this errors for USB.

 

2016-03-24T00:01:01.316Z cpu5:32796)NMP: nmp_ThrottleLogForDevice:3298: Cmd 0x9e (0x439d80429b80, 0) to dev "mpx.vmhba32:C0:T0:L0" on path "vmhba32:C0:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0. Act:NONE

 

For now I'm booting XPEnology ISO directly on real hardware. But I really want ESXi. Dont want to go Hyper-V route. I know lots of people using same microservers with esxi + xpe. Is anyone see similar problems. Can anyone check logs of vmkernel for me. I will try reinstall ESXi 6 Update 2 on another USB.

 

By the way I just install latest iLo 2.4 which is not released yet. HP site showing 1 April for release. If anyone interested here is the link.

 

Link to comment
Share on other sites

Hi,

 

We have almost the same setup.

 

Do you have the 40 degree celsius bug? That the ILO always show 40 degree Celsius for the CPU temperature?

 

I also have some issues but not the same one you are experiencing.

Mine are more related to volumes crashing even though I have had the same setup running in the past without any issues.

Mine started when I upgraded to the new xpenoboot but even when I went back still having the same issues.

BTW. I use vodka files not RDW.

 

I am now running on a new host (HP ML310e Gen8 v2) and without raid, just a couple of big volumes too see if that stays intact.

Link to comment
Share on other sites

Yes I think I have same 40 degree bug.

 

Today I remove my SSD. Set SATA controller to B120 RAID. Create 4 x RAID 0 for each hard drive. And I mount Xpenology ISO with ILO.

So I boot XPenology from real PC. When I try to logon via browser it want to reinstall. Not just install. Reinstall.

This means he finally found volume but something not right. I reinstall with DSM 5.2 pat file. Then its fixed.

 

Then I put my SSD to 5th ODD sata port. And create another RAID 0 for SSD. Put ESXi USB back to inside USB port. Boot from ESXi.

Create DSM virtual machine. Create RDM files for each disks via SSH. Add this disks to DSM virtual machine. Same orders as ILO Storage show order.

Mount XPenology ISO to DSM virtual machine. And boot the VM. But it show install. Not reinstall. I think this mean he didnt find disks or volume on disks.

And If i click to install will I loose my volume ? How can I check disk and volumes on DSM boot?

Link to comment
Share on other sites

By the way I just install latest iLo 2.4 which is not released yet. HP site showing 1 April for release. If anyone interested here is the link.

 

 

You might not know this, but the iLO update only updates the iLO chip, not the BIOS. For that you need the Service Pack for ProLiant (SPP).

 

 

You can get the latest version here (2015.10.0): ftp://ftp.hp.com/pub/softlib2/software1 ... 8/v113584/

 

Burn the iso to a USB or DVD and boot it, select the auto update and the software on the iso does the rest.

 

The SPP contains many fixes for all sorts or issues, also for ESXi:

(Revision) VMware - Recovered Paths Are Not Restored When Using an HPE Smart Array Controller Connected to a Shared Storage RAID Enclosure on VMware ESXi 5.0, ESXi 5.1, ESXi 5.5 or ESXi 6.0

Source: http://h17007.www1.hp.com/us/en/enterpr ... x#tab=TAB2

Link to comment
Share on other sites

Do you tried passthrough the SATA controller in order to be able to access the disks instead of RDM?

I have a similar setup and don't passthrough the controllers. I just create a vmfs datastore, create virtual disks which I then pass to the xpenology vms. Has been performing well and stable for a very long time.

Link to comment
Share on other sites

I migrated from an n54l to a G8 and the first month or so were hell because of this. I ended up loosing the majority of my data trying to resolve it.

 

The basics of it are. when the Raid controller in the Gen8 is configured for AHCI mode it will not cooperate with ESXI which causes port resets and crashes the vm's (and sometimes the whole system).

The agreed workaround is to enable the raid controller and if you want to add a single drive configure it as an array consisting of a single disk in raid 0 mode (single disk as raid 0 is treated as passthrough)

 

I have 5 single disk raid 0 arrays. 4 of these are passthrough single disks to ESXI 6 and since i configured it this way i have not had a single crash or error.

Link to comment
Share on other sites

 

You might not know this, but the iLO update only updates the iLO chip, not the BIOS. For that you need the Service Pack for ProLiant (SPP).

 

 

You can get the latest version here (2015.10.0): ftp://ftp.hp.com/pub/softlib2/software1 ... 8/v113584/

 

Burn the iso to a USB or DVD and boot it, select the auto update and the software on the iso does the rest.

 

The SPP contains many fixes for all sorts or issues, also for ESXi:

(Revision) VMware - Recovered Paths Are Not Restored When Using an HPE Smart Array Controller Connected to a Shared Storage RAID Enclosure on VMware ESXi 5.0, ESXi 5.1, ESXi 5.5 or ESXi 6.0

Source: http://h17007.www1.hp.com/us/en/enterpr ... x#tab=TAB2

 

I know iLo just iLo not include BIOS but my brand new unit already shipped with latest J06 BIOS. 2015.07.16 Also I already run SPP on my system. But I just realize a new SPP just come out. 2016.04.0 http://h17007.www1.hp.com/us/en/enterpr ... x#tab=TAB1

 

Do you tried passthrough the SATA controller in order to be able to access the disks instead of RDM?

I have a similar setup and don't passthrough the controllers. I just create a vmfs datastore, create virtual disks which I then pass to the xpenology vms. Has been performing well and stable for a very long time.

 

I think that. But if I passthrough whole B120i Raid controller or AHCI controller then I also passthrough my SSD. So I dont have any other drive for ESXi datastore. Including XPEnology VM.

I migrated from an n54l to a G8 and the first month or so were hell because of this. I ended up loosing the majority of my data trying to resolve it.

 

The basics of it are. when the Raid controller in the Gen8 is configured for AHCI mode it will not cooperate with ESXI which causes port resets and crashes the vm's (and sometimes the whole system).

The agreed workaround is to enable the raid controller and if you want to add a single drive configure it as an array consisting of a single disk in raid 0 mode (single disk as raid 0 is treated as passthrough)

 

I have 5 single disk raid 0 arrays. 4 of these are passthrough single disks to ESXI 6 and since i configured it this way i have not had a single crash or error.

 

I'm already doing this. 5 Raid 0 Arrays for each disks. But my disks have data. I know If I had another big capacity disks or NAS I was thinking to backup all my data and start from scratch with B120i - Raid 0 each disk separately RDM all disks it will work. But my concern is If I cant use my AHCI disk on Raid 0 arrays on ESXi reverse thing also fail. If my MS Gen 8 broken I want my Raid 0 arrayed disk usable on another system. I want to know why my previously AHCI attached and data contained disk isnt trully passed over B120i Raid 0 Arrayed and RDM to XPE.

 

An alternative would be to install a PCI-E HBA card and run the disks of that.

 

I dont want to use any PCI-E card on this system. I will add an NVIDIA card for Cuda.

Link to comment
Share on other sites

×
×
  • Create New...