extraman

XPE in ESXi with RDM passthrough and SMART

Recommended Posts

Hi,

 

In the past a lot of discussion is done around the SMART support and ESXi RDMs.
So, I lot a deep research in how the Synology DSM handles the SMART support. I share here my results, because I can guarantee that SMART can be used inside a virtual machine using physical RDMs in ESXi.

 

- The first result is about the SMART support inside virtual machines. Anyone can repeat the test installing a recent Linux version inside a Virtual Machine and attaching to it a physical RDM disk. Inside this VM you can execute the command “smartctl -a -d sat /dev/sda” and verify that it gets the SMART info from the hard disk connected to the first SCSI port of the virtual controller. However, it is necessary to comply with some requirements: the physical disk needs to be SATA; the virtual controller needs to be SCSI; and the RDM needs to be configured as physical.

 

- The second result is about the SMART support inside XPEnology running inside a virtual machine. In this case, I configured this complex (nested) environment: XPE inside a VM running in Promox; and Promox running inside a VM running in ESXi. With this nested virtualization (very slow, by the way), the AHCI layer is virtualized in Promox over a physical RDM in ESXi. So the XPE VM is seeing an AHCI disk, and not a SCSI disk. In this case, the UI of DSM shows SMART info of the disk. But unfortunately the data aren’t true values, as the Promox driver pass default hardcoded values. But still, this test demonstrates that XPE can receive SMART data running inside a virtual machine.

 

- The third result comes from a kernel debug of the XPE. I discovered that Synology is using the smartmontools in a library form. So instead of call to the “smartctl” at any time, for discovering and other tasks it’s using the library compiled statically inside the binaries. The files involved are “synostoraged” and “strace-synostoraged-disk”. More or less, what is doing the DSM to handle SMART info is: discover the type of disk, then use kernel calls to the controller driver to send SMART requests and read the answer.

 

So, why then using XPE installed in baremetal the SMART info of SATA disks are OK, and not inside a Virtual Machine?

 

The answer is quite simple: because the Synology strategy identifies the disk incorrectly. So as a consequence the SMART commands used can’t be handle by the RDM passthrough driver.

I try to explain this in a more large description: To collect SMART info from a disk you need to use SMART commands. These commands are different depending on the type of disk. For example a SATA disk uses different commands than a SCSI disk. Furthermore, the issue is more complex as not only the disk is involved, but the controller too. So, the device driver of the controller can pass the correct SMART commands to the disk, or translate them in a form that the disk can handle it. And all of this is the inherent complexity that the smartmontools project are handling.

 

Knowing all of this the question is: It is possible to support SMART over RDMs in a virtual machine running XPE? And the answer is: yes, it’s possible.

 

Here is my solution: The central point is the controller device driver. And the key is to put inside this driver the code to handle a correct SMART commands translation. And related to the XPE project, the best candidate is the “pvscsi” device driver. Why? Because this driver isn’t native in DSM, and we are compiling it for XPE. Therefore it’s the first candidate. Furthermore, we can attach any RDM disk to this controller and also we obtain the best performance with this paravitual device. So, with the modified driver we only need to load it with some added parameter (like “insmod vmw_pvscsi.ko --smart-translation”) and the SMART support will be enabled!

 

In order to achieve this objective, the necessary work is reduced to modify the “pvscsi” device driver in this way: At time the current implementation of DSM recognizes any RDM disk as “SCSI”, so the native DSM tools use SCSI commands for SMART, but the ESXi virtual controller can’t handle these commands as the target disk is SATA. In fact, the first IOCTL with HDIO_DRIVE_CMD always fails with “-1” when XPE tries to identify a virtual disk connected to a “pvscsi” controller and a RDM target. So the solution can be: intercept these commands and translate them to regular AHCI SATA commands for SMART. Doing this translation the virtual controller will pass the commands to the physical target disk and the answer is received by the driver. With this approach the translation is transparent and the DSM will think that is using a regular SCSI disk with SMART support.

 

However, as this solution is feasible I don’t have sufficient time to implement it. Sorry! So, I request if some other likes to do it. The development can be done inside a regular Linux virtual machine, and it only applies to the “vmw_pvscsi” device driver. The goal is to intercept SMART commands and provide some translation. And for this task the kernel provides some useful libraries. To test the result you only need to load the modified driver inside XPE running in a virtual machine and execute the command “smartctl -a /dev/sda” and receive the same response as when using “smartctl -a -d sat /dev/sda”. When this will be true the driver will do the correct translation and SMART will be available in a virtualized XPE. This test can be done in a regular Linux too, as in fact this translation can be useful not only for XPE but for other similar projects.

 

I hope that all this will serve to clarify this problem and someone can finally implement a solution. In my opinion running XPE inside a VM is preferable as using it in baremetal. But the SMART support is a must have for a NAS. And the physical passthrough of a SATA controller ins't possible in a lot environments. So it would be very positive for the project to implement this solution.

 

Regards.
E.

 

Share this post


Link to post
Share on other sites
Posted (edited)

Hi E.

thanks for your post and explaination.

I have created the rdm Disk with vmkfstool -z <disk> <name.vmdk> and checked inside of XPE if i can read the smart infos with smartctl -a -d sat /dev/sd<x> and i get following output

 

smartctl -a -d sat /dev/sde
smartctl 6.5 (build date Oct 26 2018) [x86_64-linux-4.4.59+] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     VMware Virtual SATA Hard Drive
Serial Number:    10000000000000000001
LU WWN Device Id: 5 000c29 caf352b61
Firmware Version: 00000001
User Capacity:    21,474,836,480 bytes [21.4 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ATA/ATAPI-6 T13/1410D revision 0
Local Time is:    Tue Jan  8 19:05:03 2019 CET
SMART support is: Unavailable - device lacks SMART capability.

So, how was it for you possible to get the smart infos from a rdm disk?

 

regards,

fa2k

Edited by fa2k
typos

Share this post


Link to post
Share on other sites

If the necessary drivers (MPT2SAS etc.)are there (as with 103b/ds3617xs) and you use the Esxi LSI SAS controller SMART works fine with no work arounds. On 1.04b/ds918+ the MPT2SAS support is missing and existing MPT3SAS driver doesn’t recognize/support the Esxi LSI sas controller. 

Share this post


Link to post
Share on other sites

As a result of a bit of research on the matter I can add more clarity. 

 

As I understand it, the ESXI LSI Logic Parallel and the ParaVirtual controller  have no provision for translating SMART commands into something a SCSI drive can understand while the LSI Logic SAS controller does have this capability and I’ve confirmed that SMART works fine on a ds3617xs vm as long as it’s using the LSI SAS controller. . However, the SAS controller requires an Mpt2sas/mptsas set of drivers which are present in the 103b/ds3617xs combination but not in the 1.04b/ds918+. It is not recognized by the Mpt3sas driver and therefore the ESXI SAS controller doesn’t work in  ds918+ vm. 

 

Two possibilities I have imagined are 1. Inject a Mpt2sas support in the ds918+ or 2. Somehow patch the syno disk data collector daemon (which appears to issue SMART commands directly to the drives without using the smartctl utility). 

 

As I understand it,, as of dsm6.2.2 module security has been turned on preventing the use of new modules not compiled by Synology and a community community generated modules will no longer work once you update past 6.2.2. That’s something that Jun mentioned and I’d like to get confirmation that I understood it correctly. 

Share this post


Link to post
Share on other sites
30 minutes ago, wingspinner said:

As I understand it, the ESXI LSI Logic Parallel and the ParaVirtual controller  have no provision for translating SMART commands into something a SCSI drive can understand while the LSI Logic SAS controller does have this capability and I’ve confirmed that SMART works fine on a ds3617xs vm as long as it’s using the LSI SAS controller. . However, the SAS controller requires an Mpt2sas/mptsas set of drivers which are present in the 103b/ds3617xs combination but not in the 1.04b/ds918+. It is not recognized by the Mpt3sas driver and therefore the ESXI SAS controller doesn’t work in  ds918+ vm. 

 

Your discovery is consistent with this report here.  Note that 6.2.x doesn't like creating storage pools when the devices are attached to a SAS controller - but once they are built they can be moved to it and seem to work fine.  Probably not compatible with most users' expectations of reliability and supportability.

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.