Jump to content
XPEnology Community

Physical RDM with ESXi: queue_depth problem with PVSCSI driver


Recommended Posts

Hi,

 

I need your help to fix this problem: I'm using the Physical RDM functionality of ESXi to pass SATA disks directly to DSM. I discovered that with this pass-through I can read the real SMART values using the "smartctl -d sat" mode. You can check the difference calling to the same disk using "smartctl -d sat" and "smartctl -d ata". If the disk is a SATA disk, connected over PVSCSI controller and using pRDM, then you will see the difference between the "fake" SMART values (-d ata) vs the "real" SMART values (-d sat).

 

The problem is the QUEUE_DEPTH of the disks connected using this mode. You can see the values with "lsscsi -l" (if you have installed the tool) or with "cat /sys/block/sd*/device/queue_depth". The different queue values that you can obtain are:

  • Virtual VMDK disk over PVSCSI: 31 (OK)
  • pRDM SATA disk over SATA: 31 (OK, but with troubles)
  • pRDM SATA disk over PVSCSI: 1 (INCORRECT)

 

If you compare the performance, then you will see that with pRDM disks and the PVSCSI controller inside DSM the write performance is low (~50%). But with the SATA (AHCI) controller the performance for one disk is OK, but the lantency increases a lot (~500%). Futhermore, using multiple pRDM disks with the SATA controller the system could generate a failure with the Supervisor (purple screen of death). Therefore, the best option is to use pRDM and PVSCSI in any case (or vRDM if you don't need SMART).

 

I'm then in the way to "fix" the queue_depth value. I'm reading a lot of information of the kernel sources searching for the root cause that implies to set the value to "1". Until now I suspect that the internal struct of the SCSI device has the value "tagged_supported" to "0". Because this enforces the "qdepth" to 1 in the "vmw_pvscsi.c" driver when using the "function pvscsi_change_queue_depth()". I comment that because executing "echo 31 > /sys/block/sd*/device/queue_depth" doesn't change the value.

 

So I request help to found a solution.

Any idea to debug the DSM kernel?

Any help with this problem?

I need to say that booting a recent Linux LiveCD with the same disk configuration on the VM the queue depth is correct (31). Therefore, this is not a fixed/required value and we can "fix" the problem.

 

Link to comment
Share on other sites

Hi,

 

Some news after a lot of debug with the Linux kernel:

 

- When using RDMs with the PVSCSI controller, different reports of the SCSI INQUIRY are obtained based on physical or virtual connection:

- With pRDMs (passthrough for SMART support of a SATA disks) the INQUIRYDATA Byte[7] returns '49' (0b00110001).

- With vRDMs (the same disk connected "virtual" without SMART support) the INQUIRYDATA Byte[7] returns '114' (0b01110010).

 

This table shows the content of the INQUIRYDATA messages:

typedef struct _INQUIRYDATA {
  UCHAR              DeviceType : 5;
  UCHAR              DeviceTypeQualifier : 3;
  UCHAR              DeviceTypeModifier : 7;
  UCHAR              RemovableMedia : 1;
  UCHAR              Versions;
  UCHAR              ResponseDataFormat : 4;
  UCHAR              HiSupport : 1;
  UCHAR              NormACA : 1;
  UCHAR              ReservedBit : 1;
  UCHAR              AERC : 1;
  UCHAR              AdditionalLength;
  UCHAR              Reserved[2];

  UCHAR              SoftReset : 1;
  UCHAR              CommandQueue : 1;
  UCHAR              Reserved2 : 1;
  UCHAR              LinkedCommands : 1;
  UCHAR              Synchronous : 1;
  UCHAR              Wide16Bit : 1;
  UCHAR              Wide32Bit : 1;
  UCHAR              RelativeAddressing : 1;

  UCHAR              VendorId[8];
  UCHAR              ProductId[16];
  UCHAR              ProductRevisionLevel[4];
  UCHAR              VendorSpecific[20];
  UCHAR              Reserved3[2];
  VERSION_DESCRIPTOR VersionDescriptors[8];
  UCHAR              Reserved4[30];
} INQUIRYDATA, *PINQUIRYDATA;

 

You can see here that with pRDM the CommandQueue value is FALSE and with vRDM the CommandQueue is TRUE.

 

My assumption is that the ESXi server is sending the CQ flag off because it doesn't test the type of the disk connected (to be safe). But for SATA disks with NCQ support the disk can handle multiple commands. And in fact it works... in a Linux kernel >3.x the queue_depth works between 1-31 values, and the performance is degraded when you set it to 1.

 

However, inside the XPE it seems that the DSM is enforcing the queue_depth to 1 because it checks the value of the "tagged_supported" value of the "scsi_device" structure. This value is set to "1" only if the INQUIRYDATA includes the flag CommandQueue. You can check the source file "scsi_scan.c" in the kernel and search for the function "scsi_add_lun()" and you will see something like:

 

	if ((sdev->scsi_level >= SCSI_2) && (inq_result[7] & 2) &&
	    !(*bflags & BLIST_NOTQ)) {
		sdev->tagged_supported = 1;
		sdev->simple_tags = 1;
	}
    

 

Therefore, my idea is one of these:

- In the redpill-lkm module create a new shim function to filter this INQUIRYDATA query. And pass the CQ flag on. Perhaps it could be done automatically (checking the disk and only if the controller is PVSCSI).

- Another option is to introduce a modification of the PVSCSI driver to do a similar filtering. Perhaps it has more sense because only users with this controller will need this. Then with a a boot parameter (like pvscsi=cq:2,3,5) you can manually enable the filtering for the disks that you're sure that have NCQ support.

What you think?

 

And why I want to obtain this? I'm trying to enable full SMART support over PVSCSI. I've readed a lot and I feel I can put some new code in the SCSI SHIM to use the SAT protocol to pass the SMART commands, instead of a burden fake emulation like now. But before implementing this is required to solve the problem of the queue_depth because this is limitating a lot the performance using the pRDM connection. And if we can fix this, then it will have sense to implement after the full SMART passthrough because the performance will be the same using virtual or physical connection.

 

You want to help me to do it?

 

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...