PnoT Posted September 6, 2015 #1 Posted September 6, 2015 (edited) I seem to be having slow performance on my SSDs in my SM chassis and am unable to pinpoint the source of the problem. I'm having severe performance issues with my SSDs in the sub 200MB/sec range for the entire array but today those numbers have increased a bit but are still below what I would expect from a RAID 0 of all SSDs with this setup. I've pieced together a bunch of information and have tried my best to supply everything I can think of in this post to describe my issue in the hopes that someone can help me diagnose the problem. One thing I couldn't find is a reliable way to determine link speed on each drive as most of the commands I found, on the net, came back with "". Current Setup X8SIL-F Xeon 3450 16GB ECC RAM SuperMicro SC846 Chassis SAS2 backplane IBM M1015 flashed to an LSI 9211-8i in IT mode and running R19 firmware. Single cable from M1015 P0 to PRI_J0 on the backplane 5592.2 Update 3 Drive / RAID layout 8 x 4TB WD RED + 2 x 5TB WD RED in SHR 4 x 256GB Samsung 850 Pro in RAID 0 Drive Info: /dev/sdq: ATA device, with non-removable media Model Number: Samsung SSD 850 PRO 256GB Serial Number: S1SUNSAFC81422B Firmware Revision: EXM02B6Q Transport: Serial, ATA8-AST, SATA 1.0a, SATA II Extensions, SATA Rev 2.5, SATA Rev 2.6, SATA Rev 3.0 Standards: Used: unknown (minor revision code 0x0039) Supported: 9 8 7 6 5 Likely used: 9 Configuration: Logical max current cylinders 16383 16383 heads 16 16 sectors/track 63 63 -- CHS current addressable sectors: 16514064 LBA user addressable sectors: 268435455 LBA48 user addressable sectors: 500118192 Logical Sector size: 512 bytes Physical Sector size: 512 bytes Logical Sector-0 offset: 0 bytes device size with M = 1024*1024: 244198 MBytes device size with M = 1000*1000: 256060 MBytes (256 GB) cache/buffer size = unknown Nominal Media Rotation Rate: Solid State Device Capabilities: LBA, IORDY(can be disabled) Queue depth: 32 Standby timer values: spec'd by Standard, no device specific minimum R/W multiple sector transfer: Max = 1 Current = 1 DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6 Cycle time: min=120ns recommended=120ns PIO: pio0 pio1 pio2 pio3 pio4 Cycle time: no flow control=120ns IORDY flow control=120ns Commands/features: Enabled Supported: * SMART feature set Security Mode feature set * Power Management feature set * Write cache * Look-ahead * Host Protected Area feature set * WRITE_BUFFER command * READ_BUFFER command * NOP cmd * DOWNLOAD_MICROCODE SET_MAX security extension * 48-bit Address feature set * Device Configuration Overlay feature set * Mandatory FLUSH_CACHE * FLUSH_CACHE_EXT * SMART error logging * SMART self-test * General Purpose Logging feature set * WRITE_{DMA|MULTIPLE}_FUA_EXT * 64-bit World wide name Write-Read-Verify feature set * WRITE_UNCORRECTABLE_EXT command * {READ,WRITE}_DMA_EXT_GPL commands * Segmented DOWNLOAD_MICROCODE * Gen1 signaling speed (1.5Gb/s) * Gen2 signaling speed (3.0Gb/s) * Gen3 signaling speed (6.0Gb/s) * Native Command Queueing (NCQ) * Phy event counters * unknown 76[15] * DMA Setup Auto-Activate optimization Device-initiated interface power management * Asynchronous notification (eg. media change) * Software settings preservation unknown 78[8] * SMART Command Transport (SCT) feature set * SCT LBA Segment Access (AC2) * SCT Error Recovery Control (AC3) * SCT Features Control (AC4) * SCT Data Tables (AC5) * reserved 69[4] * DOWNLOAD MICROCODE DMA command * SET MAX SETPASSWORD/UNLOCK DMA commands * WRITE BUFFER DMA command * READ BUFFER DMA command * Data Set Management TRIM supported (limit 8 blocks) Security: Master password revision code = 65534 supported not enabled not locked not frozen not expired: security count supported: enhanced erase 2min for SECURITY ERASE UNIT. 2min for ENHANCED SECURITY ERASE UNIT. Logical Unit WWN Device Identifier: 50025388a08df889 NAA : 5 IEEE OUI : 002538 Unique ID : 8a08df889 Checksum: correct { /volume3}-> dmesg | grep "Write cache" [ 8.714585] sd 0:0:15:0: [sdp] Write cache: enabled, read cache: enabled, supports DPO and FUA [ 8.714698] sd 0:0:8:0: [sdi] Write cache: enabled, read cache: enabled, supports DPO and FUA [ 8.715425] sd 0:0:9:0: [sdj] Write cache: enabled, read cache: enabled, supports DPO and FUA [ 8.716042] sd 0:0:10:0: [sdk] Write cache: enabled, read cache: enabled, supports DPO and FUA [ 8.716334] sd 0:0:12:0: [sdm] Write cache: enabled, read cache: enabled, supports DPO and FUA [ 8.716425] sd 0:0:11:0: [sdl] Write cache: enabled, read cache: enabled, supports DPO and FUA [ 8.716564] sd 0:0:14:0: [sdo] Write cache: enabled, read cache: enabled, supports DPO and FUA [ 8.769484] sd 0:0:13:0: [sdn] Write cache: enabled, read cache: enabled, supports DPO and FUA [ 8.772838] sd 0:0:5:0: [sdf] Write cache: enabled, read cache: enabled, supports DPO and FUA [ 8.773529] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, supports DPO and FUA [ 8.773881] sd 0:0:6:0: [sdg] Write cache: enabled, read cache: enabled, supports DPO and FUA [ 8.775287] sd 0:0:2:0: [sdc] Write cache: enabled, read cache: enabled, supports DPO and FUA [ 8.778207] sd 0:0:3:0: [sdd] Write cache: enabled, read cache: enabled, supports DPO and FUA [ 8.778531] sd 0:0:1:0: [sdb] Write cache: enabled, read cache: enabled, supports DPO and FUA [ 8.779942] sd 0:0:4:0: [sde] Write cache: enabled, read cache: enabled, supports DPO and FUA [ 8.790654] sd 0:0:7:0: [sdh] Write cache: enabled, read cache: enabled, supports DPO and FUA [ 66.494905] sd 7:0:0:0: [synoboot] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA [236816.271293] sd 0:0:17:0: [sdq] Write cache: enabled, read cache: enabled, supports DPO and FUA [236830.021295] sd 0:0:18:0: [sdr] Write cache: enabled, read cache: enabled, supports DPO and FUA [236838.521456] sd 0:0:19:0: [sds] Write cache: enabled, read cache: enabled, supports DPO and FUA [236927.771667] sd 0:0:20:0: [sdt] Write cache: enabled, read cache: enabled, supports DPO and FUA The hdparm results per drive and finally the overall array look right on the money but why are tests with dd so horrible Disk /dev/sdq: 256GB Disk /dev/sdr: 256GB Disk /dev/sds: 256GB Disk /dev/sdt: 256GB hdparm -tT --direct /dev/sdr /dev/sdr: Timing O_DIRECT cached reads: 946 MB in 2.00 seconds = 472.71 MB/sec Timing O_DIRECT disk reads: 1468 MB in 3.00 seconds = 489.13 MB/sec hdparm -tT --direct /dev/sds /dev/sds: Timing O_DIRECT cached reads: 966 MB in 2.00 seconds = 482.10 MB/sec Timing O_DIRECT disk reads: 1476 MB in 3.00 seconds = 491.79 MB/sec hdparm -tT --direct /dev/sdt /dev/sdt: Timing O_DIRECT cached reads: 962 MB in 2.00 seconds = 480.54 MB/sec Timing O_DIRECT disk reads: 1464 MB in 3.00 seconds = 487.95 MB/sec hdparm -tT --direct /dev/sdq /dev/sdq: Timing O_DIRECT cached reads: 964 MB in 2.00 seconds = 481.62 MB/sec Timing O_DIRECT disk reads: 1466 MB in 3.00 seconds = 488.58 MB/sec hdparm -tT --direct /dev/vg3/volume_3 /dev/vg3/volume_3: Timing O_DIRECT cached reads: 2880 MB in 2.00 seconds = 1439.44 MB/sec Timing O_DIRECT disk reads: 4570 MB in 3.00 seconds = 1523.11 MB/sec Here is the query on my LSI controller to determine link speed which look like 8x: lspci -vvv -d 1000:0072 02:00.0 Serial Attached SCSI controller: LSI Logic / Symbios Logic SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] (rev 03) Subsystem: Device 1028:1f1c Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- Latency: 0, Cache Line Size: 32 bytes Interrupt: pin A routed to IRQ 16 Region 0: I/O ports at c000 [size=11] Region 1: Memory at fb3b0000 (64-bit, non-prefetchable) [size=64K] Region 3: Memory at fb3c0000 (64-bit, non-prefetchable) [size=256K] Expansion ROM at fb400000 [disabled] [size=1M] Capabilities: [50] Power Management version 3 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- Capabilities: [68] Express (v2) Endpoint, MSI 00 DevCap: MaxPayload 4096 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+ RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ FLReset- MaxPayload 256 bytes, MaxReadReq 512 bytes DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend- LnkCap: Port #0, Speed 5GT/s, Width x8, ASPM L0s, Exit Latency L0s <64ns, L1 <1us ClockPM- Surprise- LLActRep- BwNot- LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk- ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 5GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Range BC, TimeoutDis+, LTR-, OBFF Not Supported DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis- Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1- EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest- Capabilities: [d0] Vital Product Data Unknown small resource type 00, will not decode more. Capabilities: [a8] MSI: Enable- Count=1/1 Maskable- 64bit+ Address: 0000000000000000 Data: 0000 Capabilities: [c0] MSI-X: Enable+ Count=15 Masked- Vector table: BAR=1 offset=0000e000 PBA: BAR=1 offset=0000f800 Capabilities: [100 v1] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn- Capabilities: [138 v1] Power Budgeting <?> Kernel driver in use: mpt2sas If you look at the performance of volume2 which consists of NAS REDS (SHR) it's not bad when using dd and the speeds are very consistent. Keep in mind that this image only shows 7 / 9 drives in the array due to the limitation of the resource monitor tool so if you add another 2 drives @ 90MB/sec that's well over 1GB/sec. dd if=/dev/zero of=/volume2/test.bin bs=1M count=500M The same command on the RAID0 Samsung 850 Pros nets some pretty crappy results. If you look closely you can see huge swings in performance from 50MB/sec to almost 200MB/sec per drive which is completely the opposite of how the SHR and spinning disks are performing. dd if=/dev/zero of=/volume3/test.bin bs=1M count=500M Edited September 7, 2015 by Guest
PnoT Posted September 7, 2015 Author #2 Posted September 7, 2015 I updated the original post with a ton of information as I felt there was a lot lacking in it initially.
Recommended Posts