Sign in to follow this  
PnoT

LSI SAS controller timeout issues with 5.2-5592.2

Recommended Posts

I'm on the latest and greatest and am seeing what looks like timeouts on expanding a volume.

 

During these timeouts all of my volumes are not accessible and the NAS pretty much freezes until it's over which is about 10-45 seconds.

 

Aug 15 11:06:00 SYN kernel: [686987.368171] cdb[0]=0x28: 28 00 d3 8c 03 e0 00 03 80 00
Aug 15 11:06:02 SYN kernel: [686987.857817] cdb[0]=0x28: 28 00 d3 8c 07 60 00 00 80 00
Aug 15 11:06:02 SYN kernel: [686987.857847] cdb[0]=0x28: 28 00 d3 8c 07 e0 00 01 00 00
Aug 15 11:06:02 SYN kernel: [686987.857861] cdb[0]=0x28: 28 00 d3 8c 08 e0 00 00 80 00
Aug 15 11:06:02 SYN kernel: [686987.857874] cdb[0]=0x28: 28 00 d3 8c 09 60 00 02 80 00
Aug 15 11:06:02 SYN kernel: [686987.857887] cdb[0]=0x28: 28 00 d3 8c 0b e0 00 02 80 00
Aug 15 11:06:02 SYN kernel: [686987.857900] cdb[0]=0x28: 28 00 d3 8c 0e 60 00 02 80 00
Aug 15 11:06:02 SYN kernel: [686987.857914] cdb[0]=0x28: 28 00 d3 8c 10 e0 00 01 00 00
Aug 15 11:06:02 SYN kernel: [686987.857927] cdb[0]=0x28: 28 00 d3 8c 11 e0 00 00 80 00
Aug 15 11:06:02 SYN kernel: [686987.857941] cdb[0]=0x28: 28 00 d3 8c 12 60 00 00 80 00
Aug 15 11:06:02 SYN kernel: [686987.857954] cdb[0]=0x28: 28 00 d3 8c 12 e0 00 00 78 00
Aug 15 11:06:02 SYN kernel: [686987.857967] cdb[0]=0x28: 28 00 d3 8c 13 58 00 00 08 00
Aug 15 11:06:02 SYN kernel: [686987.857980] cdb[0]=0x28: 28 00 d3 8c 13 60 00 00 80 00
Aug 15 11:06:02 SYN kernel: [686987.857994] cdb[0]=0x28: 28 00 d3 8c 13 e0 00 01 00 00
Aug 15 11:06:02 SYN kernel: [686987.858037] cdb[0]=0x28: 28 00 d3 8c 14 e0 00 00 80 00
Aug 15 11:06:02 SYN kernel: [686987.858050] cdb[0]=0x28: 28 00 d3 8c 15 60 00 00 80 00
Aug 15 11:06:02 SYN kernel: [686987.858064] cdb[0]=0x28: 28 00 d3 8c 15 e0 00 02 80 00
Aug 15 11:06:02 SYN kernel: [686987.858077] cdb[0]=0x28: 28 00 d3 8c 18 60 00 00 80 00
Aug 15 11:06:02 SYN kernel: [686987.858091] cdb[0]=0x28: 28 00 d3 8c 18 e0 00 02 80 00
Aug 15 11:06:02 SYN kernel: [686987.858104] cdb[0]=0x28: 28 00 d3 8c 1b 60 00 00 80 00
Aug 15 11:06:02 SYN kernel: [686987.858117] cdb[0]=0x28: 28 00 d3 8c 1b e0 00 01 00 00
Aug 15 11:06:02 SYN kernel: [686987.858130] cdb[0]=0x28: 28 00 d3 8c 1c e0 00 01 00 00
Aug 15 11:06:02 SYN kernel: [686987.858144] cdb[0]=0x28: 28 00 d3 8c 1d e0 00 00 80 00
Aug 15 11:06:02 SYN kernel: [686987.858157] cdb[0]=0x28: 28 00 d3 8c 1e 60 00 00 80 00
Aug 15 11:06:02 SYN kernel: [686987.858170] cdb[0]=0x28: 28 00 d3 8c 1e e0 00 00 80 00
Aug 15 11:06:02 SYN kernel: [686987.858184] cdb[0]=0x28: 28 00 d3 8c 1f 60 00 00 80 00
Aug 15 11:06:02 SYN kernel: [686987.858197] cdb[0]=0x28: 28 00 d3 8c 1f e0 00 00 80 00
Aug 15 11:06:02 SYN kernel: [686987.858210] cdb[0]=0x28: 28 00 d3 8c 20 60 00 00 80 00
Aug 15 11:06:02 SYN kernel: [686987.858224] cdb[0]=0x28: 28 00 d3 8c 20 e0 00 00 80 00
Aug 15 11:06:02 SYN kernel: [686987.858237] cdb[0]=0x28: 28 00 d3 8c 21 60 00 01 00 00
Aug 15 11:06:02 SYN kernel: [686987.858250] cdb[0]=0x28: 28 00 d3 8c 22 60 00 00 80 00
Aug 15 11:06:02 SYN kernel: [686987.858263] cdb[0]=0x28: 28 00 d3 8c 22 e0 00 04 00 00

 

this is also showing up which looks like some type of timeout

 

Aug 15 05:15:53 SYN kernel: [665966.766989] mpt2sas0: log_info(0x31120436): originator(PL), code(0x12), sub_code(0x0436)

 

It looks like the LSI SAS card is having issues.

qDmun9N.jpg

 

Any ideas?

Share this post


Link to post
Share on other sites

I've finally figured out what was happening and it has to do with the 2TB Samsung F3s. These drives have been rock solid since the dawn of time but apparently they timeout the controller in XPEnology for some odd reason. I've had no issues with them in my 1812/1815 so I'm not sure, at this point, if it's due to the current LSI driver that was recently updated in XPEnology or some odd incompatibility with the controller /drives. The firmware on the controller and drives are the latest and there are no failures on the drives themselves.

Share this post


Link to post
Share on other sites

The LSI log info code decodes to say that this is an IO Aborted issue, the error is most likely by a bad cable but can also be due to a bad port. I suggest checking the cables first as they are cheaper to replace :smile:

 

If it's a SAS disk you can look at the SAS counters (depending on the OS how you look at them) and see on which cable/port/phy the issue exists.

 

I've written a tool to decode the LSI log info codes to help troubleshoot such problems, you can see relevant links to the log info at:

* http://blog.disksurvey.org/knowledge-base/lsi-loginfo/ -- pre decoded list

* https://github.com/baruch/lsi_decode_loginfo -- command line toool that essentially generated that list and can help with the unknown codes

 

LSI never fully documented openly all the codes so the list I have is incomplete.

 

Hope this helps and let me know if you need some more help figuring it out.

Share this post


Link to post
Share on other sites
The LSI log info code decodes to say that this is an IO Aborted issue, the error is most likely by a bad cable but can also be due to a bad port. I suggest checking the cables first as they are cheaper to replace :smile:

 

If it's a SAS disk you can look at the SAS counters (depending on the OS how you look at them) and see on which cable/port/phy the issue exists.

 

I've written a tool to decode the LSI log info codes to help troubleshoot such problems, you can see relevant links to the log info at:

* http://blog.disksurvey.org/knowledge-base/lsi-loginfo/ -- pre decoded list

* https://github.com/baruch/lsi_decode_loginfo -- command line toool that essentially generated that list and can help with the unknown codes

 

LSI never fully documented openly all the codes so the list I have is incomplete.

 

Hope this helps and let me know if you need some more help figuring it out.

 

Wow, thank you for helping out and I've bookmarked those sites for future use that's pretty amazing.

 

My fix was to remove the Samsung drives as there are known issues with them dropping out of RAID sets with LSI cards and since then I haven't had a single problem. I will swap the cable out and try a different slot and give the batch of drives another try.

 

i don't know which version do you mean with LATEST.. but i strongly recomment NOT!! to use P20 fw...

 

viewtopic.php?f=2&t=4985&p=29300

 

I should have been more specific and just said I was on p19 as I've seen the issues revolving around p20 but thank you for pointing it out.

Share this post


Link to post
Share on other sites
Sign in to follow this