snoopy78

WARNING - do not flash LSI-9201-16i HBA with P20

Recommended Posts

HI @ all,

 

just a warning for those who, like me, use the LSI 9201-16i HBA.

 

DO NOT! flash the latest FW P20, as there is an bug in it (LSI confirmed this AFTER my system crashed).

Use of the P19 FW seems to be fine and is recommended by LSI.

 

Issue is reporting I/O errors randomly on your drives, so at one point the DSM will throw all volumes out and mark the drives as faulty, which makes an normal recovery via WebGui impossible.

Manual revovery via CLI is needed and can be successful.

 

BR

snoopy78

Share this post


Link to post
Share on other sites

C'mon.

Flashed my 9211-8i to P20 yesterday and after reboot, volume 1 was crashed. 3 disks show crashed and 2 not initialized.

How can i recover this?

And why didnt LSI remove bios, if it is faulty?

Share this post


Link to post
Share on other sites

i haven't made it by myself.. luckily one of my colleagues were able to help me ^^

 

my setup for each volume is max. 4 drives using SHR, so we had to do follwoing things:

( one of my desaster szenarios was to have an spare System with HBA and xpenology available which we used, but original system should be fine too )

 

my drives (4) were labelled as sdg5/sdh5/sdgi5/sdj5

 

=> stop the LVM

=> add the drives back to raid

=> rebuild

=> restart LVM/Server

 

these should be the commands for my system...!! be adviced, know what you do or ALL is gone !!

 

"

mdadm --manage --stop /dev/vg1002/lv

mdadm --examine /dev/sdg5

mdadm --examine /dev/sdi5

mdadm --examine /dev/sdh5

mdadm --examine /dev/sdj5

vgchange -an vg1001

mdadm --stop /dev/md3

mdadm --query --detail /dev/md3

cat /proc/mdstat

mdadm --verbose --create /dev/md3 --chunk=64 --level=5 --raid-devices=4 /dev/sdi5 /dev/sdj5 missing /dev/sdh5

mdadm --manage /dev/md3 --add /dev/sdg5

cat /proc/mdstat

"

 

THIS is LSI's reply to my issue report:

 

"

There is an issue with P20. We are expecting a fixed version any day now.

 

I recommend you downgrade to P19 until then.

 

You have to erase P20 to downgrade and this can only be done in DOS or UEFI.

 

Doc attached.

 

Data Center Solutions Group

 

Avago

4165 Shackleford Road

Norcross, GA 30093

"

 

BR

Share this post


Link to post
Share on other sites

Thanks for heads up!

 

I updated to P20 on a new system not too long ago. I haven't done much with it but test stuff... but I just downgraded to P19.

Share this post


Link to post
Share on other sites

as long as they don't provide the new version downgrade to P19 seens to be the only solution

for me since i went back to P19 the system is working fine and without issues

Share this post


Link to post
Share on other sites

when you donwload the firmware (f.e. DOS version) then there is a manual included....

 

f.e. 9201-16i

http://docs.avagotech.com/docs/12350436

 

all required commands are in there too

 

just ned to delete the firmware before installing the new one as AVAGO/LSI told me

 

"

You have to erase P20 to downgrade and this can only be done in DOS or UEFI.

Doc attached.

"

http://sc836.lindem.de/update.pdf

 

br

snoopy78

Share this post


Link to post
Share on other sites

In DOS i get the follow error when trying to launch SASFLSH.exe:

ERROR: Failed to initialize PAL. Exiting Program

 

...and EFI Shell hangs when trying to --listall... I guess I'll have to try DOS on a NON-UEFI PC.

 

Edit: Flashed the HBA from DOS on a NON-UEFI PC and it worked right out of the box. Hopefully the change in FW doesn't screw up the volume. Thanks for the tip on downgrading, highly appreciated! :smile:

 

I'll try to stress test the HBA later this weekend to see if this actually fixes the I/O issues...

Share this post


Link to post
Share on other sites

I can confirm that this is not related to the firmware version. Even on P19 I get the I/O error when doing a data scrub!

Share this post


Link to post
Share on other sites

Nope, all drives are new and tested.

The only thing it might be is a bad cable... even though I've now replaced all of them too without solving the issue...

 

No idea, but as this only happens on the initiation of a data scrub or other really high I/O activities, I guess it's nothing to really worry about for now. Will look into it in the days to come.

Share this post


Link to post
Share on other sites

This is a heat-related issue.

 

It has nothing to do with firmware revision except that perhaps P20 is more sensitive to high-heat conditions. At the same time, it could just be that P20 'runs' that much more hot.

 

In any case, install a fan on to or over top of your HBA's PU.

Share this post


Link to post
Share on other sites

These cards alone, are only ever cold while they're powered off. So it's quite obvious you've made no real attempt to cool it/them.

Share this post


Link to post
Share on other sites

That's my point exactly. Even though my server is powered off, I can force this issue within just a few minutes after the system has bootet. The server and card never gets hot in my testing.

Share this post


Link to post
Share on other sites

Might want to check your p20 subversion. the original release (20.00.00.00) had io bugs according to LSI. There have been more releases 20.00.02.00 and 20.20.04.00. Fixes it according to folks who had issues with this on FreeNAS.

Share this post


Link to post
Share on other sites