[OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?
rt at steait.net
Thu Mar 5 19:44:35 UTC 2015
They are qLogic qmh2562 across the board... just figured the emlxs.conf had something to say since I had to edit it to get comstar into target mode.
From: Johan Kragsterman [mailto:johan.kragsterman at capvert.se]
Sent: Thursday, March 05, 2015 11:07 AM
To: Rune Tipsmark
Cc: 'Nate Smith'; omnios-discuss at lists.omniti.com
Subject: Ang: RE: Re: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?
-----Rune Tipsmark <rt at steait.net> skrev: -----
Till: 'Johan Kragsterman' <johan.kragsterman at capvert.se>
Från: Rune Tipsmark <rt at steait.net>
Datum: 2015-03-05 19:38
Kopia: 'Nate Smith' <nsmith at careyweb.com>, "omnios-discuss at lists.omniti.com" <omnios-discuss at lists.omniti.com>
Ärende: RE: Re: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?
Pls see below >>
: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?
Haven’t tried iSCSI but had similar issues with Infiniband… more frequent due to higher io load, but no console error messages.
This only happened on my SuperMicro server and never on my HP server… what brand are you running?
This is interesting, only on Supermicro, and never on HP? I'd like to know some more details here...
First, when you say "server", do you mean the SAN head? Not the hosts?
>> SAN Head yes
Second: Can you specify the exakt model of the Supermicro and the HP?
>>SuperMicro X9DRE-TF+ and the HP is actually an HP P4500 SAN I removed
>>the shitty HP software and controller from and replaced with an LSI
>>9207 and installed OmniOS on. I have tested on other HP and SM servers
>>too, all exhibit the same behavior (3 SM and 2 HP tested)
Third: Did you pay attention to bios settings on the two different servers? Like C-states, and other settings...how about IRQ settings? And how about the physical PCIe buses the HBA's are sitting on? This is often causing problems, if you don't know the layout of the PCIe-buses.
>> both are set to max performance in BIOS and afaik all C-states are disabled. I am 100% sure on HP and 99.99% sure on SM. I didn't touch the IRQ settings but I actually have two 8Gbit FC Cards in the SM server and both exhibit the problem. I have tried to swap things around too with no luck. I do use every available PCI-E slot though.. L2ARC, SLOG etc.
Fourth: When you say you can cause it with windows as initiator, do you mean windows on hardware, and not windows as a VM? And when you say you can NOT cause it on VmWare, you mean you can run a windows VM on VmWare with direct LUN access without problems? And is this true for both hardwares, HP and Supermicro?
>>Windows on hardware yes, all I have to do is zone a block device over to Windows and copy say 100 4GB ISO Files onto it or a 200GB backup file... no doubt it will cause this issue... when I say I cannot cause this on VMware then I mean any VM hosted on my ESXi host (5.1 or 5.5) and a guest operating system without raw device mapping - I have not tested if I can cause this using RDM.
It is also true for both HP and SM - both behave just fine using VMware and FibreChannel - however VMware can cause issues with Infiniband on the SM but I think that's a different issue and has to do with Mellanox and their terrible drivers that are never ever updated and half-beta etc.
Since it appears on one hardware and not another, it is difficult to blame any specific sofware, but we just had a discussion here about iScsi/comstar, where Garrret suspected comstar to handle certain things bad. I don't know wether that has anything to do with this.
>> I think it could be a mx of both, would be interesting to see if something in Comstar could be fixed,...
ESX+FC+SM = problem
ESX+FC+HP = no problem
Win+FC+SM = problem
Win+FC+HP = not tested
ESX+IB+SM = problem
ESX+IB+HP = no problem
Win+IB+SM = not tested, SRP not supported in Win HP = not tested, SRP
Win+IB+not supported in Win
Anyway it all lead me to some information on 8Gbit FC - in particular portCfgFillWord Maybe this can affect some of this... google will reveal a great bit of info and also found some links.. http://virtualkenneth.com/2012/05/21/random-esxi-lun-loss-in-a-8gb-fc-brocade-infrastructure/
What about tuning the emlxs.conf? can anything be done there to get better performance?
Are you using Emulex HBA's? That would explain things....I have never used Emulex in production. Tried some times in lab env, but always turned out to behave strangly...
From: Nate Smith [mailto:nsmith at careyweb.com]
Sent: Thursday, March 05, 2015 8:10 AM
To: Rune Tipsmark; omnios-discuss at lists.omniti.com
Subject: RE: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?
Do you see the same problem with Windows and iSCSI as an initiator? I wish there was a way to turn up debugging to figure this out.
From: Rune Tipsmark [mailto:rt at steait.net]
Sent: Thursday, March 05, 2015 11:08 AM
To: 'Nate Smith'; omnios-discuss at lists.omniti.com
Subject: RE: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?
Same problem here… have noticed I can cause this easily by using Windows as initiator… I cannot cause this using VMware as initiator…
No idea how to fix, but a big problem.
From: OmniOS-discuss [mailto:omnios-discuss-bounces at lists.omniti.com] On Behalf Of Nate Smith
Sent: Thursday, March 05, 2015 6:01 AM
To: omnios-discuss at lists.omniti.com
Subject: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?
I’ve had this problem for a while, and I have no way to diagnose what is going on, but occasionally when system IO gets high (I’ve seen it happen especially on backups), I will lose connectivity with my Fibre Channel cards which serve up fibre channel LUNS to a VM cluster. All hell breaks loose, and then connectivity gets restored. I don’t get an error that it’s dropped, at least not on the Omnios system, but I get notice when it’s restored (which makes no sense). I’m wondering if the cards are just overheating, and if heat sinks with a fan would help on the io chip.
Mar 5 01:55:01 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt2,0 LINK UP, portid 20000, topology Fabric Pt-to-Pt,speed 8G
Mar 5 01:56:26 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt0,0 LINK UP, portid 20100, topology Fabric Pt-to-Pt,speed 8G
Mar 5 02:00:13 newstorm last message repeated 1 time
Mar 5 02:00:15 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt3,0 LINK UP, portid 10000, topology Fabric Pt-to-Pt,speed 8G
Mar 5 02:00:15 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt2,0 LINK UP, portid 20000, topology Fabric Pt-to-Pt,speed 8G
Mar 5 02:00:18 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt1,0 LINK UP, portid 10100, topology Fabric Pt-to-Pt,speed 8G
OmniOS-discuss mailing list
OmniOS-discuss at lists.omniti.com
More information about the OmniOS-discuss