[OmniOS-discuss] infiniband

Michael Rasmussen mir at miras.org
Sun Nov 9 23:06:26 UTC 2014


On Fri, 7 Nov 2014 23:49:51 +0100
Michael Rasmussen <mir at miras.org> wrote:

> 
> 2) Performance is much slower than on Linux (Debian Stable). iperf
> between two linux boxes shows 7.29 gbps which is max performance.
> Between omnios an linux I am not able to pass 2.77 gbps which is
> disappointing. Is there some tunable I can fiddle with to increase
> performance?
> 
I might have found something here (same HCA on both Omnios and Linux):
From Omnios
08:00.0 InfiniBand: Mellanox Technologies MT25408 [ConnectX VPI - IB
SDR / 10GigE] (rev a0) Subsystem: Mellanox Technologies Device 0003
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr+ Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF-
FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-
INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to
IRQ 11 Region 0: Memory at fe600000 (64-bit, non-prefetchable)
        Region 2: Memory at fc000000 (64-bit, prefetchable)
        Capabilities: [40] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst- PME-Enable-
DSel=0 DScale=0 PME- Capabilities: [48] Vital Product Data
                Not readable
        Capabilities: [9c] MSI-X: Enable+ Count=256 Masked-
                Vector table: BAR=0 offset=0007c000
                PBA: BAR=0 offset=0007d000
        Capabilities: [60] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s
<64ns, L1 unlimited ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
                DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+
Unsupported- RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
                        MaxPayload 128 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq-
AuxPwr- TransPend- LnkCap: Port #8, Speed 2.5GT/s, Width x8, ASPM L0s,
Exit Latency L0s unlimited, L1 unlimited ClockPM- Surprise- LLActRep-
BwNot- LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk-
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s, Width x2, TrErr- Train- SlotClk-
DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Range ABCD,
TimeoutDis+, LTR-, OBFF Not Supported DevCtl2: Completion Timeout: 50us
to 50ms, TimeoutDis-, LTR-, OBFF Disabled LnkCtl2: Target Link Speed:
2.5GT/s, EnterCompliance- SpeedDis- Transmit Margin: Normal Operating
Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis:
-6dB LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-,
EqualizationPhase1- EqualizationPhase2-, EqualizationPhase3-,
LinkEqualizationRequest-


From Linux:
01:00.0 InfiniBand: Mellanox Technologies MT25408 [ConnectX VPI - IB
SDR / 10GigE] (rev a0) Subsystem: Mellanox Technologies Device 0003
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR+ FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF-
FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-
INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to
IRQ 28 Region 0: Memory at fe800000 (64-bit, non-prefetchable) [size=1M]
        Region 2: Memory at fb800000 (64-bit, prefetchable) [size=8M]
        Capabilities: [40] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst- PME-Enable-
DSel=0 DScale=0 PME- Capabilities: [48] Vital Product Data
                Product Name: Eagle IB DDR
                Read-only fields:
                        [PN] Part number: HCA-00001            
                        [EC] Engineering changes: A5
                        [SN] Serial number: ML4209003684            
                        [V0] Vendor specific: HCA 500Ex-D     
                        [RV] Reserved: checksum good, 0 byte(s) reserved
                Read/write fields:
                        [V1] Vendor specific: N/A    
                        [YA] Asset tag: N/A                             
                        [RW] Read-write area: 107 byte(s) free
                End
        Capabilities: [9c] MSI-X: Enable+ Count=256 Masked-
                Vector table: BAR=0 offset=0007c000
                PBA: BAR=0 offset=0007d000
        Capabilities: [60] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s
<64ns, L1 unlimited ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal-
Unsupported- RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
                        MaxPayload 128 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq-
AuxPwr- TransPend- LnkCap: Port #8, Speed 2.5GT/s, Width x8, ASPM L0s,
Latency L0 unlimited, L1 unlimited ClockPM- Surprise- LLActRep- BwNot-
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain-
CommClk- ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s, Width x8, TrErr- Train- SlotClk-
DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Range ABCD,
TimeoutDis+ DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-
                LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance-
SpeedDis-, Selectable De-emphasis: -6dB Transmit Margin: Normal
Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance
De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -6dB,
EqualizationComplete-, EqualizationPhase1- EqualizationPhase2-,
EqualizationPhase3-, LinkEqualizationRequest- Capabilities: [100 v1]
Alternative Routing-ID Interpretation (ARI) ARICap: MFVC- ACS-, Next
Function: 1 ARICtl: MFVC- ACS-, Function Group: 0
        Kernel driver in use: mlx4_core

As can be seen above on both Omnios and Linux the LnkCap is reported to
2.5GT/s, width x8. But the LnkSta is different:
Linux: Speed 2.5GT/s, Width x8
Omnios: Speed 2.5GT/s, Width x2

This explains the difference in speed since 2.5GT/s x2 is roughly 3.5
gbps over IBoIP.

Can anybody explain why Omnios only uses 2 lains when PCIe slot and HCA
is capable of 8 lains?

-- 
Hilsen/Regards
Michael Rasmussen

Get my public GnuPG keys:
michael <at> rasmussen <dot> cc
http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xD3C9A00E
mir <at> datanom <dot> net
http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE501F51C
mir <at> miras <dot> org
http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE3E80917
--------------------------------------------------------------
/usr/games/fortune -es says:
Single tasking: Just Say No.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 181 bytes
Desc: OpenPGP digital signature
URL: <https://omniosce.org/ml-archive/attachments/20141110/b2cd6f6b/attachment-0001.bin>


More information about the OmniOS-discuss mailing list