Skip to main content
TrueNAS - Scale

TrueNAS Scale – Infiniband

By March 26, 2022April 29th, 2022No Comments

Well, I have been wanting to get infiniband working for a while. I decided today was as good of a time as any…. So, I made it happen.

Summary

  1. Was able to get infiniband working
  2. IPoIB performance is utter garbage.
  3. I didn’t want to hack up TrueNAS more to support RDMA
  4. Gave up, went back to my Chelsio 40GBe adaptor.

Step 1. Install a few packages.

By default, TrueNAS Scale has apt “disabled”. Please see THIS POST on how to re-enable it.

Once you have… re-enabled the ability to install packages, AND added the debian repos, we need to install a few packages to assist.


At this point, you should be able to run ibstat and get a status. If you don’t have a status, make sure your card is installed. I am using a ConnectX-3 (Non-pro)

To note, I have one port set as infiniband, and the other port set as ethernet.

[email protected]:~# ibstat
CA 'mlx4_0'
        CA type: MT4099
        Number of ports: 2
        Firmware version: 2.42.5000
        Hardware version: 1
        Node GUID: 0x506b4b03007bfc50
        System image GUID: 0x506b4b03007bfc53
        Port 1:
                State: Initializing
                Physical state: LinkUp
                Rate: 40
                Base lid: 0
                LMC: 0
                SM lid: 0
                Capability mask: 0x02514868
                Port GUID: 0x506b4b03007bfc51
                Link layer: InfiniBand
        Port 2:
                State: Down
                Physical state: Disabled
                Rate: 10
                Base lid: 0
                LMC: 0
                SM lid: 0
                Capability mask: 0x00010000
                Port GUID: 0x526b4bfffe7bfc52
                Link layer: Ethernet

How to change configuration parameters

# First- get the PCI address of your card.
[email protected]:~# lspci | grep Mell
03:00.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3]

# In this case- 03:00.0 is my address.

# Show current configuration
[email protected]:~# mstconfig -d 03:00.0 query

Device #1:
----------

Device type:    ConnectX3
Device:         03:00.0

Configurations:                              Next Boot
         SRIOV_EN                            True(1)
         NUM_OF_VFS                          8
         LINK_TYPE_P1                        IB(1)
         LINK_TYPE_P2                        ETH(2)
         LOG_BAR_SIZE                        3
         BOOT_PKEY_P1                        0
         BOOT_PKEY_P2                        0
         BOOT_OPTION_ROM_EN_P1               False(0)
         BOOT_VLAN_EN_P1                     False(0)
         BOOT_RETRY_CNT_P1                   0
         LEGACY_BOOT_PROTOCOL_P1             PXE(1)
         BOOT_VLAN_P1                        1
         BOOT_OPTION_ROM_EN_P2               False(0)
         BOOT_VLAN_EN_P2                     False(0)
         BOOT_RETRY_CNT_P2                   0
         LEGACY_BOOT_PROTOCOL_P2             PXE(1)
         BOOT_VLAN_P2                        1
         IP_VER_P1                           IPv4(0)
         IP_VER_P2                           IPv4(0)
         CQ_TIMESTAMP                        True(1)


# Set port mode to infiniband
mstconfig -d 03:00.0 set LINK_TYPE_P1=1

# Set port mode to ethernet
mstconfig -d 03:00.0 set LINK_TYPE_P1=2

# You can also set multiple properties in the same command
mstconfig -d 03:00.0 set LINK_TYPE_P1=1 LINK_TYPE_P2=1

# Disable boot rom, if you desire. I don't have plans on using it.
mstconfig -d 03:00.0 set LINK_TYPE_P1=1 BOOT_OPTION_ROM_EN_P1=0 BOOT_OPTION_ROM_EN_P2=0

Step 2. Ping across the network

This guide ASSUMES you already have a working physical link established between two infiniband cards.

In my case, I have a windows client, and a TrueNAS server.

Next, lets lists connected hosts

[email protected]:~# ibhosts
ibwarn: [2618545] get_abi_version: can't read ABI version from /sys/class/infiniband_mad/abi_version (No such file or directory): is ib_umad module loaded?
ibwarn: [2618545] mad_rpc_open_port: can't open UMAD port ((null):0)
../libibnetdisc/ibnetdisc.c:802; can't open MAD port ((null):0)
/usr/sbin/ibnetdiscover: iberror: failed: discover failed

Seems… a kernel module is missing. Lets add it, and try again.

[email protected]:~# modprobe ib_umad
# I also went ahead and loaded the infiniband IP over InfiniBand module.
[email protected]:~# modprobe ib_ipoib
[email protected]:~# ibhosts
Ca      : 0x0002c90300a29060 ports 2 "XTREMEOWNAGE-PC"
Ca      : 0x506b4b03007bfc50 ports 2 "MT25408 ConnectX Mellanox Technologies"

Well, look at that. The two PCs can see each other over infiniband.

The next step, was to allocate an ip address for the infiniband adapter.

ibp3s0: flags=4098<BROADCAST,MULTICAST>  mtu 4092
        unspec 80-00-02-28-FE-80-00-00-00-00-00-00-00-00-00-00  txqueuelen 256  (UNSPEC)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

As we can see, it doesn’t have an address. I provisioned the IP address through the TrueNAS WebGUI.

After saving the changes, applying, etc…. I was able to ping the local IP, but, not the remote IP.

[email protected]:~# ping 10.100.255.1
PING 10.100.255.1 (10.100.255.1) 56(84) bytes of data.
64 bytes from 10.100.255.1: icmp_seq=1 ttl=64 time=0.052 ms
^C
--- 10.100.255.1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.052/0.052/0.052/0.000 ms
[email protected]:~# ping 10.100.255.2
PING 10.100.255.2 (10.100.255.2) 56(84) bytes of data.
^C

Doing a bit of troubleshooting, I ran ibstat on both ends.

and, determined the link was stuck in status “Initializing”

After a bit of research, I determined a subnet manager needs to exist, for a point to point link.

So…

[email protected]:~# apt-get install opensm
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following NEW packages will be installed:
  opensm
...

At this point, I decided to do a reboot, to ensure all of the modules are being properly loaded at boot.

Or… to see if…. all of the above work has been reset.

After rebooting, all of the configuration and packages were still in place, However, the kernel modules were not loaded back in. So… I reloaded the modules manually, and added a config to load them at boot.

[email protected]:~# modprobe ib_ipoib ib_umad ib_uverbs rdma_ucm
[email protected]:~# nano /etc/modules-load.d/infiniband.conf
# File Contents:
ib_umad
ib_ipoib
ib_uverbs
rdma_ucm

The next step, was to get opensm up and running.

[email protected][/etc/modules-load.d]# systemctl status opensm
● opensm.service - Starts the OpenSM InfiniBand fabric Subnet Managers
     Loaded: loaded (/lib/systemd/system/opensm.service; enabled; vendor preset: enabled)
     Active: inactive (dead)
  Condition: start condition failed at Sat 2022-03-26 13:26:55 CDT; 8min ago
       Docs: man:opensm(8)

Mar 26 13:26:55 truenas.local.xtremeownage.com systemd[1]: Condition check resulted in Starts the OpenSM InfiniBand fabric Subnet Managers being skipped.


## I assume, it failed to start due to the missing kernel modules. Lets start it.

[email protected][/etc/modules-load.d]# systemctl start opensm
[email protected][/etc/modules-load.d]# systemctl status opensm
● opensm.service - Starts the OpenSM InfiniBand fabric Subnet Managers
     Loaded: loaded (/lib/systemd/system/opensm.service; enabled; vendor preset: enabled)
     Active: active (exited) since Sat 2022-03-26 13:35:38 CDT; 1s ago
       Docs: man:opensm(8)
    Process: 296934 ExecCondition=/bin/sh -c if test "$PORTS" = NONE; then echo "opensm is disabled via PORTS=NONE."; exit 1; fi (code=exited, status=0/SUCCESS)
    Process: 296935 ExecStart=/bin/sh -c if test "$PORTS" = ALL; then PORTS=$(/usr/sbin/ibstat -p); if test -z "$PORTS"; then echo "No InfiniBand ports found."; exi>
   Main PID: 296935 (code=exited, status=0/SUCCESS)

Mar 26 13:35:38 truenas.local.xtremeownage.com systemd[1]: Starting Starts the OpenSM InfiniBand fabric Subnet Managers...
Mar 26 13:35:38 truenas.local.xtremeownage.com sh[296935]: Starting opensm on following ports: 0x506b4b03007bfc51
Mar 26 13:35:38 truenas.local.xtremeownage.com sh[296935]: 0x526b4bfffe7bfc52
Mar 26 13:35:38 truenas.local.xtremeownage.com systemd[1]: Finished Starts the OpenSM InfiniBand fabric Subnet Managers.


## Much better.

At this point, I checked the stats for the infiniband nics.

[email protected][/etc/modules-load.d]# ibstat
CA 'mlx4_0'
        CA type: MT4099
        Number of ports: 2
        Firmware version: 2.42.5000
        Hardware version: 1
        Node GUID: 0x506b4b03007bfc50
        System image GUID: 0x506b4b03007bfc53
        Port 1:
                State: Active
                Physical state: LinkUp
                Rate: 40
                Base lid: 1
                LMC: 0
                SM lid: 1
                Capability mask: 0x0251486a
                Port GUID: 0x506b4b03007bfc51
                Link layer: InfiniBand
        Port 2:
                State: Down
                Physical state: Disabled
                Rate: 10
                Base lid: 0
                LMC: 0
                SM lid: 0
                Capability mask: 0x00010000
                Port GUID: 0x526b4bfffe7bfc52
                Link layer: Ethernet


## Boom, good to go on the truenas side.

BADPROMPT:ibstat
CA 'ibv_device0'
        CA type:
        Number of ports: 2
        Firmware version: 2.42.5000
        Hardware version: 0x0
        Node GUID: 0x0002c90300a29060
        System image GUID: 0x0002c90300a29063
    Port 1:
        State: Down
        Physical state: Disabled
        Rate: 1
        Base lid: 0
        LMC: 0
        SM lid: 0
        Capability mask: 0x80500000
        Port GUID: 0x0202c9fffea29061
        Link layer: Ethernet
        Transport: RoCE v1.25
    Port 2:
        State: Active
        Physical state: LinkUp
        Rate: 40
        Real rate: 32.00 (QDR)
        Base lid: 2
        LMC: 0
        SM lid: 1
        Capability mask: 0x90580000
        Port GUID: 0x0002c90300a29062
        Link layer: IB
        Transport: IB

Excellent, the link is active on both sides!

Since, the module was not active when the machine was rebooted, the interface was down. I used iconfig to bring it up.

[email protected][/etc/modules-load.d]# ifconfig ibp3s0 up
[email protected][/etc/modules-load.d]# ifconfig ibp3s0
ibp3s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 2044
        inet 10.100.255.1  netmask 255.255.255.252  broadcast 10.100.255.3
        inet6 fe80::526b:4b03:7b:fc51  prefixlen 64  scopeid 0x20<link>
        unspec 80-00-02-28-FE-80-00-00-00-00-00-00-00-00-00-00  txqueuelen 256  (UNSPEC)
        RX packets 2  bytes 212 (212.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 37  bytes 11759 (11.4 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

[email protected]as[/etc/modules-load.d]# ping 10.100.255.1
PING 10.100.255.1 (10.100.255.1) 56(84) bytes of data.
64 bytes from 10.100.255.1: icmp_seq=1 ttl=64 time=0.056 ms
64 bytes from 10.100.255.1: icmp_seq=2 ttl=64 time=0.049 ms
^C
--- 10.100.255.1 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1030ms
rtt min/avg/max/mdev = 0.049/0.052/0.056/0.003 ms

We can now ping over IP. Lets do a speed test.

## Since, TrueNAS doesn't ship with iperf2 (Does include iperf3, however, getting speed results above 10G isn't consistent using iperf3, due to its single-threaded nature....
## So, I prefer iperf2 for testing over 10G.
[email protected][/etc/modules-load.d]# apt-get install iperf
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following NEW packages will be installed:
  iperf
0 upgraded, 1 newly installed, 0 to remove and 105 not upgraded.
...
[email protected][/etc/modules-load.d]# iperf -v
iperf version 2.0.14a (2 October 2020) pthreads
[email protected][/etc/modules-load.d]# iperf -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size:  128 KByte (default)
------------------------------------------------------------


## on the windows side.
BADPROMPT:iperf.exe -c 10.100.255.1
------------------------------------------------------------
Client connecting to 10.100.255.1, TCP port 5001
TCP window size:  208 KByte (default)
------------------------------------------------------------
[  3] local 10.100.255.2 port 23277 connected with 10.100.255.1 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  7.69 GBytes  6.60 Gbits/sec

BADPROMPT:iperf.exe -c 10.100.255.1 -P 10
------------------------------------------------------------
Client connecting to 10.100.255.1, TCP port 5001
TCP window size:  208 KByte (default)
------------------------------------------------------------
[ 12] local 10.100.255.2 port 23297 connected with 10.100.255.1 port 5001
[ 10] local 10.100.255.2 port 23295 connected with 10.100.255.1 port 5001
[ 11] local 10.100.255.2 port 23296 connected with 10.100.255.1 port 5001
[  9] local 10.100.255.2 port 23294 connected with 10.100.255.1 port 5001
[  7] local 10.100.255.2 port 23292 connected with 10.100.255.1 port 5001
[  6] local 10.100.255.2 port 23291 connected with 10.100.255.1 port 5001
[  8] local 10.100.255.2 port 23293 connected with 10.100.255.1 port 5001
[  5] local 10.100.255.2 port 23290 connected with 10.100.255.1 port 5001
[  4] local 10.100.255.2 port 23289 connected with 10.100.255.1 port 5001
[  3] local 10.100.255.2 port 23288 connected with 10.100.255.1 port 5001
[ ID] Interval       Transfer     Bandwidth
[ 12]  0.0-10.0 sec  1.07 GBytes   916 Mbits/sec
[ 10]  0.0-10.0 sec   899 MBytes   753 Mbits/sec
[  6]  0.0-10.0 sec  1.01 GBytes   870 Mbits/sec
[  8]  0.0-10.0 sec   220 MBytes   185 Mbits/sec
[  5]  0.0-10.0 sec   561 MBytes   470 Mbits/sec
[  9]  0.0-10.1 sec   695 MBytes   578 Mbits/sec
[ 11]  0.0-10.1 sec   674 MBytes   557 Mbits/sec
[  7]  0.0-10.2 sec   346 MBytes   285 Mbits/sec
[  4]  0.0-10.2 sec   253 MBytes   208 Mbits/sec
[  3]  0.0-10.2 sec   681 MBytes   560 Mbits/sec
[SUM]  0.0-10.2 sec  6.31 GBytes  5.31 Gbits/sec


Well, that was extremely disappointing!

Troubleshooting poor performance

The first step- is always to check MTU. I found the server was configured for 8,000, however, my client only supported 4092 max.

 [email protected]# ifconfig ibp3s0 mtu 4092 up

Just, to confirm this wasn’t an iperf issue, I decided to do a quick iscsi performance test.

Only, 5Gbits….. far short of the expected 40/56Gbits.

Well… lets doublecheck MTU.

[email protected][/etc/modules-load.d]# ifconfig ibp3s0 mtu 4092
[email protected][/etc/modules-load.d]# ifconfig ibp3s0
ibp3s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 2044
        inet 10.100.255.1  netmask 255.255.255.252  broadcast 10.100.255.3
        inet6 fe80::526b:4b03:7b:fc51  prefixlen 64  scopeid 0x20<link>
        unspec 80-00-02-28-FE-80-00-00-00-00-00-00-00-00-00-00  txqueuelen 256  (UNSPEC)
        RX packets 23373377  bytes 47020734796 (43.7 GiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 6652816  bytes 6411049427 (5.9 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

## Appears the card on my server only supports 2044 MTU... this might explain the issues.

So… I set the MTU on both ends to 2044.

Sadly, this did not help performance.

Conclusion.

I DID successfully manage to get infiniband up and running.

I DID learn, IPoIB (IP over InfiniBand) is actually quite slow. To realize huge performance gains over using the NIC in ETH mode, you would need to leverage RDMA/RCOE.

Since, I have hacked up my TrueNAS install enough for the day, I decided to switch everything back to ethernet mode, and call it a day. I can easily get 40G out of ethernet mode.

How to switch back to ethernet mode.

If you recall earlier, I left one interface in ETH mode…. I am just going to switch the physical adaptor, and move the ip address over… so, I don’t need to reboot my PC.

ifconfig ibp3s0 0.0.0.0 0.0.0.0
ifconfig enp3s0d1 10.100.255.1/30 mtu 9000 up
[email protected][/etc/modules-load.d]# ifconfig enp3s0d1
enp3s0d1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 9000
        inet 10.100.255.1  netmask 255.255.255.252  broadcast 10.100.255.3
        inet6 fe80::526b:4bff:fe7b:fc52  prefixlen 64  scopeid 0x20<link>

## On the windows side, I also physically swapped the adaptors over.

[email protected][/etc/modules-load.d]# ping 10.100.255.2
PING 10.100.255.2 (10.100.255.2) 56(84) bytes of data.
64 bytes from 10.100.255.2: icmp_seq=1 ttl=128 time=0.482 ms
64 bytes from 10.100.255.2: icmp_seq=2 ttl=128 time=0.162 ms

## Connection is good to.... Lets test performance.
[SUM] 0.0000-10.0208 sec  10.7 GBytes  9.15 Gbits/sec

## Appears my client is only connected at 10G.... interesting.
[email protected][/etc/modules-load.d]# ethtool enp3s0d1
Settings for enp3s0d1:
        Supported ports: [ FIBRE ]
        Supported link modes:   10000baseKX4/Full
                                40000baseCR4/Full
                                40000baseSR4/Full
                                56000baseCR4/Full
                                56000baseSR4/Full
                                1000baseX/Full
                                10000baseCR/Full
                                10000baseSR/Full
        Supported pause frame use: Symmetric Receive-only
        Supports auto-negotiation: Yes
        Supported FEC modes: Not reported
        Advertised link modes:  10000baseKX4/Full
                                40000baseCR4/Full
                                40000baseSR4/Full
                                1000baseX/Full
                                10000baseCR/Full
                                10000baseSR/Full
        Advertised pause frame use: Symmetric
        Advertised auto-negotiation: Yes
        Advertised FEC modes: Not reported
        Speed: 10000Mb/s
        Duplex: Full
        Auto-negotiation: off
        Port: FIBRE
        PHYAD: 0
        Transceiver: internal
        Supports Wake-on: d
        Wake-on: d
        Current message level: 0x00000014 (20)
                               link ifdown
        Link detected: yes

## Wonder what happens if I uh, FORCE it to 40g.

[email protected][/etc/modules-load.d]# ethtool -s enp3s0d1 speed 40000 duplex full autoneg off

## Turns out, the connection on the windows side dropped.... I am starting to think this particular ConnectX-3 does not support 40G. 
## I know the adaptor on the server side supports 40G, because, its the same exact adaptor i used to benchmark 40G in a previous post!

I found documentation on my particular NIC here: https://downloads.dell.com/manuals/all-products/esuprt_ser_stor_net/esuprt_pedge_srvr_ethnt_nic/mellanox-adapters_users-guide_en-us.pdf

Based on the specifications, it “should” support 40G ethernet mode…

Edit- Turns out, it supported 40G infiniband, but, only 10GBe ethernet.

Yup. Oh well. Putting the chelsio NIC back in my PC. It had no problems doing 40G.

Leave a Reply

Close Menu

Subscribe to Blog via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Join 93 other subscribers
%d bloggers like this: