Introduction
Wait, the R720XD doesn’t support bifurcation?
And- you would be correct by saying that.
However- There are special PCIe cards you can acquire, which will perform the bifurcation.
Credit for this idea goes to THIS POST which was brought to my attention by a fellow redditor. (I couldn’t find the comment from a month ago..)
Why bifurcate?
By default, without bifurcation, you can use one PCIe slot, for one device. This means, despite having a16 lane slot, you can only use a single NVMe (which only requires 4 lanes). With bifurcation, you can use all 16 lanes for 4 individual devices.
My r720XD has… 6 usable PCIe slots. Two are 16 lanes, the rest are 8 lanes. My current needs involves 4x NVMe devices, a dual port 40G NIC, and a USB 3.1 type-c card. In my current setup, this consumes all of my available slots, despite having many free lanes.
Bifurcation isn’t only for NVME, however, this is generally the most popular use for it.
I would explain exactly how this works with a non-bifurcation motherboard…. however, I would just be repeating what I read from the original Serve The Home post. So, please look at the original post.
I will note- what I am doing below, isn’t actually bifurcation. But, PCIe switching. If you are more interested to know the technical details, I recommend you read the PEX 8748 Product Brief or PEX 8724 Product Brief
Getting Started
I purchased two cards for this project.
- https://www.aliexpress.com/item/1005001344675103.html
- This is the one used for now.
- https://www.aliexpress.com/item/1005001889076788.html
- I picked up this one as a backup.
Regarding shipping, both cards ended up arriving at my house in the same package, only 2 weeks later. For- shipping from China, I was quite amazed at how quickly they arrived. Packaging was good.
To note, the cards did seem to go out of stock. However, see the Serve The Home post for other potential cards.
Step 1. Remove the old.
I started the process by removing all of the riser boards from my server. As you can see, I had a lot of adaptors using an entire PCIe slot for a single NVMe drive before.




After a few minutes…. I had my pile of NVMe devices ready to go into the new card.



After removing the 6 screws on the back of the new device, I installed all 4 of my NVMe devices. The backside of the aluminum cover does have heat-sink pads, which hopefully should help keep my devices running a tad cooler.



And… all buttoned up, ready to be installed.



The final part of this process, was to re-install all of the devices into the chassis.



After adding everything back into the chassis, I still have three open PCIe slots.
In the future, when I decide to add more stripes to my NVMe/Flash pool, I will move the USB Type-C adaptor over to the half-width slots. But, for now, this is fine.
If the mspaint labels are not readable, the card on the left is the new quad NVMe switch, in the middle, I have a dual port 40Gbit ConnectX-3 NIC, with a USB 3.1 Type-C on top. The riser on the right has all three slots completely empty.
Does it boot?
Yes!
TrueNAS booted right up, without issues. My existing Flash/Cache pools were imported right away, without any issues at all.
As well, all of the NVMe disks were detected just like nothing had changed.



Does it perform?
Without going too deep into benchmarks….
ISCSI Benchmark
iSCSI across my flash pool benchmarked exactly the same as in my Previous Benchmarks after removing my brocade ICX-6610.



As noted before, I am bottlenecked by my 10Gbe connection. I have still not tracked down the issue limited all of my iSCSI write speeds to 570MB/s.
FIO Sequential Benchmark
truenas[/mnt/Cache]# fio --name=seqread --rw=read --direct=1 --ioengine=libaio --bs=8k --numjobs=8 --size=1G --runtime=600 --group_reporting seqread: (g=0): rw=read, bs=(R) 8192B-8192B, (W) 8192B-8192B, (T) 8192B-8192B, ioengine=libaio, iodepth=1 ... fio-3.25 Starting 8 processes seqread: Laying out IO file (1 file / 1024MiB) seqread: Laying out IO file (1 file / 1024MiB) seqread: Laying out IO file (1 file / 1024MiB) seqread: Laying out IO file (1 file / 1024MiB) seqread: Laying out IO file (1 file / 1024MiB) seqread: Laying out IO file (1 file / 1024MiB) seqread: Laying out IO file (1 file / 1024MiB) seqread: Laying out IO file (1 file / 1024MiB) Jobs: 8 (f=8) seqread: (groupid=0, jobs=8): err= 0: pid=1319720: Sat Jan 8 11:52:23 2022 read: IOPS=686k, BW=5361MiB/s (5622MB/s)(8192MiB/1528msec) slat (usec): min=3, max=661, avg= 9.33, stdev=20.50 clat (nsec): min=661, max=59010, avg=878.93, stdev=459.30 lat (usec): min=4, max=664, avg=10.34, stdev=20.69 clat percentiles (nsec): | 1.00th=[ 748], 5.00th=[ 764], 10.00th=[ 772], 20.00th=[ 780], | 30.00th=[ 788], 40.00th=[ 796], 50.00th=[ 804], 60.00th=[ 812], | 70.00th=[ 828], 80.00th=[ 844], 90.00th=[ 1020], 95.00th=[ 1352], | 99.00th=[ 1656], 99.50th=[ 1960], 99.90th=[ 9792], 99.95th=[11456], | 99.99th=[15936] bw ( MiB/s): min= 5423, max= 5743, per=100.00%, avg=5583.70, stdev=23.59, samples=16 iops : min=694204, max=735222, avg=714713.00, stdev=3019.26, samples=16 lat (nsec) : 750=0.84%, 1000=89.04% lat (usec) : 2=9.66%, 4=0.31%, 10=0.06%, 20=0.09%, 50=0.01% lat (usec) : 100=0.01% cpu : usr=17.78%, sys=82.04%, ctx=47, majf=16, minf=153 IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued rwts: total=1048576,0,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0, window=0, percentile=100.00%, depth=1 Run status group 0 (all jobs): READ: bw=5361MiB/s (5622MB/s), 5361MiB/s-5361MiB/s (5622MB/s-5622MB/s), io=8192MiB (8590MB), run=1528-1528msec [email protected][/mnt/Cache]#
Based on FIO, I am receiving the full expected performance from a single drive. But- this doesn’t tell me if I am receiving full performance from multiple drives. SO…. lets benchmark my Flash pool, which is mirrored across two NVMes on this card.
truenas[/mnt/Flash]# fio --name=seqread --rw=read --direct=1 --ioengine=libaio --bs=8k --numjobs=24 --size=1G --runtime=600 --group_reporting seqread: (g=0): rw=read, bs=(R) 8192B-8192B, (W) 8192B-8192B, (T) 8192B-8192B, ioengine=libaio, iodepth=1 ... fio-3.25 Starting 24 processes seqread: Laying out IO file (1 file / 1024MiB) seqread: Laying out IO file (1 file / 1024MiB) seqread: Laying out IO file (1 file / 1024MiB) seqread: Laying out IO file (1 file / 1024MiB) seqread: Laying out IO file (1 file / 1024MiB) seqread: Laying out IO file (1 file / 1024MiB) seqread: Laying out IO file (1 file / 1024MiB) seqread: Laying out IO file (1 file / 1024MiB) Jobs: 24 (f=21): [R(8),f(1),R(6),f(2),R(7)][-.-%][r=7881MiB/s][r=1009k IOPS][eta 00m:00s] seqread: (groupid=0, jobs=24): err= 0: pid=1663699: Sat Jan 8 12:06:40 2022 read: IOPS=959k, BW=7490MiB/s (7854MB/s)(24.0GiB/3281msec) slat (usec): min=3, max=11418, avg=20.72, stdev=52.72 clat (nsec): min=735, max=582467, avg=1466.79, stdev=1155.76 lat (usec): min=4, max=11431, avg=22.42, stdev=53.14 clat percentiles (nsec): | 1.00th=[ 804], 5.00th=[ 820], 10.00th=[ 828], 20.00th=[ 852], | 30.00th=[ 876], 40.00th=[ 980], 50.00th=[ 1224], 60.00th=[ 1384], | 70.00th=[ 1576], 80.00th=[ 1880], 90.00th=[ 2448], 95.00th=[ 3056], | 99.00th=[ 4768], 99.50th=[ 5600], 99.90th=[ 8384], 99.95th=[12224], | 99.99th=[27008] bw ( MiB/s): min= 6418, max= 9478, per=100.00%, avg=7725.06, stdev=53.06, samples=135 iops : min=821626, max=1213256, avg=988808.13, stdev=6791.08, samples=135 lat (nsec) : 750=0.01%, 1000=40.74% lat (usec) : 2=41.92%, 4=15.39%, 10=1.88%, 20=0.04%, 50=0.02% lat (usec) : 100=0.01%, 250=0.01%, 500=0.01%, 750=0.01% cpu : usr=13.68%, sys=85.88%, ctx=1005, majf=0, minf=522 IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued rwts: total=3145728,0,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0, window=0, percentile=100.00%, depth=1 Run status group 0 (all jobs): READ: bw=7490MiB/s (7854MB/s), 7490MiB/s-7490MiB/s (7854MB/s-7854MB/s), io=24.0GiB (25.8GB), run=3281-3281msec
Nearly 8GB/s. I am satisfied with this performance. This should have no issues at all saturating a 40G connection with iSCSI traffic.
Lastly, I wanted to perform a simple write test to see how well it performs. SINCE, there is ZFS overhead when writing, checksums, etc… and a mirror has to write to both disks at once, I am not expecting dramatic numbers here.
truenas[/mnt/Flash]# fio --name=seqwrite --rw=write --direct=1 --ioengine=libaio --bs=8k --numjobs=16 --size=1G --runtime=600 --group_reporting seqwrite: (g=0): rw=write, bs=(R) 8192B-8192B, (W) 8192B-8192B, (T) 8192B-8192B, ioengine=libaio, iodepth=1 ... fio-3.25 Starting 16 processes seqwrite: Laying out IO file (1 file / 1024MiB) seqwrite: Laying out IO file (1 file / 1024MiB) seqwrite: Laying out IO file (1 file / 1024MiB) seqwrite: Laying out IO file (1 file / 1024MiB) seqwrite: Laying out IO file (1 file / 1024MiB) seqwrite: Laying out IO file (1 file / 1024MiB) seqwrite: Laying out IO file (1 file / 1024MiB) seqwrite: Laying out IO file (1 file / 1024MiB) seqwrite: Laying out IO file (1 file / 1024MiB) seqwrite: Laying out IO file (1 file / 1024MiB) seqwrite: Laying out IO file (1 file / 1024MiB) seqwrite: Laying out IO file (1 file / 1024MiB) seqwrite: Laying out IO file (1 file / 1024MiB) seqwrite: Laying out IO file (1 file / 1024MiB) seqwrite: Laying out IO file (1 file / 1024MiB) seqwrite: Laying out IO file (1 file / 1024MiB) Jobs: 16 (f=16): [W(16)][100.0%][w=3218MiB/s][w=412k IOPS][eta 00m:00s] seqwrite: (groupid=0, jobs=16): err= 0: pid=1680991: Sat Jan 8 12:07:53 2022 write: IOPS=398k, BW=3110MiB/s (3261MB/s)(16.0GiB/5268msec); 0 zone resets slat (usec): min=6, max=108192, avg=36.06, stdev=429.09 clat (nsec): min=834, max=17356k, avg=1526.14, stdev=15762.08 lat (usec): min=7, max=108208, avg=37.90, stdev=429.63 clat percentiles (nsec): | 1.00th=[ 916], 5.00th=[ 940], 10.00th=[ 956], 20.00th=[ 988], | 30.00th=[ 1048], 40.00th=[ 1160], 50.00th=[ 1304], 60.00th=[ 1448], | 70.00th=[ 1592], 80.00th=[ 1768], 90.00th=[ 2128], 95.00th=[ 2608], | 99.00th=[ 4320], 99.50th=[ 5600], 99.90th=[11840], 99.95th=[16768], | 99.99th=[33536] bw ( MiB/s): min= 2074, max= 3972, per=100.00%, avg=3145.32, stdev=37.36, samples=158 iops : min=265492, max=508464, avg=402599.37, stdev=4781.81, samples=158 lat (nsec) : 1000=22.22% lat (usec) : 2=64.94%, 4=11.60%, 10=1.10%, 20=0.13%, 50=0.01% lat (usec) : 100=0.01%, 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.01% lat (msec) : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01% cpu : usr=8.90%, sys=66.20%, ctx=41092, majf=0, minf=402 IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued rwts: total=0,2097152,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0, window=0, percentile=100.00%, depth=1 Run status group 0 (all jobs): WRITE: bw=3110MiB/s (3261MB/s), 3110MiB/s-3110MiB/s (3261MB/s-3261MB/s), io=16.0GiB (17.2GB), run=5268-5268msec
But, I am quite pleased to report around 3GB/s.
Based on previous benchmarks I have performed against 970 evo NVMes… these numbers are extremely good.
Results
Based on the small amount of performance testing, I believe these devices are meeting my expectations of performance. Everything seems to be working at the same speed I was achieving before, and I successfully freed up 3 entire PCIe slots on my server.
You may or may not have noticed- but, in a previous post, I powered down my brocade ICX6610 which was providing the 40Gbit connectivity. You may have also noticed, I reinstalled the 40G ConnectX-3 NIC during this post….
And- moreso, if you have paid attention to my reddit comments, you may have picked up future 40Gbit connectivity to my main computer. To confirm, I do have 50 feet of fiber cable in the mail to complete this project… So, if anyone is curious to see iSCSI benchmarks over 40Gbit, stay tuned! Also- you may notice earlier in this post, I ordered TWO quad NVMe cards…. More to come on this as well.
Very cool. Question, are you actually booting off one of these SSDs? I’m currently booting off the dual SD adapter in my R720xd, but I’d love to setup one of these cards with some SSD boot volumes off a pcie slot vs having to use Clover bootloader or something of that nature.
I do not boot from them.
I personally have a pair of cheap 15k SAS drives in the rear 2.5″ bays, from which I have a mirrored boot pool.
If your goal was to boot from em, you would still need to use clover. Or, if SSDs are preferred, you can install a pair of sata SSDs in the rear 2.5″ bays and boot from there pretty easily.
I’ve got a pair of SSDs in my flex bays on an H710 in one of the PCIE slots for an ESXI vmstore, but now that I think about it I have not tried booting off of it. When I go to swap over to TrueNAS Scale I guess I’ll see if I can actually boot off of those and go that route, but knowing I can toss a bunch of m.2 cards in a single slot for cache purposes is pretty great. Thanks for the post!
Sometime later this week, I am going to add another pair of 1TB 980 pro NVMes….. and a 100GBe Chelsio T6 NIC.
The extra places to hold NVMe is going to come in pretty handy for me.
It looks like all the x8 m.2 cards are sold out (except asmedia but that needs bifurcation). Would one of the x16 cards work in a x8 slot?
Thanks to the complete and utter lack of any documentation, its hard to say.
The AliExpress page just says, PCIe x16. Based on the data sheet for the listed PLX chip here: https://docs.broadcom.com/doc/12351855
I would say there is a possibility it would work. But, I cannot confirm, nor deny if it actually would work.