WORKAROUND : Horrible network performance of R8125B when connected to a 1GbE switch

Post Reply
caramb
Posts: 13
Joined: Sun Oct 18, 2020 8:32 pm
languages_spoken: english french
ODROIDs: Odroid-H2+
Has thanked: 0
Been thanked: 3 times
Contact:

WORKAROUND : Horrible network performance of R8125B when connected to a 1GbE switch

Post by caramb »

Hi there,

I'm sharing with you a workaround to the terrible network performance issue of two Realtek 8125B (running on Linux) talking to each other through a 1Gb/s switch.
Performance seen using the Realtek module (version 9.003.05-1 DKMS in my case) can be as low as 100Mb/s over a 1Gb/s link.

I spent a fair amount of time trying to understand what the problem was.
The test topology is quite simple : 2 H2+, 1 third-party server (HP Microserver Gen8), 1 Gigabitswitch (1 entry level TPLink TL-SG108E).
Performance between the thrid-party and any of the H2+ is okay (>800/900Mb/s).
Performance between the two H2+ is terrific (~100Mb/s).
These results would puzzle any experienced network engineer...

Thanks to these posts
viewtopic.php?t=39323
viewtopic.php?t=39979
I understood it was a driver problem ; severals users claimed the issue is fixed using the driver from beta Linux kernel 5.9.

As others already stated, please do not spend(waste) money buying different 1Gigabit switches, it surely won't help.

Surprisingly, nobody seems to have backported the fix to the stable kernel branches ; that what I'm gonna focus on in this post.
In fact, it is not a fix, it is about adding support for chipset Realtek 8125 HW revision B to the r8169 kernel module so we can get rid of the realtek r8125 module which seems to have broken implemenrtation.

Note : Please don't blame me if something goes wrong. I'm not a kernel developer ; not even a developer ; merely a dirty patcher. I have my 2 H2+ nodes running for more than 1 week without experiencing any issue. Please do not consider this production-ready without performing extensive stability tests beforehand.

I split my post into pieces :
- Part 1 : Load ready-to-use modules on Promox system.
- Part 2 : Build the modules for your system from the patched sources.
- Part 3 : Directions to patch by yourself.
- Part 4 : Performance figures ; before/after

Edit 1 (20201024) : Realtek released a new version of their module : version 9.004.01 (thank you lhb035 for pointing this). As of now, I did not test so cannot state if the problem is still present.
Edit 2 (20201120) : Built the modules against Proxmox kernel 5.4.73-1-pve and attachted them to the "Part 1" post.
Edit 3 [20201203] : Built the modules against Proxmox kernel 5.4.78-1-pve and attachted them to the "Part 1" post.
Edit 4 [20201215] : Built the modules against Proxmox kernel 5.4.78-2-pve and attachted them to the "Part 1" post.
Edit 5 [20210223] : WARNING WARNING : ]Proxmox kernel 5.4.98-1-pve switched from ubuntu-focal to ubuntu-hirsute kernel source tree. This kernel includes the "reworked" realtek module (kernel 5.9/5.10 style) thus my patch is no longer compatible. I'm investigating this ; already found that it should be safe to directly import the r8169_main.c file from stable kernel 5.10.18. At the time being, the modules I built fail to load due to missing symver. So better to stick with 5.4.78-2-pve. Did not gave a try with the latest DKMS driver from Realtek.
Edit 6 [20210224] : Ok, found the root cause. Proxmox started developping a 5.10 kernel ; thus cloning the master git branch was in fact cloning the wrong kernel source ; so I was building a 5.10 module (and pve kernel 5.4.98-1 is still based on ubuntu-focal sources). Fixed that specifying the 5.4 branch in the git clone command (I updated "Part 2" post to reflect that). Thus my regular patched source for rtl8125B properly complies against the 5.4.98 pve kernel. Built the modules against Proxmox kernel 5.4.98-1-pve and attachted them to the "Part 1" post.
Edit 7 [20210313] : Built the modules against Proxmox kernel 5.4.103-1-pve and attachted them to the "Part 1" post.
Edit 8 [20210326] : Built the modules against Proxmox kernel 5.4.106-1-pve and attachted them to the "Part 1" post.
Last edited by caramb on Sat Mar 27, 2021 2:37 am, edited 32 times in total.
These users thanked the author caramb for the post:
doughnut (Sun Jan 10, 2021 5:35 am)

caramb
Posts: 13
Joined: Sun Oct 18, 2020 8:32 pm
languages_spoken: english french
ODROIDs: Odroid-H2+
Has thanked: 0
Been thanked: 3 times
Contact:

Part 1 : Easy way for Proxmox 5.4.65-1-pve kernel

Post by caramb »

For those who are running proxmox kernel 5.4.65-1-pve on the H2+ board, find attached the ready-to-use modules.

Proceeding should be straigth forward.

1) First, confirm what driver you are running.

Code: Select all

lspci -k
should show you something similar to

Code: Select all

02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. Device 8125 (rev 04)
        Subsystem: Realtek Semiconductor Co., Ltd. Device 0123
        Kernel driver in use: r8125
        Kernel modules: r8169, r8125
03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. Device 8125 (rev 04)
        Subsystem: Realtek Semiconductor Co., Ltd. Device 0123
        Kernel driver in use: r8125
        Kernel modules: r8169, r8125
Driver in use should be r8125 whenever you installed the driver provided by realtek following the Odroid application note.
If no "driver in use" appears, you are running stock kernel with stock r8169 module which do not recognize revision B of the 8125 chipset.

2) Prerequisite : Ensure you have access to the console of the H2+. You'll loose remote access when the new modules are tested (except of course if you access the box through a non-realtek network adapter).

3) Please download the modules attached to this post (and upload them to the H2+).
There are 2 modules :
- realtek.ko : contains the drivers for the PHY part of the network drivers.
- r8169.ko : drivers for many realtek chips including 8125.
and a companion firmware.

4) Unzip the archive in the home directory of user root

Code: Select all

unzip ./modules.zip
You may need to install the unzip binary

Code: Select all

apt-get install unzip
5) Install the new drivers and backup the old ones :

Code: Select all

mv /lib/modules/5.4.65-1-pve/kernel/drivers/net/phy/realtek.ko /lib/modules/5.4.65-1-pve/kernel/drivers/net/phy/realtek.ko.orig
mv ~root/realtek.ko /lib/modules/5.4.65-1-pve/kernel/drivers/net/phy/
mv /lib/modules/5.4.65-1-pve/kernel/drivers/net/ethernet/realtek/r8169.ko /lib/modules/5.4.65-1-pve/kernel/drivers/net/ethernet/realtek/r8169.ko.orig
mv ~root/r8169.ko /lib/modules/5.4.65-1-pve/kernel/drivers/net/ethernet/realtek/
5bis) Another required thing is to load the firmware associated to the 8125B chip.
Required file is : rtl8125b-2.fw (firmware version 2 for 8125 revision B).
You may use the copy provided in the zip file you downaloded or load it from the kernel git :
Online method :

Code: Select all

cd /lib/firmware/rtl_nic/
wget https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/plain/rtl_nic/rtl8125b-2.fw
Offline method :

Code: Select all

mv ~root/rtl8125b-2.fw /lib/firmware/rtl_nic/
6) Quick integrity check : You may want to check the sha256 hashes of your files against the ones bellow :

Code: Select all

sha256sum /lib/modules/5.4.65-1-pve/kernel/drivers/net/phy/realtek.ko
1b7e7aa833e3d4b0d2f0081591e638da82520593c9f0cca5175cabeb4282625c  /lib/modules/5.4.65-1-pve/kernel/drivers/net/phy/realtek.ko

sha256sum /lib/modules/5.4.65-1-pve/kernel/drivers/net/ethernet/realtek/r8169.ko
bc86d4a1fd477603ea14f6911b2d8d95b69b3dbdf10a3f409d9fe4f2794c1d0d  /lib/modules/5.4.65-1-pve/kernel/drivers/net/ethernet/realtek/r8169.ko

sha256sum /lib/firmware/rtl_nic/rtl8125b-2.fw
529bf1c25c97ff52b401090d00ff89cc22351012336e5a0c9662728a3ee909ef  /lib/firmware/rtl_nic/rtl8125b-2.fw
6bis) Confirm module r8169 is not blacklisted (you may have done so when installing the r8125 module).

Code: Select all

grep r8169 /etc/modprobe.d/*
Should give no result ; if not, remove from the corresponding file.

7) Let's give a try : switch to the cnsole of the H2+ board.

!!! WARNING WARNING !!! : This may hurt your promox ; especially if you are running a cluster and/or a ceph cluster. Please ensure you are in a position you can safely break the network connectivity before proceeding. I strongly suggest you gently shutdown any running VM on this host and confirm quorum will be maintained if running a cluster.

Let's unload the old modules first :

Code: Select all

rmmod r8169
rmmod r8125
rmmod realtek
Check :

Code: Select all

lsmod | grep r8
lsmod | grep ^realtek
should give no result.

Load the new modules :

Code: Select all

modprobe realtek
modprobe r8169
8) Confirm the new driver has properly recognized the chip :

Code: Select all

lsmod | grep r8
should show you something similar to :

Code: Select all

r8169                  90112  0

Code: Select all

lsmod | grep ^realtek
should show you something similar to :

Code: Select all

realtek                24576  2
And most importantly, the kernel should tell you it uses the new drivers :

Code: Select all

lspci -k

Code: Select all

02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. Device 8125 (rev 04)
        Subsystem: Realtek Semiconductor Co., Ltd. Device 0123
        Kernel driver in use: r8169
        Kernel modules: r8169, r8125
03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. Device 8125 (rev 04)
        Subsystem: Realtek Semiconductor Co., Ltd. Device 0123
        Kernel driver in use: r8169
        Kernel modules: r8169, r8125
If the kernel states "Kernel driver in use: r8169" then it 's a win !

9) Confirm the system shows the Ethernet adapters :

Code: Select all

ip link list
you should see the enp2s0 and enp3s0 back. If not, dmesg should give you additional information.

As we are manually playeing with the network, they may recover in a disabled state ; enable manualy :

Code: Select all

ip link set enp2s0 up
ip link set enp3s0 up
10) You should now be able to perform basic ping tests and performance tests.
You should also note that Proxmox is smoothly recovering.

10bis) At this point, if something went wrong, a simple reboot of the system will switch back to the 8125 module (assuming you were running on before).

11) Assuming everythnig is okay, we just need to make this persistent across reboots.
This requires updating the init ramdisk.
Whenever you previously installed the r8125 module, you don't have to worry, the r8169 takes precedence ; there is no need to uninstall.


update the ramdisk

Code: Select all

update-initramfs -u
11) Reboot and enjoy !
Attachments
modules-5.4.106-1-pve.zip
(527.87 KiB) Downloaded 16 times
modules-5.4.103-1-pve.zip
(527.87 KiB) Downloaded 11 times
modules-5.4.98-1-pve.zip
(527.81 KiB) Downloaded 24 times
modules-5.4.78-2-pve.zip
(527.87 KiB) Downloaded 52 times
modules-5.4.78-1-pve.zip
(527.87 KiB) Downloaded 24 times
modules-5.4.73-1-pve.zip
(528.04 KiB) Downloaded 45 times
modules.zip
(529.86 KiB) Downloaded 69 times
Last edited by caramb on Sat Mar 27, 2021 2:36 am, edited 30 times in total.
These users thanked the author caramb for the post:
CarminaBurana (Wed Dec 09, 2020 10:37 pm)

caramb
Posts: 13
Joined: Sun Oct 18, 2020 8:32 pm
languages_spoken: english french
ODROIDs: Odroid-H2+
Has thanked: 0
Been thanked: 3 times
Contact:

Part 2 : almost DIY section

Post by caramb »

If you are not running Proxmox or as soon as Proxmox updates its kernel, the modules provided in the Part1 post will stop working.
(building a DKMS aware module should solve that but I'm not familiar with and I'm missing the time to do this).
Thus this post is explaning the process I went trhough to build the modules.

Huge greetings go to a lot of people who shared very valuable information through various posts. Picking from each led me to a successful build.
viewtopic.php?t=39323
viewtopic.php?t=39979
https://askubuntu.com/questions/1259947 ... g-on-20-04
https://forum.proxmox.com/threads/compi ... eck.36374/
https://stackoverflow.com/questions/284 ... nux-kernel
https://yoursunny.com/t/2018/one-kernel-module/

My assumption is the sources I'm providing will build against any 5.4 sources and probably other kernel branches.
They won't build againt branch 5.9 because there was a rework of the realtek driver code.

For steps 6 and 7, I cannot provide with instructions that will fit any linux distribution ; I can only give directions...

Here we go :

1) Access your Linux machine (proxmox node or whatever other distro you play with).
To avoid any confusion later in the process, you may check the kernel version tou are currently running on :

Code: Select all

uname -a
2) Download the zip file attached to this post containing the modules sources.

3) Upload the zip file to your Linux box in the root homedir

4) Unzip the file.

5) Ensure your system is up-to-date
Warning : Please do not perform blind updates if the system is running merly critical services.
Debian/Promox

Code: Select all

apt-get update
Centos/Fedora

Code: Select all

yum update or dnf update
6) Make your system capable of building the kernel : method will vary depending on the distro you use. Bellow works for Proxmox ; alternative should work for debian :
Promox :

Code: Select all

apt-get install git nano screen patch fakeroot build-essential devscripts libncurses5 libncurses5-dev libssl-dev bc flex bison libelf-dev libaudit-dev libgtk2.0-dev libperl-dev asciidoc xmlto gnupg gnupg2 rsync lintian debhelper libdw-dev libnuma-dev libslang2-dev sphinx-common asciidoc-base automake cpio dh-python file gcc kmod libiberty-dev libpve-common-perl libtool perl-modules python-minimal sed tar zlib1g-dev lz4
Debian

Code: Select all

apt install build-essential fakeroot dpkg-dev perl libssl-dev bc gnupg dirmngr libncurses5-dev libelf-dev flex bison lsb-release rsync

7) Retrieve your kernel source (method depends on your Linux distribution) ; it's the most tricky part as you must get the source from the exact kernel build you are running (because every module includes a vermagic flag that tells it which version of the kernel it was made for) :
Following instruction works for Proxmox :

Code: Select all

cd /usr/src/
#git clone git://git.proxmox.com/git/pve-kernel.git
# As there are various kernel branches I recommand specifying the branch. Not doing so and thus cloning the master branch may lead to clone the wrong kernel.
git clone --branch 'pve-kernel-5.4' git://git.proxmox.com/git/pve-kernel.git
cd /usr/src/pve-kernel
make
This step requires a fair amount of time ; however, we don't need to wait for the entire compliation to finish ; we just need to be sure the kernel sources are there and up-to-date (patched).
You'll see the proceess starts downloading the kernel (ubuntu-focal) sources, then some zfs source... wait until it starts really building the kernel and break it (ctrl-C).

For other distributions, please google the internet (This step also assumes you retrieved the kernel config).

8) If you came to this point, we probably went over the most difficult part.
In the export statement bellow, please replace "/usr/src/pve-kernel/build/ubuntu-focal" by your kernel source directory.

Code: Select all

export MYKERNELROOTDIR="/usr/src/pve-kernel/build/ubuntu-focal"
This sets the MYKERNELROOTDIR variable to your kernel directory to acomodate your specific situation.
Please be sure to run all the next commands into the same linux terminal.

Before jumping to the next step, please check that :

Code: Select all

echo $MYKERNELROOTDIR
outputs the directory of your kernel source and that a :

Code: Select all

ls $MYKERNELROOTDIR 
gives no error and list the kernel source root dir content.

8bis) So now, let's put the new source in place.

Code: Select all

mv $MYKERNELROOTDIR/drivers/net/phy/realtek.c $MYKERNELROOTDIR/drivers/net/phy/realtek.c.orig
mv ~root/realtek.c $MYKERNELROOTDIR/drivers/net/phy/
mv $MYKERNELROOTDIR/drivers/net/ethernet/realtek/r8169_main.c $MYKERNELROOTDIR/drivers/net/ethernet/realtek/r8169_main.c.orig
mv ~root/r8169_main.c $MYKERNELROOTDIR/drivers/net/ethernet/realtek/
9) Let's build selectively the modules (avoiding the time-consumming task of building the entire kernel).
We cannot ask for a single module compilation but we ask for a single directory compliation ; which saves us hours

Code: Select all

cd $MYKERNELROOTDIR/
make scripts prepare modules_prepare
make -C . M=drivers/net/phy
make -C . M=drivers/net/ethernet/realtek
10) If build was successful, we can check if the vermagic is correct using modinfo :

Code: Select all

modinfo $MYKERNELROOTDIR/drivers/net/phy/realtek.ko
modinfo $MYKERNELROOTDIR/drivers/net/ethernet/realtek/r8169.ko
The "vermagic:" line should expose the same version as the output of a "uname -r"

11) As a final step, retrieve the module and follow the instruction of post "Part 1" to install them.
The 2 files are :

Code: Select all

$MYKERNELROOTDIR/drivers/net/phy/realtek.ko
$MYKERNELROOTDIR/drivers/net/ethernet/realtek/r8169.ko
Attachments
sources.zip
(42.8 KiB) Downloaded 39 times
Last edited by caramb on Thu Feb 25, 2021 4:57 pm, edited 24 times in total.

caramb
Posts: 13
Joined: Sun Oct 18, 2020 8:32 pm
languages_spoken: english french
ODROIDs: Odroid-H2+
Has thanked: 0
Been thanked: 3 times
Contact:

Part 3 : DIY section (the hard way)

Post by caramb »

Whenever the previously "Part 2 : almost DIY" method fails or if you'd like to patch yourself, the best I can do for you is to share what I did to modify the sources.

Reminder : only 3 files need modification :
- r8169_main.c : main driver.
- realtek.c : PHY subpart of the driver.
- rtl8125b-2.fw : specific firmware required by the driver.

rtl8125b-2.fw : easy, if file does not exist, just add it to your system and that's it (please refer to post "Part 1").

realtek.c : Should be easy. After manually comparing the file from my local Linux distro (/<kernel source>/drivers/net/phy/realtek.c) to the kernel 5.9 source git (https://git.kernel.org/pub/scm/linux/ke ... h=v5.9-rc8), I came to the conclusion that local file can safely be replaced by the 5.9 kernel source (file from 5.9 only contains additionnal devices support).

r8169_main.c : This one needs more work.

Your (assumed not impossible) mission is to backport the follwing patch to your kernel : https://git.kernel.org/pub/scm/linux/ke ... 6d140ce2ef

If your source already has support for 8125A chip, it will be "quite" straight forward. It is not a strict prerequisite but not having 8125A support will force you to backport additional code and greatly increases the difficulty.

Here are some tips for patching the source :
- Most kernel do not have r8169_phy_config.c file ; changes take place directly in the r8169_main.c
- To make code cleaner and avoid any confusion, you'll have to rename existing functions used by the 8125A code from <whatever>8125<whatever> to <whatever>8125a<whatever>
- Due to some code rework/refactoring, some functions in the 5.9 code do not exist in the 5.4 branch, here are some mappings :
- rtl_wait_txrx_fifo_empty => rtl_hw_init_8125
- rtl8169_cleanup => rtl_hw_initialiaze

You may also start from the source I've patched attached to previous post "Part 2".

Please follow the "Part 2" post instructions to build your modified modules.

Tips for debugging :
- Always check the dmesg ouput to get the feedback from the kernel.
- If your test fails with a "realtek.ko not loaded, maybe it needs to be added to initramfs?" error message, then you are not running the proper version of the realtek.ko module (the PHY module). Do even try to update the r8169.ko module without also updating the realtek.ko module ; they are thightly coupled.
- If modprobe fails with a "Exec format error" error message, your module may be built for the wrong kernel version. Another cause (but I'm not 100% sure about) would be that you strip the module.

Good luck !
Last edited by caramb on Tue Oct 20, 2020 3:33 am, edited 14 times in total.

caramb
Posts: 13
Joined: Sun Oct 18, 2020 8:32 pm
languages_spoken: english french
ODROIDs: Odroid-H2+
Has thanked: 0
Been thanked: 3 times
Contact:

Part 4 : Results / performance.

Post by caramb »

Here are some figures regarding the performance.

Test setup :
- pve node : Proxmox on HP Microserver Gen8 (Intel 1gig onboard adapter)
- pve2 node : Promox on Odroid-H2+
- pve3 node : Promox Odroid-H2+
- 1GbE switch : TPLink TL-SG108E
- test software : regular iperf3
- All 3 nodes running Linux kernel 5.4.65-1-pve
- In any case, the nodes are not completely idle ; they are running a proxmox cluster, some VMs and a Ceph cluster.


Results with module r8125 (realtek-r8125-9.003.05) (DKMS version) :
- pve to pve2 : 874Mb/s
- pve2 to pve : 725Mb/s
- pve to pve3 : 858Mb/s
- pve3 to pve : 735Mb/s
- pve2 to pve3 : 97Mb/s !!!!!! :twisted:
- pve3 to pve2 : 105Mb/s !!!!!! :twisted:

As I'm running a Ceph cluster (yes, the H2+ can do that assuming you don't put heavy load on it), this was killing the performance of the cluster.


Result with r8169 patched-by-myself module :
- pve to pve2 : 942Mb/s
- pve2 to pve : 836Mb/s
- pve to pve3 : 943Mb/s
- pve3 to pve : 846Mb/s
- pve2 to pve3 : 890Mb/s :D
- pve3 to pve2 : 899Mb/s :D

These patched modules also bring flow-control support to the r8125B as well as greater ethtool support.


You may think sub-900Mb/s over a 1Gig link is a suboptimal result. It may or may not be a problem that I will probably won't dig into (may be due to the flow-control wich was enabled).
My concern was the 100Mb/s performance I was achieving with the vendor module ; relative to this point : job is done.


As I don't own a 2.5GbE capable switch I was not able to test this topology.
However, I tested the patched module in a back-to-back 2.5GbE configuration (thus pve2 to pve3 direct connection).
Iperf3 gives something around 2.3Gb/s which sounds reasonnable to me.


Sidenote regarding the side effect of this network performance on the Ceph performance :
- I'm running a Win10pro VM on each of the proxmox nodes ; all storage on the Ceph cluster.
- Due to the asymetrical performance, I was getting weird results when I was running a CrystalDiskMark on the Windows
- The VM on node pve was getting the 150/50MB/s read/write performance.
- The 2 VMs on nodes pve2 and pve3 were getting 35/15MB/s only.
- After switching to the patched r8169 module, VMs on pve2 et pve3 got similar performance to the pve node.
- Please do not troll, these figures are still fairly low but acceptable considering the low cost of the entire system, the very low power conumption, the almost no-noise advantage (the 2 H2+ are fanless, the HP fan runs <20%)... and it is just mainly a homelab. Hummm, the H2+ is a really great and very capable x86 SBC ; probably the only one supporting 32GB of ram.
Last edited by caramb on Mon Oct 19, 2020 3:03 pm, edited 19 times in total.

caramb
Posts: 13
Joined: Sun Oct 18, 2020 8:32 pm
languages_spoken: english french
ODROIDs: Odroid-H2+
Has thanked: 0
Been thanked: 3 times
Contact:

Part 5 : Footnotes

Post by caramb »

As of now : nothing to say
Except that I miserably failed posting this in the N2/N2+ section instead of H2/H2+ ; asked the admin to fix that :lol: .... and the great admin did that :P
Last edited by caramb on Wed Mar 17, 2021 1:25 am, edited 3 times in total.

User avatar
odroid
Site Admin
Posts: 37286
Joined: Fri Feb 22, 2013 11:14 pm
languages_spoken: English, Korean
ODROIDs: ODROID
Has thanked: 1749 times
Been thanked: 1127 times
Contact:

Re: WORKAROUND : Horrible network performance of R8125B when connected to a 1GbE switch

Post by odroid »

Moved from N2 sub-forum.
Thank you for sharing a nice instruction for Proxmox users.

lhb035
Posts: 5
Joined: Thu Aug 27, 2020 5:09 pm
languages_spoken: Chinese
ODROIDs: H2+
Has thanked: 0
Been thanked: 4 times
Contact:

Re: WORKAROUND : Horrible network performance of R8125B when connected to a 1GbE switch

Post by lhb035 »

Hello caramb,

Follow your tutorial, But my H2+ show "Kernel modules: r8169" not "Kernel modules: r8169, r8125"

I us Openwrt test NAT speed only 20MB/s .



Part 1 : Easy way for Proxmox 5.4.65-1-pve kernel------8) Confirm the new driver has properly recognized the chip :


# uname -a
Linux h2pve 5.4.65-1-pve #1 SMP PVE 5.4.65-1 (Mon, 21 Sep 2020 15:40:22 +0200) x86_64 GNU/Linux

# lspci -k

02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. Device 8125 (rev 04)
Subsystem: Realtek Semiconductor Co., Ltd. Device 0123
Kernel driver in use: r8169
Kernel modules: r8169
03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. Device 8125 (rev 04)
Subsystem: Realtek Semiconductor Co., Ltd. Device 0123
Kernel driver in use: r8169
Kernel modules: r8169




and realtek update the driver to 9.004.01
https://www.realtek.com/en/component/zo ... s-software
2.5G Ethernet LINUX driver r8125 for kernel up to 5.6 9.004.01 2020/10/19 75 KB

caramb
Posts: 13
Joined: Sun Oct 18, 2020 8:32 pm
languages_spoken: english french
ODROIDs: Odroid-H2+
Has thanked: 0
Been thanked: 3 times
Contact:

Re: WORKAROUND : Horrible network performance of R8125B when connected to a 1GbE switch

Post by caramb »

Hi lhb035,

First, thank you for pointing me to the new Realtek driver ; I may give a try latter ; as of now, I'm satisfied with the native kernel module I patched.

Respective to your "lspci -k" output, I would say it looks good.
You have "Kernel driver in use: r8169" ; which means your Linux kernel found the modified r8169 module, considered it was able to drive the realtek chip of the H2+.
The lack of r8125 is harmless ; it does suggest you just did not install the realtek module ; but we don't need it.

To clarify, I would say r8169 module and r8125 module are competitors ; the first is the native Linux kernel module, the other is the vendor (realtek) proprietary module.
(Don't be confused by the module name, the r8169 module has support for numerous Realtek chips including 8125...)
You may have the two modules installed on your system (it was my case) but you'll use only one.

If you want to give a try to the native kernel module (r8169) I patched, you are on the right track. As the module is loaded, "ip link list" should show you the interfaces. If not, check the ouput of "dmesg".


If you want to give a try to the Realtek module (r8125), it's a different story : just give up with my "Part x" posts.
Hardkernel provided instructions for that here : https://wiki.odroid.com/odroid-h2/appli ... _on_h2plus

Bellow is what I did to initally install the DKMS version of the Realtek module :

Prepare your system :

Code: Select all

apt-get install dkms
apt install pve-headers-$(uname -r)
Download the debian package on your H2+ :

Code: Select all

wget https://github.com/awesometic/realtek-r8125-dkms/releases/download/9.004.01-1/realtek-r8125-dkms_9.004.01-1_amd64.deb
Install the dkms module :

Code: Select all

dpkg -i ./realtek-r8125-dkms_9.004.01-1_amd64.deb
Verify :

Code: Select all

lspci -k
Make the module available to all your kernel versions and persistent across reboots :

Code: Select all

update-initramfs -k all -u
When there is a kernel update on the proxmox side, one needs to retrigger the dkms build process :

Code: Select all

apt install pve-headers-$(uname -r)
ls /var/lib/initramfs-tools | sudo xargs -n1 /usr/lib/dkms/dkms_autoinstaller start
update-initramfs -k all -u

lhb035
Posts: 5
Joined: Thu Aug 27, 2020 5:09 pm
languages_spoken: Chinese
ODROIDs: H2+
Has thanked: 0
Been thanked: 4 times
Contact:

Re: WORKAROUND : Horrible network performance of R8125B when connected to a 1GbE switch

Post by lhb035 »

Hi caramb,
Thanks for your guidance and help,

After install the 9.004.01-1 DKMS driver, I tested NAT speed is allway slow...

environment:
H2+ 16GB ram*2 500G NVME SSD
AX88179 usb3.0 1000Mbps network adapter
PVE 6.2 5.4.65-1-pve

Openwrt 19.07.4 run in LXC

WAN port to linux bridge eth0 AX8819
LAN port tolinux bridge eth1 RTL8125
NAT speed 70MB/s
ksoftirqd 60% cpu used


WAN port to linux bridge eth0 AX8819
LAN port tolinux bridge eth1 AX8819
NAT speed 80MB/s

WAN port to linux bridge eth0 RTL8125
LAN port tolinux bridge eth1 RTL8125
NAT speed 20MB/s



The driver of the 8125 network card is too bad, I can only wait for realtek to update it...

Trilom
Posts: 1
Joined: Mon Nov 02, 2020 8:24 am
languages_spoken: english
ODROIDs: H2+
Has thanked: 0
Been thanked: 0
Contact:

Re: WORKAROUND : Horrible network performance of R8125B when connected to a 1GbE switch

Post by Trilom »

Thank you for taking your time to work on this. I have a dump of information that I hope helps you. The short version is that the r8169 improves performance on 1GbE networks without a doubt. When hooking up to at least my 10GbE network it is much better (actually works) than the r8125 driver however when it pushes it has issues with my configuration. I have not torn down the ovs_system off but can see the interfaces communicating as expected but the packet size on the vLANs in/out of the h2 seem funky and demonstrate weirdness as described below after r8169.

In the end this driver improves 1 GbE performance, but to use the 2.5 GbE pipe on my existing 10 GbE equipment it is not worthwhile. Do you have any indications to what might be the issue? The only differences in configuration I notice is that the interface on ovs-vsctl is explicitly set to 100000 on the 10GbE interface on r1 but is left unconfigured on the 2.5GbE interface on h2.

## with r8169 on 10GbE switch

One thing that frustrated me with the r8125 drivers was that I could not use the ports on a 10GbE switch since there is a lack of affordable 2.5GbE switches. With that being said I have some mixed results. With r8125 no matter what it wouldn't work on 10GbE switch, with r8169 and switch port set to auto-negotiation then it can connect. Whenever you push between h2 devices there are issues however when the h2 pushes to another device(r1) everything is fine(full 2.5GbE on 1500 and 9000 networks). If another device(r1) pushes to h2 then there are issues.

- Switch is a Unifi US-16-XG.
- [SFP+ interfaces are these iopolex ones that are pretty universal.](https://www.amazon.com/gp/product/B01M5LIUK5/)
- When interface is in autonegotiate it works fine.
- When interface is explicitly set to 2.5GbE it does not work.
- Going from r1 (10GbE host) to h2a(2.5GbE host on 10GbE interface) or h2b(2.5GbE host on 10GbE interface).
What do packets look like?
- when going to/from vlan23 (h2 to h2) in/out doesn't seem to get larger than 16000 or so
- when going to/from vlan23 (r1 to h2) vlan23(eno3 at 9k) sends at 9k from r1 and receives at 14/15k on h2a:vlan23(enp3s0 at 9k)
- when going to/from vlan23 (h2 to r1) vlan23(enp3s0 at 9k) sends at 62k from h2a and receives at 26k on r1:vlan23(eno3 at 9k)

### h2a to/from h2b

Code: Select all

# iperf3 (other host pushing to h2) on 9000 net is same as return
root@h2b:~# iperf3 -c h2a.s.ceph.trilhome.lan
Connecting to host h2a.s.ceph.trilhome.lan, port 5201
[  5] local 172.16.12.5 port 39072 connected to 172.16.12.4 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  17.1 MBytes   144 Mbits/sec  105   61.2 KBytes
[  5]   1.00-2.00   sec  15.4 MBytes   129 Mbits/sec   91   69.9 KBytes
[  5]   2.00-3.00   sec  16.5 MBytes   138 Mbits/sec   91   43.7 KBytes
[  5]   3.00-4.00   sec  14.6 MBytes   123 Mbits/sec  108   43.7 KBytes
[  5]   4.00-5.00   sec  15.0 MBytes   126 Mbits/sec  116   52.4 KBytes
[  5]   5.00-6.00   sec  16.1 MBytes   135 Mbits/sec  110   43.7 KBytes
[  5]   6.00-7.00   sec  14.5 MBytes   121 Mbits/sec   93   35.0 KBytes
[  5]   7.00-8.00   sec  15.2 MBytes   127 Mbits/sec  108   43.7 KBytes
[  5]   8.00-9.00   sec  14.6 MBytes   123 Mbits/sec   83   35.0 KBytes
[  5]   9.00-10.00  sec  15.7 MBytes   132 Mbits/sec  104   61.2 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   155 MBytes   130 Mbits/sec  1009             sender
[  5]   0.00-10.00  sec   154 MBytes   129 Mbits/sec                  receiver

iperf Done.
# iperf3 in the return path (h2 is pushing to other host) on 9000 is same as send
root@h2b:~#  iperf3 -c h2a.s.ceph.trilhome.lan -R
Connecting to host h2a.s.ceph.trilhome.lan, port 5201
Reverse mode, remote host h2a.s.ceph.trilhome.lan is sending
[  5] local 172.16.12.5 port 39120 connected to 172.16.12.4 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  15.4 MBytes   129 Mbits/sec
[  5]   1.00-2.00   sec  14.1 MBytes   118 Mbits/sec
[  5]   2.00-3.00   sec  14.3 MBytes   120 Mbits/sec
[  5]   3.00-4.00   sec  15.4 MBytes   129 Mbits/sec
[  5]   4.00-5.00   sec  14.4 MBytes   121 Mbits/sec
[  5]   5.00-6.00   sec  16.3 MBytes   137 Mbits/sec
[  5]   6.00-7.00   sec  15.8 MBytes   133 Mbits/sec
[  5]   7.00-8.00   sec  14.3 MBytes   120 Mbits/sec
[  5]   8.00-9.00   sec  16.0 MBytes   134 Mbits/sec
[  5]   9.00-10.00  sec  15.2 MBytes   127 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   152 MBytes   127 Mbits/sec  972             sender
[  5]   0.00-10.00  sec   151 MBytes   127 Mbits/sec                  receiver

iperf Done.
# ping to 1500 net as 9000 fails
root@h2b:~# ping -f -M do -s 8972 h2a.ceph.trilhome.lan
PING h2a.ceph.trilhome.lan (192.168.11.4) 8972(9000) bytes of data.
EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE^C
--- h2a.ceph.trilhome.lan ping statistics ---
74 packets transmitted, 0 received, +74 errors, 100% packet loss, time 189ms

# ping to 9000 net as 9000 works
root@h2b:~# ping -f -M do -s 8972 h2a.s.ceph.trilhome.lan
PING h2a.s.ceph.trilhome.lan (172.16.12.4) 8972(9000) bytes of data.
.^C
--- h2a.s.ceph.trilhome.lan ping statistics ---
1242 packets transmitted, 1241 received, 0.0805153% packet loss, time 710ms
rtt min/avg/max/mdev = 1.425/2.795/3.357/0.263 ms, ipg/ewma 2.987/2.876 ms

# ping to 9000 net as 1500 works
root@h2b:~# ping -f -M do -s 1472 h2a.s.ceph.trilhome.lan
PING h2a.s.ceph.trilhome.lan (172.16.12.4) 1472(1500) bytes of data.
.^C
--- h2a.s.ceph.trilhome.lan ping statistics ---
1578 packets transmitted, 1577 received, 0.0633714% packet loss, time 379ms
rtt min/avg/max/mdev = 1.191/2.601/3.198/0.309 ms, ipg/ewma 2.774/2.646 ms

# ping to 1500 net as 1500 works
root@h2b:~# ping -f -M do -s 1472 h2a.ceph.trilhome.lan
PING h2a.ceph.trilhome.lan (192.168.11.4) 1472(1500) bytes of data.
.^C
--- h2a.ceph.trilhome.lan ping statistics ---
2072 packets transmitted, 2071 received, 0.0482625% packet loss, time 564ms
rtt min/avg/max/mdev = 0.763/2.495/3.359/0.390 ms, ipg/ewma 2.684/2.523 ms

# here is relevent network config
# ip l on r1
root@h2b:~# ip l
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
3: enp3s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc pfifo_fast master ovs-system state UP mode DEFAULT group default qlen 1000
    link/ether ab:cd:ef:12:34:56 brd ff:ff:ff:ff:ff:ff
4: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether ab:cd:ef:12:34:56 brd ff:ff:ff:ff:ff:ff
6: vmbr1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether ab:cd:ef:12:34:56 brd ff:ff:ff:ff:ff:ff
10: vlan23: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether ab:cd:ef:12:34:56 brd ff:ff:ff:ff:ff:ff
11: vlan24: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether ab:cd:ef:12:34:56 brd ff:ff:ff:ff:ff:ff
# ip l on h2a
root@h2a:~# ip l
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
3: enp3s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc pfifo_fast master ovs-system state UP mode DEFAULT group default qlen 1000
    link/ether ab:cd:ef:12:34:56 brd ff:ff:ff:ff:ff:ff
4: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether ab:cd:ef:12:34:56 brd ff:ff:ff:ff:ff:ff
6: vmbr1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether ab:cd:ef:12:34:56 brd ff:ff:ff:ff:ff:ff
10: vlan23: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether ab:cd:ef:12:34:56 brd ff:ff:ff:ff:ff:ff
11: vlan24: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether ab:cd:ef:12:34:56 brd ff:ff:ff:ff:ff:ff
### r1 to/from h2a

Code: Select all

# iperf3 (other host pushing to h2) on 9000 net is very slow
root@r1:~# iperf3 -c h2a.s.ceph.trilhome.lan
Connecting to host h2a.s.ceph.trilhome.lan, port 5201
[  5] local 172.16.12.2 port 45260 connected to 172.16.12.4 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  16.1 MBytes   135 Mbits/sec  181   26.2 KBytes
[  5]   1.00-2.00   sec  17.1 MBytes   143 Mbits/sec  162   17.5 KBytes
[  5]   2.00-3.00   sec  16.3 MBytes   137 Mbits/sec  164   17.5 KBytes
[  5]   3.00-4.00   sec  15.8 MBytes   132 Mbits/sec  178   35.0 KBytes
[  5]   4.00-5.00   sec  15.9 MBytes   134 Mbits/sec  155   17.5 KBytes
[  5]   5.00-6.00   sec  15.8 MBytes   132 Mbits/sec  164   17.5 KBytes
[  5]   6.00-7.00   sec  15.8 MBytes   132 Mbits/sec  161   17.5 KBytes
[  5]   7.00-8.00   sec  16.1 MBytes   135 Mbits/sec  166   26.2 KBytes
[  5]   8.00-9.00   sec  16.3 MBytes   137 Mbits/sec  174   17.5 KBytes
[  5]   9.00-10.00  sec  15.9 MBytes   134 Mbits/sec  157   17.5 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   161 MBytes   135 Mbits/sec  1662             sender
[  5]   0.00-10.00  sec   161 MBytes   135 Mbits/sec                  receiver

iperf Done.
# iperf3 in the return path (h2 is pushing to other host) on 9000 is perfect
root@r1:~# iperf3 -c h2a.s.ceph.trilhome.lan -R
Connecting to host h2a.s.ceph.trilhome.lan, port 5201
Reverse mode, remote host h2a.s.ceph.trilhome.lan is sending
[  5] local 172.16.12.2 port 45336 connected to 172.16.12.4 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec   294 MBytes  2.47 Gbits/sec
[  5]   1.00-2.00   sec   295 MBytes  2.47 Gbits/sec
[  5]   2.00-3.00   sec   295 MBytes  2.47 Gbits/sec
[  5]   3.00-4.00   sec   295 MBytes  2.47 Gbits/sec
[  5]   4.00-5.00   sec   294 MBytes  2.47 Gbits/sec
[  5]   5.00-6.00   sec   295 MBytes  2.47 Gbits/sec
[  5]   6.00-7.00   sec   295 MBytes  2.47 Gbits/sec
[  5]   7.00-8.00   sec   295 MBytes  2.47 Gbits/sec
[  5]   8.00-9.00   sec   295 MBytes  2.47 Gbits/sec
[  5]   9.00-10.00  sec   295 MBytes  2.47 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  2.88 GBytes  2.47 Gbits/sec    0             sender
[  5]   0.00-10.00  sec  2.88 GBytes  2.47 Gbits/sec                  receiver

iperf Done.
# ping to 1500 net as 9000 fails
root@r1:~# ping -f -M do -s 8972 h2a.ceph.trilhome.lan
PING h2a.ceph.trilhome.lan (192.168.11.4) 8972(9000) bytes of data.
EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE^C
--- h2a.ceph.trilhome.lan ping statistics ---
51 packets transmitted, 0 received, +51 errors, 100% packet loss, time 802ms

# ping to 9000 net as 9000 works
root@r1:~# ping -f -M do -s 8972 h2a.s.ceph.trilhome.lan
PING h2a.s.ceph.trilhome.lan (172.16.12.4) 8972(9000) bytes of data.
.^C
--- h2a.s.ceph.trilhome.lan ping statistics ---
4834 packets transmitted, 4833 received, 0.0206868% packet loss, time 723ms
rtt min/avg/max/mdev = 0.208/1.115/3.012/0.337 ms, ipg/ewma 1.183/1.301 ms

# ping to 9000 net as 1500 works
root@r1:~# ping -f -M do -s 1472 h2a.s.ceph.trilhome.lan
PING h2a.s.ceph.trilhome.lan (172.16.12.4) 1472(1500) bytes of data.
.^C
--- h2a.s.ceph.trilhome.lan ping statistics ---
7169 packets transmitted, 7168 received, 0.0139489% packet loss, time 925ms
rtt min/avg/max/mdev = 0.112/0.910/1.985/0.319 ms, ipg/ewma 0.965/0.925 ms

# ping to 1500 net as 1500 works
root@r1:~# ping -f -M do -s 1472 h2a.ceph.trilhome.lan
PING h2a.ceph.trilhome.lan (192.168.11.4) 1472(1500) bytes of data.
.^
--- h2a.ceph.trilhome.lan ping statistics ---
8743 packets transmitted, 8743 received, 0% packet loss, time 352ms
rtt min/avg/max/mdev = 0.099/0.899/1.999/0.346 ms, ipg/ewma 0.954/0.864 ms

# here is relevent network config
# ip l on r1
root@r1:~# ip l
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
3: eno2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq master ovs-system state UP mode DEFAULT group default qlen 1000
    link/ether ab:cd:ef:12:34:56 brd ff:ff:ff:ff:ff:ff
4: eno3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq master ovs-system state UP mode DEFAULT group default qlen 1000
    link/ether ab:cd:ef:12:34:56 brd ff:ff:ff:ff:ff:ff
5: eno4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq master ovs-system state UP mode DEFAULT group default qlen 1000
    link/ether ab:cd:ef:12:34:56 brd ff:ff:ff:ff:ff:ff
8: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether ab:cd:ef:12:34:56 brd ff:ff:ff:ff:ff:ff
15: vmbr1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether ab:cd:ef:12:34:56 brd ff:ff:ff:ff:ff:ff
16: vlan23: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether ab:cd:ef:12:34:56 brd ff:ff:ff:ff:ff:ff
17: vlan24: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether ab:cd:ef:12:34:56 brd ff:ff:ff:ff:ff:ff
20: bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether ab:cd:ef:12:34:56 brd ff:ff:ff:ff:ff:ff
# ip l on h2a
root@h2a:~# ip l
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
3: enp3s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc pfifo_fast master ovs-system state UP mode DEFAULT group default qlen 1000
    link/ether ab:cd:ef:12:34:56 brd ff:ff:ff:ff:ff:ff
4: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether ab:cd:ef:12:34:56 brd ff:ff:ff:ff:ff:ff
6: vmbr1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether ab:cd:ef:12:34:56 brd ff:ff:ff:ff:ff:ff
10: vlan23: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether ab:cd:ef:12:34:56 brd ff:ff:ff:ff:ff:ff
11: vlan24: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether ab:cd:ef:12:34:56 brd ff:ff:ff:ff:ff:ff
## with r8169 (after reboot)

Code: Select all

# the issue from nmon shows relatively normal now in and out traffic is 9k
# iperf3 (other host pushing to h2) on 9000 net is problematic still with retries
root@r2:~# iperf3 -c h2d.s.ceph.trilhome.lan
Connecting to host h2d.s.ceph.trilhome.lan, port 5201
[  5] local 172.16.12.3 port 60800 connected to 172.16.12.7 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   119 MBytes   997 Mbits/sec  566    192 KBytes
[  5]   1.00-2.00   sec   118 MBytes   988 Mbits/sec  564    253 KBytes
[  5]   2.00-3.00   sec   118 MBytes   989 Mbits/sec  604    184 KBytes
[  5]   3.00-4.00   sec   118 MBytes   989 Mbits/sec  559    210 KBytes
[  5]   4.00-5.00   sec   118 MBytes   987 Mbits/sec  557    184 KBytes
[  5]   5.00-6.00   sec   118 MBytes   991 Mbits/sec  571    201 KBytes
[  5]   6.00-7.00   sec   118 MBytes   990 Mbits/sec  625    201 KBytes
[  5]   7.00-8.00   sec   118 MBytes   987 Mbits/sec  561    271 KBytes
[  5]   8.00-9.00   sec   118 MBytes   990 Mbits/sec  583    166 KBytes
[  5]   9.00-10.00  sec   118 MBytes   994 Mbits/sec  561    262 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  1.15 GBytes   990 Mbits/sec  5751             sender
[  5]   0.00-10.21  sec  1.15 GBytes   969 Mbits/sec                  receiver

iperf Done.
# iperf3 in the return path (h2 is pushing to other host) on 9000 has slowed down
# the only thing I notice is that when the packet arrives to the vlan23 on the other host
# then it seems the packet is larger than 9k, around the 12-13k range based on nmon
# this issue is present on another identical host to the "other host"
root@r2:~# iperf3 -c h2d.s.ceph.trilhome.lan -R
Connecting to host h2d.s.ceph.trilhome.lan, port 5201
Reverse mode, remote host h2d.s.ceph.trilhome.lan is sending
[  5] local 172.16.12.3 port 60332 connected to 172.16.12.7 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  77.2 MBytes   647 Mbits/sec
[  5]   1.00-2.00   sec  77.2 MBytes   647 Mbits/sec
[  5]   2.00-3.00   sec  76.4 MBytes   641 Mbits/sec
[  5]   3.00-4.00   sec  76.2 MBytes   639 Mbits/sec
[  5]   4.00-5.00   sec  72.6 MBytes   609 Mbits/sec
[  5]   5.00-6.00   sec  76.4 MBytes   641 Mbits/sec
[  5]   6.00-7.00   sec  77.8 MBytes   653 Mbits/sec
[  5]   7.00-8.00   sec  77.4 MBytes   649 Mbits/sec
[  5]   8.00-9.00   sec  76.8 MBytes   644 Mbits/sec
[  5]   9.00-10.00  sec  77.9 MBytes   654 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   767 MBytes   643 Mbits/sec    1             sender
[  5]   0.00-10.00  sec   766 MBytes   643 Mbits/sec                  receiver

iperf Done.
# ping to 1500 net as 9000 fails
root@r2:~# ping -f -M do -s 8972 h2d.ceph.trilhome.lan
PING h2d.ceph.trilhome.lan (192.168.11.7) 8972(9000) bytes of data.
EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE^C
--- h2d.ceph.trilhome.lan ping statistics ---
119 packets transmitted, 0 received, +119 errors, 100% packet loss, time 924ms

# ping to 9000 net as 9000 works
root@r2:~# ping -f -M do -s 8972 h2d.s.ceph.trilhome.lan
PING h2d.s.ceph.trilhome.lan (172.16.12.7) 8972(9000) bytes of data.
^C
--- h2d.s.ceph.trilhome.lan ping statistics ---
4798 packets transmitted, 4798 received, 0% packet loss, time 258ms
rtt min/avg/max/mdev = 0.367/1.241/2.582/0.343 ms, ipg/ewma 1.303/1.235 ms

# ping to 9000 net as 1500 works
root@r2:~# ping -f -M do -s 1472 h2d.s.ceph.trilhome.lan
PING h2d.s.ceph.trilhome.lan (172.16.12.7) 1472(1500) bytes of data.
.^C
--- h2d.s.ceph.trilhome.lan ping statistics ---
7063 packets transmitted, 7062 received, 0.0141583% packet loss, time 286ms
rtt min/avg/max/mdev = 0.116/0.847/1.950/0.259 ms, ipg/ewma 0.889/0.896 ms

# ping to 1500 net as 1500 works
root@r2:~# ping -f -M do -s 1472 h2d.ceph.trilhome.lan
PING h2d.ceph.trilhome.lan (192.168.11.7) 1472(1500) bytes of data.
.^C
--- h2d.ceph.trilhome.lan ping statistics ---
7032 packets transmitted, 7031 received, 0.0142207% packet loss, time 994ms
rtt min/avg/max/mdev = 0.095/0.811/3.098/0.257 ms, ipg/ewma 0.851/0.826 ms

# here is relevent network config
# ip l
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
3: enp3s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc pfifo_fast master ovs-system state UP mode DEFAULT group default qlen 1000
    link/ether ab:cd:ef:12:34:56 brd ff:ff:ff:ff:ff:ff
4: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether ab:cd:ef:12:34:56 brd ff:ff:ff:ff:ff:ff
6: vmbr1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether ab:cd:ef:12:34:56 brd ff:ff:ff:ff:ff:ff
10: vlan23: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether ab:cd:ef:12:34:56 brd ff:ff:ff:ff:ff:ff
11: vlan24: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether ab:cd:ef:12:34:56 brd ff:ff:ff:ff:ff:ff
# one thing I also notice is that this issue might be isolated to this network
# here are some iperfs on the other networks

# this is a vlan on enp2s0 (1500)
root@r2:~# iperf3 -c h2d.ceph.trilhome.lan -R
Connecting to host h2d.ceph.trilhome.lan, port 5201
Reverse mode, remote host h2d.ceph.trilhome.lan is sending
[  5] local 192.168.11.3 port 52156 connected to 192.168.11.7 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec   103 MBytes   867 Mbits/sec
[  5]   1.00-2.00   sec   103 MBytes   868 Mbits/sec
[  5]   2.00-3.00   sec   105 MBytes   884 Mbits/sec
[  5]   3.00-4.00   sec   112 MBytes   938 Mbits/sec
[  5]   4.00-5.00   sec   110 MBytes   919 Mbits/sec
[  5]   5.00-6.00   sec   105 MBytes   879 Mbits/sec
[  5]   6.00-7.00   sec   103 MBytes   864 Mbits/sec
[  5]   7.00-8.00   sec   104 MBytes   871 Mbits/sec
[  5]   8.00-9.00   sec   103 MBytes   864 Mbits/sec
[  5]   9.00-10.00  sec   109 MBytes   918 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  1.03 GBytes   889 Mbits/sec    5             sender
[  5]   0.00-10.00  sec  1.03 GBytes   887 Mbits/sec                  receiver

iperf Done.

# this is a vlan on enp2s0 (1500)
root@r2:~# iperf3 -c h2d.trilhome.lan -R
Connecting to host h2d.trilhome.lan, port 5201
Reverse mode, remote host h2d.trilhome.lan is sending
[  5] local 192.168.10.227 port 57664 connected to 192.168.10.231 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec   104 MBytes   876 Mbits/sec
[  5]   1.00-2.00   sec   109 MBytes   911 Mbits/sec
[  5]   2.00-3.00   sec   105 MBytes   884 Mbits/sec
[  5]   3.00-4.00   sec   104 MBytes   870 Mbits/sec
[  5]   4.00-5.00   sec   111 MBytes   934 Mbits/sec
[  5]   5.00-6.00   sec   110 MBytes   920 Mbits/sec
[  5]   6.00-7.00   sec   110 MBytes   921 Mbits/sec
[  5]   7.00-8.00   sec   108 MBytes   910 Mbits/sec
[  5]   8.00-9.00   sec   111 MBytes   933 Mbits/sec
[  5]   9.00-10.00  sec   112 MBytes   937 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  1.06 GBytes   912 Mbits/sec    0             sender
[  5]   0.00-10.00  sec  1.06 GBytes   909 Mbits/sec                  receiver

iperf Done.

# this is a vlan on enp3s0 (1500)
root@r2:~# iperf3 -c h2d.c2.trilhome.lan -R
Connecting to host h2d.c2.trilhome.lan, port 5201
Reverse mode, remote host h2d.c2.trilhome.lan is sending
[  5] local 172.16.12.131 port 35092 connected to 172.16.12.135 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  94.6 MBytes   793 Mbits/sec
[  5]   1.00-2.00   sec  97.5 MBytes   818 Mbits/sec
[  5]   2.00-3.00   sec   101 MBytes   844 Mbits/sec
[  5]   3.00-4.00   sec   103 MBytes   865 Mbits/sec
[  5]   4.00-5.00   sec  96.5 MBytes   809 Mbits/sec
[  5]   5.00-6.00   sec  97.7 MBytes   819 Mbits/sec
[  5]   6.00-7.00   sec  95.0 MBytes   797 Mbits/sec
[  5]   7.00-8.00   sec  98.3 MBytes   825 Mbits/sec
[  5]   8.00-9.00   sec  96.8 MBytes   812 Mbits/sec
[  5]   9.00-10.00  sec  98.5 MBytes   826 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   981 MBytes   822 Mbits/sec    0             sender
[  5]   0.00-10.00  sec   978 MBytes   821 Mbits/sec                  receiver

iperf Done.
## with r8169 (before reboot)

Code: Select all

# the issue from nmon shows normal now in and out traffic is 9k
# iperf3 (other host pushing to h2) on 9000 net is problematic still with retries
root@r2:~# iperf3 -c h2d.s.ceph.trilhome.lan
Connecting to host h2d.s.ceph.trilhome.lan, port 5201
[  5] local 172.16.12.3 port 55648 connected to 172.16.12.7 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   119 MBytes   999 Mbits/sec  531    201 KBytes
[  5]   1.00-2.00   sec   117 MBytes   985 Mbits/sec  586    192 KBytes
[  5]   2.00-3.00   sec   118 MBytes   991 Mbits/sec  590    288 KBytes
[  5]   3.00-4.00   sec   118 MBytes   992 Mbits/sec  569    184 KBytes
[  5]   4.00-5.00   sec   118 MBytes   988 Mbits/sec  572    131 KBytes
[  5]   5.00-6.00   sec   118 MBytes   989 Mbits/sec  598    175 KBytes
[  5]   6.00-7.00   sec   118 MBytes   990 Mbits/sec  594    192 KBytes
[  5]   7.00-8.00   sec   118 MBytes   986 Mbits/sec  565    201 KBytes
[  5]   8.00-9.00   sec   118 MBytes   993 Mbits/sec  606    192 KBytes
[  5]   9.00-10.00  sec   118 MBytes   986 Mbits/sec  560    210 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  1.15 GBytes   990 Mbits/sec  5771             sender
[  5]   0.00-10.00  sec  1.15 GBytes   989 Mbits/sec                  receiver

iperf Done.
# iperf3 in the return path (h2 is pushing to other host) on 9000 net works
root@r2:~# iperf3 -c h2d.s.ceph.trilhome.lan -R
Connecting to host h2d.s.ceph.trilhome.lan, port 5201
Reverse mode, remote host h2d.s.ceph.trilhome.lan is sending
[  5] local 172.16.12.3 port 56144 connected to 172.16.12.7 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec   101 MBytes   844 Mbits/sec
[  5]   1.00-2.00   sec   101 MBytes   845 Mbits/sec
[  5]   2.00-3.00   sec   101 MBytes   846 Mbits/sec
[  5]   3.00-4.00   sec  99.3 MBytes   833 Mbits/sec
[  5]   4.00-5.00   sec   101 MBytes   844 Mbits/sec
[  5]   5.00-6.00   sec  99.6 MBytes   835 Mbits/sec
[  5]   6.00-7.00   sec  99.9 MBytes   838 Mbits/sec
[  5]   7.00-8.00   sec   100 MBytes   840 Mbits/sec
[  5]   8.00-9.00   sec  99.9 MBytes   838 Mbits/sec
[  5]   9.00-10.00  sec  99.8 MBytes   837 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  1003 MBytes   841 Mbits/sec    0             sender
[  5]   0.00-10.00  sec  1001 MBytes   840 Mbits/sec                  receiver

iperf Done.
# ping to 1500 net as 9000 fails
root@r2:~# ping -f -M do -s 8972 h2d.ceph.trilhome.lan
PING h2d.ceph.trilhome.lan (192.168.11.7) 8972(9000) bytes of data.
EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE^C
--- h2d.ceph.trilhome.lan ping statistics ---
90 packets transmitted, 0 received, +90 errors, 100% packet loss, time 468ms

# ping to 9000 net as 9000 works
root@r2:~# ping -f -M do -s 8972 h2d.s.ceph.trilhome.lan
PING h2d.s.ceph.trilhome.lan (172.16.12.7) 8972(9000) bytes of data.
.^C
--- h2d.s.ceph.trilhome.lan ping statistics ---
4154 packets transmitted, 4153 received, 0.0240732% packet loss, time 373ms
rtt min/avg/max/mdev = 0.357/1.231/1.978/0.338 ms, ipg/ewma 1.292/1.367 ms

# ping to 9000 net as 1500 works
root@r2:~# ping -f -M do -s 1472 h2d.s.ceph.trilhome.lan
PING h2d.s.ceph.trilhome.lan (172.16.12.7) 1472(1500) bytes of data.
.^C
--- h2d.s.ceph.trilhome.lan ping statistics ---
6950 packets transmitted, 6949 received, 0.0143885% packet loss, time 759ms
rtt min/avg/max/mdev = 0.100/0.786/4.617/0.293 ms, ipg/ewma 0.828/0.914 ms

# ping to 1500 net as 1500 works
root@r2:~# ping -f -M do -s 1472 h2d.ceph.trilhome.lan
PING h2d.ceph.trilhome.lan (192.168.11.7) 1472(1500) bytes of data.
.^C
--- h2d.ceph.trilhome.lan ping statistics ---
9190 packets transmitted, 9189 received, 0.0108814% packet loss, time 137ms
rtt min/avg/max/mdev = 0.109/0.841/2.751/0.245 ms, ipg/ewma 0.884/0.856 ms

# here is relevent network config
# ip l
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
4: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether ab:cd:ef:12:34:56 brd ff:ff:ff:ff:ff:ff
6: vmbr1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether ab:cd:ef:12:34:56 brd ff:ff:ff:ff:ff:ff
10: vlan23: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether ab:cd:ef:12:34:56 brd ff:ff:ff:ff:ff:ff
11: vlan24: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether ab:cd:ef:12:34:56 brd ff:ff:ff:ff:ff:ff
13: enp3s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc pfifo_fast master ovs-system state UP mode DEFAULT group default qlen 1000
    link/ether ab:cd:ef:12:34:56 brd ff:ff:ff:ff:ff:ff
## with r8125

When fucking around with the h2's I have noticed the performance on the r8125 drivers sucks.
**outbound (h2 to something) works because it can handle 4k**
**inbound (something to h2) doesn't work because it can't handle 9k**

Code: Select all

# the issue from nmon shows that when the traffic arrives on enp3s0
# from the remote host the packet size is 9k as expected
# however, when the traffic is being sent to the remote host it
# is sized as 4k and not 9k as expected
# iperf3 (other host pushing to h2) on 9000 net is problematic
root@r2:~# iperf3 -c h2d.s.ceph.trilhome.lan
Connecting to host h2d.s.ceph.trilhome.lan, port 5201
[  5] local 172.16.12.3 port 46780 connected to 172.16.12.7 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  20.1 MBytes   169 Mbits/sec  142   43.7 KBytes
[  5]   1.00-2.00   sec  22.9 MBytes   192 Mbits/sec  153   35.0 KBytes
[  5]   2.00-3.00   sec  19.4 MBytes   162 Mbits/sec  139   35.0 KBytes
[  5]   3.00-4.00   sec  18.6 MBytes   156 Mbits/sec  132   52.4 KBytes
[  5]   4.00-5.00   sec  18.5 MBytes   155 Mbits/sec  155   35.0 KBytes
[  5]   5.00-6.00   sec  19.4 MBytes   163 Mbits/sec  168   35.0 KBytes
[  5]   6.00-7.00   sec  52.6 MBytes   441 Mbits/sec  227    201 KBytes
[  5]   7.00-8.00   sec  24.4 MBytes   204 Mbits/sec  181   43.7 KBytes
[  5]   8.00-9.00   sec  19.5 MBytes   163 Mbits/sec  157   69.9 KBytes
[  5]   9.00-10.00  sec  18.9 MBytes   158 Mbits/sec  145   35.0 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   234 MBytes   197 Mbits/sec  1599             sender
[  5]   0.00-10.00  sec   233 MBytes   195 Mbits/sec                  receiver

iperf Done.
# iperf3 in the return path (h2 is pushing to other host) on 9000 net works
root@r2:~# iperf3 -c h2d.s.ceph.trilhome.lan -R
Connecting to host h2d.s.ceph.trilhome.lan, port 5201
Reverse mode, remote host h2d.s.ceph.trilhome.lan is sending
[  5] local 172.16.12.3 port 46314 connected to 172.16.12.7 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec   101 MBytes   844 Mbits/sec
[  5]   1.00-2.00   sec   100 MBytes   839 Mbits/sec
[  5]   2.00-3.00   sec  99.4 MBytes   834 Mbits/sec
[  5]   3.00-4.00   sec   100 MBytes   839 Mbits/sec
[  5]   4.00-5.00   sec   100 MBytes   840 Mbits/sec
[  5]   5.00-6.00   sec   100 MBytes   842 Mbits/sec
[  5]   6.00-7.00   sec   100 MBytes   842 Mbits/sec
[  5]   7.00-8.00   sec   100 MBytes   841 Mbits/sec
[  5]   8.00-9.00   sec  98.4 MBytes   826 Mbits/sec
[  5]   9.00-10.00  sec  99.6 MBytes   836 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  1001 MBytes   839 Mbits/sec    7             sender
[  5]   0.00-10.00  sec   999 MBytes   838 Mbits/sec                  receiver

iperf Done.
# ping to 1500 net as 9000 fails
root@r2:~# ping -f -M do -s 8972 h2d.ceph.trilhome.lan
PING h2d.ceph.trilhome.lan (192.168.11.7) 8972(9000) bytes of data.
EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE^C
--- h2d.ceph.trilhome.lan ping statistics ---
214 packets transmitted, 0 received, +214 errors, 100% packet loss, time 476ms

# ping to 9000 net as 9000 works
root@r2:~# ping -f -M do -s 8972 h2d.s.ceph.trilhome.lan
PING h2d.s.ceph.trilhome.lan (172.16.12.7) 8972(9000) bytes of data.
.^C
--- h2d.s.ceph.trilhome.lan ping statistics ---
3656 packets transmitted, 3655 received, 0.0273523% packet loss, time 515ms
rtt min/avg/max/mdev = 0.475/1.708/4.443/0.232 ms, ipg/ewma 1.780/1.776 ms

# ping to 9000 net as 1500 works
root@r2:~# ping -f -M do -s 1472 h2d.s.ceph.trilhome.lan
PING h2d.s.ceph.trilhome.lan (172.16.12.7) 1472(1500) bytes of data.
.^C
--- h2d.s.ceph.trilhome.lan ping statistics ---
3541 packets transmitted, 3540 received, 0.0282406% packet loss, time 650ms
rtt min/avg/max/mdev = 0.482/1.536/4.723/0.391 ms, ipg/ewma 1.594/1.101 ms

# ping to 1500 net as 1500 works
root@r2:~# ping -f -M do -s 1472 h2d.ceph.trilhome.lan
PING h2d.ceph.trilhome.lan (192.168.11.7) 1472(1500) bytes of data.
.^C
--- h2d.ceph.trilhome.lan ping statistics ---
4190 packets transmitted, 4189 received, 0.0238663% packet loss, time 709ms
rtt min/avg/max/mdev = 0.162/1.537/8.674/0.406 ms, ipg/ewma 1.600/1.750 ms

# here is relevent network config
# ip l
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
3: enp3s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc pfifo_fast master ovs-system state UP mode DEFAULT group default qlen 1000
    link/ether ab:cd:ef:12:34:56 brd ff:ff:ff:ff:ff:ff
4: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether ab:cd:ef:12:34:56 brd ff:ff:ff:ff:ff:ff
5: vmbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether ab:cd:ef:12:34:56 brd ff:ff:ff:ff:ff:ff
6: vmbr1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether ab:cd:ef:12:34:56 brd ff:ff:ff:ff:ff:ff
9: vlan22: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether ab:cd:ef:12:34:56 brd ff:ff:ff:ff:ff:ff
10: vlan23: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether ab:cd:ef:12:34:56 brd ff:ff:ff:ff:ff:ff
11: vlan24: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether ab:cd:ef:12:34:56 brd ff:ff:ff:ff:ff:ff

djsashaz
Posts: 2
Joined: Sat Nov 07, 2020 1:47 am
languages_spoken: english
Has thanked: 0
Been thanked: 0
Contact:

Re: WORKAROUND : Horrible network performance of R8125B when connected to a 1GbE switch

Post by djsashaz »

Im using Ubuntu 18 and I have similar issues with network performance. Where I would get occasional drops in my network stream that would cause a disruption in video delivery over IP.

lhb035
Posts: 5
Joined: Thu Aug 27, 2020 5:09 pm
languages_spoken: Chinese
ODROIDs: H2+
Has thanked: 0
Been thanked: 4 times
Contact:

Re: WORKAROUND : Horrible network performance of R8125B when connected to a 1GbE switch

Post by lhb035 »

Hello,

some one find the issus point : BIOS-- chipset----south cluster configuration----pci express configuration----pci express root port(1&2)---ASPM(change to disable)

After Disable ASPM the 1GbE Link iperf3 80MB/s


From(Chinese)
https://www.right.com.cn/FORUM/thread-4053662-1-1.html
These users thanked the author lhb035 for the post (total 3):
gofaster (Wed Dec 09, 2020 8:42 am) • CarminaBurana (Wed Dec 09, 2020 10:37 pm) • domih (Wed Feb 03, 2021 8:56 am)

henrikno
Posts: 3
Joined: Mon Dec 07, 2020 10:43 am
languages_spoken: english
Has thanked: 0
Been thanked: 5 times
Contact:

Re: WORKAROUND : Horrible network performance of R8125B when connected to a 1GbE switch

Post by henrikno »

I also had similar performance issues in one direction but not the other (testing with iperf). I used wireshark and saw a lot of retransmits. I also noticed using ethtool -S enp3s0 that rx_missed was high and climbing during iperf.

Code: Select all

     rx_missed: 24938
Googling that led me to some threads about other realtek chips (e.g. r8169) that require disabling ASPM.
https://bugzilla.redhat.com/show_bug.cgi?id=1679140
https://bugs.launchpad.net/ubuntu/+sour ... ug/1880076
https://www.spinics.net/lists/netdev/msg548397.html
Running

Code: Select all

echo "performance" > /sys/module/pcie_aspm/parameters/policy
Improved things a lot for me (added pcie_aspm=performance to kernel options to make it permanent)

For reference I'm running 5.4.60-1-pve

Code: Select all

# ethtool -i enp3s0
driver: r8125
version: 9.003.05-NAPI
These users thanked the author henrikno for the post (total 3):
odroid (Mon Dec 07, 2020 11:36 am) • CarminaBurana (Wed Dec 09, 2020 10:37 pm) • domih (Wed Feb 03, 2021 8:55 am)

doughnut
Posts: 18
Joined: Mon Aug 31, 2015 5:10 am
languages_spoken: english
ODROIDs: Odroid C1+ C4 XU4 H2 (dead) H2+
Location: So. Fla. USA
Has thanked: 1 time
Been thanked: 4 times
Contact:

Re: WORKAROUND : Horrible network performance of R8125B when connected to a 1GbE switch

Post by doughnut »

Great post! Great subject. Thanks for posting this. I was just looking for posts regarding poor 2.5Gbe performance and I came across this one. It prompted me to go out to Realtec to ensure I had the latest (12302020) Driver loaded in Windows. After updating driver, simple copy file went from sporadic, all over the place, 30-80Mbps to a much more respectable 130-160 Mbps on my 1/2.5/5/10Gbe network (Netgear MS510TX). I even hit peak over 2Gbs on performance monitor.

domih
Posts: 396
Joined: Mon Feb 11, 2019 4:48 pm
languages_spoken: English, French
ODROIDs: UX4, HC2, N2, N2+, H2, H2+, C4, HC4 - 1GbE, 2.5GbE, 10GbE, 40+GbE
Location: San Francisco Bay Area
Has thanked: 152 times
Been thanked: 152 times
Contact:

Re: WORKAROUND : Horrible network performance of R8125B when connected to a 1GbE switch

Post by domih »

I confirm that disabling ASPM fixes the speed on 1GbE subnet. See more details there: viewtopic.php?p=320336#p320336

caramb
Posts: 13
Joined: Sun Oct 18, 2020 8:32 pm
languages_spoken: english french
ODROIDs: Odroid-H2+
Has thanked: 0
Been thanked: 3 times
Contact:

Re: WORKAROUND : Horrible network performance of R8125B when connected to a 1GbE switch

Post by caramb »

Hi there,

Greetings and many thanks to all the contributors on this thread who provided very valuable feedback & information ; I apology for not replying to all.

Let me share what is the current status on my side :
1) The root issue still exists as kernel 5.4 branch still completely lacks of RTL8125B support.
2) I'm running on my patched driver for months now and did not experience any stability issue.
3) As this fixed my initial issue (bandwidth between 2 H2+ through a Gigabit switch), I figured I was still experiencing performance problem as soon as I was turning on jumbo frames (set mtu to 9000).
4) As shared by some people here (lhb035 & henrikno & domih), disabling energy management for PCIe was the key to fix this remaining issue.
5) I did not test the latest DKMS driver from Realtek but, according to posts in this thread, (Realtek DKMS driver + ASPM off) is an alternate solution.

To permanently disable ASPM, edit your /etc/default/grub file

Code: Select all

GRUB_CMDLINE_LINUX_DEFAULT="pcie_aspm.policy=performance"
Alternatively, you can disable power management for pcie ports in the H2+ bios settings as explained by 'Odroid' in the post bellow.
Last edited by caramb on Sun Mar 14, 2021 3:38 am, edited 8 times in total.

User avatar
odroid
Site Admin
Posts: 37286
Joined: Fri Feb 22, 2013 11:14 pm
languages_spoken: English, Korean
ODROIDs: ODROID
Has thanked: 1749 times
Been thanked: 1127 times
Contact:

Re: WORKAROUND : Horrible network performance of R8125B when connected to a 1GbE switch

Post by odroid »

Or, you can disable ASPM feature in BIOS settings.
BIOS / Chipset / South cluster configuration / PCI Express configuration / PCI Express root port(1&2) / ASPM / Disable
Note: do it for root port 1 and root port 2 which are the two root ports used for Ethernet controller.

See also.
viewtopic.php?p=322345#p322345
These users thanked the author odroid for the post:
sshd (Thu Mar 18, 2021 5:27 am)

_linux_
Posts: 4
Joined: Wed Mar 24, 2021 9:19 pm
languages_spoken: English, German
ODROIDs: H2+
Has thanked: 3 times
Been thanked: 0
Contact:

Re: WORKAROUND : Horrible network performance of R8125B when connected to a 1GbE switch

Post by _linux_ »

I didn't have those performance problems... I reached around 110MB/s by using the PPA of Hardkernel.

In the meantime I upgraded my Network to 2.5Gbe and reached 280MB/s with one interface.

caramb
Posts: 13
Joined: Sun Oct 18, 2020 8:32 pm
languages_spoken: english french
ODROIDs: Odroid-H2+
Has thanked: 0
Been thanked: 3 times
Contact:

Re: WORKAROUND : Horrible network performance of R8125B when connected to a 1GbE switch

Post by caramb »

Hi _linux_,

To experience the problem you need two RTL8125B talking to each other through a 1GbE switch and/or use jumbo frames (mtu 9000).
Regarding pure network performance, assuming you have proper drivers and settings, you can nearly achieve 2.5Gb/s when two devices are directly connected (back-to-back) (which is quite impressive).
Example bellow (2 H2+ with jumbo on) :

Code: Select all

iperf3 -c 192.168.100.202
Connecting to host 192.168.100.202, port 5201
[  5] local 192.168.100.203 port 59022 connected to 192.168.100.202 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   297 MBytes  2.49 Gbits/sec    0    839 KBytes
[  5]   1.00-2.00   sec   295 MBytes  2.47 Gbits/sec    0    926 KBytes
[  5]   2.00-3.00   sec   295 MBytes  2.47 Gbits/sec    0    970 KBytes
[  5]   3.00-4.00   sec   295 MBytes  2.47 Gbits/sec    0    970 KBytes
[  5]   4.00-5.00   sec   295 MBytes  2.47 Gbits/sec    0    970 KBytes
[  5]   5.00-6.00   sec   295 MBytes  2.47 Gbits/sec    0    970 KBytes
[  5]   6.00-7.00   sec   295 MBytes  2.47 Gbits/sec    0    970 KBytes
[  5]   7.00-8.00   sec   295 MBytes  2.47 Gbits/sec    0    970 KBytes
[  5]   8.00-9.00   sec   295 MBytes  2.47 Gbits/sec    0   2.08 MBytes
[  5]   9.00-10.00  sec   295 MBytes  2.47 Gbits/sec    0   2.08 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  2.88 GBytes  2.48 Gbits/sec    0             sender
[  5]   0.00-10.00  sec  2.88 GBytes  2.47 Gbits/sec                  receiver
Regards.

Hostis
Posts: 2
Joined: Mon Mar 29, 2021 2:56 am
languages_spoken: english
Has thanked: 0
Been thanked: 0
Contact:

Re: WORKAROUND : Horrible network performance of R8125B when connected to a 1GbE switch

Post by Hostis »

Hello,

first of all, thank you for your amazing work !

I have 2 questions:
-if i understand it correctly, your workaround also applies with a Net Card installed? have you considered buying this card ?
-also, have you considered installing a new Proxmox Kernel, the 5.11 which is in a test repository https://forum.proxmox.com/threads/kernel-5-11.86225/ (apparently, if i remember it correctly, 5.10 kernel brings out-of-box support for this Chipset).

Thank you for your answers.

caramb
Posts: 13
Joined: Sun Oct 18, 2020 8:32 pm
languages_spoken: english french
ODROIDs: Odroid-H2+
Has thanked: 0
Been thanked: 3 times
Contact:

Re: WORKAROUND : Horrible network performance of R8125B when connected to a 1GbE switch

Post by caramb »

Hi Hostis,

Thank you !
Glad if my work is useful to others !

To answer your first question :
This backported driver should work no problem with the Net Card because it is RTL8125B based. ; the four 2.5GbE ports will appear as individual PCIe devices (BTW this is the reason why you need to flash your H2 with the ESF (a.k.a "M.2 bifurcated") version of the BIOS).
The Net Card is a really cool and cheap piece of HW ; allowing to turn your H2 into a very very capable network device.

However, installing the NetCard has 1 major obvious drawback : it prevents you from using the M.2 port for NVMe storage... (adding a M.2 port on the Net Card would made no sense because of the massive oversubscription this would introduce on the x4 PCIe port... along with the extra cost, this is probably the reason why Odroid did not even consider it)

In my particular use case (Hyperconverged Proxmox/Ceph cluster), I need 3 storage devices on the H2 : 1 boot device (NVME disk) and 2 Ceph OSDs (data storage) (2 SATA ports).
I initially (before the Net Card came out) choosed NVME storage not because of the perf requirements but just because such disks were already widely available and rather cheap (considering entry level low power ones).

Upgrading with the Net Card would have required 2 things :
1) Buy an extra >=32GB eMMC module as new boot device
2) Proxmox reinstall (not that straight forward and somewhat risky in this scenario ; 2 nodes out of the 3 are H2+).

For new setups, this is something one should really consider.
But in my case, the upgrade was way too complex.

So I didn't buy simply because I went another way ; let me explain...

I was looking for a cheap way to upgrade my network to 2.5GbE (mainly because from a Ceph point of view, network bandwidth does really matter !)
However, the only reasonnably priced (for homelab) (<150€) 2.5Gb/E copper (not sfp) fanless switches were/are unmanaged ones ; just like the TPLink models.
As I'm using various vlans to segregate cluster traffic, managed switch was mandatory ; thus, I gave up the idea of a switch upgrade.

No Net Card, no 2.5GbE switch ; what else ?

In fact there was another route : Using the 2 RTL8125B ports on each of my 3 nodes, I built a "full-mesh", "loop-free", "stp-free" high performance triangle topology (high performance because each node has 2.5Gb/s full duplex bandwidth to each of the two others).
This just needs proper configuration of openvswitch to avoid usage of spanning-tree (STP is hell ; STP is required when there are loops ; so just build loop-free...)
I had a plan to write a specific post to share some details on how to setup up such a topology ; I'm just missing the time...



Regarding the 2nd question, no I didn't consider switching to 5.11 yet because it is in a too early stage.
My cluster is not for testing-only and runs some "personnal production" that I need to keep up running.
You're right, vanilla 5.9+ Linux kernels have a r8169 module with builtin support for RTL8125B (I've just checked ; kernel 5.11.10 has this builtin support)
However, I'm not 100% confident the Proxmox does as it is derived from the Debian/Ubuntu kernel which has some differences with the mainline kernel.

I'm gonna git clone the source of pve-kernel 5.11, will take a look and will let you know.


Regards.

caramb
Posts: 13
Joined: Sun Oct 18, 2020 8:32 pm
languages_spoken: english french
ODROIDs: Odroid-H2+
Has thanked: 0
Been thanked: 3 times
Contact:

Re: WORKAROUND : Horrible network performance of R8125B when connected to a 1GbE switch

Post by caramb »

Hi again Hostis,

I've just cloned the master branch of pve-kernel which is version 5.11.7.
It is based on Ubuntu 21.04 Hirsute kernel.

I do confirm this kernel HAS support for RTL8125B.

Code: Select all

root@pve:/usr/src/pve-kernel/build/ubuntu-hirsute/drivers/net/ethernet/realtek# grep -i 8125B *
r8169_main.c:#define FIRMWARE_8125B_2   "rtl_nic/rtl8125b-2.fw"
r8169_main.c:   [RTL_GIGA_MAC_VER_63] = {"RTL8125B",            FIRMWARE_8125B_2},
r8169_main.c:MODULE_FIRMWARE(FIRMWARE_8125B_2);
r8169_main.c:           /* 8125B family. */
r8169_main.c:static void rtl8125b_config_eee_mac(struct rtl8169_private *tp)
r8169_main.c:           rtl8125b_config_eee_mac(tp);
r8169_main.c:static void rtl_hw_start_8125b(struct rtl8169_private *tp)
r8169_main.c:   static const struct ephy_info e_info_8125b[] = {
r8169_main.c:   rtl_ephy_init(tp, e_info_8125b);
r8169_main.c:           [RTL_GIGA_MAC_VER_63] = rtl_hw_start_8125b,
r8169_phy_config.c:static void rtl8125b_config_eee_phy(struct phy_device *phydev)
r8169_phy_config.c:static void rtl8125b_hw_phy_config(struct rtl8169_private *tp,
r8169_phy_config.c:     rtl8125b_config_eee_phy(phydev);
r8169_phy_config.c:             [RTL_GIGA_MAC_VER_63] = rtl8125b_hw_phy_config,
As you can see here (https://discourse.ubuntu.com/t/hirsute- ... dule/18539), the hirsute is still in testing stage and there won't be any final release before end of April...
So we should not expect kernel 5.11 to switch from the pvetest repository to the stable one before Ubuntu hirsute goes final.

But I will probably give a try in the meanwhile... as I have another Proxmox cluster I can play with and break...

Regards.
These users thanked the author caramb for the post:
odroid (Tue Mar 30, 2021 9:17 am)

mad_ady
Posts: 9374
Joined: Wed Jul 15, 2015 5:00 pm
languages_spoken: english
ODROIDs: XU4, C1+, C2, C4, N1, N2, H2, Go, Go Advance
Location: Bucharest, Romania
Has thanked: 599 times
Been thanked: 660 times
Contact:

Re: WORKAROUND : Horrible network performance of R8125B when connected to a 1GbE switch

Post by mad_ady »

@caramb you can use an unmanaged switch with vlans, but the hosts that connect to the switch need to use trunk interfaces. There will be a bit of "bleed over" of traffic where you will receive traffic for other vlans (tagged) on ports where you don't want it, but this will be broadcast/multicast and flooding (when the switch doesn't know on which port the destination mac is and floods traffic for that mac on all other ports until it learns it). So in typicall cases you shouldn't see any traffic degradation.

Sure, this means that you trust your hosts in the lan, otherwise a rogue host could easily change vlans...

caramb
Posts: 13
Joined: Sun Oct 18, 2020 8:32 pm
languages_spoken: english french
ODROIDs: Odroid-H2+
Has thanked: 0
Been thanked: 3 times
Contact:

Re: WORKAROUND : Horrible network performance of R8125B when connected to a 1GbE switch

Post by caramb »

Thank you mad_ady.
You're absolutely right, unmanaged switches are transparent to vlans and jumbo frames.
In fact, the other reason I did not want to go this way is the fact that I'm about to upgrade to a SDN managed switch (more precisely, upgrade from my current TPLink TL-SG108E to a TL-SG2008P ; I already set up an Omada SDN that manages my Wifi APs).

Regards.

mad_ady
Posts: 9374
Joined: Wed Jul 15, 2015 5:00 pm
languages_spoken: english
ODROIDs: XU4, C1+, C2, C4, N1, N2, H2, Go, Go Advance
Location: Bucharest, Romania
Has thanked: 599 times
Been thanked: 660 times
Contact:

Re: WORKAROUND : Horrible network performance of R8125B when connected to a 1GbE switch

Post by mad_ady »

I wonder what their "SDN" implementation does for you... Since it's not SDN in the true sense (having a centralised controller with real-time view of network traffic and topology that does routing/switching table rewriting on the fly to optimize the data flow). It only looks like a centralized dashboard, so apart from poe, I doubt you'll get more functionality from the other switch...

caramb
Posts: 13
Joined: Sun Oct 18, 2020 8:32 pm
languages_spoken: english french
ODROIDs: Odroid-H2+
Has thanked: 0
Been thanked: 3 times
Contact:

Re: WORKAROUND : Horrible network performance of R8125B when connected to a 1GbE switch

Post by caramb »

@mad_ady
Once again you're right, this is consumer/SMB SDN solution ; not enterprise/isp/carrier grade one.

mad_ady
Posts: 9374
Joined: Wed Jul 15, 2015 5:00 pm
languages_spoken: english
ODROIDs: XU4, C1+, C2, C4, N1, N2, H2, Go, Go Advance
Location: Bucharest, Romania
Has thanked: 599 times
Been thanked: 660 times
Contact:

Re: WORKAROUND : Horrible network performance of R8125B when connected to a 1GbE switch

Post by mad_ady »

Don't you just love it when manufacturers throw in big words like SDN, Big Data, Machine Learning to try to make their products more appealing, but behind the scenes is the same old cron script that reboots at 4 am?
That's what I like about Hardkernel - no such nonsense...

Hostis
Posts: 2
Joined: Mon Mar 29, 2021 2:56 am
languages_spoken: english
Has thanked: 0
Been thanked: 0
Contact:

Re: WORKAROUND : Horrible network performance of R8125B when connected to a 1GbE switch

Post by Hostis »

Hello again,

thank you for your deep explanations @caramb, i don't know what to say :D, i appreciate the effort you put into these answers.

That is what i though, that you already have the M2 port occupied , and it's gonna be just too much work to change it. I was just thinking that maybe there is an another reason (like performance/stability problems), since nobody even mentioned it in this thread. For me personally, it's gonna be way better to use this Net Card , than like USB -> Ethernet adapters, which are well known for having stability problems, and those which are considered as "good", costs like 70$ each.

Thank you for your info about Ubuntu as well, i didn't know that, they didn't say anything in this thread also (on the Proxmox forum)

Thank you so much again, and i hope you are gonna find some free time to write this post about setting your topology, can't wait!

Post Reply

Return to “General Topics”

Who is online

Users browsing this forum: No registered users and 2 guests