WORKAROUND : Horrible network performance of R8125B when connected to a 1GbE switch

Post Reply
caramb
Posts: 27
Joined: Sun Oct 18, 2020 8:32 pm
languages_spoken: english french
ODROIDs: Odroid-H2+
Has thanked: 1 time
Been thanked: 5 times
Contact:

WORKAROUND : Horrible network performance of R8125B when connected to a 1GbE switch

Post by caramb »

Note 20210429 : You can still read the whole story bellow but since I started this thread, a few things evolved. You can check this post (viewtopic.php?p=322534#p322534) for a quick sumary & this one (viewtopic.php?p=327687#p327687) as kernel 5.11 of Proxmox 6.4 brings out-of-the-box R8125B support (still testing ; see "edit 10" at the bottom of this first post).

Hi there,

I'm sharing with you a workaround to the terrible network performance issue of two Realtek 8125B (running on Linux) talking to each other through a 1Gb/s switch.
Performance seen using the Realtek module (version 9.003.05-1 DKMS in my case) can be as low as 100Mb/s over a 1Gb/s link.

I spent a fair amount of time trying to understand what the problem was.
The test topology is quite simple : 2 H2+, 1 third-party server (HP Microserver Gen8), 1 Gigabitswitch (1 entry level TPLink TL-SG108E).
Performance between the thrid-party and any of the H2+ is okay (>800/900Mb/s).
Performance between the two H2+ is terrific (~100Mb/s).
These results would puzzle any experienced network engineer...

Thanks to these posts
viewtopic.php?t=39323
viewtopic.php?t=39979
I understood it was a driver problem ; severals users claimed the issue is fixed using the driver from beta Linux kernel 5.9.

As others already stated, please do not spend(waste) money buying different 1Gigabit switches, it surely won't help.

Surprisingly, nobody seems to have backported the fix to the stable kernel branches ; that what I'm gonna focus on in this post.
In fact, it is not a fix, it is about adding support for chipset Realtek 8125 HW revision B to the r8169 kernel module so we can get rid of the realtek r8125 module which seems to have broken implemenrtation.

Note : Please don't blame me if something goes wrong. I'm not a kernel developer ; not even a developer ; merely a dirty patcher. I have my 2 H2+ nodes running for more than 1 week without experiencing any issue. Please do not consider this production-ready without performing extensive stability tests beforehand.

I split my post into pieces :
- Part 1 : Load ready-to-use modules on Promox system.
- Part 2 : Build the modules for your system from the patched sources.
- Part 3 : Directions to patch by yourself.
- Part 4 : Performance figures ; before/after

Edit 1 (20201024) : Realtek released a new version of their module : version 9.004.01 (thank you lhb035 for pointing this). As of now, I did not test so cannot state if the problem is still present.
Edit 2 (20201120) : Built the modules against Proxmox kernel 5.4.73-1-pve and attachted them to the "Part 1" post.
Edit 3 [20201203] : Built the modules against Proxmox kernel 5.4.78-1-pve and attachted them to the "Part 1" post.
Edit 4 [20201215] : Built the modules against Proxmox kernel 5.4.78-2-pve and attachted them to the "Part 1" post.
Edit 5 [20210223] : WARNING WARNING : ]Proxmox kernel 5.4.98-1-pve switched from ubuntu-focal to ubuntu-hirsute kernel source tree. This kernel includes the "reworked" realtek module (kernel 5.9/5.10 style) thus my patch is no longer compatible. I'm investigating this ; already found that it should be safe to directly import the r8169_main.c file from stable kernel 5.10.18. At the time being, the modules I built fail to load due to missing symver. So better to stick with 5.4.78-2-pve. Did not gave a try with the latest DKMS driver from Realtek.
Edit 6 [20210224] : Ok, found the root cause. Proxmox started developping a 5.10 kernel ; thus cloning the master git branch was in fact cloning the wrong kernel source ; so I was building a 5.10 module (and pve kernel 5.4.98-1 is still based on ubuntu-focal sources). Fixed that specifying the 5.4 branch in the git clone command (I updated "Part 2" post to reflect that). Thus my regular patched source for rtl8125B properly complies against the 5.4.98 pve kernel. Built the modules against Proxmox kernel 5.4.98-1-pve and attachted them to the "Part 1" post.
Edit 7 [20210313] : Built the modules against Proxmox kernel 5.4.103-1-pve and attachted them to the "Part 1" post.
Edit 8 [20210326] : Built the modules against Proxmox kernel 5.4.106-1-pve and attachted them to the "Part 1" post.
Edit 9 [20210428] : Promox 6.4 with kernel 5.11 brings first builtin support for R8125B. (viewtopic.php?p=327687#p327687)
Edit 10 [20210501] : Kernel 5.11 (with builtin r8169) provides with similar (good) performance to Kernel 5.4 + custom r8169 (viewtopic.php?p=327687#p327687). However, testings show that kernel 5.11 has an extra 10% CPU overhead. Please also give a look at post viewtopic.php?p=327991#p327991 which deals with efficiency concerns (CPU usage under heavy network load).
Edit 11 [20210512] : Built the modules against Proxmox kernel 5.4.114-1-pve and attachted them to the "Part 1" post.
Edit 12 [20210612] : Built the modules against Proxmox kernel 5.4.119-1-pve and attachted them to the "Part 1" post.
Last edited by caramb on Sun Jun 13, 2021 12:00 am, edited 42 times in total.
These users thanked the author caramb for the post:
doughnut (Sun Jan 10, 2021 5:35 am)

caramb
Posts: 27
Joined: Sun Oct 18, 2020 8:32 pm
languages_spoken: english french
ODROIDs: Odroid-H2+
Has thanked: 1 time
Been thanked: 5 times
Contact:

Part 1 : Easy way for Proxmox 5.4.65-1-pve kernel

Post by caramb »

For those who are running proxmox kernel 5.4.65-1-pve on the H2+ board, find attached the ready-to-use modules.

Proceeding should be straigth forward.

1) First, confirm what driver you are running.

Code: Select all

lspci -k
should show you something similar to

Code: Select all

02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. Device 8125 (rev 04)
        Subsystem: Realtek Semiconductor Co., Ltd. Device 0123
        Kernel driver in use: r8125
        Kernel modules: r8169, r8125
03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. Device 8125 (rev 04)
        Subsystem: Realtek Semiconductor Co., Ltd. Device 0123
        Kernel driver in use: r8125
        Kernel modules: r8169, r8125
Driver in use should be r8125 whenever you installed the driver provided by realtek following the Odroid application note.
If no "driver in use" appears, you are running stock kernel with stock r8169 module which do not recognize revision B of the 8125 chipset.

2) Prerequisite : Ensure you have access to the console of the H2+. You'll loose remote access when the new modules are tested (except of course if you access the box through a non-realtek network adapter).

3) Please download the modules attached to this post (and upload them to the H2+).
There are 2 modules :
- realtek.ko : contains the drivers for the PHY part of the network drivers.
- r8169.ko : drivers for many realtek chips including 8125.
and a companion firmware.

4) Unzip the archive in the home directory of user root

Code: Select all

unzip ./modules.zip
You may need to install the unzip binary

Code: Select all

apt-get install unzip
5) Install the new drivers and backup the old ones :

Code: Select all

mv /lib/modules/5.4.65-1-pve/kernel/drivers/net/phy/realtek.ko /lib/modules/5.4.65-1-pve/kernel/drivers/net/phy/realtek.ko.orig
mv ~root/realtek.ko /lib/modules/5.4.65-1-pve/kernel/drivers/net/phy/
mv /lib/modules/5.4.65-1-pve/kernel/drivers/net/ethernet/realtek/r8169.ko /lib/modules/5.4.65-1-pve/kernel/drivers/net/ethernet/realtek/r8169.ko.orig
mv ~root/r8169.ko /lib/modules/5.4.65-1-pve/kernel/drivers/net/ethernet/realtek/
5bis) Another required thing is to load the firmware associated to the 8125B chip.
Required file is : rtl8125b-2.fw (firmware version 2 for 8125 revision B).
You may use the copy provided in the zip file you downaloded or load it from the kernel git :
Online method :

Code: Select all

cd /lib/firmware/rtl_nic/
wget https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/plain/rtl_nic/rtl8125b-2.fw
Offline method :

Code: Select all

mv ~root/rtl8125b-2.fw /lib/firmware/rtl_nic/
6) Quick integrity check : You may want to check the sha256 hashes of your files against the ones bellow :

Code: Select all

sha256sum /lib/modules/5.4.65-1-pve/kernel/drivers/net/phy/realtek.ko
1b7e7aa833e3d4b0d2f0081591e638da82520593c9f0cca5175cabeb4282625c  /lib/modules/5.4.65-1-pve/kernel/drivers/net/phy/realtek.ko

sha256sum /lib/modules/5.4.65-1-pve/kernel/drivers/net/ethernet/realtek/r8169.ko
bc86d4a1fd477603ea14f6911b2d8d95b69b3dbdf10a3f409d9fe4f2794c1d0d  /lib/modules/5.4.65-1-pve/kernel/drivers/net/ethernet/realtek/r8169.ko

sha256sum /lib/firmware/rtl_nic/rtl8125b-2.fw
529bf1c25c97ff52b401090d00ff89cc22351012336e5a0c9662728a3ee909ef  /lib/firmware/rtl_nic/rtl8125b-2.fw
6bis) Confirm module r8169 is not blacklisted (you may have done so when installing the r8125 module).

Code: Select all

grep r8169 /etc/modprobe.d/*
Should give no result ; if not, remove from the corresponding file.

7) Let's give a try : switch to the cnsole of the H2+ board.

!!! WARNING WARNING !!! : This may hurt your promox ; especially if you are running a cluster and/or a ceph cluster. Please ensure you are in a position you can safely break the network connectivity before proceeding. I strongly suggest you gently shutdown any running VM on this host and confirm quorum will be maintained if running a cluster.

Let's unload the old modules first :

Code: Select all

rmmod r8169
rmmod r8125
rmmod realtek
Check :

Code: Select all

lsmod | grep r8
lsmod | grep ^realtek
should give no result.

Load the new modules :

Code: Select all

modprobe realtek
modprobe r8169
8) Confirm the new driver has properly recognized the chip :

Code: Select all

lsmod | grep r8
should show you something similar to :

Code: Select all

r8169                  90112  0

Code: Select all

lsmod | grep ^realtek
should show you something similar to :

Code: Select all

realtek                24576  2
And most importantly, the kernel should tell you it uses the new drivers :

Code: Select all

lspci -k

Code: Select all

02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. Device 8125 (rev 04)
        Subsystem: Realtek Semiconductor Co., Ltd. Device 0123
        Kernel driver in use: r8169
        Kernel modules: r8169, r8125
03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. Device 8125 (rev 04)
        Subsystem: Realtek Semiconductor Co., Ltd. Device 0123
        Kernel driver in use: r8169
        Kernel modules: r8169, r8125
If the kernel states "Kernel driver in use: r8169" then it 's a win !

9) Confirm the system shows the Ethernet adapters :

Code: Select all

ip link list
you should see the enp2s0 and enp3s0 back. If not, dmesg should give you additional information.

As we are manually playeing with the network, they may recover in a disabled state ; enable manualy :

Code: Select all

ip link set enp2s0 up
ip link set enp3s0 up
10) You should now be able to perform basic ping tests and performance tests.
You should also note that Proxmox is smoothly recovering.

10bis) At this point, if something went wrong, a simple reboot of the system will switch back to the 8125 module (assuming you were running on before).

11) Assuming everythnig is okay, we just need to make this persistent across reboots.
This requires updating the init ramdisk.
Whenever you previously installed the r8125 module, you don't have to worry, the r8169 takes precedence ; there is no need to uninstall.


update the ramdisk

Code: Select all

update-initramfs -u
11) Reboot and enjoy !
Attachments
modules-5.4.119-1-pve.zip
(528.07 KiB) Not downloaded yet
modules-5.4.114-1-pve.zip
(528.01 KiB) Downloaded 12 times
modules-5.4.106-1-pve.zip
(527.87 KiB) Downloaded 34 times
modules-5.4.103-1-pve.zip
(527.87 KiB) Downloaded 18 times
modules-5.4.98-1-pve.zip
(527.81 KiB) Downloaded 29 times
modules-5.4.78-2-pve.zip
(527.87 KiB) Downloaded 59 times
modules-5.4.78-1-pve.zip
(527.87 KiB) Downloaded 31 times
modules-5.4.73-1-pve.zip
(528.04 KiB) Downloaded 53 times
modules.zip
(529.86 KiB) Downloaded 77 times
Last edited by caramb on Sat Jun 12, 2021 11:59 pm, edited 32 times in total.
These users thanked the author caramb for the post:
CarminaBurana (Wed Dec 09, 2020 10:37 pm)

caramb
Posts: 27
Joined: Sun Oct 18, 2020 8:32 pm
languages_spoken: english french
ODROIDs: Odroid-H2+
Has thanked: 1 time
Been thanked: 5 times
Contact:

Part 2 : almost DIY section

Post by caramb »

If you are not running Proxmox or as soon as Proxmox updates its kernel, the modules provided in the Part1 post will stop working.
(building a DKMS aware module should solve that but I'm not familiar with and I'm missing the time to do this).
Thus this post is explaning the process I went trhough to build the modules.

Huge greetings go to a lot of people who shared very valuable information through various posts. Picking from each led me to a successful build.
viewtopic.php?t=39323
viewtopic.php?t=39979
https://askubuntu.com/questions/1259947 ... g-on-20-04
https://forum.proxmox.com/threads/compi ... eck.36374/
https://stackoverflow.com/questions/284 ... nux-kernel
https://yoursunny.com/t/2018/one-kernel-module/

My assumption is the sources I'm providing will build against any 5.4 sources and probably other kernel branches.
They won't build againt branch 5.9 because there was a rework of the realtek driver code.

For steps 6 and 7, I cannot provide with instructions that will fit any linux distribution ; I can only give directions...

Here we go :

1) Access your Linux machine (proxmox node or whatever other distro you play with).
To avoid any confusion later in the process, you may check the kernel version tou are currently running on :

Code: Select all

uname -a
2) Download the zip file attached to this post containing the modules sources.

3) Upload the zip file to your Linux box in the root homedir

4) Unzip the file.

5) Ensure your system is up-to-date
Warning : Please do not perform blind updates if the system is running merly critical services.
Debian/Promox

Code: Select all

apt-get update
Centos/Fedora

Code: Select all

yum update or dnf update
6) Make your system capable of building the kernel : method will vary depending on the distro you use. Bellow works for Proxmox ; alternative should work for debian :
Promox :

Code: Select all

apt-get install git nano screen patch fakeroot build-essential devscripts libncurses5 libncurses5-dev libssl-dev bc flex bison libelf-dev libaudit-dev libgtk2.0-dev libperl-dev asciidoc xmlto gnupg gnupg2 rsync lintian debhelper libdw-dev libnuma-dev libslang2-dev sphinx-common asciidoc-base automake cpio dh-python file gcc kmod libiberty-dev libpve-common-perl libtool perl-modules python-minimal sed tar zlib1g-dev lz4
Debian

Code: Select all

apt install build-essential fakeroot dpkg-dev perl libssl-dev bc gnupg dirmngr libncurses5-dev libelf-dev flex bison lsb-release rsync

7) Retrieve your kernel source (method depends on your Linux distribution) ; it's the most tricky part as you must get the source from the exact kernel build you are running (because every module includes a vermagic flag that tells it which version of the kernel it was made for) :
Following instruction works for Proxmox :

Code: Select all

cd /usr/src/
#git clone git://git.proxmox.com/git/pve-kernel.git
# As there are various kernel branches I recommand specifying the branch. Not doing so and thus cloning the master branch may lead to clone the wrong kernel.
git clone --branch 'pve-kernel-5.4' git://git.proxmox.com/git/pve-kernel.git
cd /usr/src/pve-kernel
make
This step requires a fair amount of time ; however, we don't need to wait for the entire compliation to finish ; we just need to be sure the kernel sources are there and up-to-date (patched).
You'll see the proceess starts downloading the kernel (ubuntu-focal) sources, then some zfs source... wait until it starts really building the kernel and break it (ctrl-C).

For other distributions, please google the internet (This step also assumes you retrieved the kernel config).

8) If you came to this point, we probably went over the most difficult part.
In the export statement bellow, please replace "/usr/src/pve-kernel/build/ubuntu-focal" by your kernel source directory.

Code: Select all

export MYKERNELROOTDIR="/usr/src/pve-kernel/build/ubuntu-focal"
This sets the MYKERNELROOTDIR variable to your kernel directory to acomodate your specific situation.
Please be sure to run all the next commands into the same linux terminal.

Before jumping to the next step, please check that :

Code: Select all

echo $MYKERNELROOTDIR
outputs the directory of your kernel source and that a :

Code: Select all

ls $MYKERNELROOTDIR 
gives no error and list the kernel source root dir content.

8bis) So now, let's put the new source in place.

Code: Select all

mv $MYKERNELROOTDIR/drivers/net/phy/realtek.c $MYKERNELROOTDIR/drivers/net/phy/realtek.c.orig
mv ~root/realtek.c $MYKERNELROOTDIR/drivers/net/phy/
mv $MYKERNELROOTDIR/drivers/net/ethernet/realtek/r8169_main.c $MYKERNELROOTDIR/drivers/net/ethernet/realtek/r8169_main.c.orig
mv ~root/r8169_main.c $MYKERNELROOTDIR/drivers/net/ethernet/realtek/
9) Let's build selectively the modules (avoiding the time-consumming task of building the entire kernel).
We cannot ask for a single module compilation but we ask for a single directory compliation ; which saves us hours

Code: Select all

cd $MYKERNELROOTDIR/
make scripts prepare modules_prepare
make -C . M=drivers/net/phy
make -C . M=drivers/net/ethernet/realtek
10) If build was successful, we can check if the vermagic is correct using modinfo :

Code: Select all

modinfo $MYKERNELROOTDIR/drivers/net/phy/realtek.ko
modinfo $MYKERNELROOTDIR/drivers/net/ethernet/realtek/r8169.ko
The "vermagic:" line should expose the same version as the output of a "uname -r"

11) As a final step, retrieve the module and follow the instruction of post "Part 1" to install them.
The 2 files are :

Code: Select all

$MYKERNELROOTDIR/drivers/net/phy/realtek.ko
$MYKERNELROOTDIR/drivers/net/ethernet/realtek/r8169.ko
Attachments
sources.zip
(42.8 KiB) Downloaded 47 times
Last edited by caramb on Thu Feb 25, 2021 4:57 pm, edited 24 times in total.

caramb
Posts: 27
Joined: Sun Oct 18, 2020 8:32 pm
languages_spoken: english french
ODROIDs: Odroid-H2+
Has thanked: 1 time
Been thanked: 5 times
Contact:

Part 3 : DIY section (the hard way)

Post by caramb »

edit 20210429 : Fixed broken link to the kernel patch.

Whenever the previously "Part 2 : almost DIY" method fails or if you'd like to patch yourself, the best I can do for you is to share what I did to modify the sources.

Reminder : only 3 files need modification :
- r8169_main.c : main driver.
- realtek.c : PHY subpart of the driver.
- rtl8125b-2.fw : specific firmware required by the driver.

rtl8125b-2.fw : easy, if file does not exist, just add it to your system and that's it (please refer to post "Part 1").

realtek.c : Should be easy. After manually comparing the file from my local Linux distro (/<kernel source>/drivers/net/phy/realtek.c) to the kernel 5.9 source git (https://git.kernel.org/pub/scm/linux/ke ... h=v5.9-rc8), I came to the conclusion that local file can safely be replaced by the 5.9 kernel source (file from 5.9 only contains additionnal devices support).

r8169_main.c : This one needs more work.

Your (assumed not impossible) mission is to backport the follwing patch to your kernel : https://git.kernel.org/pub/scm/linux/ke ... 6d140ce2ef
https://git.kernel.org/pub/scm/linux/ke ... 6d140ce2ef

If your source already has support for 8125A chip, it will be "quite" straight forward. It is not a strict prerequisite but not having 8125A support will force you to backport additional code and greatly increases the difficulty.

Here are some tips for patching the source :
- Most kernel do not have r8169_phy_config.c file ; changes take place directly in the r8169_main.c
- To make code cleaner and avoid any confusion, you'll have to rename existing functions used by the 8125A code from <whatever>8125<whatever> to <whatever>8125a<whatever>
- Due to some code rework/refactoring, some functions in the 5.9 code do not exist in the 5.4 branch, here are some mappings :
- rtl_wait_txrx_fifo_empty => rtl_hw_init_8125
- rtl8169_cleanup => rtl_hw_initialiaze

You may also start from the source I've patched attached to previous post "Part 2".

Please follow the "Part 2" post instructions to build your modified modules.

Tips for debugging :
- Always check the dmesg ouput to get the feedback from the kernel.
- If your test fails with a "realtek.ko not loaded, maybe it needs to be added to initramfs?" error message, then you are not running the proper version of the realtek.ko module (the PHY module). Do even try to update the r8169.ko module without also updating the realtek.ko module ; they are thightly coupled.
- If modprobe fails with a "Exec format error" error message, your module may be built for the wrong kernel version. Another cause (but I'm not 100% sure about) would be that you strip the module.

Good luck !
Last edited by caramb on Fri Apr 30, 2021 2:14 am, edited 15 times in total.

caramb
Posts: 27
Joined: Sun Oct 18, 2020 8:32 pm
languages_spoken: english french
ODROIDs: Odroid-H2+
Has thanked: 1 time
Been thanked: 5 times
Contact:

Part 4 : Results / performance.

Post by caramb »

Here are some figures regarding the performance.

Test setup :
- pve node : Proxmox on HP Microserver Gen8 (Intel 1gig onboard adapter)
- pve2 node : Promox on Odroid-H2+
- pve3 node : Promox Odroid-H2+
- 1GbE switch : TPLink TL-SG108E
- test software : regular iperf3
- All 3 nodes running Linux kernel 5.4.65-1-pve
- In any case, the nodes are not completely idle ; they are running a proxmox cluster, some VMs and a Ceph cluster.


Results with module r8125 (realtek-r8125-9.003.05) (DKMS version) :
- pve to pve2 : 874Mb/s
- pve2 to pve : 725Mb/s
- pve to pve3 : 858Mb/s
- pve3 to pve : 735Mb/s
- pve2 to pve3 : 97Mb/s !!!!!! :twisted:
- pve3 to pve2 : 105Mb/s !!!!!! :twisted:

As I'm running a Ceph cluster (yes, the H2+ can do that assuming you don't put heavy load on it), this was killing the performance of the cluster.


Result with r8169 patched-by-myself module :
- pve to pve2 : 942Mb/s
- pve2 to pve : 836Mb/s
- pve to pve3 : 943Mb/s
- pve3 to pve : 846Mb/s
- pve2 to pve3 : 890Mb/s :D
- pve3 to pve2 : 899Mb/s :D

These patched modules also bring flow-control support to the r8125B as well as greater ethtool support.


You may think sub-900Mb/s over a 1Gig link is a suboptimal result. It may or may not be a problem that I will probably won't dig into (may be due to the flow-control wich was enabled).
My concern was the 100Mb/s performance I was achieving with the vendor module ; relative to this point : job is done.


As I don't own a 2.5GbE capable switch I was not able to test this topology.
However, I tested the patched module in a back-to-back 2.5GbE configuration (thus pve2 to pve3 direct connection).
Iperf3 gives something around 2.3Gb/s which sounds reasonnable to me.


Sidenote regarding the side effect of this network performance on the Ceph performance :
- I'm running a Win10pro VM on each of the proxmox nodes ; all storage on the Ceph cluster.
- Due to the asymetrical performance, I was getting weird results when I was running a CrystalDiskMark on the Windows
- The VM on node pve was getting the 150/50MB/s read/write performance.
- The 2 VMs on nodes pve2 and pve3 were getting 35/15MB/s only.
- After switching to the patched r8169 module, VMs on pve2 et pve3 got similar performance to the pve node.
- Please do not troll, these figures are still fairly low but acceptable considering the low cost of the entire system, the very low power conumption, the almost no-noise advantage (the 2 H2+ are fanless, the HP fan runs <20%)... and it is just mainly a homelab. Hummm, the H2+ is a really great and very capable x86 SBC ; probably the only one supporting 32GB of ram.
Last edited by caramb on Mon Oct 19, 2020 3:03 pm, edited 19 times in total.

caramb
Posts: 27
Joined: Sun Oct 18, 2020 8:32 pm
languages_spoken: english french
ODROIDs: Odroid-H2+
Has thanked: 1 time
Been thanked: 5 times
Contact:

Part 5 : Footnotes

Post by caramb »

As of now : nothing to say
Except that I miserably failed posting this in the N2/N2+ section instead of H2/H2+ ; asked the admin to fix that :lol: .... and the great admin did that :P
Last edited by caramb on Wed Mar 17, 2021 1:25 am, edited 3 times in total.

User avatar
odroid
Site Admin
Posts: 37528
Joined: Fri Feb 22, 2013 11:14 pm
languages_spoken: English, Korean
ODROIDs: ODROID
Has thanked: 1835 times
Been thanked: 1155 times
Contact:

Re: WORKAROUND : Horrible network performance of R8125B when connected to a 1GbE switch

Post by odroid »

Moved from N2 sub-forum.
Thank you for sharing a nice instruction for Proxmox users.

lhb035
Posts: 5
Joined: Thu Aug 27, 2020 5:09 pm
languages_spoken: Chinese
ODROIDs: H2+
Has thanked: 0
Been thanked: 4 times
Contact:

Re: WORKAROUND : Horrible network performance of R8125B when connected to a 1GbE switch

Post by lhb035 »

Hello caramb,

Follow your tutorial, But my H2+ show "Kernel modules: r8169" not "Kernel modules: r8169, r8125"

I us Openwrt test NAT speed only 20MB/s .



Part 1 : Easy way for Proxmox 5.4.65-1-pve kernel------8) Confirm the new driver has properly recognized the chip :


# uname -a
Linux h2pve 5.4.65-1-pve #1 SMP PVE 5.4.65-1 (Mon, 21 Sep 2020 15:40:22 +0200) x86_64 GNU/Linux

# lspci -k

02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. Device 8125 (rev 04)
Subsystem: Realtek Semiconductor Co., Ltd. Device 0123
Kernel driver in use: r8169
Kernel modules: r8169
03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. Device 8125 (rev 04)
Subsystem: Realtek Semiconductor Co., Ltd. Device 0123
Kernel driver in use: r8169
Kernel modules: r8169




and realtek update the driver to 9.004.01
https://www.realtek.com/en/component/zo ... s-software
2.5G Ethernet LINUX driver r8125 for kernel up to 5.6 9.004.01 2020/10/19 75 KB

caramb
Posts: 27
Joined: Sun Oct 18, 2020 8:32 pm
languages_spoken: english french
ODROIDs: Odroid-H2+
Has thanked: 1 time
Been thanked: 5 times
Contact:

Re: WORKAROUND : Horrible network performance of R8125B when connected to a 1GbE switch

Post by caramb »

Hi lhb035,

First, thank you for pointing me to the new Realtek driver ; I may give a try latter ; as of now, I'm satisfied with the native kernel module I patched.

Respective to your "lspci -k" output, I would say it looks good.
You have "Kernel driver in use: r8169" ; which means your Linux kernel found the modified r8169 module, considered it was able to drive the realtek chip of the H2+.
The lack of r8125 is harmless ; it does suggest you just did not install the realtek module ; but we don't need it.

To clarify, I would say r8169 module and r8125 module are competitors ; the first is the native Linux kernel module, the other is the vendor (realtek) proprietary module.
(Don't be confused by the module name, the r8169 module has support for numerous Realtek chips including 8125...)
You may have the two modules installed on your system (it was my case) but you'll use only one.

If you want to give a try to the native kernel module (r8169) I patched, you are on the right track. As the module is loaded, "ip link list" should show you the interfaces. If not, check the ouput of "dmesg".


If you want to give a try to the Realtek module (r8125), it's a different story : just give up with my "Part x" posts.
Hardkernel provided instructions for that here : https://wiki.odroid.com/odroid-h2/appli ... _on_h2plus

Bellow is what I did to initally install the DKMS version of the Realtek module :

Prepare your system :

Code: Select all

apt-get install dkms
apt install pve-headers-$(uname -r)
Download the debian package on your H2+ :

Code: Select all

wget https://github.com/awesometic/realtek-r8125-dkms/releases/download/9.004.01-1/realtek-r8125-dkms_9.004.01-1_amd64.deb
Install the dkms module :

Code: Select all

dpkg -i ./realtek-r8125-dkms_9.004.01-1_amd64.deb
Verify :

Code: Select all

lspci -k
Make the module available to all your kernel versions and persistent across reboots :

Code: Select all

update-initramfs -k all -u
When there is a kernel update on the proxmox side, one needs to retrigger the dkms build process :

Code: Select all

apt install pve-headers-$(uname -r)
ls /var/lib/initramfs-tools | sudo xargs -n1 /usr/lib/dkms/dkms_autoinstaller start
update-initramfs -k all -u

lhb035
Posts: 5
Joined: Thu Aug 27, 2020 5:09 pm
languages_spoken: Chinese
ODROIDs: H2+
Has thanked: 0
Been thanked: 4 times
Contact:

Re: WORKAROUND : Horrible network performance of R8125B when connected to a 1GbE switch

Post by lhb035 »

Hi caramb,
Thanks for your guidance and help,

After install the 9.004.01-1 DKMS driver, I tested NAT speed is allway slow...

environment:
H2+ 16GB ram*2 500G NVME SSD
AX88179 usb3.0 1000Mbps network adapter
PVE 6.2 5.4.65-1-pve

Openwrt 19.07.4 run in LXC

WAN port to linux bridge eth0 AX8819
LAN port tolinux bridge eth1 RTL8125
NAT speed 70MB/s
ksoftirqd 60% cpu used


WAN port to linux bridge eth0 AX8819
LAN port tolinux bridge eth1 AX8819
NAT speed 80MB/s

WAN port to linux bridge eth0 RTL8125
LAN port tolinux bridge eth1 RTL8125
NAT speed 20MB/s



The driver of the 8125 network card is too bad, I can only wait for realtek to update it...

Trilom
Posts: 1
Joined: Mon Nov 02, 2020 8:24 am
languages_spoken: english
ODROIDs: H2+
Has thanked: 0
Been thanked: 0
Contact:

Re: WORKAROUND : Horrible network performance of R8125B when connected to a 1GbE switch

Post by Trilom »

Thank you for taking your time to work on this. I have a dump of information that I hope helps you. The short version is that the r8169 improves performance on 1GbE networks without a doubt. When hooking up to at least my 10GbE network it is much better (actually works) than the r8125 driver however when it pushes it has issues with my configuration. I have not torn down the ovs_system off but can see the interfaces communicating as expected but the packet size on the vLANs in/out of the h2 seem funky and demonstrate weirdness as described below after r8169.

In the end this driver improves 1 GbE performance, but to use the 2.5 GbE pipe on my existing 10 GbE equipment it is not worthwhile. Do you have any indications to what might be the issue? The only differences in configuration I notice is that the interface on ovs-vsctl is explicitly set to 100000 on the 10GbE interface on r1 but is left unconfigured on the 2.5GbE interface on h2.

## with r8169 on 10GbE switch

One thing that frustrated me with the r8125 drivers was that I could not use the ports on a 10GbE switch since there is a lack of affordable 2.5GbE switches. With that being said I have some mixed results. With r8125 no matter what it wouldn't work on 10GbE switch, with r8169 and switch port set to auto-negotiation then it can connect. Whenever you push between h2 devices there are issues however when the h2 pushes to another device(r1) everything is fine(full 2.5GbE on 1500 and 9000 networks). If another device(r1) pushes to h2 then there are issues.

- Switch is a Unifi US-16-XG.
- [SFP+ interfaces are these iopolex ones that are pretty universal.](https://www.amazon.com/gp/product/B01M5LIUK5/)
- When interface is in autonegotiate it works fine.
- When interface is explicitly set to 2.5GbE it does not work.
- Going from r1 (10GbE host) to h2a(2.5GbE host on 10GbE interface) or h2b(2.5GbE host on 10GbE interface).
What do packets look like?
- when going to/from vlan23 (h2 to h2) in/out doesn't seem to get larger than 16000 or so
- when going to/from vlan23 (r1 to h2) vlan23(eno3 at 9k) sends at 9k from r1 and receives at 14/15k on h2a:vlan23(enp3s0 at 9k)
- when going to/from vlan23 (h2 to r1) vlan23(enp3s0 at 9k) sends at 62k from h2a and receives at 26k on r1:vlan23(eno3 at 9k)

### h2a to/from h2b

Code: Select all

# iperf3 (other host pushing to h2) on 9000 net is same as return
root@h2b:~# iperf3 -c h2a.s.ceph.trilhome.lan
Connecting to host h2a.s.ceph.trilhome.lan, port 5201
[  5] local 172.16.12.5 port 39072 connected to 172.16.12.4 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  17.1 MBytes   144 Mbits/sec  105   61.2 KBytes
[  5]   1.00-2.00   sec  15.4 MBytes   129 Mbits/sec   91   69.9 KBytes
[  5]   2.00-3.00   sec  16.5 MBytes   138 Mbits/sec   91   43.7 KBytes
[  5]   3.00-4.00   sec  14.6 MBytes   123 Mbits/sec  108   43.7 KBytes
[  5]   4.00-5.00   sec  15.0 MBytes   126 Mbits/sec  116   52.4 KBytes
[  5]   5.00-6.00   sec  16.1 MBytes   135 Mbits/sec  110   43.7 KBytes
[  5]   6.00-7.00   sec  14.5 MBytes   121 Mbits/sec   93   35.0 KBytes
[  5]   7.00-8.00   sec  15.2 MBytes   127 Mbits/sec  108   43.7 KBytes
[  5]   8.00-9.00   sec  14.6 MBytes   123 Mbits/sec   83   35.0 KBytes
[  5]   9.00-10.00  sec  15.7 MBytes   132 Mbits/sec  104   61.2 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   155 MBytes   130 Mbits/sec  1009             sender
[  5]   0.00-10.00  sec   154 MBytes   129 Mbits/sec                  receiver

iperf Done.
# iperf3 in the return path (h2 is pushing to other host) on 9000 is same as send
root@h2b:~#  iperf3 -c h2a.s.ceph.trilhome.lan -R
Connecting to host h2a.s.ceph.trilhome.lan, port 5201
Reverse mode, remote host h2a.s.ceph.trilhome.lan is sending
[  5] local 172.16.12.5 port 39120 connected to 172.16.12.4 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  15.4 MBytes   129 Mbits/sec
[  5]   1.00-2.00   sec  14.1 MBytes   118 Mbits/sec
[  5]   2.00-3.00   sec  14.3 MBytes   120 Mbits/sec
[  5]   3.00-4.00   sec  15.4 MBytes   129 Mbits/sec
[  5]   4.00-5.00   sec  14.4 MBytes   121 Mbits/sec
[  5]   5.00-6.00   sec  16.3 MBytes   137 Mbits/sec
[  5]   6.00-7.00   sec  15.8 MBytes   133 Mbits/sec
[  5]   7.00-8.00   sec  14.3 MBytes   120 Mbits/sec
[  5]   8.00-9.00   sec  16.0 MBytes   134 Mbits/sec
[  5]   9.00-10.00  sec  15.2 MBytes   127 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   152 MBytes   127 Mbits/sec  972             sender
[  5]   0.00-10.00  sec   151 MBytes   127 Mbits/sec                  receiver

iperf Done.
# ping to 1500 net as 9000 fails
root@h2b:~# ping -f -M do -s 8972 h2a.ceph.trilhome.lan
PING h2a.ceph.trilhome.lan (192.168.11.4) 8972(9000) bytes of data.
EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE^C
--- h2a.ceph.trilhome.lan ping statistics ---
74 packets transmitted, 0 received, +74 errors, 100% packet loss, time 189ms

# ping to 9000 net as 9000 works
root@h2b:~# ping -f -M do -s 8972 h2a.s.ceph.trilhome.lan
PING h2a.s.ceph.trilhome.lan (172.16.12.4) 8972(9000) bytes of data.
.^C
--- h2a.s.ceph.trilhome.lan ping statistics ---
1242 packets transmitted, 1241 received, 0.0805153% packet loss, time 710ms
rtt min/avg/max/mdev = 1.425/2.795/3.357/0.263 ms, ipg/ewma 2.987/2.876 ms

# ping to 9000 net as 1500 works
root@h2b:~# ping -f -M do -s 1472 h2a.s.ceph.trilhome.lan
PING h2a.s.ceph.trilhome.lan (172.16.12.4) 1472(1500) bytes of data.
.^C
--- h2a.s.ceph.trilhome.lan ping statistics ---
1578 packets transmitted, 1577 received, 0.0633714% packet loss, time 379ms
rtt min/avg/max/mdev = 1.191/2.601/3.198/0.309 ms, ipg/ewma 2.774/2.646 ms

# ping to 1500 net as 1500 works
root@h2b:~# ping -f -M do -s 1472 h2a.ceph.trilhome.lan
PING h2a.ceph.trilhome.lan (192.168.11.4) 1472(1500) bytes of data.
.^C
--- h2a.ceph.trilhome.lan ping statistics ---
2072 packets transmitted, 2071 received, 0.0482625% packet loss, time 564ms
rtt min/avg/max/mdev = 0.763/2.495/3.359/0.390 ms, ipg/ewma 2.684/2.523 ms

# here is relevent network config
# ip l on r1
root@h2b:~# ip l
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
3: enp3s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc pfifo_fast master ovs-system state UP mode DEFAULT group default qlen 1000
    link/ether ab:cd:ef:12:34:56 brd ff:ff:ff:ff:ff:ff
4: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether ab:cd:ef:12:34:56 brd ff:ff:ff:ff:ff:ff
6: vmbr1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether ab:cd:ef:12:34:56 brd ff:ff:ff:ff:ff:ff
10: vlan23: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether ab:cd:ef:12:34:56 brd ff:ff:ff:ff:ff:ff
11: vlan24: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether ab:cd:ef:12:34:56 brd ff:ff:ff:ff:ff:ff
# ip l on h2a
root@h2a:~# ip l
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
3: enp3s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc pfifo_fast master ovs-system state UP mode DEFAULT group default qlen 1000
    link/ether ab:cd:ef:12:34:56 brd ff:ff:ff:ff:ff:ff
4: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether ab:cd:ef:12:34:56 brd ff:ff:ff:ff:ff:ff
6: vmbr1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether ab:cd:ef:12:34:56 brd ff:ff:ff:ff:ff:ff
10: vlan23: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether ab:cd:ef:12:34:56 brd ff:ff:ff:ff:ff:ff
11: vlan24: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether ab:cd:ef:12:34:56 brd ff:ff:ff:ff:ff:ff
### r1 to/from h2a

Code: Select all

# iperf3 (other host pushing to h2) on 9000 net is very slow
root@r1:~# iperf3 -c h2a.s.ceph.trilhome.lan
Connecting to host h2a.s.ceph.trilhome.lan, port 5201
[  5] local 172.16.12.2 port 45260 connected to 172.16.12.4 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  16.1 MBytes   135 Mbits/sec  181   26.2 KBytes
[  5]   1.00-2.00   sec  17.1 MBytes   143 Mbits/sec  162   17.5 KBytes
[  5]   2.00-3.00   sec  16.3 MBytes   137 Mbits/sec  164   17.5 KBytes
[  5]   3.00-4.00   sec  15.8 MBytes   132 Mbits/sec  178   35.0 KBytes
[  5]   4.00-5.00   sec  15.9 MBytes   134 Mbits/sec  155   17.5 KBytes
[  5]   5.00-6.00   sec  15.8 MBytes   132 Mbits/sec  164   17.5 KBytes
[  5]   6.00-7.00   sec  15.8 MBytes   132 Mbits/sec  161   17.5 KBytes
[  5]   7.00-8.00   sec  16.1 MBytes   135 Mbits/sec  166   26.2 KBytes
[  5]   8.00-9.00   sec  16.3 MBytes   137 Mbits/sec  174   17.5 KBytes
[  5]   9.00-10.00  sec  15.9 MBytes   134 Mbits/sec  157   17.5 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   161 MBytes   135 Mbits/sec  1662             sender
[  5]   0.00-10.00  sec   161 MBytes   135 Mbits/sec                  receiver

iperf Done.
# iperf3 in the return path (h2 is pushing to other host) on 9000 is perfect
root@r1:~# iperf3 -c h2a.s.ceph.trilhome.lan -R
Connecting to host h2a.s.ceph.trilhome.lan, port 5201
Reverse mode, remote host h2a.s.ceph.trilhome.lan is sending
[  5] local 172.16.12.2 port 45336 connected to 172.16.12.4 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec   294 MBytes  2.47 Gbits/sec
[  5]   1.00-2.00   sec   295 MBytes  2.47 Gbits/sec
[  5]   2.00-3.00   sec   295 MBytes  2.47 Gbits/sec
[  5]   3.00-4.00   sec   295 MBytes  2.47 Gbits/sec
[  5]   4.00-5.00   sec   294 MBytes  2.47 Gbits/sec
[  5]   5.00-6.00   sec   295 MBytes  2.47 Gbits/sec
[  5]   6.00-7.00   sec   295 MBytes  2.47 Gbits/sec
[  5]   7.00-8.00   sec   295 MBytes  2.47 Gbits/sec
[  5]   8.00-9.00   sec   295 MBytes  2.47 Gbits/sec
[  5]   9.00-10.00  sec   295 MBytes  2.47 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  2.88 GBytes  2.47 Gbits/sec    0             sender
[  5]   0.00-10.00  sec  2.88 GBytes  2.47 Gbits/sec                  receiver

iperf Done.
# ping to 1500 net as 9000 fails
root@r1:~# ping -f -M do -s 8972 h2a.ceph.trilhome.lan
PING h2a.ceph.trilhome.lan (192.168.11.4) 8972(9000) bytes of data.
EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE^C
--- h2a.ceph.trilhome.lan ping statistics ---
51 packets transmitted, 0 received, +51 errors, 100% packet loss, time 802ms

# ping to 9000 net as 9000 works
root@r1:~# ping -f -M do -s 8972 h2a.s.ceph.trilhome.lan
PING h2a.s.ceph.trilhome.lan (172.16.12.4) 8972(9000) bytes of data.
.^C
--- h2a.s.ceph.trilhome.lan ping statistics ---
4834 packets transmitted, 4833 received, 0.0206868% packet loss, time 723ms
rtt min/avg/max/mdev = 0.208/1.115/3.012/0.337 ms, ipg/ewma 1.183/1.301 ms

# ping to 9000 net as 1500 works
root@r1:~# ping -f -M do -s 1472 h2a.s.ceph.trilhome.lan
PING h2a.s.ceph.trilhome.lan (172.16.12.4) 1472(1500) bytes of data.
.^C
--- h2a.s.ceph.trilhome.lan ping statistics ---
7169 packets transmitted, 7168 received, 0.0139489% packet loss, time 925ms
rtt min/avg/max/mdev = 0.112/0.910/1.985/0.319 ms, ipg/ewma 0.965/0.925 ms

# ping to 1500 net as 1500 works
root@r1:~# ping -f -M do -s 1472 h2a.ceph.trilhome.lan
PING h2a.ceph.trilhome.lan (192.168.11.4) 1472(1500) bytes of data.
.^
--- h2a.ceph.trilhome.lan ping statistics ---
8743 packets transmitted, 8743 received, 0% packet loss, time 352ms
rtt min/avg/max/mdev = 0.099/0.899/1.999/0.346 ms, ipg/ewma 0.954/0.864 ms

# here is relevent network config
# ip l on r1
root@r1:~# ip l
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
3: eno2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq master ovs-system state UP mode DEFAULT group default qlen 1000
    link/ether ab:cd:ef:12:34:56 brd ff:ff:ff:ff:ff:ff
4: eno3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq master ovs-system state UP mode DEFAULT group default qlen 1000
    link/ether ab:cd:ef:12:34:56 brd ff:ff:ff:ff:ff:ff
5: eno4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq master ovs-system state UP mode DEFAULT group default qlen 1000
    link/ether ab:cd:ef:12:34:56 brd ff:ff:ff:ff:ff:ff
8: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether ab:cd:ef:12:34:56 brd ff:ff:ff:ff:ff:ff
15: vmbr1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether ab:cd:ef:12:34:56 brd ff:ff:ff:ff:ff:ff
16: vlan23: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether ab:cd:ef:12:34:56 brd ff:ff:ff:ff:ff:ff
17: vlan24: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether ab:cd:ef:12:34:56 brd ff:ff:ff:ff:ff:ff
20: bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether ab:cd:ef:12:34:56 brd ff:ff:ff:ff:ff:ff
# ip l on h2a
root@h2a:~# ip l
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
3: enp3s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc pfifo_fast master ovs-system state UP mode DEFAULT group default qlen 1000
    link/ether ab:cd:ef:12:34:56 brd ff:ff:ff:ff:ff:ff
4: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether ab:cd:ef:12:34:56 brd ff:ff:ff:ff:ff:ff
6: vmbr1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether ab:cd:ef:12:34:56 brd ff:ff:ff:ff:ff:ff
10: vlan23: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether ab:cd:ef:12:34:56 brd ff:ff:ff:ff:ff:ff
11: vlan24: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether ab:cd:ef:12:34:56 brd ff:ff:ff:ff:ff:ff
## with r8169 (after reboot)

Code: Select all

# the issue from nmon shows relatively normal now in and out traffic is 9k
# iperf3 (other host pushing to h2) on 9000 net is problematic still with retries
root@r2:~# iperf3 -c h2d.s.ceph.trilhome.lan
Connecting to host h2d.s.ceph.trilhome.lan, port 5201
[  5] local 172.16.12.3 port 60800 connected to 172.16.12.7 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   119 MBytes   997 Mbits/sec  566    192 KBytes
[  5]   1.00-2.00   sec   118 MBytes   988 Mbits/sec  564    253 KBytes
[  5]   2.00-3.00   sec   118 MBytes   989 Mbits/sec  604    184 KBytes
[  5]   3.00-4.00   sec   118 MBytes   989 Mbits/sec  559    210 KBytes
[  5]   4.00-5.00   sec   118 MBytes   987 Mbits/sec  557    184 KBytes
[  5]   5.00-6.00   sec   118 MBytes   991 Mbits/sec  571    201 KBytes
[  5]   6.00-7.00   sec   118 MBytes   990 Mbits/sec  625    201 KBytes
[  5]   7.00-8.00   sec   118 MBytes   987 Mbits/sec  561    271 KBytes
[  5]   8.00-9.00   sec   118 MBytes   990 Mbits/sec  583    166 KBytes
[  5]   9.00-10.00  sec   118 MBytes   994 Mbits/sec  561    262 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  1.15 GBytes   990 Mbits/sec  5751             sender
[  5]   0.00-10.21  sec  1.15 GBytes   969 Mbits/sec                  receiver

iperf Done.
# iperf3 in the return path (h2 is pushing to other host) on 9000 has slowed down
# the only thing I notice is that when the packet arrives to the vlan23 on the other host
# then it seems the packet is larger than 9k, around the 12-13k range based on nmon
# this issue is present on another identical host to the "other host"
root@r2:~# iperf3 -c h2d.s.ceph.trilhome.lan -R
Connecting to host h2d.s.ceph.trilhome.lan, port 5201
Reverse mode, remote host h2d.s.ceph.trilhome.lan is sending
[  5] local 172.16.12.3 port 60332 connected to 172.16.12.7 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  77.2 MBytes   647 Mbits/sec
[  5]   1.00-2.00   sec  77.2 MBytes   647 Mbits/sec
[  5]   2.00-3.00   sec  76.4 MBytes   641 Mbits/sec
[  5]   3.00-4.00   sec  76.2 MBytes   639 Mbits/sec
[  5]   4.00-5.00   sec  72.6 MBytes   609 Mbits/sec
[  5]   5.00-6.00   sec  76.4 MBytes   641 Mbits/sec
[  5]   6.00-7.00   sec  77.8 MBytes   653 Mbits/sec
[  5]   7.00-8.00   sec  77.4 MBytes   649 Mbits/sec
[  5]   8.00-9.00   sec  76.8 MBytes   644 Mbits/sec
[  5]   9.00-10.00  sec  77.9 MBytes   654 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   767 MBytes   643 Mbits/sec    1             sender
[  5]   0.00-10.00  sec   766 MBytes   643 Mbits/sec                  receiver

iperf Done.
# ping to 1500 net as 9000 fails
root@r2:~# ping -f -M do -s 8972 h2d.ceph.trilhome.lan
PING h2d.ceph.trilhome.lan (192.168.11.7) 8972(9000) bytes of data.
EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE^C
--- h2d.ceph.trilhome.lan ping statistics ---
119 packets transmitted, 0 received, +119 errors, 100% packet loss, time 924ms

# ping to 9000 net as 9000 works
root@r2:~# ping -f -M do -s 8972 h2d.s.ceph.trilhome.lan
PING h2d.s.ceph.trilhome.lan (172.16.12.7) 8972(9000) bytes of data.
^C
--- h2d.s.ceph.trilhome.lan ping statistics ---
4798 packets transmitted, 4798 received, 0% packet loss, time 258ms
rtt min/avg/max/mdev = 0.367/1.241/2.582/0.343 ms, ipg/ewma 1.303/1.235 ms

# ping to 9000 net as 1500 works
root@r2:~# ping -f -M do -s 1472 h2d.s.ceph.trilhome.lan
PING h2d.s.ceph.trilhome.lan (172.16.12.7) 1472(1500) bytes of data.
.^C
--- h2d.s.ceph.trilhome.lan ping statistics ---
7063 packets transmitted, 7062 received, 0.0141583% packet loss, time 286ms
rtt min/avg/max/mdev = 0.116/0.847/1.950/0.259 ms, ipg/ewma 0.889/0.896 ms

# ping to 1500 net as 1500 works
root@r2:~# ping -f -M do -s 1472 h2d.ceph.trilhome.lan
PING h2d.ceph.trilhome.lan (192.168.11.7) 1472(1500) bytes of data.
.^C
--- h2d.ceph.trilhome.lan ping statistics ---
7032 packets transmitted, 7031 received, 0.0142207% packet loss, time 994ms
rtt min/avg/max/mdev = 0.095/0.811/3.098/0.257 ms, ipg/ewma 0.851/0.826 ms

# here is relevent network config
# ip l
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
3: enp3s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc pfifo_fast master ovs-system state UP mode DEFAULT group default qlen 1000
    link/ether ab:cd:ef:12:34:56 brd ff:ff:ff:ff:ff:ff
4: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether ab:cd:ef:12:34:56 brd ff:ff:ff:ff:ff:ff
6: vmbr1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether ab:cd:ef:12:34:56 brd ff:ff:ff:ff:ff:ff
10: vlan23: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether ab:cd:ef:12:34:56 brd ff:ff:ff:ff:ff:ff
11: vlan24: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether ab:cd:ef:12:34:56 brd ff:ff:ff:ff:ff:ff
# one thing I also notice is that this issue might be isolated to this network
# here are some iperfs on the other networks

# this is a vlan on enp2s0 (1500)
root@r2:~# iperf3 -c h2d.ceph.trilhome.lan -R
Connecting to host h2d.ceph.trilhome.lan, port 5201
Reverse mode, remote host h2d.ceph.trilhome.lan is sending
[  5] local 192.168.11.3 port 52156 connected to 192.168.11.7 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec   103 MBytes   867 Mbits/sec
[  5]   1.00-2.00   sec   103 MBytes   868 Mbits/sec
[  5]   2.00-3.00   sec   105 MBytes   884 Mbits/sec
[  5]   3.00-4.00   sec   112 MBytes   938 Mbits/sec
[  5]   4.00-5.00   sec   110 MBytes   919 Mbits/sec
[  5]   5.00-6.00   sec   105 MBytes   879 Mbits/sec
[  5]   6.00-7.00   sec   103 MBytes   864 Mbits/sec
[  5]   7.00-8.00   sec   104 MBytes   871 Mbits/sec
[  5]   8.00-9.00   sec   103 MBytes   864 Mbits/sec
[  5]   9.00-10.00  sec   109 MBytes   918 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  1.03 GBytes   889 Mbits/sec    5             sender
[  5]   0.00-10.00  sec  1.03 GBytes   887 Mbits/sec                  receiver

iperf Done.

# this is a vlan on enp2s0 (1500)
root@r2:~# iperf3 -c h2d.trilhome.lan -R
Connecting to host h2d.trilhome.lan, port 5201
Reverse mode, remote host h2d.trilhome.lan is sending
[  5] local 192.168.10.227 port 57664 connected to 192.168.10.231 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec   104 MBytes   876 Mbits/sec
[  5]   1.00-2.00   sec   109 MBytes   911 Mbits/sec
[  5]   2.00-3.00   sec   105 MBytes   884 Mbits/sec
[  5]   3.00-4.00   sec   104 MBytes   870 Mbits/sec
[  5]   4.00-5.00   sec   111 MBytes   934 Mbits/sec
[  5]   5.00-6.00   sec   110 MBytes   920 Mbits/sec
[  5]   6.00-7.00   sec   110 MBytes   921 Mbits/sec
[  5]   7.00-8.00   sec   108 MBytes   910 Mbits/sec
[  5]   8.00-9.00   sec   111 MBytes   933 Mbits/sec
[  5]   9.00-10.00  sec   112 MBytes   937 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  1.06 GBytes   912 Mbits/sec    0             sender
[  5]   0.00-10.00  sec  1.06 GBytes   909 Mbits/sec                  receiver

iperf Done.

# this is a vlan on enp3s0 (1500)
root@r2:~# iperf3 -c h2d.c2.trilhome.lan -R
Connecting to host h2d.c2.trilhome.lan, port 5201
Reverse mode, remote host h2d.c2.trilhome.lan is sending
[  5] local 172.16.12.131 port 35092 connected to 172.16.12.135 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  94.6 MBytes   793 Mbits/sec
[  5]   1.00-2.00   sec  97.5 MBytes   818 Mbits/sec
[  5]   2.00-3.00   sec   101 MBytes   844 Mbits/sec
[  5]   3.00-4.00   sec   103 MBytes   865 Mbits/sec
[  5]   4.00-5.00   sec  96.5 MBytes   809 Mbits/sec
[  5]   5.00-6.00   sec  97.7 MBytes   819 Mbits/sec
[  5]   6.00-7.00   sec  95.0 MBytes   797 Mbits/sec
[  5]   7.00-8.00   sec  98.3 MBytes   825 Mbits/sec
[  5]   8.00-9.00   sec  96.8 MBytes   812 Mbits/sec
[  5]   9.00-10.00  sec  98.5 MBytes   826 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   981 MBytes   822 Mbits/sec    0             sender
[  5]   0.00-10.00  sec   978 MBytes   821 Mbits/sec                  receiver

iperf Done.
## with r8169 (before reboot)

Code: Select all

# the issue from nmon shows normal now in and out traffic is 9k
# iperf3 (other host pushing to h2) on 9000 net is problematic still with retries
root@r2:~# iperf3 -c h2d.s.ceph.trilhome.lan
Connecting to host h2d.s.ceph.trilhome.lan, port 5201
[  5] local 172.16.12.3 port 55648 connected to 172.16.12.7 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   119 MBytes   999 Mbits/sec  531    201 KBytes
[  5]   1.00-2.00   sec   117 MBytes   985 Mbits/sec  586    192 KBytes
[  5]   2.00-3.00   sec   118 MBytes   991 Mbits/sec  590    288 KBytes
[  5]   3.00-4.00   sec   118 MBytes   992 Mbits/sec  569    184 KBytes
[  5]   4.00-5.00   sec   118 MBytes   988 Mbits/sec  572    131 KBytes
[  5]   5.00-6.00   sec   118 MBytes   989 Mbits/sec  598    175 KBytes
[  5]   6.00-7.00   sec   118 MBytes   990 Mbits/sec  594    192 KBytes
[  5]   7.00-8.00   sec   118 MBytes   986 Mbits/sec  565    201 KBytes
[  5]   8.00-9.00   sec   118 MBytes   993 Mbits/sec  606    192 KBytes
[  5]   9.00-10.00  sec   118 MBytes   986 Mbits/sec  560    210 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  1.15 GBytes   990 Mbits/sec  5771             sender
[  5]   0.00-10.00  sec  1.15 GBytes   989 Mbits/sec                  receiver

iperf Done.
# iperf3 in the return path (h2 is pushing to other host) on 9000 net works
root@r2:~# iperf3 -c h2d.s.ceph.trilhome.lan -R
Connecting to host h2d.s.ceph.trilhome.lan, port 5201
Reverse mode, remote host h2d.s.ceph.trilhome.lan is sending
[  5] local 172.16.12.3 port 56144 connected to 172.16.12.7 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec   101 MBytes   844 Mbits/sec
[  5]   1.00-2.00   sec   101 MBytes   845 Mbits/sec
[  5]   2.00-3.00   sec   101 MBytes   846 Mbits/sec
[  5]   3.00-4.00   sec  99.3 MBytes   833 Mbits/sec
[  5]   4.00-5.00   sec   101 MBytes   844 Mbits/sec
[  5]   5.00-6.00   sec  99.6 MBytes   835 Mbits/sec
[  5]   6.00-7.00   sec  99.9 MBytes   838 Mbits/sec
[  5]   7.00-8.00   sec   100 MBytes   840 Mbits/sec
[  5]   8.00-9.00   sec  99.9 MBytes   838 Mbits/sec
[  5]   9.00-10.00  sec  99.8 MBytes   837 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  1003 MBytes   841 Mbits/sec    0             sender
[  5]   0.00-10.00  sec  1001 MBytes   840 Mbits/sec                  receiver

iperf Done.
# ping to 1500 net as 9000 fails
root@r2:~# ping -f -M do -s 8972 h2d.ceph.trilhome.lan
PING h2d.ceph.trilhome.lan (192.168.11.7) 8972(9000) bytes of data.
EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE^C
--- h2d.ceph.trilhome.lan ping statistics ---
90 packets transmitted, 0 received, +90 errors, 100% packet loss, time 468ms

# ping to 9000 net as 9000 works
root@r2:~# ping -f -M do -s 8972 h2d.s.ceph.trilhome.lan
PING h2d.s.ceph.trilhome.lan (172.16.12.7) 8972(9000) bytes of data.
.^C
--- h2d.s.ceph.trilhome.lan ping statistics ---
4154 packets transmitted, 4153 received, 0.0240732% packet loss, time 373ms
rtt min/avg/max/mdev = 0.357/1.231/1.978/0.338 ms, ipg/ewma 1.292/1.367 ms

# ping to 9000 net as 1500 works
root@r2:~# ping -f -M do -s 1472 h2d.s.ceph.trilhome.lan
PING h2d.s.ceph.trilhome.lan (172.16.12.7) 1472(1500) bytes of data.
.^C
--- h2d.s.ceph.trilhome.lan ping statistics ---
6950 packets transmitted, 6949 received, 0.0143885% packet loss, time 759ms
rtt min/avg/max/mdev = 0.100/0.786/4.617/0.293 ms, ipg/ewma 0.828/0.914 ms

# ping to 1500 net as 1500 works
root@r2:~# ping -f -M do -s 1472 h2d.ceph.trilhome.lan
PING h2d.ceph.trilhome.lan (192.168.11.7) 1472(1500) bytes of data.
.^C
--- h2d.ceph.trilhome.lan ping statistics ---
9190 packets transmitted, 9189 received, 0.0108814% packet loss, time 137ms
rtt min/avg/max/mdev = 0.109/0.841/2.751/0.245 ms, ipg/ewma 0.884/0.856 ms

# here is relevent network config
# ip l
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
4: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether ab:cd:ef:12:34:56 brd ff:ff:ff:ff:ff:ff
6: vmbr1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether ab:cd:ef:12:34:56 brd ff:ff:ff:ff:ff:ff
10: vlan23: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether ab:cd:ef:12:34:56 brd ff:ff:ff:ff:ff:ff
11: vlan24: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether ab:cd:ef:12:34:56 brd ff:ff:ff:ff:ff:ff
13: enp3s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc pfifo_fast master ovs-system state UP mode DEFAULT group default qlen 1000
    link/ether ab:cd:ef:12:34:56 brd ff:ff:ff:ff:ff:ff
## with r8125

When fucking around with the h2's I have noticed the performance on the r8125 drivers sucks.
**outbound (h2 to something) works because it can handle 4k**
**inbound (something to h2) doesn't work because it can't handle 9k**

Code: Select all

# the issue from nmon shows that when the traffic arrives on enp3s0
# from the remote host the packet size is 9k as expected
# however, when the traffic is being sent to the remote host it
# is sized as 4k and not 9k as expected
# iperf3 (other host pushing to h2) on 9000 net is problematic
root@r2:~# iperf3 -c h2d.s.ceph.trilhome.lan
Connecting to host h2d.s.ceph.trilhome.lan, port 5201
[  5] local 172.16.12.3 port 46780 connected to 172.16.12.7 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  20.1 MBytes   169 Mbits/sec  142   43.7 KBytes
[  5]   1.00-2.00   sec  22.9 MBytes   192 Mbits/sec  153   35.0 KBytes
[  5]   2.00-3.00   sec  19.4 MBytes   162 Mbits/sec  139   35.0 KBytes
[  5]   3.00-4.00   sec  18.6 MBytes   156 Mbits/sec  132   52.4 KBytes
[  5]   4.00-5.00   sec  18.5 MBytes   155 Mbits/sec  155   35.0 KBytes
[  5]   5.00-6.00   sec  19.4 MBytes   163 Mbits/sec  168   35.0 KBytes
[  5]   6.00-7.00   sec  52.6 MBytes   441 Mbits/sec  227    201 KBytes
[  5]   7.00-8.00   sec  24.4 MBytes   204 Mbits/sec  181   43.7 KBytes
[  5]   8.00-9.00   sec  19.5 MBytes   163 Mbits/sec  157   69.9 KBytes
[  5]   9.00-10.00  sec  18.9 MBytes   158 Mbits/sec  145   35.0 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   234 MBytes   197 Mbits/sec  1599             sender
[  5]   0.00-10.00  sec   233 MBytes   195 Mbits/sec                  receiver

iperf Done.
# iperf3 in the return path (h2 is pushing to other host) on 9000 net works
root@r2:~# iperf3 -c h2d.s.ceph.trilhome.lan -R
Connecting to host h2d.s.ceph.trilhome.lan, port 5201
Reverse mode, remote host h2d.s.ceph.trilhome.lan is sending
[  5] local 172.16.12.3 port 46314 connected to 172.16.12.7 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec   101 MBytes   844 Mbits/sec
[  5]   1.00-2.00   sec   100 MBytes   839 Mbits/sec
[  5]   2.00-3.00   sec  99.4 MBytes   834 Mbits/sec
[  5]   3.00-4.00   sec   100 MBytes   839 Mbits/sec
[  5]   4.00-5.00   sec   100 MBytes   840 Mbits/sec
[  5]   5.00-6.00   sec   100 MBytes   842 Mbits/sec
[  5]   6.00-7.00   sec   100 MBytes   842 Mbits/sec
[  5]   7.00-8.00   sec   100 MBytes   841 Mbits/sec
[  5]   8.00-9.00   sec  98.4 MBytes   826 Mbits/sec
[  5]   9.00-10.00  sec  99.6 MBytes   836 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  1001 MBytes   839 Mbits/sec    7             sender
[  5]   0.00-10.00  sec   999 MBytes   838 Mbits/sec                  receiver

iperf Done.
# ping to 1500 net as 9000 fails
root@r2:~# ping -f -M do -s 8972 h2d.ceph.trilhome.lan
PING h2d.ceph.trilhome.lan (192.168.11.7) 8972(9000) bytes of data.
EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE^C
--- h2d.ceph.trilhome.lan ping statistics ---
214 packets transmitted, 0 received, +214 errors, 100% packet loss, time 476ms

# ping to 9000 net as 9000 works
root@r2:~# ping -f -M do -s 8972 h2d.s.ceph.trilhome.lan
PING h2d.s.ceph.trilhome.lan (172.16.12.7) 8972(9000) bytes of data.
.^C
--- h2d.s.ceph.trilhome.lan ping statistics ---
3656 packets transmitted, 3655 received, 0.0273523% packet loss, time 515ms
rtt min/avg/max/mdev = 0.475/1.708/4.443/0.232 ms, ipg/ewma 1.780/1.776 ms

# ping to 9000 net as 1500 works
root@r2:~# ping -f -M do -s 1472 h2d.s.ceph.trilhome.lan
PING h2d.s.ceph.trilhome.lan (172.16.12.7) 1472(1500) bytes of data.
.^C
--- h2d.s.ceph.trilhome.lan ping statistics ---
3541 packets transmitted, 3540 received, 0.0282406% packet loss, time 650ms
rtt min/avg/max/mdev = 0.482/1.536/4.723/0.391 ms, ipg/ewma 1.594/1.101 ms

# ping to 1500 net as 1500 works
root@r2:~# ping -f -M do -s 1472 h2d.ceph.trilhome.lan
PING h2d.ceph.trilhome.lan (192.168.11.7) 1472(1500) bytes of data.
.^C
--- h2d.ceph.trilhome.lan ping statistics ---
4190 packets transmitted, 4189 received, 0.0238663% packet loss, time 709ms
rtt min/avg/max/mdev = 0.162/1.537/8.674/0.406 ms, ipg/ewma 1.600/1.750 ms

# here is relevent network config
# ip l
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
3: enp3s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc pfifo_fast master ovs-system state UP mode DEFAULT group default qlen 1000
    link/ether ab:cd:ef:12:34:56 brd ff:ff:ff:ff:ff:ff
4: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether ab:cd:ef:12:34:56 brd ff:ff:ff:ff:ff:ff
5: vmbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether ab:cd:ef:12:34:56 brd ff:ff:ff:ff:ff:ff
6: vmbr1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether ab:cd:ef:12:34:56 brd ff:ff:ff:ff:ff:ff
9: vlan22: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether ab:cd:ef:12:34:56 brd ff:ff:ff:ff:ff:ff
10: vlan23: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether ab:cd:ef:12:34:56 brd ff:ff:ff:ff:ff:ff
11: vlan24: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether ab:cd:ef:12:34:56 brd ff:ff:ff:ff:ff:ff

djsashaz
Posts: 2
Joined: Sat Nov 07, 2020 1:47 am
languages_spoken: english
Has thanked: 0
Been thanked: 0
Contact:

Re: WORKAROUND : Horrible network performance of R8125B when connected to a 1GbE switch

Post by djsashaz »

Im using Ubuntu 18 and I have similar issues with network performance. Where I would get occasional drops in my network stream that would cause a disruption in video delivery over IP.

lhb035
Posts: 5
Joined: Thu Aug 27, 2020 5:09 pm
languages_spoken: Chinese
ODROIDs: H2+
Has thanked: 0
Been thanked: 4 times
Contact:

Re: WORKAROUND : Horrible network performance of R8125B when connected to a 1GbE switch

Post by lhb035 »

Hello,

some one find the issus point : BIOS-- chipset----south cluster configuration----pci express configuration----pci express root port(1&2)---ASPM(change to disable)

After Disable ASPM the 1GbE Link iperf3 80MB/s


From(Chinese)
https://www.right.com.cn/FORUM/thread-4053662-1-1.html
These users thanked the author lhb035 for the post (total 3):
gofaster (Wed Dec 09, 2020 8:42 am) • CarminaBurana (Wed Dec 09, 2020 10:37 pm) • domih (Wed Feb 03, 2021 8:56 am)

henrikno
Posts: 3
Joined: Mon Dec 07, 2020 10:43 am
languages_spoken: english
Has thanked: 0
Been thanked: 5 times
Contact:

Re: WORKAROUND : Horrible network performance of R8125B when connected to a 1GbE switch

Post by henrikno »

I also had similar performance issues in one direction but not the other (testing with iperf). I used wireshark and saw a lot of retransmits. I also noticed using ethtool -S enp3s0 that rx_missed was high and climbing during iperf.

Code: Select all

     rx_missed: 24938
Googling that led me to some threads about other realtek chips (e.g. r8169) that require disabling ASPM.
https://bugzilla.redhat.com/show_bug.cgi?id=1679140
https://bugs.launchpad.net/ubuntu/+sour ... ug/1880076
https://www.spinics.net/lists/netdev/msg548397.html
Running

Code: Select all

echo "performance" > /sys/module/pcie_aspm/parameters/policy
Improved things a lot for me (added pcie_aspm=performance to kernel options to make it permanent)

For reference I'm running 5.4.60-1-pve

Code: Select all

# ethtool -i enp3s0
driver: r8125
version: 9.003.05-NAPI
These users thanked the author henrikno for the post (total 3):
odroid (Mon Dec 07, 2020 11:36 am) • CarminaBurana (Wed Dec 09, 2020 10:37 pm) • domih (Wed Feb 03, 2021 8:55 am)

doughnut
Posts: 18
Joined: Mon Aug 31, 2015 5:10 am
languages_spoken: english
ODROIDs: Odroid C1+ C4 XU4 H2 (dead) H2+
Location: So. Fla. USA
Has thanked: 1 time
Been thanked: 4 times
Contact:

Re: WORKAROUND : Horrible network performance of R8125B when connected to a 1GbE switch

Post by doughnut »

Great post! Great subject. Thanks for posting this. I was just looking for posts regarding poor 2.5Gbe performance and I came across this one. It prompted me to go out to Realtec to ensure I had the latest (12302020) Driver loaded in Windows. After updating driver, simple copy file went from sporadic, all over the place, 30-80Mbps to a much more respectable 130-160 Mbps on my 1/2.5/5/10Gbe network (Netgear MS510TX). I even hit peak over 2Gbs on performance monitor.

domih
Posts: 410
Joined: Mon Feb 11, 2019 4:48 pm
languages_spoken: English, French
ODROIDs: UX4, HC2, N2, N2+, H2, H2+, C4, HC4 - 1GbE, 2.5GbE, 10GbE, 40+GbE
Location: San Francisco Bay Area
Has thanked: 157 times
Been thanked: 158 times
Contact:

Re: WORKAROUND : Horrible network performance of R8125B when connected to a 1GbE switch

Post by domih »

I confirm that disabling ASPM fixes the speed on 1GbE subnet. See more details there: viewtopic.php?p=320336#p320336

caramb
Posts: 27
Joined: Sun Oct 18, 2020 8:32 pm
languages_spoken: english french
ODROIDs: Odroid-H2+
Has thanked: 1 time
Been thanked: 5 times
Contact:

Re: WORKAROUND : Horrible network performance of R8125B when connected to a 1GbE switch

Post by caramb »

Hi there,

Greetings and many thanks to all the contributors on this thread who provided very valuable feedback & information ; I apology for not replying to all.

Let me share what is the current status on my side :
1) The root issue still exists as kernel 5.4 branch still completely lacks of RTL8125B support.
2) I'm running on my patched driver for months now and did not experience any stability issue.
3) As this fixed my initial issue (bandwidth between 2 H2+ through a Gigabit switch), I figured I was still experiencing performance problem as soon as I was turning on jumbo frames (set mtu to 9000).
4) As shared by some people here (lhb035 & henrikno & domih), disabling energy management for PCIe was the key to fix this remaining issue.
5) I did not test the latest DKMS driver from Realtek but, according to posts in this thread, (Realtek DKMS driver + ASPM off) is an alternate solution.

To permanently disable ASPM, edit your /etc/default/grub file

Code: Select all

GRUB_CMDLINE_LINUX_DEFAULT="pcie_aspm.policy=performance"
Alternatively, you can disable power management for pcie ports in the H2+ bios settings as explained by 'Odroid' in the post bellow.
Last edited by caramb on Sun Mar 14, 2021 3:38 am, edited 8 times in total.

User avatar
odroid
Site Admin
Posts: 37528
Joined: Fri Feb 22, 2013 11:14 pm
languages_spoken: English, Korean
ODROIDs: ODROID
Has thanked: 1835 times
Been thanked: 1155 times
Contact:

Re: WORKAROUND : Horrible network performance of R8125B when connected to a 1GbE switch

Post by odroid »

Or, you can disable ASPM feature in BIOS settings.
BIOS / Chipset / South cluster configuration / PCI Express configuration / PCI Express root port(1&2) / ASPM / Disable
Note: do it for root port 1 and root port 2 which are the two root ports used for Ethernet controller.

See also.
viewtopic.php?p=322345#p322345
These users thanked the author odroid for the post:
sshd (Thu Mar 18, 2021 5:27 am)

_linux_
Posts: 4
Joined: Wed Mar 24, 2021 9:19 pm
languages_spoken: English, German
ODROIDs: H2+
Has thanked: 3 times
Been thanked: 0
Contact:

Re: WORKAROUND : Horrible network performance of R8125B when connected to a 1GbE switch

Post by _linux_ »

I didn't have those performance problems... I reached around 110MB/s by using the PPA of Hardkernel.

In the meantime I upgraded my Network to 2.5Gbe and reached 280MB/s with one interface.

caramb
Posts: 27
Joined: Sun Oct 18, 2020 8:32 pm
languages_spoken: english french
ODROIDs: Odroid-H2+
Has thanked: 1 time
Been thanked: 5 times
Contact:

Re: WORKAROUND : Horrible network performance of R8125B when connected to a 1GbE switch

Post by caramb »

Hi _linux_,

To experience the problem you need two RTL8125B talking to each other through a 1GbE switch and/or use jumbo frames (mtu 9000).
Regarding pure network performance, assuming you have proper drivers and settings, you can nearly achieve 2.5Gb/s when two devices are directly connected (back-to-back) (which is quite impressive).
Example bellow (2 H2+ with jumbo on) :

Code: Select all

iperf3 -c 192.168.100.202
Connecting to host 192.168.100.202, port 5201
[  5] local 192.168.100.203 port 59022 connected to 192.168.100.202 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   297 MBytes  2.49 Gbits/sec    0    839 KBytes
[  5]   1.00-2.00   sec   295 MBytes  2.47 Gbits/sec    0    926 KBytes
[  5]   2.00-3.00   sec   295 MBytes  2.47 Gbits/sec    0    970 KBytes
[  5]   3.00-4.00   sec   295 MBytes  2.47 Gbits/sec    0    970 KBytes
[  5]   4.00-5.00   sec   295 MBytes  2.47 Gbits/sec    0    970 KBytes
[  5]   5.00-6.00   sec   295 MBytes  2.47 Gbits/sec    0    970 KBytes
[  5]   6.00-7.00   sec   295 MBytes  2.47 Gbits/sec    0    970 KBytes
[  5]   7.00-8.00   sec   295 MBytes  2.47 Gbits/sec    0    970 KBytes
[  5]   8.00-9.00   sec   295 MBytes  2.47 Gbits/sec    0   2.08 MBytes
[  5]   9.00-10.00  sec   295 MBytes  2.47 Gbits/sec    0   2.08 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  2.88 GBytes  2.48 Gbits/sec    0             sender
[  5]   0.00-10.00  sec  2.88 GBytes  2.47 Gbits/sec                  receiver
Regards.

Hostis
Posts: 2
Joined: Mon Mar 29, 2021 2:56 am
languages_spoken: english
Has thanked: 0
Been thanked: 0
Contact:

Re: WORKAROUND : Horrible network performance of R8125B when connected to a 1GbE switch

Post by Hostis »

Hello,

first of all, thank you for your amazing work !

I have 2 questions:
-if i understand it correctly, your workaround also applies with a Net Card installed? have you considered buying this card ?
-also, have you considered installing a new Proxmox Kernel, the 5.11 which is in a test repository https://forum.proxmox.com/threads/kernel-5-11.86225/ (apparently, if i remember it correctly, 5.10 kernel brings out-of-box support for this Chipset).

Thank you for your answers.

caramb
Posts: 27
Joined: Sun Oct 18, 2020 8:32 pm
languages_spoken: english french
ODROIDs: Odroid-H2+
Has thanked: 1 time
Been thanked: 5 times
Contact:

Re: WORKAROUND : Horrible network performance of R8125B when connected to a 1GbE switch

Post by caramb »

Hi Hostis,

Thank you !
Glad if my work is useful to others !

To answer your first question :
This backported driver should work no problem with the Net Card because it is RTL8125B based. ; the four 2.5GbE ports will appear as individual PCIe devices (BTW this is the reason why you need to flash your H2 with the ESF (a.k.a "M.2 bifurcated") version of the BIOS).
The Net Card is a really cool and cheap piece of HW ; allowing to turn your H2 into a very very capable network device.

However, installing the NetCard has 1 major obvious drawback : it prevents you from using the M.2 port for NVMe storage... (adding a M.2 port on the Net Card would made no sense because of the massive oversubscription this would introduce on the x4 PCIe port... along with the extra cost, this is probably the reason why Odroid did not even consider it)

In my particular use case (Hyperconverged Proxmox/Ceph cluster), I need 3 storage devices on the H2 : 1 boot device (NVME disk) and 2 Ceph OSDs (data storage) (2 SATA ports).
I initially (before the Net Card came out) choosed NVME storage not because of the perf requirements but just because such disks were already widely available and rather cheap (considering entry level low power ones).

Upgrading with the Net Card would have required 2 things :
1) Buy an extra >=32GB eMMC module as new boot device
2) Proxmox reinstall (not that straight forward and somewhat risky in this scenario ; 2 nodes out of the 3 are H2+).

For new setups, this is something one should really consider.
But in my case, the upgrade was way too complex.

So I didn't buy simply because I went another way ; let me explain...

I was looking for a cheap way to upgrade my network to 2.5GbE (mainly because from a Ceph point of view, network bandwidth does really matter !)
However, the only reasonnably priced (for homelab) (<150€) 2.5Gb/E copper (not sfp) fanless switches were/are unmanaged ones ; just like the TPLink models.
As I'm using various vlans to segregate cluster traffic, managed switch was mandatory ; thus, I gave up the idea of a switch upgrade.

No Net Card, no 2.5GbE switch ; what else ?

In fact there was another route : Using the 2 RTL8125B ports on each of my 3 nodes, I built a "full-mesh", "loop-free", "stp-free" high performance triangle topology (high performance because each node has 2.5Gb/s full duplex bandwidth to each of the two others).
This just needs proper configuration of openvswitch to avoid usage of spanning-tree (STP is hell ; STP is required when there are loops ; so just build loop-free...)
I had a plan to write a specific post to share some details on how to setup up such a topology ; I'm just missing the time...



Regarding the 2nd question, no I didn't consider switching to 5.11 yet because it is in a too early stage.
My cluster is not for testing-only and runs some "personnal production" that I need to keep up running.
You're right, vanilla 5.9+ Linux kernels have a r8169 module with builtin support for RTL8125B (I've just checked ; kernel 5.11.10 has this builtin support)
However, I'm not 100% confident the Proxmox does as it is derived from the Debian/Ubuntu kernel which has some differences with the mainline kernel.

I'm gonna git clone the source of pve-kernel 5.11, will take a look and will let you know.


Regards.

caramb
Posts: 27
Joined: Sun Oct 18, 2020 8:32 pm
languages_spoken: english french
ODROIDs: Odroid-H2+
Has thanked: 1 time
Been thanked: 5 times
Contact:

Re: WORKAROUND : Horrible network performance of R8125B when connected to a 1GbE switch

Post by caramb »

Hi again Hostis,

I've just cloned the master branch of pve-kernel which is version 5.11.7.
It is based on Ubuntu 21.04 Hirsute kernel.

I do confirm this kernel HAS support for RTL8125B.

Code: Select all

root@pve:/usr/src/pve-kernel/build/ubuntu-hirsute/drivers/net/ethernet/realtek# grep -i 8125B *
r8169_main.c:#define FIRMWARE_8125B_2   "rtl_nic/rtl8125b-2.fw"
r8169_main.c:   [RTL_GIGA_MAC_VER_63] = {"RTL8125B",            FIRMWARE_8125B_2},
r8169_main.c:MODULE_FIRMWARE(FIRMWARE_8125B_2);
r8169_main.c:           /* 8125B family. */
r8169_main.c:static void rtl8125b_config_eee_mac(struct rtl8169_private *tp)
r8169_main.c:           rtl8125b_config_eee_mac(tp);
r8169_main.c:static void rtl_hw_start_8125b(struct rtl8169_private *tp)
r8169_main.c:   static const struct ephy_info e_info_8125b[] = {
r8169_main.c:   rtl_ephy_init(tp, e_info_8125b);
r8169_main.c:           [RTL_GIGA_MAC_VER_63] = rtl_hw_start_8125b,
r8169_phy_config.c:static void rtl8125b_config_eee_phy(struct phy_device *phydev)
r8169_phy_config.c:static void rtl8125b_hw_phy_config(struct rtl8169_private *tp,
r8169_phy_config.c:     rtl8125b_config_eee_phy(phydev);
r8169_phy_config.c:             [RTL_GIGA_MAC_VER_63] = rtl8125b_hw_phy_config,
As you can see here (https://discourse.ubuntu.com/t/hirsute- ... dule/18539), the hirsute is still in testing stage and there won't be any final release before end of April...
So we should not expect kernel 5.11 to switch from the pvetest repository to the stable one before Ubuntu hirsute goes final.

But I will probably give a try in the meanwhile... as I have another Proxmox cluster I can play with and break...

Regards.
These users thanked the author caramb for the post:
odroid (Tue Mar 30, 2021 9:17 am)

User avatar
mad_ady
Posts: 9469
Joined: Wed Jul 15, 2015 5:00 pm
languages_spoken: english
ODROIDs: XU4, C1+, C2, C4, N1, N2, H2, Go, Go Advance
Location: Bucharest, Romania
Has thanked: 604 times
Been thanked: 678 times
Contact:

Re: WORKAROUND : Horrible network performance of R8125B when connected to a 1GbE switch

Post by mad_ady »

@caramb you can use an unmanaged switch with vlans, but the hosts that connect to the switch need to use trunk interfaces. There will be a bit of "bleed over" of traffic where you will receive traffic for other vlans (tagged) on ports where you don't want it, but this will be broadcast/multicast and flooding (when the switch doesn't know on which port the destination mac is and floods traffic for that mac on all other ports until it learns it). So in typicall cases you shouldn't see any traffic degradation.

Sure, this means that you trust your hosts in the lan, otherwise a rogue host could easily change vlans...

caramb
Posts: 27
Joined: Sun Oct 18, 2020 8:32 pm
languages_spoken: english french
ODROIDs: Odroid-H2+
Has thanked: 1 time
Been thanked: 5 times
Contact:

Re: WORKAROUND : Horrible network performance of R8125B when connected to a 1GbE switch

Post by caramb »

Thank you mad_ady.
You're absolutely right, unmanaged switches are transparent to vlans and jumbo frames.
In fact, the other reason I did not want to go this way is the fact that I'm about to upgrade to a SDN managed switch (more precisely, upgrade from my current TPLink TL-SG108E to a TL-SG2008P ; I already set up an Omada SDN that manages my Wifi APs).

Regards.

User avatar
mad_ady
Posts: 9469
Joined: Wed Jul 15, 2015 5:00 pm
languages_spoken: english
ODROIDs: XU4, C1+, C2, C4, N1, N2, H2, Go, Go Advance
Location: Bucharest, Romania
Has thanked: 604 times
Been thanked: 678 times
Contact:

Re: WORKAROUND : Horrible network performance of R8125B when connected to a 1GbE switch

Post by mad_ady »

I wonder what their "SDN" implementation does for you... Since it's not SDN in the true sense (having a centralised controller with real-time view of network traffic and topology that does routing/switching table rewriting on the fly to optimize the data flow). It only looks like a centralized dashboard, so apart from poe, I doubt you'll get more functionality from the other switch...

caramb
Posts: 27
Joined: Sun Oct 18, 2020 8:32 pm
languages_spoken: english french
ODROIDs: Odroid-H2+
Has thanked: 1 time
Been thanked: 5 times
Contact:

Re: WORKAROUND : Horrible network performance of R8125B when connected to a 1GbE switch

Post by caramb »

@mad_ady
Once again you're right, this is consumer/SMB SDN solution ; not enterprise/isp/carrier grade one.

User avatar
mad_ady
Posts: 9469
Joined: Wed Jul 15, 2015 5:00 pm
languages_spoken: english
ODROIDs: XU4, C1+, C2, C4, N1, N2, H2, Go, Go Advance
Location: Bucharest, Romania
Has thanked: 604 times
Been thanked: 678 times
Contact:

Re: WORKAROUND : Horrible network performance of R8125B when connected to a 1GbE switch

Post by mad_ady »

Don't you just love it when manufacturers throw in big words like SDN, Big Data, Machine Learning to try to make their products more appealing, but behind the scenes is the same old cron script that reboots at 4 am?
That's what I like about Hardkernel - no such nonsense...

Hostis
Posts: 2
Joined: Mon Mar 29, 2021 2:56 am
languages_spoken: english
Has thanked: 0
Been thanked: 0
Contact:

Re: WORKAROUND : Horrible network performance of R8125B when connected to a 1GbE switch

Post by Hostis »

Hello again,

thank you for your deep explanations @caramb, i don't know what to say :D, i appreciate the effort you put into these answers.

That is what i though, that you already have the M2 port occupied , and it's gonna be just too much work to change it. I was just thinking that maybe there is an another reason (like performance/stability problems), since nobody even mentioned it in this thread. For me personally, it's gonna be way better to use this Net Card , than like USB -> Ethernet adapters, which are well known for having stability problems, and those which are considered as "good", costs like 70$ each.

Thank you for your info about Ubuntu as well, i didn't know that, they didn't say anything in this thread also (on the Proxmox forum)

Thank you so much again, and i hope you are gonna find some free time to write this post about setting your topology, can't wait!

caramb
Posts: 27
Joined: Sun Oct 18, 2020 8:32 pm
languages_spoken: english french
ODROIDs: Odroid-H2+
Has thanked: 1 time
Been thanked: 5 times
Contact:

Re: WORKAROUND : Horrible network performance of R8125B when connected to a 1GbE switch

Post by caramb »

Hi there,

Proxmox 6.4 was released today (https://forum.proxmox.com/threads/proxm ... ble.88336/)
This version has official support for linux kernel 5.11 branch through simple opt-in process (https://pve.proxmox.com/wiki/Roadmap#Proxmox_VE_6.4).

If you opt-in (command bellow), Promox will boot on kernel 5.11.7-1-pve which has built-in support for Realtek 8125B !

Code: Select all

apt install pve-kernel-5.11

Code: Select all

root@pve3:~# uname -a
Linux pve3 5.11.7-1-pve #1 SMP PVE 5.11.7-1~bpo10 (Thu, 18 Mar 2021 16:17:24 +0100) x86_64 GNU/Linux
root@pve3:~# dmesg | grep r8169
[    2.423420] libphy: r8169: probed
[    2.423682] r8169 0000:02:00.0 eth0: RTL8125B, 00:1e:06:45:34:be, XID 641, IRQ 132
[    2.423692] r8169 0000:02:00.0 eth0: jumbo features [frames: 9194 bytes, tx checksumming: ko]
[    2.443414] libphy: r8169: probed
[    2.443700] r8169 0000:03:00.0 eth1: RTL8125B, 00:1e:06:45:34:bf, XID 641, IRQ 133
[    2.443711] r8169 0000:03:00.0 eth1: jumbo features [frames: 9194 bytes, tx checksumming: ko]
[    2.477700] r8169 0000:02:00.0 enp2s0: renamed from eth0
[    2.496010] r8169 0000:03:00.0 enp3s0: renamed from eth1
[    7.703536] RTL8226B_RTL8221B 2.5Gbps PHY r8169-200:00: attached PHY driver (mii_bus:phy_addr=r8169-200:00, irq=IGNORE)
[    7.899645] r8169 0000:02:00.0 enp2s0: Link is Down
[    7.967528] RTL8226B_RTL8221B 2.5Gbps PHY r8169-300:00: attached PHY driver (mii_bus:phy_addr=r8169-300:00, irq=IGNORE)
[    8.179656] r8169 0000:03:00.0 enp3s0: Link is Down
[   10.785741] r8169 0000:02:00.0 enp2s0: Link is Up - 2.5Gbps/Full - flow control rx/tx
[   11.293737] r8169 0000:03:00.0 enp3s0: Link is Up - 2.5Gbps/Full - flow control rx/tx
If you don't opt-in, Promox will still boot on kernel 5.4.106-1-pve which requires either the DKMS drivers or the one I built for.

Of course, cannot tell about kernel 5.11 stability yet ; opted-in on one out of the 3 nodes of my cluster ; wait & see.
Will perform some basic throughput tests and will share the results.

caramb
Posts: 27
Joined: Sun Oct 18, 2020 8:32 pm
languages_spoken: english french
ODROIDs: Odroid-H2+
Has thanked: 1 time
Been thanked: 5 times
Contact:

Re: WORKAROUND : Horrible network performance of R8125B when connected to a 1GbE switch

Post by caramb »

Some performance figures :

Iperf between kernel 5.4 with custom r8169 driver (node pve2) and kernel 5.4 with custom r8169 driver (node pve)

Code: Select all

root@pve2:~# iperf3 -c 192.168.100.201
Connecting to host 192.168.100.201, port 5201
[  5] local 192.168.100.202 port 53560 connected to 192.168.100.201 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   297 MBytes  2.49 Gbits/sec    0    699 KBytes
[  5]   1.00-2.00   sec   295 MBytes  2.48 Gbits/sec    0    769 KBytes
[  5]   2.00-3.00   sec   295 MBytes  2.47 Gbits/sec    0    769 KBytes
[  5]   3.00-4.00   sec   294 MBytes  2.47 Gbits/sec    0    900 KBytes
[  5]   4.00-5.00   sec   296 MBytes  2.48 Gbits/sec    0    900 KBytes
[  5]   5.00-6.00   sec   293 MBytes  2.45 Gbits/sec    0    952 KBytes
[  5]   6.00-7.00   sec   296 MBytes  2.48 Gbits/sec    0    952 KBytes
[  5]   7.00-8.00   sec   295 MBytes  2.47 Gbits/sec    0    952 KBytes
[  5]   8.00-9.00   sec   295 MBytes  2.47 Gbits/sec    0    952 KBytes
[  5]   9.00-10.00  sec   295 MBytes  2.47 Gbits/sec    0   1.40 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  2.88 GBytes  2.47 Gbits/sec    0             sender
[  5]   0.00-10.01  sec  2.88 GBytes  2.47 Gbits/sec                  receiver
Iperf between kernel 5.11 with builtin r8169 driver (node pve3) and kernel 5.4 with custom r8169 driver (node pve)

Code: Select all

root@pve3:~# iperf3 -c 192.168.100.201
Connecting to host 192.168.100.201, port 5201
[  5] local 192.168.100.203 port 48058 connected to 192.168.100.201 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   297 MBytes  2.49 Gbits/sec    0    708 KBytes
[  5]   1.00-2.00   sec   294 MBytes  2.46 Gbits/sec    0    743 KBytes
[  5]   2.00-3.00   sec   295 MBytes  2.47 Gbits/sec    0    821 KBytes
[  5]   3.00-4.00   sec   294 MBytes  2.47 Gbits/sec    0    821 KBytes
[  5]   4.00-5.00   sec   295 MBytes  2.48 Gbits/sec    0    821 KBytes
[  5]   5.00-6.00   sec   295 MBytes  2.47 Gbits/sec    0    874 KBytes
[  5]   6.00-7.00   sec   294 MBytes  2.46 Gbits/sec    0    874 KBytes
[  5]   7.00-8.00   sec   295 MBytes  2.47 Gbits/sec    0    874 KBytes
[  5]   8.00-9.00   sec   295 MBytes  2.47 Gbits/sec    0    874 KBytes
[  5]   9.00-10.00  sec   294 MBytes  2.47 Gbits/sec    0    874 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  2.88 GBytes  2.47 Gbits/sec    0             sender
[  5]   0.00-10.01  sec  2.88 GBytes  2.47 Gbits/sec                  receiver
Performance seems to be identical (and is rather good).

caramb
Posts: 27
Joined: Sun Oct 18, 2020 8:32 pm
languages_spoken: english french
ODROIDs: Odroid-H2+
Has thanked: 1 time
Been thanked: 5 times
Contact:

Re: WORKAROUND : Horrible network performance of R8125B when connected to a 1GbE switch

Post by caramb »

@Hostis,

Sorry for not replying sooner ; I missed your last post...
You're absolutely right, go for the NetCard if you can ; having the Ethernet chip attached to the PCIe is far better than attached to USB.
I will write the post about the network topology in the next few days ; stay tuned !

caramb
Posts: 27
Joined: Sun Oct 18, 2020 8:32 pm
languages_spoken: english french
ODROIDs: Odroid-H2+
Has thanked: 1 time
Been thanked: 5 times
Contact:

Re: WORKAROUND : Horrible network performance of R8125B when connected to a 1GbE switch

Post by caramb »

As I was browsing the kernel netdev mailing-list to find a specific patch related to small frames and realtek firmware bug, I found this one which may be valuable to you :
https://lore.kernel.org/netdev/6e453d49 ... gmail.com/

By end of January 2021, an hardware bug (oughhh) was found (and confirmed by Realtek) in chip R8125 causing network disruption under heavy UDP load.
As stated in the email, Realtek provided with a software workaround (r8169 driver patch) which was merged in the kernel source tree.

The custom driver I'm providing does not include the fix.
Don't know if the DKMS driver does.
Don't know yet about Proxmox kernel 5.11
Don't know yet if R8125B is also affected.

Will dig into this.

domih
Posts: 410
Joined: Mon Feb 11, 2019 4:48 pm
languages_spoken: English, French
ODROIDs: UX4, HC2, N2, N2+, H2, H2+, C4, HC4 - 1GbE, 2.5GbE, 10GbE, 40+GbE
Location: San Francisco Bay Area
Has thanked: 157 times
Been thanked: 158 times
Contact:

Re: WORKAROUND : Horrible network performance of R8125B when connected to a 1GbE switch

Post by domih »

caramb wrote:
Thu Apr 29, 2021 4:16 am
As I was browsing the kernel netdev mailing-list to find a specific patch related to small frames and realtek firmware bug, I found this one which may be valuable to you :
https://lore.kernel.org/netdev/6e453d49 ... gmail.com/

By end of January 2021, an hardware bug (oughhh) was found (and confirmed by Realtek) in chip R8125 causing network disruption under heavy UDP load.
As stated in the email, Realtek provided with a software workaround (r8169 driver patch) which was merged in the kernel source tree.

The custom driver I'm providing does not include the fix.
Don't know if the DKMS driver does.
Don't know yet about Proxmox kernel 5.11
Don't know yet if R8125B is also affected.

Will dig into this.
Did you see this: viewtopic.php?p=327565#p327565 ?

caramb
Posts: 27
Joined: Sun Oct 18, 2020 8:32 pm
languages_spoken: english french
ODROIDs: Odroid-H2+
Has thanked: 1 time
Been thanked: 5 times
Contact:

Re: WORKAROUND : Horrible network performance of R8125B when connected to a 1GbE switch

Post by caramb »

@domih,
Thank you for pointing this. It looks they're getting rather bad results with 5.9 & 5.11 stock modules.
This is quite surprinsing as I mainly backported a patch meant for kernel 5.9 to the 5.4 branch.
Will share my figures in the other thread.

I just gave some additionnal iperf3 tests and still find the Proxmox kernel 5.11 gives consistent & solid results (~2.47Gb/s both ways)
Last edited by caramb on Fri Apr 30, 2021 4:05 am, edited 2 times in total.

caramb
Posts: 27
Joined: Sun Oct 18, 2020 8:32 pm
languages_spoken: english french
ODROIDs: Odroid-H2+
Has thanked: 1 time
Been thanked: 5 times
Contact:

Re: WORKAROUND : Horrible network performance of R8125B when connected to a 1GbE switch

Post by caramb »

@domih,

were you suggesting that the software workaround to the HW caveat may be the cause of the performance drop reported in the other thread ?

caramb
Posts: 27
Joined: Sun Oct 18, 2020 8:32 pm
languages_spoken: english french
ODROIDs: Odroid-H2+
Has thanked: 1 time
Been thanked: 5 times
Contact:

Re: WORKAROUND : Horrible network performance of R8125B when connected to a 1GbE switch

Post by caramb »

caramb wrote:
Thu Apr 29, 2021 4:16 am
As I was browsing the kernel netdev mailing-list to find a specific patch related to small frames and realtek firmware bug, I found this one which may be valuable to you :
https://lore.kernel.org/netdev/6e453d49 ... gmail.com/

By end of January 2021, an hardware bug (oughhh) was found (and confirmed by Realtek) in chip R8125 causing network disruption under heavy UDP load.
As stated in the email, Realtek provided with a software workaround (r8169 driver patch) which was merged in the kernel source tree.

The custom driver I'm providing does not include the fix.
Don't know if the DKMS driver does.
Don't know yet about Proxmox kernel 5.11
Don't know yet if R8125B is also affected.

Will dig into this.
Replying to myself.

According to (https://lore.kernel.org/netdev/20210202 ... ernel.org/) fix for this HW bug was merged in kernel 5.11-rc7 (thus pve kernel 5.11 should have it).
Current patch is there : https://git.kernel.org/pub/scm/linux/ke ... 520b4de3ed
Patch seems to apply to both 8125 & 8125B.

domih
Posts: 410
Joined: Mon Feb 11, 2019 4:48 pm
languages_spoken: English, French
ODROIDs: UX4, HC2, N2, N2+, H2, H2+, C4, HC4 - 1GbE, 2.5GbE, 10GbE, 40+GbE
Location: San Francisco Bay Area
Has thanked: 157 times
Been thanked: 158 times
Contact:

Re: WORKAROUND : Horrible network performance of R8125B when connected to a 1GbE switch

Post by domih »

caramb wrote:
Fri Apr 30, 2021 2:10 am
@domih,

were you suggesting that the software workaround to the HW caveat may be the cause of the performance drop reported in the other thread ?
No, what I'm saying its that:
1) The tests Joshua performed show that the driver from the Kernel consumes a lot of CPU resources to the point it becomes crazy when you run 5 parallel iperf3 sessions with a H2 Net card. I trust Joshua because I repeated these tests and found the same results.
2) Disabling ASPM (see viewtopic.php?p=320336#p320336) fixes the speed when connected to a 1GbE subnet. I'm not the one who originally found out this, I credit the users in the post.
3) The only "stable" driver version and not crazy CPU resources consuming is from Realtek web site version 9.003.05 (the one HK put in it .deb DKMS package). All the consequent versions from Realtek or from the Kernel has either issues (high variance in speed or spurious disconnections) either crazy CPU resources consumption or both.

Question: could you run a 10 or more minutes (parallels if possible) iperf3 test, in both directions, and check CPU consumption using HTOP?

Question: with iperf3, are you using TCP or UDP?

caramb
Posts: 27
Joined: Sun Oct 18, 2020 8:32 pm
languages_spoken: english french
ODROIDs: Odroid-H2+
Has thanked: 1 time
Been thanked: 5 times
Contact:

Re: WORKAROUND : Horrible network performance of R8125B when connected to a 1GbE switch

Post by caramb »

domih wrote:
Fri Apr 30, 2021 5:52 am
caramb wrote:
Fri Apr 30, 2021 2:10 am
@domih,

were you suggesting that the software workaround to the HW caveat may be the cause of the performance drop reported in the other thread ?
No, what I'm saying its that:
1) The tests Joshua performed show that the driver from the Kernel consumes a lot of CPU resources to the point it becomes crazy when you run 5 parallel iperf3 sessions with a H2 Net card. I trust Joshua because I repeated these tests and found the same results.
2) Disabling ASPM (see viewtopic.php?p=320336#p320336) fixes the speed when connected to a 1GbE subnet. I'm not the one who originally found out this, I credit the users in the post.
3) The only "stable" driver version and not crazy CPU resources consuming is from Realtek web site version 9.003.05 (the one HK put in it .deb DKMS package). All the consequent versions from Realtek or from the Kernel has either issues (high variance in speed or spurious disconnections) either crazy CPU resources consumption or both.

Question: could you run a 10 or more minutes (parallels if possible) iperf3 test, in both directions, and check CPU consumption using HTOP?

Question: with iperf3, are you using TCP or UDP?
Domih,

Thank you for clarifying ; I guess I'm not stressing my homelab enough to face theses situations.

I mainly use iperf3 with default options ; thus TCP transport.
Unless the "reverse mode" is set, iperf3 runs in "normal mode" : client sends ; server receives.

Here are the results of the first shots which already show something interesting.

Test 1 : Node pve (HP G8 5.4.106-1-pve + custom r8169) as server ; Node pve2 (H2+ 5.4.106-1-pve + custom r8169) as client ; 5min TCP test ; single one-way stream over back-to-back 2.5GbE link.

Code: Select all

root@pve2:~# uname -a
Linux pve2 5.4.106-1-pve #1 SMP PVE 5.4.106-1 (Fri, 19 Mar 2021 11:08:47 +0100) x86_64 GNU/Linux
root@pve2:~# iperf3 -t 300 -c 192.168.100.201
Connecting to host 192.168.100.201, port 5201
[  5] local 192.168.100.202 port 41512 connected to 192.168.100.201 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   297 MBytes  2.49 Gbits/sec    0    690 KBytes
[  5]   1.00-2.00   sec   295 MBytes  2.47 Gbits/sec    0    690 KBytes
[  5]   2.00-3.00   sec   296 MBytes  2.48 Gbits/sec    0    865 KBytes
[  5]   3.00-4.00   sec   292 MBytes  2.45 Gbits/sec    0    961 KBytes
[  5]   4.00-5.00   sec   295 MBytes  2.47 Gbits/sec    0    961 KBytes
[  5]   5.00-6.00   sec   292 MBytes  2.45 Gbits/sec    0    961 KBytes
[  5]   6.00-7.00   sec   292 MBytes  2.45 Gbits/sec    0    961 KBytes
[  5]   7.00-8.00   sec   295 MBytes  2.47 Gbits/sec    0    961 KBytes
...
[  5] 294.00-295.00 sec   295 MBytes  2.47 Gbits/sec    0   2.30 MBytes
[  5] 295.00-296.00 sec   295 MBytes  2.47 Gbits/sec    0   2.30 MBytes
[  5] 296.00-297.00 sec   295 MBytes  2.47 Gbits/sec    0   2.30 MBytes
[  5] 297.00-298.00 sec   295 MBytes  2.47 Gbits/sec    0   2.30 MBytes
[  5] 298.00-299.00 sec   295 MBytes  2.47 Gbits/sec    0   2.30 MBytes
[  5] 299.00-300.00 sec   294 MBytes  2.46 Gbits/sec    0   2.30 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-300.00 sec  86.3 GBytes  2.47 Gbits/sec    1             sender
[  5]   0.00-300.01 sec  86.3 GBytes  2.47 Gbits/sec                  receiver
Nothing special to say about the bandwidth result itself (got 1 retry but the node is not idle => proxmox cluster + ceph).

During the test, I used top to monitor the CPU usage.
Iperf3 process constantly consumes 10~14% of one core ; which looks reasonable
However, one of the cores also constantly spends 50% of its time processing soft irqs (see cpu3, %si column bellow).

Code: Select all

root@pve2:~# top
top - 20:26:40 up 2 days, 56 min,  2 users,  load average: 0.23, 0.31, 0.32
Tasks: 197 total,   1 running, 196 sleeping,   0 stopped,   0 zombie
%Cpu0  :  2.3 us,  3.9 sy,  0.0 ni, 90.0 id,  0.3 wa,  0.0 hi,  3.6 si,  0.0 st
%Cpu1  :  1.0 us,  3.0 sy,  0.0 ni, 93.0 id,  0.0 wa,  0.0 hi,  3.0 si,  0.0 st
%Cpu2  :  2.7 us,  4.7 sy,  0.0 ni, 91.4 id,  0.7 wa,  0.0 hi,  0.7 si,  0.0 st
%Cpu3  :  2.0 us,  4.3 sy,  0.0 ni, 43.9 id,  0.0 wa,  0.0 hi, 49.8 si,  0.0 st
MiB Mem :  32003.6 total,  24593.8 free,   6374.9 used,   1034.9 buff/cache
MiB Swap:   8192.0 total,   8192.0 free,      0.0 used.  25094.9 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
 845655 root      20   0    8008   4292   3824 S  12.6   0.0   0:15.27 iperf3
   1279 root      rt   0  575404 180012  51464 S   3.7   0.5 145:00.34 corosync
   2354 root      20   0 2554692   1.0g  25356 S   3.3   3.2 134:33.82 kvm
   1295 ceph      20   0 1762824 824036  39604 S   2.7   2.5  64:55.19 ceph-osd
   2639 root      20   0 2505504   1.1g  24808 S   2.3   3.4 101:45.81 kvm
   1276 ceph      20   0  568932 157464  30248 S   2.0   0.5  52:02.90 ceph-mon
   1289 ceph      20   0 1754184 798656  39444 S   2.0   2.4  52:05.05 ceph-osd
     30 root      20   0       0      0      0 S   1.7   0.0   2:19.98 ksoftirq+
It's no surprise high network bandwidth causes high amount of irqs ; but I don't have baseline to compare to... so cannot tell if this should be considered normal or not.

Ran this test 3 times ; got the same results.



Test 2 : Node pve (HP G8 5.4.106-1-pve + custom r8169) as server ; Node pve3 (H2+ 5.11.7-1-pve with stock r8169) as client ; 5min TCP test ; single one-way stream over back-to-back 2.5GbE link.
(same test as previous except client is kernel 5.11 instead of 5.4)

Code: Select all

root@pve3:~# uname -a
Linux pve3 5.11.7-1-pve #1 SMP PVE 5.11.7-1~bpo10 (Thu, 18 Mar 2021 16:17:24 +0100) x86_64 GNU/Linux
root@pve3:~# iperf3 -t 300 -c 192.168.100.201
Connecting to host 192.168.100.201, port 5201
[  5] local 192.168.100.203 port 33710 connected to 192.168.100.201 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   297 MBytes  2.49 Gbits/sec    0    786 KBytes
[  5]   1.00-2.00   sec   295 MBytes  2.47 Gbits/sec    0    786 KBytes
[  5]   2.00-3.00   sec   294 MBytes  2.47 Gbits/sec    0    786 KBytes
[  5]   3.00-4.00   sec   296 MBytes  2.48 Gbits/sec    0    821 KBytes
[  5]   4.00-5.00   sec   294 MBytes  2.47 Gbits/sec    0    821 KBytes
[  5]   5.00-6.00   sec   295 MBytes  2.48 Gbits/sec    0    865 KBytes
[  5]   6.00-7.00   sec   295 MBytes  2.47 Gbits/sec    0    865 KBytes
...
[  5] 293.00-294.00 sec   294 MBytes  2.46 Gbits/sec    0   3.00 MBytes
[  5] 294.00-295.00 sec   295 MBytes  2.47 Gbits/sec    0   3.00 MBytes
[  5] 295.00-296.00 sec   295 MBytes  2.47 Gbits/sec    0   3.00 MBytes
[  5] 296.00-297.00 sec   295 MBytes  2.47 Gbits/sec    0   3.00 MBytes
[  5] 297.00-298.00 sec   294 MBytes  2.46 Gbits/sec    0   3.00 MBytes
[  5] 298.00-299.00 sec   295 MBytes  2.47 Gbits/sec    0   3.00 MBytes
[  5] 299.00-300.00 sec   295 MBytes  2.47 Gbits/sec    0   3.00 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-300.00 sec  86.3 GBytes  2.47 Gbits/sec    1             sender
[  5]   0.00-300.01 sec  86.3 GBytes  2.47 Gbits/sec                  receiver
Iperf3 process constantly consumes 12~16% of one core ; which looks reasonable but slightly more than with kernel 5.4
However, one of the cores also constantly spends 60% of its time processing soft irqs (see cpu3, %si column bellow).
This is 10% higher for the same amount of bandwidth.

Code: Select all

top - 20:32:16 up 2 days, 52 min,  2 users,  load average: 0.27, 0.23, 0.28
Tasks: 197 total,   2 running, 195 sleeping,   0 stopped,   0 zombie
%Cpu0  :  1.7 us,  7.4 sy,  0.0 ni, 90.6 id,  0.3 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu1  :  7.0 us,  6.0 sy,  0.0 ni, 86.7 id,  0.3 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu2  :  3.7 us,  2.7 sy,  0.0 ni, 93.6 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu3  :  2.0 us,  5.9 sy,  0.0 ni, 31.5 id,  0.0 wa,  0.0 hi, 60.7 si,  0.0 st
MiB Mem :  31932.2 total,  25318.6 free,   5675.9 used,    937.7 buff/cache
MiB Swap:   8192.0 total,   8192.0 free,      0.0 used.  25769.4 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
 789691 root      20   0    7884   4264   3788 R  14.6   0.0   0:06.94 iperf3
   1438 root      20   0  272436  90732   9540 S   5.3   0.3  22:25.49 pvestatd
   1331 root      rt   0  573288 177892  51468 S   4.3   0.5 167:06.41 corosync
     32 root      20   0       0      0      0 S   2.0   0.0   0:56.67 ksoftir+
   1437 root      20   0  273836  90348   7804 S   1.7   0.3  12:14.52 pve-fir+
   1892 root      20   0   10.4g   1.7g  24628 S   1.7   5.4  55:11.20 kvm
   2188 root      20   0 2505208   1.1g  24920 S   1.7   3.4  30:35.64 kvm
   1332 ceph      20   0  560832 153728  30448 S   1.3   0.5  73:01.54 ceph-mo
I will run additionnal tests and post the results.
Last edited by caramb on Sat May 01, 2021 4:29 pm, edited 1 time in total.

caramb
Posts: 27
Joined: Sun Oct 18, 2020 8:32 pm
languages_spoken: english french
ODROIDs: Odroid-H2+
Has thanked: 1 time
Been thanked: 5 times
Contact:

Re: WORKAROUND : Horrible network performance of R8125B when connected to a 1GbE switch

Post by caramb »

Test 3 : Node pve (HP G8 5.4.106-1-pve + custom r8169) as server ; Node pve2 (H2+ 5.4.106-1-pve + custom r8169) as client ; 5min UDP test 2.5Gb/s bandwidth ; single one-way stream over back-to-back 2.5GbE link.

Code: Select all

root@pve2:~# iperf3 -u -b 2500M -t 300 -c 192.168.100.201
Connecting to host 192.168.100.201, port 5201
[  5] local 192.168.100.202 port 54417 connected to 192.168.100.201 port 5201
[ ID] Interval           Transfer     Bitrate         Total Datagrams
[  5]   0.00-1.00   sec   295 MBytes  2.48 Gbits/sec  34607
[  5]   1.00-2.00   sec   295 MBytes  2.48 Gbits/sec  34612
[  5]   2.00-3.00   sec   295 MBytes  2.48 Gbits/sec  34625
[  5]   3.00-4.00   sec   295 MBytes  2.48 Gbits/sec  34609
[  5]   4.00-5.00   sec   295 MBytes  2.48 Gbits/sec  34628
[  5]   5.00-6.00   sec   295 MBytes  2.48 Gbits/sec  34601
[  5]   6.00-7.00   sec   296 MBytes  2.48 Gbits/sec  34642
[  5]   7.00-8.00   sec   295 MBytes  2.48 Gbits/sec  34614
[  5]   8.00-9.00   sec   296 MBytes  2.48 Gbits/sec  34644
[  5]   9.00-10.00  sec   295 MBytes  2.48 Gbits/sec  34598
[  5]  10.00-11.00  sec   295 MBytes  2.48 Gbits/sec  34616
[  5]  11.00-12.00  sec   295 MBytes  2.48 Gbits/sec  34603
[  5]  12.00-13.00  sec   295 MBytes  2.48 Gbits/sec  34627
[  5]  13.00-14.00  sec   295 MBytes  2.48 Gbits/sec  34626
[  5]  14.00-15.00  sec   295 MBytes  2.48 Gbits/sec  34588
...
[  5] 295.00-296.00 sec   296 MBytes  2.48 Gbits/sec  34638
[  5] 296.00-297.00 sec   296 MBytes  2.48 Gbits/sec  34644
[  5] 297.00-298.00 sec   295 MBytes  2.47 Gbits/sec  34533
[  5] 298.00-299.00 sec   296 MBytes  2.48 Gbits/sec  34631
[  5] 299.00-300.00 sec   296 MBytes  2.48 Gbits/sec  34645
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Jitter    Lost/Total Datagrams
[  5]   0.00-300.00 sec  86.5 GBytes  2.48 Gbits/sec  0.000 ms  0/10381576 (0%)  sender
[  5]   0.00-300.00 sec  86.5 GBytes  2.48 Gbits/sec  0.006 ms  2255/10381576 (0.022%)  receiver
Result is still solid ; pushing UDP datagrams @ 2.48Gb/s ; almost no loss at the receiving side.
CPU usage shows a different scheme
The iperf3 process constantly consumed 88% of one core.
However, %tage of time in soft irqs dropped to <20% (may be due to the connectionless nature of UDP compared to TCP ; logic is a lot simplier).

Code: Select all

top - 21:06:49 up 2 days,  1:36,  2 users,  load average: 0.98, 0.70, 0.56
Tasks: 199 total,   2 running, 197 sleeping,   0 stopped,   0 zombie
%Cpu0  :  1.8 us,  5.8 sy,  0.0 ni, 79.2 id,  0.3 wa,  0.0 hi, 12.9 si,  0.0 st
%Cpu1  :  2.2 us,  3.8 sy,  0.0 ni, 88.3 id,  0.3 wa,  0.0 hi,  5.4 si,  0.0 st
%Cpu2  :  4.8 us, 14.5 sy,  0.0 ni, 78.1 id,  0.0 wa,  0.0 hi,  2.6 si,  0.0 st
%Cpu3  :  6.2 us, 50.3 sy,  0.0 ni, 26.9 id,  0.0 wa,  0.0 hi, 16.6 si,  0.0 st
MiB Mem :  32003.6 total,  24536.5 free,   6426.0 used,   1041.1 buff/cache
MiB Swap:   8192.0 total,   8192.0 free,      0.0 used.  25043.2 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
 857023 root      20   0    7892   4360   3900 R  87.4   0.0   2:17.29 iperf3
   1279 root      rt   0  575404 180012  51464 S   3.0   0.5 146:56.24 corosync
   2354 root      20   0 2554692   1.0g  25356 S   2.0   3.2 136:21.80 kvm
   1276 ceph      20   0  568932 161220  30248 S   1.3   0.5  52:46.88 ceph-mon
   2639 root      20   0 2505504   1.1g  24808 S   1.0   3.4 103:07.91 kvm
     30 root      20   0       0      0      0 S   0.7   0.0   2:24.20 ksoftirq+
   1280 ceph      20   0  995104 229596  33704 S   0.7   0.7  19:34.40 ceph-mgr
   1295 ceph      20   0 1762824 857504  39604 S   0.7   2.6  65:42.92 ceph-osd


Test 4 : Node pve (HP G8 5.4.106-1-pve + custom r8169) as server ; Node pve3 (H2+ 5.11.7-1-pve with stock r8169) as client ; 5min UDP test 2.5Gb/s bandwidth ; single one-way stream over back-to-back 2.5GbE link.

Code: Select all

root@pve3:~# iperf3 -u -b 2500M -t 300 -c 192.168.100.201
Connecting to host 192.168.100.201, port 5201
[  5] local 192.168.100.203 port 45496 connected to 192.168.100.201 port 5201
[ ID] Interval           Transfer     Bitrate         Total Datagrams
[  5]   0.00-1.00   sec   293 MBytes  2.46 Gbits/sec  34364
[  5]   1.00-2.00   sec   292 MBytes  2.45 Gbits/sec  34235
[  5]   2.00-3.00   sec   295 MBytes  2.48 Gbits/sec  34600
[  5]   3.00-4.00   sec   294 MBytes  2.46 Gbits/sec  34427
[  5]   4.00-5.00   sec   295 MBytes  2.47 Gbits/sec  34540
[  5]   5.00-6.00   sec   295 MBytes  2.47 Gbits/sec  34542
[  5]   6.00-7.00   sec   292 MBytes  2.45 Gbits/sec  34274
[  5]   7.00-8.00   sec   295 MBytes  2.47 Gbits/sec  34530
[  5]   8.00-9.00   sec   294 MBytes  2.47 Gbits/sec  34477
...
[  5] 294.00-295.00 sec   295 MBytes  2.47 Gbits/sec  34557
[  5] 295.00-296.00 sec   293 MBytes  2.46 Gbits/sec  34319
[  5] 296.00-297.00 sec   293 MBytes  2.46 Gbits/sec  34340
[  5] 297.00-298.00 sec   294 MBytes  2.47 Gbits/sec  34494
[  5] 298.00-299.00 sec   295 MBytes  2.47 Gbits/sec  34541
[  5] 299.00-300.00 sec   295 MBytes  2.47 Gbits/sec  34564
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Jitter    Lost/Total Datagrams
[  5]   0.00-300.00 sec  86.2 GBytes  2.47 Gbits/sec  0.000 ms  0/10340591 (0%)  sender
[  5]   0.00-300.00 sec  86.2 GBytes  2.47 Gbits/sec  0.004 ms  1368/10340591 (0.013%)  receiver
Pushing UDP datagrams @ 2.47Gb/s ; almost no loss at the receiving side.
The iperf3 process constantly consumed 93% of one core ; 5% higher than with the other kernel and driver.
%tage of time in soft irqs is ~30% ; which is 10% higher than with the other kernel and driver.

Code: Select all

top - 21:17:34 up 2 days,  1:37,  2 users,  load average: 0.92, 0.48, 0.33
Tasks: 194 total,   2 running, 192 sleeping,   0 stopped,   0 zombie
%Cpu0  :  1.7 us,  7.0 sy,  0.0 ni, 89.9 id,  0.7 wa,  0.0 hi,  0.7 si,  0.0 st
%Cpu1  :  1.7 us,  3.4 sy,  0.0 ni, 94.6 id,  0.3 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu2  :  1.3 us,  1.3 sy,  0.0 ni, 96.7 id,  0.7 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu3  :  5.3 us, 57.8 sy,  0.0 ni,  9.0 id,  0.0 wa,  0.0 hi, 27.9 si,  0.0 st
MiB Mem :  31932.2 total,  25273.3 free,   5723.8 used,    935.1 buff/cache
MiB Swap:   8192.0 total,   8192.0 free,      0.0 used.  25721.5 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
 801626 root      20   0    7768   4412   3948 R  93.0   0.0   1:05.22 iperf3
   1331 root      rt   0  573288 177892  51468 S   2.3   0.5 169:34.41 corosync
   1351 ceph      20   0 1839164 905548  38944 S   1.7   2.8  82:33.48 ceph-osd
pve2 & pve3 are two different H2+ (from the same order) ; thus unless pve3 features a less efficient hardware, it looks kernel 5.11 consumes something like 10% more CPU for the same network workload.
Both features the exact same bogomips results ; thus on a CPU point of view, both H2+ can do absolutely nothing at the exact same speed... :lol: ; which tends to show that HW are identical...

Code: Select all

root@pve2:~# cat /proc/cpuinfo |grep -i bogo
bogomips        : 3571.20
bogomips        : 3571.20
bogomips        : 3571.20
bogomips        : 3571.20
root@pve3:~# cat /proc/cpuinfo |grep -i bogo
bogomips        : 3571.20
bogomips        : 3571.20
bogomips        : 3571.20
bogomips        : 3571.20
Last edited by caramb on Sat May 01, 2021 4:33 pm, edited 1 time in total.

caramb
Posts: 27
Joined: Sun Oct 18, 2020 8:32 pm
languages_spoken: english french
ODROIDs: Odroid-H2+
Has thanked: 1 time
Been thanked: 5 times
Contact:

Re: WORKAROUND : Horrible network performance of R8125B when connected to a 1GbE switch

Post by caramb »

Test 5 : Swapping nodes roles : nodes pve2 & pve3 (H2+) as iperf3 servers ; pve node as client.
Quick TCP/UDP tests

Code: Select all

root@pve:~# iperf3 -c 192.168.100.202
Connecting to host 192.168.100.202, port 5201
[  5] local 192.168.100.201 port 55894 connected to 192.168.100.202 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   296 MBytes  2.49 Gbits/sec    0    987 KBytes
[  5]   1.00-2.00   sec   293 MBytes  2.46 Gbits/sec    0   1.08 MBytes
[  5]   2.00-3.00   sec   294 MBytes  2.46 Gbits/sec    0   1.13 MBytes
[  5]   3.00-4.00   sec   295 MBytes  2.47 Gbits/sec    0   1.13 MBytes
[  5]   4.00-5.00   sec   295 MBytes  2.47 Gbits/sec    0   1.13 MBytes
[  5]   5.00-6.00   sec   295 MBytes  2.47 Gbits/sec    0   1.13 MBytes
[  5]   6.00-7.00   sec   294 MBytes  2.46 Gbits/sec    0   1.13 MBytes
[  5]   7.00-8.00   sec   295 MBytes  2.47 Gbits/sec    0   1.13 MBytes
[  5]   8.00-9.00   sec   294 MBytes  2.46 Gbits/sec    0   1.13 MBytes
[  5]   9.00-10.00  sec   294 MBytes  2.46 Gbits/sec    0   1.13 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  2.88 GBytes  2.47 Gbits/sec    0             sender
[  5]   0.00-10.00  sec  2.87 GBytes  2.47 Gbits/sec                  receiver

iperf Done.
root@pve:~#
root@pve:~#
root@pve:~# iperf3 -c 192.168.100.203
Connecting to host 192.168.100.203, port 5201
[  5] local 192.168.100.201 port 43218 connected to 192.168.100.203 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   296 MBytes  2.48 Gbits/sec    0    778 KBytes
[  5]   1.00-2.00   sec   295 MBytes  2.47 Gbits/sec    0    778 KBytes
[  5]   2.00-3.00   sec   295 MBytes  2.47 Gbits/sec    0    813 KBytes
[  5]   3.00-4.00   sec   294 MBytes  2.47 Gbits/sec    0    891 KBytes
[  5]   4.00-5.00   sec   295 MBytes  2.48 Gbits/sec    0    891 KBytes
[  5]   5.00-6.00   sec   295 MBytes  2.47 Gbits/sec    0    891 KBytes
[  5]   6.00-7.00   sec   296 MBytes  2.48 Gbits/sec    0    961 KBytes
[  5]   7.00-8.00   sec   294 MBytes  2.46 Gbits/sec    0   1.45 MBytes
[  5]   8.00-9.00   sec   295 MBytes  2.47 Gbits/sec    0   1.45 MBytes
[  5]   9.00-10.00  sec   294 MBytes  2.46 Gbits/sec    0   1.45 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  2.88 GBytes  2.47 Gbits/sec    0             sender
[  5]   0.00-10.00  sec  2.88 GBytes  2.47 Gbits/sec                  receiver

iperf Done.
root@pve:~# iperf3 -u -b 2500M -c 192.168.100.202
Connecting to host 192.168.100.202, port 5201
[  5] local 192.168.100.201 port 56750 connected to 192.168.100.202 port 5201
[ ID] Interval           Transfer     Bitrate         Total Datagrams
[  5]   0.00-1.00   sec   295 MBytes  2.48 Gbits/sec  34605
[  5]   1.00-2.00   sec   296 MBytes  2.48 Gbits/sec  34640
[  5]   2.00-3.00   sec   295 MBytes  2.48 Gbits/sec  34618
[  5]   3.00-4.00   sec   296 MBytes  2.48 Gbits/sec  34647
[  5]   4.00-5.00   sec   296 MBytes  2.48 Gbits/sec  34641
[  5]   5.00-6.00   sec   296 MBytes  2.48 Gbits/sec  34644
[  5]   6.00-7.00   sec   296 MBytes  2.48 Gbits/sec  34649
[  5]   7.00-8.00   sec   295 MBytes  2.48 Gbits/sec  34623
[  5]   8.00-9.00   sec   296 MBytes  2.48 Gbits/sec  34641
[  5]   9.00-10.00  sec   296 MBytes  2.48 Gbits/sec  34640
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Jitter    Lost/Total Datagrams
[  5]   0.00-10.00  sec  2.89 GBytes  2.48 Gbits/sec  0.000 ms  0/346348 (0%)  sender
[  5]   0.00-10.00  sec  2.88 GBytes  2.47 Gbits/sec  0.009 ms  1124/346348 (0.32%)  receiver

iperf Done.
root@pve:~# iperf3 -u -b 2500M -c 192.168.100.203
Connecting to host 192.168.100.203, port 5201
[  5] local 192.168.100.201 port 51679 connected to 192.168.100.203 port 5201
[ ID] Interval           Transfer     Bitrate         Total Datagrams
[  5]   0.00-1.00   sec   295 MBytes  2.48 Gbits/sec  34615
[  5]   1.00-2.00   sec   296 MBytes  2.48 Gbits/sec  34650
[  5]   2.00-3.00   sec   296 MBytes  2.48 Gbits/sec  34648
[  5]   3.00-4.00   sec   294 MBytes  2.47 Gbits/sec  34475
[  5]   4.00-5.00   sec   296 MBytes  2.48 Gbits/sec  34649
[  5]   5.00-6.00   sec   296 MBytes  2.48 Gbits/sec  34649
[  5]   6.00-7.00   sec   296 MBytes  2.48 Gbits/sec  34652
[  5]   7.00-8.00   sec   296 MBytes  2.48 Gbits/sec  34649
[  5]   8.00-9.00   sec   296 MBytes  2.48 Gbits/sec  34633
[  5]   9.00-10.00  sec   295 MBytes  2.48 Gbits/sec  34621
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Jitter    Lost/Total Datagrams
[  5]   0.00-10.00  sec  2.89 GBytes  2.48 Gbits/sec  0.000 ms  0/346241 (0%)  sender
[  5]   0.00-10.00  sec  2.87 GBytes  2.47 Gbits/sec  0.013 ms  1339/346240 (0.39%)  receiver
Still getting 2.47/2.48Gb/s both tcp & udp.

caramb
Posts: 27
Joined: Sun Oct 18, 2020 8:32 pm
languages_spoken: english french
ODROIDs: Odroid-H2+
Has thanked: 1 time
Been thanked: 5 times
Contact:

Re: WORKAROUND : Horrible network performance of R8125B when connected to a 1GbE switch

Post by caramb »

Test 6 : H2+ "full throttle" TCP test

node pve (HP G8) as server
node pve3 (H2+ 5.11.7-1-pve) as server
node pve2 (H2+ 5.4.106-1-pve) as client.
pve2 to pve : dedicated 2.5GbE link (back-to-back)
pve2 to pve3 : dedicated 2.5GbE link (back-to-back)

Simultaneous 1min TCP test between (pve2 and pve) and (pve2 and pve3) :

Code: Select all

root@pve2:~# iperf3 -t 60 -c 192.168.100.201
Connecting to host 192.168.100.201, port 5201
[  5] local 192.168.100.202 port 52428 connected to 192.168.100.201 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   297 MBytes  2.49 Gbits/sec    0    629 KBytes
[  5]   1.00-2.00   sec   295 MBytes  2.47 Gbits/sec    0    664 KBytes
[  5]   2.00-3.00   sec   295 MBytes  2.47 Gbits/sec    0    699 KBytes
[  5]   3.00-4.00   sec   295 MBytes  2.48 Gbits/sec    0    699 KBytes
[  5]   4.00-5.00   sec   294 MBytes  2.47 Gbits/sec    0    699 KBytes
[  5]   5.00-6.00   sec   295 MBytes  2.47 Gbits/sec    0   1005 KBytes
[  5]   6.00-7.00   sec   292 MBytes  2.46 Gbits/sec    0   1005 KBytes
...
[  5]  55.00-56.00  sec   295 MBytes  2.47 Gbits/sec    0   1.46 MBytes
[  5]  56.00-57.00  sec   294 MBytes  2.46 Gbits/sec    0   1.46 MBytes
[  5]  57.00-58.00  sec   295 MBytes  2.47 Gbits/sec    0   1.46 MBytes
[  5]  58.00-59.00  sec   292 MBytes  2.45 Gbits/sec    0   1.46 MBytes
[  5]  59.00-60.00  sec   295 MBytes  2.47 Gbits/sec    0   1.46 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-60.00  sec  17.3 GBytes  2.47 Gbits/sec    0             sender
[  5]   0.00-60.01  sec  17.3 GBytes  2.47 Gbits/sec                  receiver

root@pve2:~# iperf3 -t 60 -c 192.168.100.203
Connecting to host 192.168.100.203, port 5201
[  5] local 192.168.100.202 port 45380 connected to 192.168.100.203 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   296 MBytes  2.48 Gbits/sec    0    778 KBytes
[  5]   1.00-2.00   sec   296 MBytes  2.48 Gbits/sec    0    813 KBytes
[  5]   2.00-3.00   sec   294 MBytes  2.47 Gbits/sec    0    813 KBytes
[  5]   3.00-4.00   sec   295 MBytes  2.47 Gbits/sec    0    944 KBytes
[  5]   4.00-5.00   sec   295 MBytes  2.47 Gbits/sec    0    944 KBytes
[  5]   5.00-6.00   sec   295 MBytes  2.47 Gbits/sec    0    996 KBytes
[  5]   6.00-7.00   sec   295 MBytes  2.47 Gbits/sec    0    996 KBytes
[  5]   7.00-8.00   sec   295 MBytes  2.47 Gbits/sec    0    996 KBytes
[  5]   8.00-9.00   sec   295 MBytes  2.47 Gbits/sec    0   1.44 MBytes
...
[  5]  56.00-57.00  sec   295 MBytes  2.47 Gbits/sec    0   3.37 MBytes
[  5]  57.00-58.00  sec   295 MBytes  2.47 Gbits/sec    0   3.37 MBytes
[  5]  58.00-59.00  sec   295 MBytes  2.47 Gbits/sec    0   3.37 MBytes
[  5]  59.00-60.00  sec   294 MBytes  2.47 Gbits/sec    0   3.37 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-60.00  sec  17.3 GBytes  2.47 Gbits/sec    0             sender
[  5]   0.00-60.00  sec  17.3 GBytes  2.47 Gbits/sec                  receiver
pv2 is pushing 2.47Gb/s to both nodes simultaneously (great!)

Each of the iperf3 process consumes ~20% of one core.
Two cores spend ~50% of their time in soft irqs ; which is consistent with previous results.

Code: Select all

top - 21:51:46 up 2 days,  2:21,  3 users,  load average: 0.46, 0.39, 0.37
Tasks: 200 total,   2 running, 198 sleeping,   0 stopped,   0 zombie
%Cpu0  :  1.3 us,  4.9 sy,  0.0 ni, 39.2 id,  0.3 wa,  0.0 hi, 54.2 si,  0.0 st
%Cpu1  :  2.1 us, 11.7 sy,  0.0 ni, 77.0 id,  0.3 wa,  0.0 hi,  8.9 si,  0.0 st
%Cpu2  :  2.9 us, 10.0 sy,  0.0 ni, 83.0 id,  0.0 wa,  0.0 hi,  4.2 si,  0.0 st
%Cpu3  :  2.0 us,  6.0 sy,  0.0 ni, 39.3 id,  0.0 wa,  0.0 hi, 52.7 si,  0.0 st
MiB Mem :  32003.6 total,  24464.8 free,   6491.7 used,   1047.1 buff/cache
MiB Swap:   8192.0 total,   8192.0 free,      0.0 used.  24976.9 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
 870545 root      20   0    7876   4320   3852 R  19.8   0.0   0:03.66 iperf3
 870544 root      20   0    7876   4304   3832 S  18.2   0.0   0:04.02 iperf3
   1279 root      rt   0  575404 180012  51464 S   5.0   0.5 149:09.22 corosync
   2354 root      20   0 2554692   1.0g  25356 S   4.3   3.2 138:24.67 kvm
   1133 root      20   0  676128  54340  42872 S   3.0   0.2  16:32.51 pmxcfs
     30 root      20   0       0      0      0 S   2.6   0.0   2:27.56 ksoftir+
   1276 ceph      20   0  573028 166008  30248 S   2.0   0.5  53:36.96 ceph-mon
     10 root      20   0       0      0      0 S   1.7   0.0   1:01.43 ksoftir+
Test 7 : H2+ "full throttle" UDP test

Code: Select all

root@pve2:~# iperf3 -u -b 2500M -t 60 -c 192.168.100.201
Connecting to host 192.168.100.201, port 5201
[  5] local 192.168.100.202 port 53100 connected to 192.168.100.201 port 5201
[ ID] Interval           Transfer     Bitrate         Total Datagrams
[  5]   0.00-1.00   sec   295 MBytes  2.48 Gbits/sec  34623
[  5]   1.00-2.00   sec   275 MBytes  2.31 Gbits/sec  32270
[  5]   2.00-3.00   sec   288 MBytes  2.42 Gbits/sec  33758
[  5]   3.00-4.00   sec   296 MBytes  2.48 Gbits/sec  34651
[  5]   4.00-5.00   sec   295 MBytes  2.47 Gbits/sec  34563
[  5]   5.00-6.00   sec   295 MBytes  2.48 Gbits/sec  34621
[  5]   6.00-7.00   sec   295 MBytes  2.48 Gbits/sec  34618
[  5]   7.00-8.00   sec   273 MBytes  2.29 Gbits/sec  32029
[  5]   8.00-9.00   sec   256 MBytes  2.15 Gbits/sec  30013
...
[  5]  56.00-57.00  sec   291 MBytes  2.44 Gbits/sec  34064
[  5]  57.00-58.00  sec   294 MBytes  2.47 Gbits/sec  34446
[  5]  58.00-59.00  sec   293 MBytes  2.46 Gbits/sec  34326
[  5]  59.00-60.00  sec   296 MBytes  2.48 Gbits/sec  34652
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Jitter    Lost/Total Datagrams
[  5]   0.00-60.00  sec  16.8 GBytes  2.41 Gbits/sec  0.000 ms  0/2015981 (0%)  sender
[  5]   0.00-60.00  sec  16.8 GBytes  2.40 Gbits/sec  0.008 ms  339/2015981 (0.017%)  receiver


root@pve2:~# iperf3 -u -b 2500M -t 60 -c 192.168.100.203
Connecting to host 192.168.100.203, port 5201
[  5] local 192.168.100.202 port 41625 connected to 192.168.100.203 port 5201
[ ID] Interval           Transfer     Bitrate         Total Datagrams
[  5]   0.00-1.00   sec   291 MBytes  2.44 Gbits/sec  34080
[  5]   1.00-2.00   sec   264 MBytes  2.21 Gbits/sec  30915
[  5]   2.00-3.00   sec   294 MBytes  2.47 Gbits/sec  34471
[  5]   3.00-4.00   sec   295 MBytes  2.48 Gbits/sec  34619
[  5]   4.00-5.00   sec   295 MBytes  2.48 Gbits/sec  34582
[  5]   5.00-6.00   sec   295 MBytes  2.48 Gbits/sec  34575
[  5]   6.00-7.00   sec   296 MBytes  2.48 Gbits/sec  34632
...
[  5]  57.00-58.00  sec   294 MBytes  2.47 Gbits/sec  34492
[  5]  58.00-59.00  sec   294 MBytes  2.46 Gbits/sec  34398
[  5]  59.00-60.00  sec   295 MBytes  2.47 Gbits/sec  34573
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Jitter    Lost/Total Datagrams
[  5]   0.00-60.00  sec  17.2 GBytes  2.46 Gbits/sec  0.000 ms  0/2063118 (0%)  sender
[  5]   0.00-60.00  sec  17.2 GBytes  2.46 Gbits/sec  0.015 ms  4336/2063113 (0.21%)  receiver
2.4Gb/s to pve & 2.46Gb/s to pve3 simultaneously.
Bandwidth to pve is lower ; also pve is not a H2+.
One shoumld note that :
pve3 is a kernel 5.11 which r8169 driver is supposed to have the fix for the UDP HW bug.
pve is a kernel 5.4 with a backported 5.9 r8169 driver which do not have the fix.

Code: Select all

top - 22:03:22 up 2 days,  2:32,  3 users,  load average: 0.75, 0.34, 0.32
Tasks: 204 total,   3 running, 201 sleeping,   0 stopped,   0 zombie
%Cpu0  :  7.4 us, 68.0 sy,  0.0 ni,  3.0 id,  0.0 wa,  0.0 hi, 21.5 si,  0.0 st
%Cpu1  :  5.0 us, 17.6 sy,  0.0 ni, 62.8 id,  0.3 wa,  0.0 hi, 14.4 si,  0.0 st
%Cpu2  :  7.7 us,  6.5 sy,  0.0 ni, 73.3 id,  0.3 wa,  0.0 hi, 12.2 si,  0.0 st
%Cpu3  :  7.1 us, 54.6 sy,  0.0 ni, 22.7 id,  0.0 wa,  0.0 hi, 15.6 si,  0.0 st
MiB Mem :  32003.6 total,  24465.9 free,   6491.4 used,   1046.3 buff/cache
MiB Swap:   8192.0 total,   8192.0 free,      0.0 used.  24977.0 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
 873893 root      20   0    7760   4368   3896 R  95.3   0.0   0:14.07 iperf3
 873892 root      20   0    7760   4324   3852 R  92.0   0.0   0:14.35 iperf3
   2354 root      20   0 2554692   1.0g  25356 S   3.3   3.2 138:56.47 kvm
   1279 root      rt   0  575404 180012  51464 S   2.7   0.5 149:45.08 corosync
   1379 root      20   0  272336  90736   9648 S   2.7   0.3  22:23.17 pvestatd
   2639 root      20   0 2505504   1.1g  24808 S   2.0   3.4 105:05.06 kvm
   1276 ceph      20   0  577124 158060  30248 S   1.3   0.5  53:50.52 ceph-mon
   1380 root      20   0  273800  90108   7592 S   1.3   0.3  10:30.25 pve-fir+

The H2+ is able to push 4.94Gb/s tcp traffic.
The H2+ is able to push 4.86Gb/s udp traffic.
(H2+ is probably able to push a lot more when a netcard is plugged).
Transmitting packets generates soft irqs ; so it no big surprise the H2+ spends time there when pushing multi-gigabit of traffic.
Cannot tell if it spends too much... but it still looks to me the H2+ is a great performer :D
There may be some sysctl tweaking to dig into (Proxmox comes with some tunings) (some info here : https://github.com/leandromoreira/linux ... /README.md)
As of stability, while performing these iperf3 tests, I did not face any hang, oops, slowdown, crash, link down, disruption of any sort.

Footnote : Using the "zerocopy mode" of iperf3 (-Z option) will cut CPU usage by half when running TCP tests.
I.e. iperf3 on H2+ (5.4.106-1-pve) pushing 2.5Gb/s will only consumes 6% of a single core.
But this is application side optimization ; neither kernel/driver nor system optimization.

domih
Posts: 410
Joined: Mon Feb 11, 2019 4:48 pm
languages_spoken: English, French
ODROIDs: UX4, HC2, N2, N2+, H2, H2+, C4, HC4 - 1GbE, 2.5GbE, 10GbE, 40+GbE
Location: San Francisco Bay Area
Has thanked: 157 times
Been thanked: 158 times
Contact:

Re: WORKAROUND : Horrible network performance of R8125B when connected to a 1GbE switch

Post by domih »

Could you alternatively measure the CPU consumption as a whole (including the time spent in IRQ) using:

awk '{u=$2+$3+$4+$6+$7+$8+$9+$10; t=$2+$3+$4+$5+$6+$7+$8+$9+$10; if (NR==1){u1=u; t1=t;} else print ($2+$3+$4+$6+$7+$8+$9+$10-u1) * 100 / (t-t1); }' <(grep 'cpu ' /proc/stat) <(sleep 1;grep 'cpu ' /proc/stat)

On my side with 9.003.05:

1-session from H2+ to H2+ on the iperf3 -c ... side returns (no desktop running, via ssh, multiple manual calls while session is running in an other ssh session):

domih@h2d:~$ awk '{u=$2+$3+$4+$6+$7+$8+$9+$10; t=$2+$3+$4+$5+$6+$7+$8+$9+$10; if (NR==1){u1=u; t1=t;} else print ($2+$3+$4+$6+$7+$8+$9+$10-u1) * 100 / (t-t1); }' <(grep 'cpu ' /proc/stat) <(sleep 1;grep 'cpu ' /proc/stat)
11.4428
domih@h2d:~$ awk '{u=$2+$3+$4+$6+$7+$8+$9+$10; t=$2+$3+$4+$5+$6+$7+$8+$9+$10; if (NR==1){u1=u; t1=t;} else print ($2+$3+$4+$6+$7+$8+$9+$10-u1) * 100 / (t-t1); }' <(grep 'cpu ' /proc/stat) <(sleep 1;grep 'cpu ' /proc/stat)
11.6625
domih@h2d:~$ awk '{u=$2+$3+$4+$6+$7+$8+$9+$10; t=$2+$3+$4+$5+$6+$7+$8+$9+$10; if (NR==1){u1=u; t1=t;} else print ($2+$3+$4+$6+$7+$8+$9+$10-u1) * 100 / (t-t1); }' <(grep 'cpu ' /proc/stat) <(sleep 1;grep 'cpu ' /proc/stat)
9.22693
domih@h2d:~$ awk '{u=$2+$3+$4+$6+$7+$8+$9+$10; t=$2+$3+$4+$5+$6+$7+$8+$9+$10; if (NR==1){u1=u; t1=t;} else print ($2+$3+$4+$6+$7+$8+$9+$10-u1) * 100 / (t-t1); }' <(grep 'cpu ' /proc/stat) <(sleep 1;grep 'cpu ' /proc/stat)
11.8519
domih@h2d:~$ awk '{u=$2+$3+$4+$6+$7+$8+$9+$10; t=$2+$3+$4+$5+$6+$7+$8+$9+$10; if (NR==1){u1=u; t1=t;} else print ($2+$3+$4+$6+$7+$8+$9+$10-u1) * 100 / (t-t1); }' <(grep 'cpu ' /proc/stat) <(sleep 1;grep 'cpu ' /proc/stat)
9.65347
domih@h2d:~$ awk '{u=$2+$3+$4+$6+$7+$8+$9+$10; t=$2+$3+$4+$5+$6+$7+$8+$9+$10; if (NR==1){u1=u; t1=t;} else print ($2+$3+$4+$6+$7+$8+$9+$10-u1) * 100 / (t-t1); }' <(grep 'cpu ' /proc/stat) <(sleep 1;grep 'cpu ' /proc/stat)
9.65347

Idle (no desktop running, via ssh)
domih@h2d:~$ awk '{u=$2+$3+$4+$6+$7+$8+$9+$10; t=$2+$3+$4+$5+$6+$7+$8+$9+$10; if (NR==1){u1=u; t1=t;} else print ($2+$3+$4+$6+$7+$8+$9+$10-u1) * 100 / (t-t1); }' <(grep 'cpu ' /proc/stat) <(sleep 1;grep 'cpu ' /proc/stat)
0.501253
domih@h2d:~$ awk '{u=$2+$3+$4+$6+$7+$8+$9+$10; t=$2+$3+$4+$5+$6+$7+$8+$9+$10; if (NR==1){u1=u; t1=t;} else print ($2+$3+$4+$6+$7+$8+$9+$10-u1) * 100 / (t-t1); }' <(grep 'cpu ' /proc/stat) <(sleep 1;grep 'cpu ' /proc/stat)
0.250627
domih@h2d:~$ awk '{u=$2+$3+$4+$6+$7+$8+$9+$10; t=$2+$3+$4+$5+$6+$7+$8+$9+$10; if (NR==1){u1=u; t1=t;} else print ($2+$3+$4+$6+$7+$8+$9+$10-u1) * 100 / (t-t1); }' <(grep 'cpu ' /proc/stat) <(sleep 1;grep 'cpu ' /proc/stat)
0.74813

domih@h2d:~$ uname -a
Linux h2d 5.4.0-70-generic #78-Ubuntu SMP Fri Mar 19 13:29:52 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
domih@h2d:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 20.04.2 LTS
Release: 20.04
Codename: focal

domih@h2d:~$ ethtool -i enp2s0
driver: r8125
version: 9.003.05-NAPI
firmware-version:
expansion-rom-version:
bus-info: 0000:02:00.0
supports-statistics: yes
supports-test: no
supports-eeprom-access: no
supports-register-dump: yes
supports-priv-flags: no

caramb
Posts: 27
Joined: Sun Oct 18, 2020 8:32 pm
languages_spoken: english french
ODROIDs: Odroid-H2+
Has thanked: 1 time
Been thanked: 5 times
Contact:

Re: WORKAROUND : Horrible network performance of R8125B when connected to a 1GbE switch

Post by caramb »

Domih,

As of now, I cannot easily get my node completely idle ; stopped a bunch of daemons/services but due to being member of a cluster, if I stopped cluster sensitive things, the node fences itself and reboot... I'll have to find a way to boot my kernel really idle to perform the definitve tests (no recovery mode/single user mode in the grub choices).

However, here are some results :

When "not so idle" :

Code: Select all

root@pve2:~# awk '{u=$2+$3+$4+$6+$7+$8+$9+$10; t=$2+$3+$4+$5+$6+$7+$8+$9+$10; if (NR==1){u1=u; t1=t;} else print ($2+$3+$4+$6+$7+$8+$9+$10-u1) * 100 / (t-t1); }' <(grep 'cpu ' /proc/stat) <(sleep 1;grep 'cpu ' /proc/stat)
0.251889
root@pve2:~# awk '{u=$2+$3+$4+$6+$7+$8+$9+$10; t=$2+$3+$4+$5+$6+$7+$8+$9+$10; if (NR==1){u1=u; t1=t;} else print ($2+$3+$4+$6+$7+$8+$9+$10-u1) * 100 / (t-t1); }' <(grep 'cpu ' /proc/stat) <(sleep 1;grep 'cpu ' /proc/stat)
0.75188
root@pve2:~# awk '{u=$2+$3+$4+$6+$7+$8+$9+$10; t=$2+$3+$4+$5+$6+$7+$8+$9+$10; if (NR==1){u1=u; t1=t;} else print ($2+$3+$4+$6+$7+$8+$9+$10-u1) * 100 / (t-t1); }' <(grep 'cpu ' /proc/stat) <(sleep 1;grep 'cpu ' /proc/stat)
1.75
root@pve2:~# awk '{u=$2+$3+$4+$6+$7+$8+$9+$10; t=$2+$3+$4+$5+$6+$7+$8+$9+$10; if (NR==1){u1=u; t1=t;} else print ($2+$3+$4+$6+$7+$8+$9+$10-u1) * 100 / (t-t1); }' <(grep 'cpu ' /proc/stat) <(sleep 1;grep 'cpu ' /proc/stat)
1.00503
root@pve2:~# awk '{u=$2+$3+$4+$6+$7+$8+$9+$10; t=$2+$3+$4+$5+$6+$7+$8+$9+$10; if (NR==1){u1=u; t1=t;} else print ($2+$3+$4+$6+$7+$8+$9+$10-u1) * 100 / (t-t1); }' <(grep 'cpu ' /proc/stat) <(sleep 1;grep 'cpu ' /proc/stat)
4.7619
root@pve2:~# awk '{u=$2+$3+$4+$6+$7+$8+$9+$10; t=$2+$3+$4+$5+$6+$7+$8+$9+$10; if (NR==1){u1=u; t1=t;} else print ($2+$3+$4+$6+$7+$8+$9+$10-u1) * 100 / (t-t1); }' <(grep 'cpu ' /proc/stat) <(sleep 1;grep 'cpu ' /proc/stat)
0.501253
root@pve2:~# awk '{u=$2+$3+$4+$6+$7+$8+$9+$10; t=$2+$3+$4+$5+$6+$7+$8+$9+$10; if (NR==1){u1=u; t1=t;} else print ($2+$3+$4+$6+$7+$8+$9+$10-u1) * 100 / (t-t1); }' <(grep 'cpu ' /proc/stat) <(sleep 1;grep 'cpu ' /proc/stat)
1.5
root@pve2:~# awk '{u=$2+$3+$4+$6+$7+$8+$9+$10; t=$2+$3+$4+$5+$6+$7+$8+$9+$10; if (NR==1){u1=u; t1=t;} else print ($2+$3+$4+$6+$7+$8+$9+$10-u1) * 100 / (t-t1); }' <(grep 'cpu ' /proc/stat) <(sleep 1;grep 'cpu ' /proc/stat)
9.25
root@pve2:~# awk '{u=$2+$3+$4+$6+$7+$8+$9+$10; t=$2+$3+$4+$5+$6+$7+$8+$9+$10; if (NR==1){u1=u; t1=t;} else print ($2+$3+$4+$6+$7+$8+$9+$10-u1) * 100 / (t-t1); }' <(grep 'cpu ' /proc/stat) <(sleep 1;grep 'cpu ' /proc/stat)
0.757576
root@pve2:~# awk '{u=$2+$3+$4+$6+$7+$8+$9+$10; t=$2+$3+$4+$5+$6+$7+$8+$9+$10; if (NR==1){u1=u; t1=t;} else print ($2+$3+$4+$6+$7+$8+$9+$10-u1) * 100 / (t-t1); }' <(grep 'cpu ' /proc/stat) <(sleep 1;grep 'cpu ' /proc/stat)
2.51256
root@pve2:~# awk '{u=$2+$3+$4+$6+$7+$8+$9+$10; t=$2+$3+$4+$5+$6+$7+$8+$9+$10; if (NR==1){u1=u; t1=t;} else print ($2+$3+$4+$6+$7+$8+$9+$10-u1) * 100 / (t-t1); }' <(grep 'cpu ' /proc/stat) <(sleep 1;grep 'cpu ' /proc/stat)
1.75439
root@pve2:~# awk '{u=$2+$3+$4+$6+$7+$8+$9+$10; t=$2+$3+$4+$5+$6+$7+$8+$9+$10; if (NR==1){u1=u; t1=t;} else print ($2+$3+$4+$6+$7+$8+$9+$10-u1) * 100 / (t-t1); }' <(grep 'cpu ' /proc/stat) <(sleep 1;grep 'cpu ' /proc/stat)
0.75188
root@pve2:~# awk '{u=$2+$3+$4+$6+$7+$8+$9+$10; t=$2+$3+$4+$5+$6+$7+$8+$9+$10; if (NR==1){u1=u; t1=t;} else print ($2+$3+$4+$6+$7+$8+$9+$10-u1) * 100 / (t-t1); }' <(grep 'cpu ' /proc/stat) <(sleep 1;grep 'cpu ' /proc/stat)
1.0101
root@pve2:~# awk '{u=$2+$3+$4+$6+$7+$8+$9+$10; t=$2+$3+$4+$5+$6+$7+$8+$9+$10; if (NR==1){u1=u; t1=t;} else print ($2+$3+$4+$6+$7+$8+$9+$10-u1) * 100 / (t-t1); }' <(grep 'cpu ' /proc/stat) <(sleep 1;grep 'cpu ' /proc/stat)
1
root@pve2:~# awk '{u=$2+$3+$4+$6+$7+$8+$9+$10; t=$2+$3+$4+$5+$6+$7+$8+$9+$10; if (NR==1){u1=u; t1=t;} else print ($2+$3+$4+$6+$7+$8+$9+$10-u1) * 100 / (t-t1); }' <(grep 'cpu ' /proc/stat) <(sleep 1;grep 'cpu ' /proc/stat)
3.01508
Let's say, <3%



When iperf running (2.5Gb/s of course) :

Code: Select all

root@pve2:~# awk '{u=$2+$3+$4+$6+$7+$8+$9+$10; t=$2+$3+$4+$5+$6+$7+$8+$9+$10; if (NR==1){u1=u; t1=t;} else print ($2+$3+$4+$6+$7+$8+$9+$10-u1) * 100 / (t-t1); }' <(grep 'cpu ' /proc/stat) <(sleep 1;grep 'cpu ' /proc/stat)
18.4915
root@pve2:~# awk '{u=$2+$3+$4+$6+$7+$8+$9+$10; t=$2+$3+$4+$5+$6+$7+$8+$9+$10; if (NR==1){u1=u; t1=t;} else print ($2+$3+$4+$6+$7+$8+$9+$10-u1) * 100 / (t-t1); }' <(grep 'cpu ' /proc/stat) <(sleep 1;grep 'cpu ' /proc/stat)
15.7107
root@pve2:~# awk '{u=$2+$3+$4+$6+$7+$8+$9+$10; t=$2+$3+$4+$5+$6+$7+$8+$9+$10; if (NR==1){u1=u; t1=t;} else print ($2+$3+$4+$6+$7+$8+$9+$10-u1) * 100 / (t-t1); }' <(grep 'cpu ' /proc/stat) <(sleep 1;grep 'cpu ' /proc/stat)
15.3846
root@pve2:~# awk '{u=$2+$3+$4+$6+$7+$8+$9+$10; t=$2+$3+$4+$5+$6+$7+$8+$9+$10; if (NR==1){u1=u; t1=t;} else print ($2+$3+$4+$6+$7+$8+$9+$10-u1) * 100 / (t-t1); }' <(grep 'cpu ' /proc/stat) <(sleep 1;grep 'cpu ' /proc/stat)
14.5363
root@pve2:~# awk '{u=$2+$3+$4+$6+$7+$8+$9+$10; t=$2+$3+$4+$5+$6+$7+$8+$9+$10; if (NR==1){u1=u; t1=t;} else print ($2+$3+$4+$6+$7+$8+$9+$10-u1) * 100 / (t-t1); }' <(grep 'cpu ' /proc/stat) <(sleep 1;grep 'cpu ' /proc/stat)
16.8766
root@pve2:~# awk '{u=$2+$3+$4+$6+$7+$8+$9+$10; t=$2+$3+$4+$5+$6+$7+$8+$9+$10; if (NR==1){u1=u; t1=t;} else print ($2+$3+$4+$6+$7+$8+$9+$10-u1) * 100 / (t-t1); }' <(grep 'cpu ' /proc/stat) <(sleep 1;grep 'cpu ' /proc/stat)
20.8955
root@pve2:~# awk '{u=$2+$3+$4+$6+$7+$8+$9+$10; t=$2+$3+$4+$5+$6+$7+$8+$9+$10; if (NR==1){u1=u; t1=t;} else print ($2+$3+$4+$6+$7+$8+$9+$10-u1) * 100 / (t-t1); }' <(grep 'cpu ' /proc/stat) <(sleep 1;grep 'cpu ' /proc/stat)
15.1741
root@pve2:~# awk '{u=$2+$3+$4+$6+$7+$8+$9+$10; t=$2+$3+$4+$5+$6+$7+$8+$9+$10; if (NR==1){u1=u; t1=t;} else print ($2+$3+$4+$6+$7+$8+$9+$10-u1) * 100 / (t-t1); }' <(grep 'cpu ' /proc/stat) <(sleep 1;grep 'cpu ' /proc/stat)
16.2963
root@pve2:~# awk '{u=$2+$3+$4+$6+$7+$8+$9+$10; t=$2+$3+$4+$5+$6+$7+$8+$9+$10; if (NR==1){u1=u; t1=t;} else print ($2+$3+$4+$6+$7+$8+$9+$10-u1) * 100 / (t-t1); }' <(grep 'cpu ' /proc/stat) <(sleep 1;grep 'cpu ' /proc/stat)
15.5941
root@pve2:~# awk '{u=$2+$3+$4+$6+$7+$8+$9+$10; t=$2+$3+$4+$5+$6+$7+$8+$9+$10; if (NR==1){u1=u; t1=t;} else print ($2+$3+$4+$6+$7+$8+$9+$10-u1) * 100 / (t-t1); }' <(grep 'cpu ' /proc/stat) <(sleep 1;grep 'cpu ' /proc/stat)
16.0099
root@pve2:~# awk '{u=$2+$3+$4+$6+$7+$8+$9+$10; t=$2+$3+$4+$5+$6+$7+$8+$9+$10; if (NR==1){u1=u; t1=t;} else print ($2+$3+$4+$6+$7+$8+$9+$10-u1) * 100 / (t-t1); }' <(grep 'cpu ' /proc/stat) <(sleep 1;grep 'cpu ' /proc/stat)
16.0494
root@pve2:~# awk '{u=$2+$3+$4+$6+$7+$8+$9+$10; t=$2+$3+$4+$5+$6+$7+$8+$9+$10; if (NR==1){u1=u; t1=t;} else print ($2+$3+$4+$6+$7+$8+$9+$10-u1) * 100 / (t-t1); }' <(grep 'cpu ' /proc/stat) <(sleep 1;grep 'cpu ' /proc/stat)
13.198
root@pve2:~# awk '{u=$2+$3+$4+$6+$7+$8+$9+$10; t=$2+$3+$4+$5+$6+$7+$8+$9+$10; if (NR==1){u1=u; t1=t;} else print ($2+$3+$4+$6+$7+$8+$9+$10-u1) * 100 / (t-t1); }' <(grep 'cpu ' /proc/stat) <(sleep 1;grep 'cpu ' /proc/stat)
21.8673
~15 to 20%

This is sightly higher than your results.

Code: Select all

root@pve2:~# uname -a
Linux pve2 5.4.106-1-pve #1 SMP PVE 5.4.106-1 (Fri, 19 Mar 2021 11:08:47 +0100) x86_64 GNU/Linux
root@pve2:~# lsb_release -a
No LSB modules are available.
Distributor ID: Debian
Description:    Debian GNU/Linux 10 (buster)
Release:        10
Codename:       buster
root@pve2:~# ethtool -i enp2s0
driver: r8169
version:
firmware-version: rtl8125b-2_0.0.2 07/13/20
expansion-rom-version:
bus-info: 0000:02:00.0
supports-statistics: yes
supports-test: no
supports-eeprom-access: no
supports-register-dump: yes
supports-priv-flags: no
One interesting thing from your ethtool output is the fact the driver version explicitely states it is a NAPI driver ; thus providing IRQ coalescence which should avoid the system from being overwhemed by "irq storm".
As NAPI is far from being new, I guess the driver from vanilla kernel does implement it also.

domih
Posts: 410
Joined: Mon Feb 11, 2019 4:48 pm
languages_spoken: English, French
ODROIDs: UX4, HC2, N2, N2+, H2, H2+, C4, HC4 - 1GbE, 2.5GbE, 10GbE, 40+GbE
Location: San Francisco Bay Area
Has thanked: 157 times
Been thanked: 158 times
Contact:

Re: WORKAROUND : Horrible network performance of R8125B when connected to a 1GbE switch

Post by domih »

Don't be concerned about idle.

<<...This is sightly higher than your results...>>

Mmm... it's 50+% higher.

comparison.png
comparison.png (28.01 KiB) Viewed 1268 times

Not an absolute concern when you have only one session, but it becomes unbearable if you use a H2 Net card and you have multiple sessions.

I remember when testing Kernel 5.9, 5.10,... and Ubuntu 20.04, 20.10,... and RTL8125 v9.004.01, v9.005.01 with 5 sessions @ 2.35Gbits/sec, the Desktop was unusable becomes CPU resources were exhausted. Going back to v9.003.05 fixed it. Disabling ASPM makes v9.003.05 work OK on 1GbE subnet. So until Realtek cleans up its act, that what's I use.

So it depends on use cases. Most people won't see a difference if they use only one connection at a time.

You might want to try:
- Disabling ASPM with your current installation. One never knows. And see if it has an effect on the CPU consumption.
- Try with v9.003.05 (the DKMS Hardkernel has made). But you have to check ethtool -i ... to make sure the kernel switched to the DKMS version.

<<...One interesting thing from your ethtool output is the fact the driver version explicitely states it is a NAPI driver ; thus providing IRQ coalescence which should avoid the system from being overwhemed by "irq storm".
As NAPI is far from being new, I guess the driver from vanilla kernel does implement it also...>>


This is way beyond my knowledge and skills :mrgreen:
These users thanked the author domih for the post:
caramb (Tue May 04, 2021 2:23 am)

caramb
Posts: 27
Joined: Sun Oct 18, 2020 8:32 pm
languages_spoken: english french
ODROIDs: Odroid-H2+
Has thanked: 1 time
Been thanked: 5 times
Contact:

Re: WORKAROUND : Horrible network performance of R8125B when connected to a 1GbE switch

Post by caramb »

<<...This is sightly higher than your results...>>
Mmm... it's 50+% higher.
I would say that you're slightly right :mrgreen: ; even if this was not that obvious as my "idle" usage was high & unstable compared to yours.
You might want to try:
- Disabling ASPM with your current installation. One never knows. And see if it has an effect on the CPU consumption.
- Try with v9.003.05 (the DKMS Hardkernel has made). But you have to check ethtool -i ... to make sure the kernel switched to the DKMS version.
I'm already running with ASPM disabled.
Will give a try using the old DKMS driver and will let you know.

I do trust your statements and results.
I will update some posts in this thread to clarify for those who are really concerned by driver efficiency (I agree it does matter)

Thank you very much for sharing.
I also now have a better understanding of joshua.yang's statements in the thread you pointed (viewtopic.php?p=327565#p327565)

Built-in driver of the mainline 5.11 kernel do works but seems to be the less efficient right now...
First DKMS driver release is the most efficient
So all this leads to an unanticpated conclusion : "it was better before" :roll:
These users thanked the author caramb for the post:
odroid (Tue May 04, 2021 9:26 am)

Post Reply

Return to “General Topics”

Who is online

Users browsing this forum: No registered users and 1 guest