LAN performance limited to 30MB/Second

Post Reply
xenek
Posts: 27
Joined: Sun Apr 17, 2016 3:51 pm
languages_spoken: english
ODROIDs: XU4
Has thanked: 0
Been thanked: 0
Contact:

LAN performance limited to 30MB/Second

Unread post by xenek » Sun Apr 17, 2016 4:03 pm

I am trying to setup XU-4 as a file server, using Ubuntu.

Distributor ID: Ubuntu
Description: Ubuntu 14.04.4 LTS
Release: 14.04
Codename: trusty

I get only 27 megabytes a second transfer speed over Samba. A DD check of the USB3 drive finds it can read/write at about 70 megabytes a second.

Optimizing Samba for performance gained only 3 MB/Second, meaning now I get 30 MB/Second

I have tried a different USB 3 Gigabit ethernet adaptor, and I have fixed the IPs and used a straight through cable meaning no ethernet switch is in the path.

Governor is set to performance.

From dmesg

cdc_ether 6-1:2.0 eth0: register 'cdc_ether' at usb-xhci-hcd.5.auto-1, CDC Ethernet Device, 00:1e:06:30:28:8d

I have many, many questions. But just one will do. Why is my network performance so slow?

User avatar
odroid
Site Admin
Posts: 30265
Joined: Fri Feb 22, 2013 11:14 pm
languages_spoken: English
ODROIDs: ODROID
Has thanked: 3 times
Been thanked: 24 times
Contact:

Re: LAN performance limited to 30MB/Second

Unread post by odroid » Sun Apr 17, 2016 4:42 pm

What is you Kernel version?
Did you update the kernel via odroid-utility?

xenek
Posts: 27
Joined: Sun Apr 17, 2016 3:51 pm
languages_spoken: english
ODROIDs: XU4
Has thanked: 0
Been thanked: 0
Contact:

Re: LAN performance limited to 30MB/Second

Unread post by xenek » Sun Apr 17, 2016 5:14 pm

This has been done. Now running 3.10.96

I thought software and updates would be enough to update, but it seems a kernel update via odroid utility was also needed.

After a reboot it now works on the external usb3-ethernet at about 50MB/Second. This is about as fast as I could get using OMV as well using the onboard ethernet...

Is this usual? I thought the network could do 100 MB/Second, and I know the filesystem does 70 MB/Second ?? (NTFS)

Thank you for your prompt late-Sunday post, legendary service!

User avatar
odroid
Site Admin
Posts: 30265
Joined: Fri Feb 22, 2013 11:14 pm
languages_spoken: English
ODROIDs: ODROID
Has thanked: 3 times
Been thanked: 24 times
Contact:

Re: LAN performance limited to 30MB/Second

Unread post by odroid » Sun Apr 17, 2016 5:17 pm

EXT4 will give you 120MB/sec of access speed probably.
If you switch the file system to EXT4, you will have 70~80MB of Samba throughput.

Try the iperf test first to find where is the bottleneck.

xenek
Posts: 27
Joined: Sun Apr 17, 2016 3:51 pm
languages_spoken: english
ODROIDs: XU4
Has thanked: 0
Been thanked: 0
Contact:

Re: LAN performance limited to 30MB/Second

Unread post by xenek » Sun Apr 17, 2016 6:53 pm

I can now get 87 MB/s sustained read using SAMBA to Windows 10 with EXT4 disk on Odroid-XU4 and the onboard gigabit LAN which I think is pushing the read speed of that particular 2.5" SATA Disk. Certainly the Kernel update fixed things. Also, sad to say, but abandoning NTFS is necessary if you want performance.

Interestingly, performance with Ubuntu as a fileserver is better than OMV, even with EXT4.

I'll have to try some better disks I think. The faster a file server is, the more functional.

Thanks for your help!

xenek
Posts: 27
Joined: Sun Apr 17, 2016 3:51 pm
languages_spoken: english
ODROIDs: XU4
Has thanked: 0
Been thanked: 0
Contact:

Re: LAN performance limited to 30MB/Second

Unread post by xenek » Sun Apr 17, 2016 9:04 pm

I should mention that further testing helped me find the governor makes a big difference.

ondemand
performance
<70 MB/s

whereas the default

Interactive
>80 MB/s

DarkBahamut
Posts: 331
Joined: Tue Jan 19, 2016 10:19 am
languages_spoken: english
ODROIDs: XU4, N1
Has thanked: 0
Been thanked: 0
Contact:

Re: LAN performance limited to 30MB/Second

Unread post by DarkBahamut » Mon Apr 18, 2016 4:59 am

I see you manage 87MB/sec, what did you change to get that? On Ubuntu 15.10 my XU4 tops out at 81MB/sec over the network. This is even when using iperf with up to 10 threads so it's fully limited by the network interface (nothing to do with the drives or Samba).

That said, more recent kernel revisions have made some large changes to the USB and NIC setup on the kernel level, and personally I think these have actually reduced the performance of the XU4 over the network. I'd be interested to hear what hardkernel can actually achieve internally over the network using the most recent kernel release. The produce page (which used a much older kernel) claims ~870Mbps, but using exactly the same iperf settings I can't get anywhere near it on 15.10 with the latest kernel built from source. Are there known performance issues with the later kernel builds?

Here's the performance using iperf on the latest XU4 builds (660Mbps) - http://i.imgur.com/8TuVi4z.png

For clarity, the two windows machines on this network can do file transfers at 115MB/s, so the network itself is more than cable of full gigabit performance. It's a shame the XU4 is a bit lacking in this area.

User avatar
odroid
Site Admin
Posts: 30265
Joined: Fri Feb 22, 2013 11:14 pm
languages_spoken: English
ODROIDs: ODROID
Has thanked: 3 times
Been thanked: 24 times
Contact:

Re: LAN performance limited to 30MB/Second

Unread post by odroid » Mon Apr 18, 2016 10:50 am

Try the iperf test again with the performance governor.

Code: Select all

echo performance > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor

Marszczak
Posts: 44
Joined: Thu Oct 01, 2015 10:15 pm
Has thanked: 0
Been thanked: 0
Contact:

Re: LAN performance limited to 30MB/Second

Unread post by Marszczak » Tue Apr 19, 2016 5:16 am

Nobody mentioned it explicitly - so what is the file size you are transferring? Samba has performance problems especially with small files.

User avatar
rooted
Posts: 6449
Joined: Fri Dec 19, 2014 9:12 am
languages_spoken: english
Location: Gulf of Mexico, US
Has thanked: 4 times
Been thanked: 3 times
Contact:

Re: LAN performance limited to 30MB/Second

Unread post by rooted » Tue Apr 19, 2016 7:56 am

SMB also has overhead, if you transfer using something like netcat on both ends you will see what I mean.

DarkBahamut
Posts: 331
Joined: Tue Jan 19, 2016 10:19 am
languages_spoken: english
ODROIDs: XU4, N1
Has thanked: 0
Been thanked: 0
Contact:

Re: LAN performance limited to 30MB/Second

Unread post by DarkBahamut » Tue Apr 19, 2016 8:34 am

Iperf using performance - no change. NIC performance is still only 660mbps.

http://i.imgur.com/I4EGTC7.png

User avatar
odroid
Site Admin
Posts: 30265
Joined: Fri Feb 22, 2013 11:14 pm
languages_spoken: English
ODROIDs: ODROID
Has thanked: 3 times
Been thanked: 24 times
Contact:

Re: LAN performance limited to 30MB/Second

Unread post by odroid » Tue Apr 19, 2016 11:59 am

Our result is better. Which router/switch do you use?

Code: Select all

odroid@odroid:~$ iperf -c 192.168.100.2 -P 10
------------------------------------------------------------
Client connecting to 192.168.100.2, TCP port 5001
TCP window size: 20.0 KByte (default)
------------------------------------------------------------
[ 12] local 192.168.100.14 port 56705 connected with 192.168.100.2 port 5001
[ 4] local 192.168.100.14 port 56696 connected with 192.168.100.2 port 5001
[ 3] local 192.168.100.14 port 56697 connected with 192.168.100.2 port 5001
[ 5] local 192.168.100.14 port 56698 connected with 192.168.100.2 port 5001
[ 6] local 192.168.100.14 port 56699 connected with 192.168.100.2 port 5001
[ 7] local 192.168.100.14 port 56700 connected with 192.168.100.2 port 5001
[ 8] local 192.168.100.14 port 56701 connected with 192.168.100.2 port 5001
[ 9] local 192.168.100.14 port 56702 connected with 192.168.100.2 port 5001
[ 10] local 192.168.100.14 port 56703 connected with 192.168.100.2 port 5001
[ 11] local 192.168.100.14 port 56704 connected with 192.168.100.2 port 5001
[ ID] Interval Transfer Bandwidth
[ 4] 0.0-10.0 sec 102 MBytes 85.8 Mbits/sec
[ 3] 0.0-10.0 sec 103 MBytes 86.6 Mbits/sec
[ 8] 0.0-10.0 sec 103 MBytes 86.4 Mbits/sec
[ 11] 0.0-10.0 sec 103 MBytes 86.2 Mbits/sec
[ 5] 0.0-10.0 sec 102 MBytes 85.8 Mbits/sec
[ 6] 0.0-10.0 sec 103 MBytes 86.5 Mbits/sec
[ 7] 0.0-10.0 sec 103 MBytes 86.6 Mbits/sec
[ 9] 0.0-10.0 sec 102 MBytes 85.6 Mbits/sec
[ 10] 0.0-10.0 sec 103 MBytes 86.3 Mbits/sec
[ 12] 0.0-10.0 sec 103 MBytes 86.4 Mbits/sec
[SUM] 0.0-10.0 sec 1.01 GBytes 861 Mbits/sec


odroid@odroid:~$ iperf -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
[ 4] local 192.168.100.14 port 5001 connected with 192.168.100.2 port 37972
[ 5] local 192.168.100.14 port 5001 connected with 192.168.100.2 port 37974
[ 6] local 192.168.100.14 port 5001 connected with 192.168.100.2 port 37976
[ 7] local 192.168.100.14 port 5001 connected with 192.168.100.2 port 37978
[ 8] local 192.168.100.14 port 5001 connected with 192.168.100.2 port 37980
[ 9] local 192.168.100.14 port 5001 connected with 192.168.100.2 port 37982
[ 10] local 192.168.100.14 port 5001 connected with 192.168.100.2 port 37984
[ 11] local 192.168.100.14 port 5001 connected with 192.168.100.2 port 37986
[ 12] local 192.168.100.14 port 5001 connected with 192.168.100.2 port 37988
[ 13] local 192.168.100.14 port 5001 connected with 192.168.100.2 port 37990
[ ID] Interval Transfer Bandwidth
[ 7] 0.0-10.2 sec 84.9 MBytes 69.7 Mbits/sec
[ 5] 0.0-10.2 sec 75.8 MBytes 62.1 Mbits/sec
[ 8] 0.0-10.2 sec 109 MBytes 89.1 Mbits/sec
[ 4] 0.0-10.2 sec 116 MBytes 94.9 Mbits/sec
[ 10] 0.0-10.2 sec 108 MBytes 88.1 Mbits/sec
[ 13] 0.0-10.2 sec 72.4 MBytes 59.3 Mbits/sec
[ 6] 0.0-10.2 sec 77.0 MBytes 63.0 Mbits/sec
[ 9] 0.0-10.2 sec 90.0 MBytes 73.7 Mbits/sec
[ 11] 0.0-10.2 sec 78.4 MBytes 64.2 Mbits/sec
[ 12] 0.0-10.2 sec 71.4 MBytes 58.4 Mbits/sec
[SUM] 0.0-10.2 sec 882 MBytes 722 Mbits/sec

DarkBahamut
Posts: 331
Joined: Tue Jan 19, 2016 10:19 am
languages_spoken: english
ODROIDs: XU4, N1
Has thanked: 0
Been thanked: 0
Contact:

Re: LAN performance limited to 30MB/Second

Unread post by DarkBahamut » Mon May 02, 2016 8:22 pm

@odroid

The router is a Asus RT-AC66U. Two windows PC's connected to the router can manage >960Mbps in iperf (116MB/sec file transfers) so the router itself is capable of handling the network speed. I do think I've found the cause of the slow speed though. While running iperf, CPU 0 goes to 100% CPU load and stays there for the duration of the test. The network performance appears to be limited due to CPU 0 becoming overloaded by interrupts. Is there any way for the packet processing to be done by one of the A15's instead?

Code: Select all

top - 12:17:08 up 9 min,  2 users,  load average: 1.06, 0.53, 0.25
Tasks: 236 total,  12 running, 224 sleeping,   0 stopped,   0 zombie
%Cpu0  :  0.0 us,  0.0 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,100.0 si,  0.0 st
%Cpu1  :  1.1 us, 20.8 sy,  0.0 ni, 77.8 id,  0.0 wa,  0.0 hi,  0.4 si,  0.0 st
%Cpu2  :  0.7 us, 17.9 sy,  0.0 ni, 81.1 id,  0.0 wa,  0.0 hi,  0.4 si,  0.0 st
%Cpu3  :  2.5 us, 16.2 sy,  0.0 ni, 81.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu4  :  0.0 us,  0.3 sy,  0.0 ni, 99.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu5  :  0.3 us,  0.7 sy,  0.0 ni, 99.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu6  :  0.7 us,  2.6 sy,  0.0 ni, 96.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu7  :  0.0 us,  0.3 sy,  0.0 ni, 99.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem:   2038584 total,   798056 used,  1240528 free,    38272 buffers
KiB Swap:        0 total,        0 used,        0 free.   273416 cached Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND     
 2433 darkbah+  20   0  112856   2124    956 S  64.4  0.1   0:24.41 iperf       
    3 root      20   0       0      0      0 R  39.5  0.0   0:14.54 ksoftirqd/0 
   66 root      20   0       0      0      0 R  35.9  0.0   0:15.79 kworker/0:1 
  125 root      20   0       0      0      0 S  14.2  0.0   0:06.14 thread_hot+ 
    7 root      rt   0       0      0      0 R   8.4  0.0   0:03.79 migration/0 

User avatar
odroid
Site Admin
Posts: 30265
Joined: Fri Feb 22, 2013 11:14 pm
languages_spoken: English
ODROIDs: ODROID
Has thanked: 3 times
Been thanked: 24 times
Contact:

Re: LAN performance limited to 30MB/Second

Unread post by odroid » Mon May 02, 2016 8:40 pm

What is the result of "cat /proc/interrupts"?
We need to find the IRQ number of the USB host controller first.

DarkBahamut
Posts: 331
Joined: Tue Jan 19, 2016 10:19 am
languages_spoken: english
ODROIDs: XU4, N1
Has thanked: 0
Been thanked: 0
Contact:

Re: LAN performance limited to 30MB/Second

Unread post by DarkBahamut » Mon May 02, 2016 8:50 pm

This is the result.

Code: Select all

darkbahamut@odroid:~$ cat /proc/interrupts 
           CPU0       CPU1       CPU2       CPU3       CPU4       CPU5       CPU6       CPU7       
 65:          0          0          0          0          0          0          0          0       GIC  10800000.mdma
 66:          0          0          0          0          0          0          0          0       GIC  121a0000.pdma
 67:          0          0          0          0          0          0          0          0       GIC  121b0000.pdma
 74:          0          0          0          0          0          0          0          0       GIC  101d0000.watchdog
 77:          0          0          0          0          0          0          0          0       GIC  13400000.pinctrl
 78:          0          0          0          0          0          0          0          0       GIC  14000000.pinctrl
 82:          0          0          0          0          0          0          0          0       GIC  14010000.pinctrl
 85:        973          0          0          0          0          0          0          0       GIC  12c20000.serial
 88:          4          0          0          0          0          0          0          0       GIC  12c60000.i2c
 89:          0          0          0          0          0          0          0          0       GIC  12c70000.i2c
 92:          0          0          0          0          0          0          0          0       GIC  12ca0000.hsi2c
 93:          0          0          0          0          0          0          0          0       GIC  12cb0000.hsi2c
 97:          0          0          0          0          0          0          0          0       GIC  exynos_tmu
101:          0          0          0          0          0          0          0          0       GIC  spi-s3c64xx
103:          1          0          0          0          0          0          0          0       GIC  ehci_hcd:usb1, ohci_hcd:usb2
104:       2135          0          0          0          0          0          0          0       GIC  xhci-hcd:usb3
105:    2798012          0          0          0          0          0          0          0       GIC  xhci-hcd:usb5
106:          0          0          0          0          0          0          0          0       GIC  11800000.mali
107:      34377          0          0          0          0          0          0          0       GIC  dw-mci
109:          0          0          0          0          0          0          0          0       GIC  dw-mci
110:          0          0          0          0          0          0          0          0       GIC  13410000.pinctrl
117:          0          0          0          0          0          0          0          0       GIC  drm_gsc
118:          0          0          0          0          0          0          0          0       GIC  drm_gsc
126:          0          0          0          0          0          0          0          0       GIC  drm_mixer
128:          6          0          0          0          0          0          0          0       GIC  11000000.mfc
138:          0          0          0          0          0          0          0          0       GIC  12d10000.adc
142:         54          0          0          0          0          0          0          0       GIC  3880000.adma
146:          0          0          0          0          0          0          0          0       GIC  hdmi-cec
149:          1          0          0          0          0          0          0          0       GIC  11800000.mali
152:     109772          0          0          0          0          0          0          0       GIC  mct_tick0
153:          0      78381          0          0          0          0          0          0       GIC  mct_tick1
154:          0          0      75207          0          0          0          0          0       GIC  mct_tick2
155:          0          0          0      72243          0          0          0          0       GIC  mct_tick3
160:          0          0          0          0      60991          0          0          0       GIC  mct_tick4
161:          0          0          0          0          0      34726          0          0       GIC  mct_tick5
162:          0          0          0          0          0          0      30014          0       GIC  mct_tick6
163:          0          0          0          0          0          0          0      44512       GIC  mct_tick7
201:          0          0          0          0          0          0          0          0       GIC  11f20000.sysmmu
215:          0          0          0          0          0          0          0          0       GIC  exynos_tmu
216:          1          0          0          0          0          0          0          0       GIC  exynos_tmu
217:          0          0          0          0          0          0          0          0       GIC  exynos_tmu
218:          0          0          0          0          0          0          0          0       GIC  12890000.sysmmu
220:          0          0          0          0          0          0          0          0       GIC  128a0000.sysmmu
247:          0          0          0          0          0          0          0          0       GIC  exynos_tmu
251:          0          0          0          0          0          0          0          0       GIC  11800000.mali
272:          0          0          0          0          0          0          0          0  COMBINER  13e80000.sysmmu
274:          0          0          0          0          0          0          0          0  COMBINER  13e90000.sysmmu
280:          0          0          0          0          0          0          0          0  COMBINER  14680000.sysmmu
282:          0          0          0          0          0          0          0          0  COMBINER  14640000.sysmmu
290:          0          0          0          0          0          0          0          0  COMBINER  11f10000.sysmmu
306:          0          0          0          0          0          0          0          0  COMBINER  11200000.sysmmu
316:          0          0          0          0          0          0          0          0  COMBINER  14650000.sysmmu
325:          0          0          0          0          0          0          0          0  COMBINER  11210000.sysmmu
404:          0          0          0          0          0          0          0          0  COMBINER  drm_fimd
414:          0          0          0          0          0          0          0          0  COMBINER  128e0000.sysmmu
434:          0          0          0          0          0          0          0          0  COMBINER  10a70000.sysmmu
436:          0          0          0          0          0          0          0          0  COMBINER  12880000.sysmmu
438:          0          0          0          0          0          0          0          0  COMBINER  128d0000.sysmmu
443:         14          0          0          0          0          0          0          0  COMBINER  mct_comp_irq
453:          0          0          0          0          0          0          0          0  COMBINER  10a60000.sysmmu
474:          0          0          0          0          0          0          0          0  COMBINER  128c0000.sysmmu
512:          0          0          0          0          0          0          0          0  exynos_wkup_irq_chip  s2mps11
523:          0          0          0          0          0          0          0          0   s2mps11  rtc-alarm0
529:          0          0          0          0          0          0          0          0  exynos_wkup_irq_chip  hdmi
530:          1          0          0          0          0          0          0          0  exynos_wkup_irq_chip  dwc3_id
531:          0          0          0          0          0          0          0          0  exynos_wkup_irq_chip  dwc3_b_sess
533:          0          0          0          0          0          0          0          0  exynos_wkup_irq_chip  gpio-keys: KEY_POWER
IPI0:          0          0          0          0          0          0          0          0  CPU wakeup interrupts
IPI1:          0          0          0          0          0          0          0          0  Timer broadcast interrupts
IPI2:      72609     505801     498676     491180     325790     297727     153256     270506  Rescheduling interrupts
IPI3:         16         16         14         17         17         18         13         16  Function call interrupts
IPI4:          9          9         10          2         14         54         30         11  Single function call interrupts
IPI5:          0          0          0          0          0          0          0          0  CPU stop interrupts
IPI6:          0          0          0          0          0          0          0          0  CPU backtrace
Err:          0

User avatar
odroid
Site Admin
Posts: 30265
Joined: Fri Feb 22, 2013 11:14 pm
languages_spoken: English
ODROIDs: ODROID
Has thanked: 3 times
Been thanked: 24 times
Contact:

Re: LAN performance limited to 30MB/Second

Unread post by odroid » Mon May 02, 2016 9:11 pm

IRQ #105 seems to be the Ethernet controller interrupt.

Try this command with super user. I hope it will move the IRQ handler to CPU#2 from CPU#0.
echo 2 > /proc/irq/105/smp_affinity_list

DarkBahamut
Posts: 331
Joined: Tue Jan 19, 2016 10:19 am
languages_spoken: english
ODROIDs: XU4, N1
Has thanked: 0
Been thanked: 0
Contact:

Re: LAN performance limited to 30MB/Second

Unread post by DarkBahamut » Tue May 03, 2016 1:17 am

That worked for moving the IRQ, I can assign it to any CPU and it seems to move over correctly.

It brings up an interesting performance bottleneck however, one which I'm uncertain of the cause. The network interface seems bottlenecked when using jumbo frames. Does the R8152 have some issues here? When using jumbo frames, in CPU limited situations they are faster (as they should be), but when more CPU time is available smaller frames are actually faster, which seems odd. See these benchmarks.

iperf (Client command: -c -w128k // Representative of a single thread file transfer like Samba):

Small frames (MTU = ~1500):
IRQ on Cortex A7 = 468Mbps
IRQ on Cortex A15 = 802Mbps

Jumbo frames (MTU = 9000)
IRQ on Cortex A7 = 645Mbps
IRQ on Cortex A15 = 655Mbps

Samba file transfer performance:

Small frames (MTU = ~1500)
IRQ on Cortex A7 = 45MB/s
IRQ on Cortex A15 = 77MB/s (Samba process at 100% CPU load)

Jumbo frames (MTU = 9000)
IRQ on Cortex A7 = 73MB/s
IRQ on Cortex A15 = 73MB/s

The jumbo frames seem to work as expected initially, when the IRQ is on the A7 with limited CPU power the lower overheads of jumbo frames really increases performance a lot, but once you move it to the A15 the jumbo frames rapidly increase in CPU usage (90% load at 2GHz) but the performance doesn't go up at all. The smaller frames seem to scale better and in iperf give a much better result over jumbo frames once you use an A15 for the IRQ which doesn't seem right. I wonder if there is a buffering issue in the network interface causing a performance cap on large frames?

Also of note is the IRQ seems to ignore normal HMP migrations. If you set it's affinity to 0-7 it still just runs at 100% on CPU 0. It seems to pick the first CPU possible in it's affinity mask and executes there no matter what.

User avatar
odroid
Site Admin
Posts: 30265
Joined: Fri Feb 22, 2013 11:14 pm
languages_spoken: English
ODROIDs: ODROID
Has thanked: 3 times
Been thanked: 24 times
Contact:

Re: LAN performance limited to 30MB/Second

Unread post by odroid » Tue May 03, 2016 9:34 am

Thank you for the detail test reports.
But I have very limited knowledge of the Ethernet stuff. :(
So my guide might not be helpful.

Anyway, according to the RTL8153 specification, the maximum(hw accelerated) jumbo frame size is 9K bytes.
http://www.realtek.com.tw/products/prod ... ProdID=326
Is there any way to benchmark the transfer rate with limited(8~9KB) of jumbo frame size?

I think the IRQ auto-balancing seems not to work correctly on the HMP system.

Do you use a HDD for the Samba test? or SSD?

DarkBahamut
Posts: 331
Joined: Tue Jan 19, 2016 10:19 am
languages_spoken: english
ODROIDs: XU4, N1
Has thanked: 0
Been thanked: 0
Contact:

Re: LAN performance limited to 30MB/Second

Unread post by DarkBahamut » Tue May 03, 2016 8:09 pm

It's a hard drive, though dd says the drive benchmarks at over 100MB/s so it shouldn't be too limiting for network use.

I spent an hour to two doing some MTU testing to see what is the best value to use on the XU4 is. After maybe 200-300 iperf tests and some sambra file transfers I can back with some unexpected results, but ultimately far faster network performance. I never thought I'd see Samba hit ~100MB/s with a network transfer to an XU4 :D

Image

Code: Select all

D:\Users\DarkBahamut\Downloads\iperf-2.0.5-3-win32\iperf-2.0.5-3-win32>iperf -c 192.168.1.215 -w128k
------------------------------------------------------------
Client connecting to 192.168.1.215, TCP port 5001
TCP window size:  128 KByte
------------------------------------------------------------
[  3] local 192.168.1.63 port 54528 connected with 192.168.1.215 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  1.07 GBytes   922 Mbits/sec
So, the MTU size matters a lot, but the best value is odd. After all my tests I settled on an MTU size of 6975 (I was using 9000). It was the fastest and most stable of over 30 MTU sizes I tested. Why this value is the fastest I have no idea, but values between 6950-7050 are MUCH faster. For example, 7075 is over 20% slower than 7050. The performance just falls off a cliff once you go past a certain value and you can't recover it. Likewise, values too low (under 6050) suffer lower performance in iperf, with speed decreasing as the MTU size gets smaller.

An interesting note is iperf appears CPU limited. It causes very high interrupts (as I noticed before) which limits it's performance. At the MTU of 6975, Speeds go from 780Mbps to 920Mbps if I move to IRQ to an A15 core (actually, using RSS can fix this..). However in Sambra the interrupts remain very low, and the IRQ can stay on an A7 with no limitation of performance. Speaking of Samba however, that screenshot is a bit cheap really. It does peak at 100MB/s, but the average speeds are more like 88MB/s long term. Unfortunately Samba is very CPU intensive and even our custom compiled version (with heavy compiler optimisation flags used) still hits 99% CPU load at these transfer speeds. Looking ahead, SMB3 multi channel support is coming to Samba soon, it's already in beta in the latest 4.4.0 release and this will allow multiple TCP connections (and therefore use multiple CPU cores) to be used which should overcome the CPU usage issues Samba has on the XU4. It's very possible to be seeing >110MB/s on an XU4 within a few months time :)

User avatar
odroid
Site Admin
Posts: 30265
Joined: Fri Feb 22, 2013 11:14 pm
languages_spoken: English
ODROIDs: ODROID
Has thanked: 3 times
Been thanked: 24 times
Contact:

Re: LAN performance limited to 30MB/Second

Unread post by odroid » Wed May 04, 2016 8:55 am

Thank you for letting us know the magic number 6975.
Really happy to see ~90MB/sec of transfer speed in the real world.
Did you set the MTU size on your Windows PC? or XU4?

Anyway, once the new SMB3 is released, please post the news on this thread again.

crooked
Posts: 125
Joined: Sun Sep 28, 2014 3:55 am
languages_spoken: english
ODROIDs: u3
Has thanked: 0
Been thanked: 0
Contact:

Re: LAN performance limited to 30MB/Second

Unread post by crooked » Sun Nov 27, 2016 10:34 am

so for someone who is complete nub,

how and where would I input this 6975 magic number?

I dont think samba "max xmit" is the correct theory, but correct me if wrong.

im sitting @ 12-16 MB/s on a full gigabit network/cat6/ac3100 router, etc.

pc-pc with ssd's are easily in the gigabite range 100MB/s +

poor odroid xu4 is struggling...

User avatar
rooted
Posts: 6449
Joined: Fri Dec 19, 2014 9:12 am
languages_spoken: english
Location: Gulf of Mexico, US
Has thanked: 4 times
Been thanked: 3 times
Contact:

Re: LAN performance limited to 30MB/Second

Unread post by rooted » Sun Nov 27, 2016 12:38 pm

crooked wrote:so for someone who is complete nub,

how and where would I input this 6975 magic number?

I dont think samba "max xmit" is the correct theory, but correct me if wrong.

im sitting @ 12-16 MB/s on a full gigabit network/cat6/ac3100 router, etc.

pc-pc with ssd's are easily in the gigabite range 100MB/s +

poor odroid xu4 is struggling...

Code: Select all

sudo ifconfig eth0 mtu 6975 up

crooked
Posts: 125
Joined: Sun Sep 28, 2014 3:55 am
languages_spoken: english
ODROIDs: u3
Has thanked: 0
Been thanked: 0
Contact:

Re: LAN performance limited to 30MB/Second

Unread post by crooked » Tue Dec 06, 2016 9:38 am

Awesome, thank you

I am now sitting in the 70-80MB/s range from ssd to odroid. I think I have hit the limit of the Western digital usb 3.0 drive.
My DD benchmarks are sitting in that same range. pretty fucking stoked.

Now how does one change the IRQ to core a15?

Im going to swap in an SSD to see if i can get singing past 100MB/s

User avatar
mad_ady
Posts: 5660
Joined: Wed Jul 15, 2015 5:00 pm
languages_spoken: english
ODROIDs: XU4, C1+, C2, N1, H2, N2
Location: Bucharest, Romania
Has thanked: 8 times
Been thanked: 13 times
Contact:

Re: LAN performance limited to 30MB/Second

Unread post by mad_ady » Tue Dec 13, 2016 7:44 pm

@DarkBahamut: How do I find out which interrupt is for the network port? I'm running kernel 4.8.11 at the moment:

Code: Select all

root@aldebarano:~# lsusb -t
/:  Bus 06.Port 1: Dev 1, Class=root_hub, Driver=exynos-ohci/3p, 12M
/:  Bus 05.Port 1: Dev 1, Class=root_hub, Driver=exynos-ehci/3p, 480M
/:  Bus 04.Port 1: Dev 1, Class=root_hub, Driver=xhci-hcd/1p, 5000M
    |__ Port 1: Dev 2, If 0, Class=Vendor Specific Class, Driver=r8152, 5000M
/:  Bus 03.Port 1: Dev 1, Class=root_hub, Driver=xhci-hcd/1p, 480M
/:  Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci-hcd/1p, 5000M
    |__ Port 1: Dev 2, If 0, Class=Hub, Driver=hub/2p, 5000M
        |__ Port 1: Dev 3, If 0, Class=Mass Storage, Driver=usb-storage, 5000M
/:  Bus 01.Port 1: Dev 1, Class=root_hub, Driver=xhci-hcd/1p, 480M
    |__ Port 1: Dev 2, If 0, Class=Hub, Driver=hub/2p, 480M
root@aldebarano:~# cat /proc/interrupts 
           CPU0       CPU1       CPU2       CPU3       CPU4       CPU5       CPU6       CPU7       
 49:          0          0          0          0          0          0          0          0  COMBINER 187 Edge      mct_comp_irq
 50:    2851827          0          0          0          0          0          0          0     GIC-0 152 Edge      mct_tick0
 51:          0    3329531          0          0          0          0          0          0     GIC-0 153 Edge      mct_tick1
 52:          0          0    2975317          0          0          0          0          0     GIC-0 154 Edge      mct_tick2
 53:          0          0          0    2848658          0          0          0          0     GIC-0 155 Edge      mct_tick3
 54:          0          0          0          0    3048661          0          0          0     GIC-0 160 Edge      mct_tick4
 55:          0          0          0          0          0    3057311          0          0     GIC-0 161 Edge      mct_tick5
 56:          0          0          0          0          0          0    2741034          0     GIC-0 162 Edge      mct_tick6
 57:          0          0          0          0          0          0          0    2484400     GIC-0 163 Edge      mct_tick7
 58:          0          0          0          0          0          0          0          0  COMBINER 197 Edge      10a60000.sysmmu:
 59:          0          0          0          0          0          0          0          0  COMBINER 178 Edge      10a70000.sysmmu:
 60:          0          0          0          0          0          0          0          0  COMBINER  60 Edge      14650000.sysmmu:
 61:          0          0          0          0          0          0          0          0  COMBINER  16 Edge      13e80000.sysmmu:
 62:          0          0          0          0          0          0          0          0  COMBINER  18 Edge      13e90000.sysmmu:
 63:          0          0          0          0          0          0          0          0  COMBINER 180 Edge      12880000.sysmmu:
 64:          0          0          0          0          0          0          0          0     GIC-0 218 Edge      12890000.sysmmu:
 65:          0          0          0          0          0          0          0          0     GIC-0 220 Edge      128a0000.sysmmu:
 66:          0          0          0          0          0          0          0          0  COMBINER 218 Edge      128c0000.sysmmu:
 67:          0          0          0          0          0          0          0          0  COMBINER 182 Edge      128d0000.sysmmu:
 68:          0          0          0          0          0          0          0          0  COMBINER 158 Edge      128e0000.sysmmu:
 69:          0          0          0          0          0          0          0          0  COMBINER  32 Edge      11d40000.sysmmu:
 70:          0          0          0          0          0          0          0          0  COMBINER  34 Edge      11f10000.sysmmu:
 71:          0          0          0          0          0          0          0          0     GIC-0 201 Edge      11f20000.sysmmu:
 72:          0          0          0          0          0          0          0          0  COMBINER  50 Edge      11200000.sysmmu:
 73:          0          0          0          0          0          0          0          0  COMBINER  69 Edge      11210000.sysmmu:
 74:          0          0          0          0          0          0          0          0  COMBINER  26 Edge      14640000.sysmmu:
 75:          0          0          0          0          0          0          0          0  COMBINER  24 Edge      14680000.sysmmu:
 78:       1160          0          0          0          0          0          0          0     GIC-0  85 Edge      12c20000.serial:
 80:          0          0          0          0          0          0          0          0     GIC-0  90 Edge      12c80000.i2c:
 81:          0          0          0          0          0          0          0          0       PMU  43 Edge      s3c2410-rtc alarm
 82:          0          0          0          0          0          0          0          0       PMU  44 Edge      s3c2410-rtc tick
 87:          0          0          0          0          0          0          0          0     GIC-0 144 Edge      10830000.sss:
 88:    1615430          0          0          0          0          0          0          0     GIC-0  92 Edge      12ca0000.i2c:
 89:          1          0          0          0          0          0          0          0     GIC-0 103 Edge      ehci_hcd:usb5, ohci_hcd:usb6
 90:          4          0          0          0          0          0          0          0     GIC-0 128 Edge      11000000.codec:
 91:    2355303          0          0          0          0          0          0          0     GIC-0 107 Edge      dw-mci
 92:          0          0          0          0          0          0          0          0     GIC-0 109 Edge      dw-mci
 93:          0          0          0          0          0          0          0          0     GIC-0  77 Edge      13400000.pinctrl:
111:          1          0          0          0          0          0          0          0     GIC-0 110 Edge      13410000.pinctrl:
112:          0          0          0          0          0          0          0          0     GIC-0  78 Edge      14000000.pinctrl:
113:          0          0          0          0          0          0          0          0     GIC-0  82 Edge      14010000.pinctrl:
114:          0          0          0          0          0          0          0          0     GIC-0  79 Edge      3860000.pinctrl:
115:          0          0          0          0          0          0          0          0     GIC-0 142 Edge      3880000.adma
116:          0          0          0          0          0          0          0          0     GIC-0  66 Edge      121a0000.pdma
117:          0          0          0          0          0          0          0          0     GIC-0  67 Edge      121b0000.pdma
118:          0          0          0          0          0          0          0          0     GIC-0  65 Edge      10800000.mdma
121:          0          0          0          0          0          0          0          0     GIC-0 126 Edge      drm_mixer
123:          0          0          0          0          0          0          0          0     GIC-0 117 Edge      13e00000.video-scaler:
124:          0          0          0          0          0          0          0          0     GIC-0 118 Edge      13e10000.video-scaler:
127:     138133          0          0          0          0          0          0          0     GIC-0  97 Edge      10060000.tmu:
128:        141          0          0          0          0          0          0          0     GIC-0 215 Edge      10064000.tmu:
129:          0          0          0          0          0          0          0          0     GIC-0 216 Edge      10068000.tmu:
130:         21          0          0          0          0          0          0          0     GIC-0 217 Edge      1006c000.tmu:
131:          0          0          0          0          0          0          0          0     GIC-0 247 Edge      100a0000.tmu:
132:          0          0          0          0          0          0          0          0     GIC-0 251 Edge      11800000.mali:
133:          0          0          0          0          0          0          0          0     GIC-0 106 Edge      11800000.mali:
134:          1          0          0          0          0          0          0          0     GIC-0 149 Edge      11800000.mali:
135:          1          0          0          0          0          0          0          0  exynos4210_wkup_irq_chip   7 Edge      hdmi
136:         19          0          0          0          0          0          0          0  exynos4210_wkup_irq_chip   4 Edge      s2mps11
137:          0          0          0          0          9          8          1          0   s2mps11  10 Edge      rtc-alarm0
138:          1          0          0          0          0          0          0          0  exynos_gpio_irq_chip   2 Edge      12200000.mmc: cd
139:     316632          0          0          0          0          0          0          0     GIC-0 104 Edge      xhci-hcd:usb1
140:    3264631          0          0          0      28421          0          0          0     GIC-0 105 Edge      xhci-hcd:usb3
IPI0:          0          0          0          0          0          0          0          0  CPU wakeup interrupts
IPI1:          0          0          0          0          0          0          0          0  Timer broadcast interrupts
IPI2:    2361361    1999703    1647158    1494155    1913921    1781469    1700499    1652194  Rescheduling interrupts
IPI3:      11906      12502       9237       8477      13850      12851      11211      11331  Function call interrupts
IPI4:          0          0          0          0          0          0          0          0  CPU stop interrupts
IPI5:     134955     122246      87800      78882     162445     104245      82787      79460  IRQ work interrupts
IPI6:          0          0          0          0          0          0          0          0  completion interrupts
Err:          0

So, the network is on bus4, but I don't see it in /proc/interrupts

Code: Select all

root@aldebarano:~# cat /proc/interrupts | grep -i usb
 89:          1          0          0          0          0          0          0          0     GIC-0 103 Edge      ehci_hcd:usb5, ohci_hcd:usb6
139:     317496          0          0          0          0          0          0          0     GIC-0 104 Edge      xhci-hcd:usb1
140:    3264631          0          0          0      42698          0          0          0     GIC-0 105 Edge      xhci-hcd:usb3



DarkBahamut
Posts: 331
Joined: Tue Jan 19, 2016 10:19 am
languages_spoken: english
ODROIDs: XU4, N1
Has thanked: 0
Been thanked: 0
Contact:

Re: LAN performance limited to 30MB/Second

Unread post by DarkBahamut » Mon Dec 19, 2016 10:06 am

Sorry, I hadn't checked back here!

I don't think the results of the lsusb and /proc/interrupts match up. For example for me the NIC is on bus 6, but 'usb5' (IRQ 105) is definitely doing the the work. From the look of your interrupt numbers I would say IRQ 140 is the one doing the packet work. Had you done anything regarding CPU affinity before posting that list? If not it's showing signs of switching CPUs for some reason, since most of the work is on CPU 0, but also some small work has been done on CPU 4.

User avatar
mad_ady
Posts: 5660
Joined: Wed Jul 15, 2015 5:00 pm
languages_spoken: english
ODROIDs: XU4, C1+, C2, N1, H2, N2
Location: Bucharest, Romania
Has thanked: 8 times
Been thanked: 13 times
Contact:

Re: LAN performance limited to 30MB/Second

Unread post by mad_ady » Mon Dec 19, 2016 3:43 pm

Yeah, I had moved 140 to core 4. Is it correct to identify the network card if I take a /proc/interrupts reading before and after a iperf test and the difference in counters should (more or less) be equal to the number of packets generated (1 packet = 1 interrupt)?

User avatar
rooted
Posts: 6449
Joined: Fri Dec 19, 2014 9:12 am
languages_spoken: english
Location: Gulf of Mexico, US
Has thanked: 4 times
Been thanked: 3 times
Contact:

Re: LAN performance limited to 30MB/Second

Unread post by rooted » Mon Dec 19, 2016 3:59 pm

TCP packets are two way, would that matter?

User avatar
mad_ady
Posts: 5660
Joined: Wed Jul 15, 2015 5:00 pm
languages_spoken: english
ODROIDs: XU4, C1+, C2, N1, H2, N2
Location: Bucharest, Romania
Has thanked: 8 times
Been thanked: 13 times
Contact:

Re: LAN performance limited to 30MB/Second

Unread post by mad_ady » Mon Dec 19, 2016 4:29 pm

I think interrupts are generated only for received packets. Packets get sent when the NIC driver gets time and it probably tries to empty its "send" queue (or applies some TCP queue management algorithm like Nagle). I'd expect interrupts to be generated by received packets (upload to the NAS).

User avatar
mad_ady
Posts: 5660
Joined: Wed Jul 15, 2015 5:00 pm
languages_spoken: english
ODROIDs: XU4, C1+, C2, N1, H2, N2
Location: Bucharest, Romania
Has thanked: 8 times
Been thanked: 13 times
Contact:

Re: LAN performance limited to 30MB/Second

Unread post by mad_ady » Mon Dec 19, 2016 6:48 pm

Ok, I've managed to run my own tests and I can confirm DarkBahamut's findings.

With no IRQ affinity set CPU0 handles all network interrupts. This does not impact performance for download (XU4 -> Client) - I get about 920Mbps, but Upload performance is lower (Client -> XU4) - ~650Mbps. I tested with the iperf process set with affinity (a special cgroup) on cores 4-7 and I could see ~24% usage for the iperf process and 100% of Core0 (caused by the interrupt handler). Once I moved the interrupt handler to Core4-7 and retested I got 90-100% CPU usage of a big core (by the interrupt handler) and ~930Mbps upload speed (iperf was running on a different big core occupying about 25%).

I'm guessing all usb-related IRQ handlers should be spread-out between CPUs 4-7, for better performance.

DarkBahamut
Posts: 331
Joined: Tue Jan 19, 2016 10:19 am
languages_spoken: english
ODROIDs: XU4, N1
Has thanked: 0
Been thanked: 0
Contact:

Re: LAN performance limited to 30MB/Second

Unread post by DarkBahamut » Mon Dec 19, 2016 9:57 pm

It's true that for iperf the performance gets better as we've seen, but for real world tasks I've found it actually hurts performance to move the IRQ to an A15. You get issues with CPU usage and you end up with lower performance. For example doing SMB transfers is definitely slower after moving the IRQ, opposed to keeping it on an A7 and using RPS instead. I've been using it for a while with good results, and there's been some recent discussion on it in this thread. The meveric assisted settings I put in my last post should give best results hopefully, I didn't bother with iperf but the SMB transfer are pretty much maxing out the link and it keeps the A15's free to deal with any heavy workloads. I'd be interested to see if improves others results by as much.

Post Reply

Return to “Ubuntu”

Who is online

Users browsing this forum: No registered users and 3 guests