HC2 performance regression on kernel 5.*

Test and fix the Kernel 5.4 features
Post Reply
zerodroid
Posts: 6
Joined: Thu Aug 29, 2019 4:53 am
languages_spoken: english
ODROIDs: ODROID-HC2
Has thanked: 0
Been thanked: 1 time
Contact:

HC2 performance regression on kernel 5.*

Post by zerodroid » Sun May 10, 2020 7:10 am

I use my HC2 as a server for borg backups over SSH.

For several months it had OMV4 (Armbian (stretch) kernel 4.*), though I don't really use OMV features. Upon upgrading to Armbian (buster) kernel 5.4.y and OMV5, the backup process, specifically "borg check", takes over twice as long.

I don't think it's a memory issue since the max "used" memory is reported by rrdcached at 400MB of the total 2GB. Min "free" was around 40MB, but that's because max "page cache" went to 1.6GB. I don't think that should be a problem, right? I also don't think it's a network issue since this is on my local network which hasn't changed and the abnormally long time is spent on checking the backup archives, according to backup logs on the client machine being backed up.

I'm thinking the issue might be IO, or, more likely, CPU related. IO seems ok for the 90% empty 10TB HDD, based on "hdparm -t /dev/sda" resulting in ~140MB/s. CPU seems like it could be suspect due to borg being a single threaded application, but I don't notice anything wrong with performance benchmarks done with "armbian-config": http://ix.io/2k57

Any ideas on how to track down the cause of this performance regression?
Last edited by zerodroid on Mon Jun 29, 2020 12:25 pm, edited 1 time in total.

User avatar
mad_ady
Posts: 8159
Joined: Wed Jul 15, 2015 5:00 pm
languages_spoken: english
ODROIDs: XU4, C1+, C2, C4, N1, N2, H2, Go, Go Advance
Location: Bucharest, Romania
Has thanked: 568 times
Been thanked: 406 times
Contact:

Re: HC2 performance regression

Post by mad_ady » Mon May 11, 2020 1:18 am

Check governor and maybe force the backup process to use the big cores.

zerodroid
Posts: 6
Joined: Thu Aug 29, 2019 4:53 am
languages_spoken: english
ODROIDs: ODROID-HC2
Has thanked: 0
Been thanked: 1 time
Contact:

Re: HC2 performance regression

Post by zerodroid » Mon May 11, 2020 2:31 am

The governor is ondemand, and according to htop the borg process is always on cores 4-7 (big cores).

zerodroid
Posts: 6
Joined: Thu Aug 29, 2019 4:53 am
languages_spoken: english
ODROIDs: ODROID-HC2
Has thanked: 0
Been thanked: 1 time
Contact:

Re: HC2 performance regression

Post by zerodroid » Mon Jun 29, 2020 12:24 pm

Benchmark results from sbc-bench show a significant regression in kernel 5.* for memory bandwidth and latency.

My HC2 with kernel 5.4.28-odroidxu4: http://ix.io/2k57

Code: Select all

 Memory bandwidth tests on a little core:
 standard memcpy                :     84.8 MB/s
 standard memset                :    278.5 MB/s (0.3%)
 
 Memory latency test:
 block size : single random read / dual random read
      1024 :    0.0 ns          /     0.0 ns 
      2048 :    0.0 ns          /     0.0 ns 
      4096 :    0.0 ns          /     0.0 ns 
      8192 :    0.0 ns          /     0.0 ns 
     16384 :    0.0 ns          /     0.0 ns 
     32768 :    0.0 ns          /     0.0 ns 
     65536 :    4.2 ns          /     7.5 ns 
    131072 :    6.5 ns          /    10.8 ns 
    262144 :    7.8 ns          /    12.3 ns 
    524288 :   13.2 ns          /    19.9 ns 
   1048576 :  262.9 ns          /   419.0 ns 
   2097152 :  394.6 ns          /   544.5 ns 
   4194304 :  461.8 ns          /   586.3 ns 
   8388608 :  498.4 ns          /   605.4 ns 
  16777216 :  524.2 ns          /   626.7 ns 
  33554432 :  547.8 ns          /   661.2 ns 
  67108864 :  588.1 ns          /   737.3 ns 

An XU4 with kernel 4.14.55-146 (odroidxu4): http://ix.io/1iLy

Code: Select all

 Memory bandwidth tests on a little core:
 standard memcpy                :    391.7 MB/s
 standard memset                :    800.5 MB/s
 
 Memory latency test:
 block size : single random read / dual random read
      1024 :    0.0 ns          /     0.0 ns 
      2048 :    0.0 ns          /     0.0 ns 
      4096 :    0.0 ns          /     0.0 ns 
      8192 :    0.0 ns          /     0.0 ns 
     16384 :    0.0 ns          /     0.0 ns 
     32768 :    0.0 ns          /     0.0 ns 
     65536 :    3.9 ns          /     7.0 ns 
    131072 :    6.0 ns          /    10.1 ns 
    262144 :    7.1 ns          /    11.5 ns 
    524288 :    9.7 ns          /    15.2 ns 
   1048576 :   76.7 ns          /   119.2 ns 
   2097152 :  116.0 ns          /   156.9 ns 
   4194304 :  136.5 ns          /   170.7 ns 
   8388608 :  147.9 ns          /   178.5 ns 
  16777216 :  156.4 ns          /   186.2 ns 
  33554432 :  164.9 ns          /   197.5 ns 
  67108864 :  176.2 ns          /   217.2 ns 
I don't know if this would explain the drop in borg check performance since it runs on a big core, but it does show that something is really wrong on kernel 5.*. No doubt if there are issues with memory bandwidth and latency being multiple times slower, it could account for the performance regression I've noticed.

It would be great for comparison if others would benchmark their HC2/HC1/XU4 with sbc-bench and post results here, especially those on kernel 4.*. Several others on kernel 5.* have confirmed my results above, so this is not an anomaly.

User avatar
odroid
Site Admin
Posts: 34642
Joined: Fri Feb 22, 2013 11:14 pm
languages_spoken: English, Korean, Japanese
ODROIDs: ODROID
Has thanked: 824 times
Been thanked: 712 times
Contact:

Re: HC2 performance regression on kernel 5.*

Post by odroid » Mon Jun 29, 2020 12:28 pm

Nice information.
I think the DRAM controller configuration in Kernel 5.x could have some issues.

User avatar
lanefu
Posts: 4
Joined: Tue Jun 30, 2020 9:35 pm
languages_spoken: english
ODROIDs: N2, MC1-solo
Has thanked: 2 times
Been thanked: 1 time
Contact:

Re: HC2 performance regression on kernel 5.*

Post by lanefu » Tue Jun 30, 2020 9:42 pm

Issue seems to be related to https://lwn.net/Articles/787647/

I disabled kernel feature CONFIG_EXYNOS5422_DMC and performance was restored.

See https://armbian.atlassian.net/browse/AR-337
These users thanked the author lanefu for the post:
odroid (Wed Jul 01, 2020 11:56 am)

User avatar
lanefu
Posts: 4
Joined: Tue Jun 30, 2020 9:35 pm
languages_spoken: english
ODROIDs: N2, MC1-solo
Has thanked: 2 times
Been thanked: 1 time
Contact:

Re: HC2 performance regression on kernel 5.*

Post by lanefu » Tue Jun 30, 2020 10:17 pm

I'd consider disabling DMC a workaround... According to https://cateee.net/lkddb/web-lkddb/EXYNOS5422_DMC.html. timings are set based on memory information provided in device tree... which is a little over my head

User avatar
lanefu
Posts: 4
Joined: Tue Jun 30, 2020 9:35 pm
languages_spoken: english
ODROIDs: N2, MC1-solo
Has thanked: 2 times
Been thanked: 1 time
Contact:

Re: HC2 performance regression on kernel 5.*

Post by lanefu » Tue Jun 30, 2020 11:57 pm

Opted to disable DMC for now

https://github.com/armbian/build/pull/2073

should be available in nightly kernel tomorrow-ish

zerodroid
Posts: 6
Joined: Thu Aug 29, 2019 4:53 am
languages_spoken: english
ODROIDs: ODROID-HC2
Has thanked: 0
Been thanked: 1 time
Contact:

Re: HC2 performance regression on kernel 5.*

Post by zerodroid » Wed Jul 01, 2020 8:35 am

I can confirm that using the nightly kernel 5.4.49-odroidxu4, "borg check" is back to kernel 4.x completion times. That's a 2-3X performance difference!

Many thanks to @lanefu for this workaround!

@odroid Any tips or efforts towards a proper fix to the DMC kernel feature?
These users thanked the author zerodroid for the post:
lanefu (Wed Jul 01, 2020 11:06 am)

User avatar
odroid
Site Admin
Posts: 34642
Joined: Fri Feb 22, 2013 11:14 pm
languages_spoken: English, Korean, Japanese
ODROIDs: ODROID
Has thanked: 824 times
Been thanked: 712 times
Contact:

Re: HC2 performance regression on kernel 5.*

Post by odroid » Wed Jul 01, 2020 11:58 am

We will look into that 4~5 weeks later when we start making Ubuntu 20.04 images for XU4/XU3/HC1/HC2.
We are too busy these days for C1/C2/N2 Ubuntu 20.04 building.
These users thanked the author odroid for the post:
lanefu (Wed Jul 01, 2020 12:45 pm)

User avatar
lanefu
Posts: 4
Joined: Tue Jun 30, 2020 9:35 pm
languages_spoken: english
ODROIDs: N2, MC1-solo
Has thanked: 2 times
Been thanked: 1 time
Contact:

Re: HC2 performance regression on kernel 5.*

Post by lanefu » Fri Jul 03, 2020 12:36 pm

alternate workaround is to change DMC governor (memory not cpu) to performance or userspace. default is simple_ondemand

https://github.com/armbian/build/pull/2 ... -653203633

User avatar
AreaScout
Posts: 1361
Joined: Sun Jul 07, 2013 3:05 am
languages_spoken: german, english
ODROIDs: X2, U3, XU3, C2, HiFi Shield, XU4, XU4Q,
N1, Go, VU5A, Show2, CloudShell2,
H2, N2, VU7A, VuShell, Go2, C4
Has thanked: 62 times
Been thanked: 195 times
Contact:

Re: HC2 performance regression on kernel 5.*

Post by AreaScout » Sat Jul 04, 2020 1:48 am

    @lanefu

    Yes both workarounds are ok, the simple_ondemand governor seems to not doing anything, at least not scaling on demand and it stays strictly on 165MHZ

    RG

    Post Reply

    Return to “Linux Kernel 5.4 Development Party”

    Who is online

    Users browsing this forum: No registered users and 1 guest