HC2 performance regression on kernel 5.*

Test and fix the Kernel 5.4 features
Post Reply
zerodroid
Posts: 9
Joined: Thu Aug 29, 2019 4:53 am
languages_spoken: english
ODROIDs: ODROID-HC2
Has thanked: 1 time
Been thanked: 1 time
Contact:

HC2 performance regression on kernel 5.*

Post by zerodroid »

I use my HC2 as a server for borg backups over SSH.

For several months it had OMV4 (Armbian (stretch) kernel 4.*), though I don't really use OMV features. Upon upgrading to Armbian (buster) kernel 5.4.y and OMV5, the backup process, specifically "borg check", takes over twice as long.

I don't think it's a memory issue since the max "used" memory is reported by rrdcached at 400MB of the total 2GB. Min "free" was around 40MB, but that's because max "page cache" went to 1.6GB. I don't think that should be a problem, right? I also don't think it's a network issue since this is on my local network which hasn't changed and the abnormally long time is spent on checking the backup archives, according to backup logs on the client machine being backed up.

I'm thinking the issue might be IO, or, more likely, CPU related. IO seems ok for the 90% empty 10TB HDD, based on "hdparm -t /dev/sda" resulting in ~140MB/s. CPU seems like it could be suspect due to borg being a single threaded application, but I don't notice anything wrong with performance benchmarks done with "armbian-config": http://ix.io/2k57

Any ideas on how to track down the cause of this performance regression?
Last edited by zerodroid on Mon Jun 29, 2020 12:25 pm, edited 1 time in total.

mad_ady
Posts: 8338
Joined: Wed Jul 15, 2015 5:00 pm
languages_spoken: english
ODROIDs: XU4, C1+, C2, C4, N1, N2, H2, Go, Go Advance
Location: Bucharest, Romania
Has thanked: 574 times
Been thanked: 439 times
Contact:

Re: HC2 performance regression

Post by mad_ady »

Check governor and maybe force the backup process to use the big cores.

zerodroid
Posts: 9
Joined: Thu Aug 29, 2019 4:53 am
languages_spoken: english
ODROIDs: ODROID-HC2
Has thanked: 1 time
Been thanked: 1 time
Contact:

Re: HC2 performance regression

Post by zerodroid »

The governor is ondemand, and according to htop the borg process is always on cores 4-7 (big cores).

zerodroid
Posts: 9
Joined: Thu Aug 29, 2019 4:53 am
languages_spoken: english
ODROIDs: ODROID-HC2
Has thanked: 1 time
Been thanked: 1 time
Contact:

Re: HC2 performance regression

Post by zerodroid »

Benchmark results from sbc-bench show a significant regression in kernel 5.* for memory bandwidth and latency.

My HC2 with kernel 5.4.28-odroidxu4: http://ix.io/2k57

Code: Select all

 Memory bandwidth tests on a little core:
 standard memcpy                :     84.8 MB/s
 standard memset                :    278.5 MB/s (0.3%)
 
 Memory latency test:
 block size : single random read / dual random read
      1024 :    0.0 ns          /     0.0 ns 
      2048 :    0.0 ns          /     0.0 ns 
      4096 :    0.0 ns          /     0.0 ns 
      8192 :    0.0 ns          /     0.0 ns 
     16384 :    0.0 ns          /     0.0 ns 
     32768 :    0.0 ns          /     0.0 ns 
     65536 :    4.2 ns          /     7.5 ns 
    131072 :    6.5 ns          /    10.8 ns 
    262144 :    7.8 ns          /    12.3 ns 
    524288 :   13.2 ns          /    19.9 ns 
   1048576 :  262.9 ns          /   419.0 ns 
   2097152 :  394.6 ns          /   544.5 ns 
   4194304 :  461.8 ns          /   586.3 ns 
   8388608 :  498.4 ns          /   605.4 ns 
  16777216 :  524.2 ns          /   626.7 ns 
  33554432 :  547.8 ns          /   661.2 ns 
  67108864 :  588.1 ns          /   737.3 ns 

An XU4 with kernel 4.14.55-146 (odroidxu4): http://ix.io/1iLy

Code: Select all

 Memory bandwidth tests on a little core:
 standard memcpy                :    391.7 MB/s
 standard memset                :    800.5 MB/s
 
 Memory latency test:
 block size : single random read / dual random read
      1024 :    0.0 ns          /     0.0 ns 
      2048 :    0.0 ns          /     0.0 ns 
      4096 :    0.0 ns          /     0.0 ns 
      8192 :    0.0 ns          /     0.0 ns 
     16384 :    0.0 ns          /     0.0 ns 
     32768 :    0.0 ns          /     0.0 ns 
     65536 :    3.9 ns          /     7.0 ns 
    131072 :    6.0 ns          /    10.1 ns 
    262144 :    7.1 ns          /    11.5 ns 
    524288 :    9.7 ns          /    15.2 ns 
   1048576 :   76.7 ns          /   119.2 ns 
   2097152 :  116.0 ns          /   156.9 ns 
   4194304 :  136.5 ns          /   170.7 ns 
   8388608 :  147.9 ns          /   178.5 ns 
  16777216 :  156.4 ns          /   186.2 ns 
  33554432 :  164.9 ns          /   197.5 ns 
  67108864 :  176.2 ns          /   217.2 ns 
I don't know if this would explain the drop in borg check performance since it runs on a big core, but it does show that something is really wrong on kernel 5.*. No doubt if there are issues with memory bandwidth and latency being multiple times slower, it could account for the performance regression I've noticed.

It would be great for comparison if others would benchmark their HC2/HC1/XU4 with sbc-bench and post results here, especially those on kernel 4.*. Several others on kernel 5.* have confirmed my results above, so this is not an anomaly.

User avatar
odroid
Site Admin
Posts: 34947
Joined: Fri Feb 22, 2013 11:14 pm
languages_spoken: English, Korean, Japanese
ODROIDs: ODROID
Has thanked: 960 times
Been thanked: 775 times
Contact:

Re: HC2 performance regression on kernel 5.*

Post by odroid »

Nice information.
I think the DRAM controller configuration in Kernel 5.x could have some issues.

User avatar
lanefu
Posts: 5
Joined: Tue Jun 30, 2020 9:35 pm
languages_spoken: english
ODROIDs: N2, MC1-solo
Has thanked: 2 times
Been thanked: 2 times
Contact:

Re: HC2 performance regression on kernel 5.*

Post by lanefu »

Issue seems to be related to https://lwn.net/Articles/787647/

I disabled kernel feature CONFIG_EXYNOS5422_DMC and performance was restored.

See https://armbian.atlassian.net/browse/AR-337
These users thanked the author lanefu for the post:
odroid (Wed Jul 01, 2020 11:56 am)

User avatar
lanefu
Posts: 5
Joined: Tue Jun 30, 2020 9:35 pm
languages_spoken: english
ODROIDs: N2, MC1-solo
Has thanked: 2 times
Been thanked: 2 times
Contact:

Re: HC2 performance regression on kernel 5.*

Post by lanefu »

I'd consider disabling DMC a workaround... According to https://cateee.net/lkddb/web-lkddb/EXYNOS5422_DMC.html. timings are set based on memory information provided in device tree... which is a little over my head

User avatar
lanefu
Posts: 5
Joined: Tue Jun 30, 2020 9:35 pm
languages_spoken: english
ODROIDs: N2, MC1-solo
Has thanked: 2 times
Been thanked: 2 times
Contact:

Re: HC2 performance regression on kernel 5.*

Post by lanefu »

Opted to disable DMC for now

https://github.com/armbian/build/pull/2073

should be available in nightly kernel tomorrow-ish

zerodroid
Posts: 9
Joined: Thu Aug 29, 2019 4:53 am
languages_spoken: english
ODROIDs: ODROID-HC2
Has thanked: 1 time
Been thanked: 1 time
Contact:

Re: HC2 performance regression on kernel 5.*

Post by zerodroid »

I can confirm that using the nightly kernel 5.4.49-odroidxu4, "borg check" is back to kernel 4.x completion times. That's a 2-3X performance difference!

Many thanks to @lanefu for this workaround!

@odroid Any tips or efforts towards a proper fix to the DMC kernel feature?
These users thanked the author zerodroid for the post:
lanefu (Wed Jul 01, 2020 11:06 am)

User avatar
odroid
Site Admin
Posts: 34947
Joined: Fri Feb 22, 2013 11:14 pm
languages_spoken: English, Korean, Japanese
ODROIDs: ODROID
Has thanked: 960 times
Been thanked: 775 times
Contact:

Re: HC2 performance regression on kernel 5.*

Post by odroid »

We will look into that 4~5 weeks later when we start making Ubuntu 20.04 images for XU4/XU3/HC1/HC2.
We are too busy these days for C1/C2/N2 Ubuntu 20.04 building.
These users thanked the author odroid for the post (total 2):
lanefu (Wed Jul 01, 2020 12:45 pm) • xdcc_master (Tue Jul 14, 2020 4:03 pm)

User avatar
lanefu
Posts: 5
Joined: Tue Jun 30, 2020 9:35 pm
languages_spoken: english
ODROIDs: N2, MC1-solo
Has thanked: 2 times
Been thanked: 2 times
Contact:

Re: HC2 performance regression on kernel 5.*

Post by lanefu »

alternate workaround is to change DMC governor (memory not cpu) to performance or userspace. default is simple_ondemand

https://github.com/armbian/build/pull/2 ... -653203633

AreaScout
Posts: 1384
Joined: Sun Jul 07, 2013 3:05 am
languages_spoken: german, english
ODROIDs: X2, U3, XU3, C2, HiFi Shield, XU4, XU4Q,
N1, Go, VU5A, Show2, CloudShell2,
H2, N2, VU7A, VuShell, Go2, C4
Has thanked: 62 times
Been thanked: 202 times
Contact:

Re: HC2 performance regression on kernel 5.*

Post by AreaScout »

    @lanefu

    Yes both workarounds are ok, the simple_ondemand governor seems to not doing anything, at least not scaling on demand and it stays strictly on 165MHZ

    RG

    MastaG
    Posts: 333
    Joined: Mon Aug 26, 2013 6:05 pm
    languages_spoken: english
    Has thanked: 24 times
    Been thanked: 14 times
    Contact:

    Re: HC2 performance regression on kernel 5.*

    Post by MastaG »

    Ahhh thank you guys!
    I was already wondering why my Fedora image was so slow on 5.4 compared to the legacy kernel.

    This really adds the cherry on top.
    The XU4 is still a beast with such good kernel/driver support.

    joy
    Posts: 1336
    Joined: Fri Oct 02, 2015 1:44 pm
    languages_spoken: english
    ODROIDs: ODROID-C1+, XU4, X
    Has thanked: 147 times
    Been thanked: 163 times
    Contact:

    Re: HC2 performance regression on kernel 5.*

    Post by joy »

    I'm wondering how exynos5422 DMC driver is implemented in kernel 5.x, so, I've looked into the devfreq driver briefly and checked memory benchmark.
    And it looks basic operation of the driver works normally based on the result. :roll:

    I have some back data that I made to check DMC driver on kernel 3.10.y and with u-boot workaround on kernel 4.x (no dmc driver) as well,
    so I compared mbw benchmark output and essential registers of DMC and BPLL component.

    1. Test Environment
    I followed @joshua.yang's guide and use XU4 ubuntu mate image (20190929) & memeka's github branch odroidxu4.5.4.y.
    viewtopic.php?p=273697#p273697
    https://github.com/mihailescu2m/linux/t ... dxu4-5.4.y

    Here is a patch to enable exynos5422 dmc feature.

    Code: Select all

    diff --git a/arch/arm/configs/odroidxu4_defconfig b/arch/arm/configs/odroidxu4_defconfig
    index 661a849..33aea5d 100644
    --- a/arch/arm/configs/odroidxu4_defconfig
    +++ b/arch/arm/configs/odroidxu4_defconfig
    @@ -5413,10 +5413,11 @@ CONFIG_EXTCON=y
     CONFIG_EXTCON_USB_GPIO=m
     # CONFIG_EXTCON_USBC_CROS_EC is not set
     CONFIG_MEMORY=y
    +CONFIG_DDR=y
     # CONFIG_ARM_PL172_MPMC is not set
     CONFIG_PL353_SMC=y
     CONFIG_SAMSUNG_MC=y
    -# CONFIG_EXYNOS5422_DMC is not set
    +CONFIG_EXYNOS5422_DMC=y
     CONFIG_EXYNOS_SROM=y
     CONFIG_IIO=y
     CONFIG_IIO_BUFFER=y
    @@ -6599,8 +6600,6 @@ CONFIG_KASAN_STACK=1
     # end of Memory Debugging
     
     CONFIG_ARCH_HAS_KCOV=y
    -CONFIG_CC_HAS_SANCOV_TRACE_PC=y
    -# CONFIG_KCOV is not set
     
     #
     # Debug Lockups and Hangs
    

    Code: Select all

    # uname -a
    Linux odroid 5.4.3+ #2 SMP PREEMPT Fri Jul 31 22:11:19 KST 2020 armv7l armv7l armv7l GNU/Linux
    

    2. memory benchmark, 'mbw'
    I use benchmark utility 'mbw' and governor 'performance' mode of cpu & dmc.

    Code: Select all

    # apt install mbw
    

    Code: Select all

    # echo performance > /sys/devices/system/cpu/cpufreq/policy0/scaling_governor
    # echo performance > /sys/devices/system/cpu/cpufreq/policy4/scaling_governor
    
    # echo performance > /sys/class/devfreq/devfreq0/governor
    
    # echo ${ddr_freq} > /sys/class/devfreq/devfreq0/max_freq
    
    # cat /sys/class/devfreq/devfreq0/cur_freq
    ${ddr_freq}
    

    Code: Select all

    # mbw 100 | grep AVG
    
    xu4_mbw_comparison.png
    xu4_mbw_comparison.png (222.3 KiB) Viewed 423 times

    3. essential registers value
    To change DDR timing and PLL output, the following registers should be adjusted for each cases.
    Using 'devmem2', I checked the registers status.

    Code: Select all

    # devmem2 0x10C20034 word
    /dev/mem opened.
    Memory mapped at address 0xb6fc8000.
    Value at address 0x10C20034 (0xb6fc8034): 0x365A9713
    
    xu4_dmc_registers.png
    xu4_dmc_registers.png (102.9 KiB) Viewed 423 times
    BPLL_LOCK values are different but it can be adjusted because it's related to PLL lock time value.


    4. With governor, simple_ondemand
    And one more thing..
    Default governor options after booting done is simple_ondemand.
    In this case, I got similar mbw results to 825MHz case, regardless min_freq/cur_freq.
    I need to check this result is correct one because related register values are changed when I change cur_freq, but same mbw result.
    It's not very low value like, with case 165MHz.
    But this result shows dmc devfreq with simple_ondemand doesn't work correctly.

    Code: Select all

    root@odroid:~# cat /sys/class/devfreq/devfreq0/governor 
    simple_ondemand
    root@odroid:~# cat /sys/class/devfreq/devfreq0/cur_freq 
    165000000
    root@odroid:~# mbw 100 | grep AVG
    AVG     Method: MEMCPY  Elapsed: 0.05463        MiB: 100.00000  Copy: 1830.389 MiB/s
    AVG     Method: DUMB    Elapsed: 0.06001        MiB: 100.00000  Copy: 1666.444 MiB/s
    AVG     Method: MCBLOCK Elapsed: 0.02231        MiB: 100.00000  Copy: 4482.757 MiB/s
    
    As @AreaScout, there should be no issue with performance, powersave or userspace.
    But, I'm confused for simple_ondemand case.
    wondering if this slow performance issue is related to dmc driver itself issue. :roll:

    @zerodroid, could you share your dtb file with me?
    Last edited by joy on Sun Aug 02, 2020 1:25 pm, edited 1 time in total.

    zerodroid
    Posts: 9
    Joined: Thu Aug 29, 2019 4:53 am
    languages_spoken: english
    ODROIDs: ODROID-HC2
    Has thanked: 1 time
    Been thanked: 1 time
    Contact:

    Re: HC2 performance regression on kernel 5.*

    Post by zerodroid »

    @joy which file would you like to see?

    Thanks for investigating.

    joy
    Posts: 1336
    Joined: Fri Oct 02, 2015 1:44 pm
    languages_spoken: english
    ODROIDs: ODROID-C1+, XU4, X
    Has thanked: 147 times
    Been thanked: 163 times
    Contact:

    Re: HC2 performance regression on kernel 5.*

    Post by joy »

    @zerodroid,
    Do you use exynos5422-odroidxu4.dtb that is generated from arch/arm/boot/dts/?
    Or other DTB file?
    I mean the file. :)

    zerodroid
    Posts: 9
    Joined: Thu Aug 29, 2019 4:53 am
    languages_spoken: english
    ODROIDs: ODROID-HC2
    Has thanked: 1 time
    Been thanked: 1 time
    Contact:

    Re: HC2 performance regression on kernel 5.*

    Post by zerodroid »

    @joy Sorry, I'm not familiar with these details. Could you let me know how to check which .dtb file is being used?
    A quick search shows there are many .dtb files in the "/boot/dtb-5.4.50-odroidxu4/" and "/usr/lib/linux-image-current-odroidxu4/" directories.

    joy
    Posts: 1336
    Joined: Fri Oct 02, 2015 1:44 pm
    languages_spoken: english
    ODROIDs: ODROID-C1+, XU4, X
    Has thanked: 147 times
    Been thanked: 163 times
    Contact:

    Re: HC2 performance regression on kernel 5.*

    Post by joy »

    Hi @zerodroid,
    That's OK. No problem.
    I found the exynos5422-odroidhc1.dtb and there is no difference of lpddr timing.
    Sorry for bothering you. I'm also not familiar with Armbian configs.

    And based on your updated test result here and @lanefu's comment , I'm sure it's related to default simple_ondemand condition.
    zerodroid wrote:
    Wed Jul 01, 2020 8:35 am
    I can confirm that using the nightly kernel 5.4.49-odroidxu4, "borg check" is back to kernel 4.x completion times. That's a 2-3X performance difference!
    I'm going to check how simple_ondemand condition of 3.10.y & 5.4.x kernel works and test sbc-bench and other memory benchmarks more.
    On kernel 4.x, default memory clock is 825MHz and it depends on u-boot 'dmc' command to support various memory clocks on XU4. (no dmc devfreq driver on kernel 4.x)
    Last edited by joy on Sat Aug 01, 2020 10:05 am, edited 1 time in total.

    zerodroid
    Posts: 9
    Joined: Thu Aug 29, 2019 4:53 am
    languages_spoken: english
    ODROIDs: ODROID-HC2
    Has thanked: 1 time
    Been thanked: 1 time
    Contact:

    Re: HC2 performance regression on kernel 5.*

    Post by zerodroid »

    @joy Ok, good.
    I don't know what the .dtb is for, but you may want to take a look at the one for exynos5422-odroidxu4 too because the HC1/HC2 are identified as an XU4.

    The optimized board configurations in Armbian for HC1/HC2 currently don't work so it is using the XU4 profile. As for the significance of this, I don't know...
    Not sure if this is relevant at all, but more information is better than less.

    Just let me know if you'd like me to check anything.

    joy
    Posts: 1336
    Joined: Fri Oct 02, 2015 1:44 pm
    languages_spoken: english
    ODROIDs: ODROID-C1+, XU4, X
    Has thanked: 147 times
    Been thanked: 163 times
    Contact:

    Re: HC2 performance regression on kernel 5.*

    Post by joy »

    @zerodroid,
    Thanks for the information. OK I will.

    It looks this is a known issue and there are some discussions.
    https://lore.kernel.org/linux-pm/202006 ... ini.local/

    joy
    Posts: 1336
    Joined: Fri Oct 02, 2015 1:44 pm
    languages_spoken: english
    ODROIDs: ODROID-C1+, XU4, X
    Has thanked: 147 times
    Been thanked: 163 times
    Contact:

    Re: HC2 performance regression on kernel 5.*

    Post by joy »

    I have an update for this issue.
    I reproduced the low memory performance on XU4 board and found patches to fix this issue.

    1. Test summary using sbc-bench and tinymembench
    As I tested on XU4 using 5.4.x kernel in case of default simple_ondemand, there is no low memory performance issue.
    Kernel 5.4.28 + XU4 DTB
    http://ix.io/2sVv

    Code: Select all

    Memory bandwidth tests on little core
     standard memcpy                                      :    339.0 MB/s
     standard memset                                      :    796.3 MB/s (24.3%)
    
    Memory bandwidth tests on big core
     standard memcpy                                      :   2235.5 MB/s
     standard memset                                      :   4904.9 MB/s (24.5%)
    
    During some test, I found there is a difference between (1) Armbian XU4 DTB and (2) Armbian HC1 DTB,
    and I could reproduce low memory performance pattern with LITTLE cores.
    Kernel 5.4.28 + HC1 DTB
    http://ix.io/2sVP

    Code: Select all

    Memory bandwidth tests on little core
     standard memcpy                                      :     85.2 MB/s
     standard memset                                      :    278.7 MB/s (0.1%)
    
    Memory bandwidth tests on big core
     standard memcpy                                      :   2426.4 MB/s
     standard memset                                      :   4892.2 MB/s (0.9%)
    
    And got same result on XU4 DTB, HK Ubuntu, kernel 5.4.3 with (1) HDMI connected and (2) HDMI disconnected.
    For this test, I use 'tinymembench' and XU4 board+XU4 DTB.
    https://github.com/ssvb/tinymembench
    Kernel 5.4.3 + XU4 DTB + HDMI connected

    Code: Select all

    Memory bandwidth tests on little core
     standard memcpy                                      :    322.6 MB/s
     standard memset                                      :    793.9 MB/s (25.2%)
    
    Memory bandwidth tests on big core
     standard memcpy                                      :   2241.8 MB/s
     standard memset                                      :   4897.8 MB/s (24.7%)
    
    Kernel 5.4.3 + XU4 DTB + HDMI removed

    Code: Select all

    # taskset -c 0 /home/odroid/tinymembench/tinymembench
    
     standard memcpy                                      :     81.1 MB/s
     standard memset                                      :    275.5 MB/s
    
    # taskset -c 4 /home/odroid/tinymembench/tinymembench
    
    standard memcpy                                      :   2394.7 MB/s
    standard memset                                      :   4845.6 MB/s (0.5%)
    
    Using 'mbw' with simple_ondemand governor, I got the expected output and confirmed there is no basic devfreq logic issue of exynos5422 DMC again.
    Please note that there is a different condition between 'mbw' test after frequency set-up and 'tinymembench' after booting done.

    And devfreq sysfs node is also different from linux-stable.
    /sys/class/devfreq/10c20000.memory-controller/

    Code: Select all

    # cat /sys/class/devfreq/devfreq0/governor 
    simple_ondemand
    
    root@odroid:~# echo 825000000 > /sys/class/devfreq/devfreq0/max_freq 
    root@odroid:~# echo 825000000 > /sys/class/devfreq/devfreq0/min_freq                                                                           
    root@odroid:~# cat /sys/class/devfreq/devfreq0/cur_freq 
    825000000
    root@odroid:~# mbw 100 | grep AVG                                                                          
    AVG     Method: MEMCPY  Elapsed: 0.05163        MiB: 100.00000  Copy: 1936.952 MiB/s
    AVG     Method: DUMB    Elapsed: 0.05595        MiB: 100.00000  Copy: 1787.406 MiB/s
    AVG     Method: MCBLOCK Elapsed: 0.02208        MiB: 100.00000  Copy: 4529.765 MiB/s
    
    root@odroid:~# echo 165000000 > /sys/class/devfreq/devfreq0/min_freq                                                                           
    root@odroid:~# echo 165000000 > /sys/class/devfreq/devfreq0/max_freq                                                                           
    root@odroid:~# cat /sys/class/devfreq/devfreq0/cur_freq 
    165000000
    root@odroid:~# mbw 100 | grep AVG                                                                          
    AVG     Method: MEMCPY  Elapsed: 0.35779        MiB: 100.00000  Copy: 279.493 MiB/s
    AVG     Method: DUMB    Elapsed: 0.35230        MiB: 100.00000  Copy: 283.851 MiB/s
    AVG     Method: MCBLOCK Elapsed: 0.20465        MiB: 100.00000  Copy: 488.636 MiB/s
    
    It looks this issue depends on booting workload conditions, I think,
    and found some discussions and patches.

    2. Related threads and Patches
    Please refer to these links.
    (updated)

    Code: Select all

    memory: samsung: exynos5422-dmc: Adjust polling interval and uptreshold
    memory: samsung: exynos5422-dmc: Add module param to control IRQ mode
    
    https://lore.kernel.org/linux-pm/202006 ... ini.local/
    https://lore.kernel.org/linux-pm/82080e ... arm.com/T/
    https://lore.kernel.org/linux-pm/202007 ... a@arm.com/
    https://kernel.googlesource.com/pub/scm ... ding-edge/

    And here is my test results.
    I got the similar result regardless of any conditions.

    Code: Select all

    # taskset -c 0 /home/odroid/tinymembench/tinymembench
    ---
     standard memcpy                                      :    326.5 MB/s
     standard memset                                      :    793.3 MB/s
     ---
    
    # taskset -c 4 /home/odroid/tinymembench/tinymembench
    ---
     standard memcpy                                      :   2309.7 MB/s
     standard memset                                      :   4843.3 MB/s (0.5%)
     ---
    
    This patch is generated kernel 5.4.3 just for my test
    and I think it must be similar (not exactly same) with kernel 5.4.28 tag of linux-stable for Armbian target.
    git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git

    Code: Select all

    diff --git a/arch/arm/configs/odroidxu4_defconfig b/arch/arm/configs/odroidxu4_defconfig
    index 661a849..33aea5d 100644
    --- a/arch/arm/configs/odroidxu4_defconfig
    +++ b/arch/arm/configs/odroidxu4_defconfig
    @@ -5413,10 +5413,11 @@ CONFIG_EXTCON=y
     CONFIG_EXTCON_USB_GPIO=m
     # CONFIG_EXTCON_USBC_CROS_EC is not set
     CONFIG_MEMORY=y
    +CONFIG_DDR=y
     # CONFIG_ARM_PL172_MPMC is not set
     CONFIG_PL353_SMC=y
     CONFIG_SAMSUNG_MC=y
    -# CONFIG_EXYNOS5422_DMC is not set
    +CONFIG_EXYNOS5422_DMC=y
     CONFIG_EXYNOS_SROM=y
     CONFIG_IIO=y
     CONFIG_IIO_BUFFER=y
    @@ -6599,8 +6600,6 @@ CONFIG_KASAN_STACK=1
     # end of Memory Debugging
     
     CONFIG_ARCH_HAS_KCOV=y
    -CONFIG_CC_HAS_SANCOV_TRACE_PC=y
    -# CONFIG_KCOV is not set
     
     #
     # Debug Lockups and Hangs
    diff --git a/drivers/memory/samsung/exynos5422-dmc.c b/drivers/memory/samsung/exynos5422-dmc.c
    index bdb264b..9fc3134 100644
    --- a/drivers/memory/samsung/exynos5422-dmc.c
    +++ b/drivers/memory/samsung/exynos5422-dmc.c
    @@ -12,6 +12,7 @@
     #include <linux/io.h>
     #include <linux/mfd/syscon.h>
     #include <linux/module.h>
    +#include <linux/moduleparam.h>
     #include <linux/of_device.h>
     #include <linux/pm_opp.h>
     #include <linux/platform_device.h>
    @@ -21,6 +22,10 @@
     #include "../jedec_ddr.h"
     #include "../of_memory.h"
     
    +static int irqmode;
    +module_param(irqmode, int, 0644);
    +MODULE_PARM_DESC(irqmode, "Enable IRQ mode (0=off [default], 1=on)");
    +
     #define EXYNOS5_DREXI_TIMINGAREF		(0x0030)
     #define EXYNOS5_DREXI_TIMINGROW0		(0x0034)
     #define EXYNOS5_DREXI_TIMINGDATA0		(0x0038)
    @@ -945,6 +950,7 @@ static int exynos5_dmc_get_cur_freq(struct device *dev, unsigned long *freq)
      * It provides to the devfreq framework needed functions and polling period.
      */
     static struct devfreq_dev_profile exynos5_dmc_df_profile = {
    +	.timer = DEVFREQ_TIMER_DELAYED,
     	.target = exynos5_dmc_target,
     	.get_dev_status = exynos5_dmc_get_status,
     	.get_cur_freq = exynos5_dmc_get_cur_freq,
    @@ -1432,7 +1438,7 @@ static int exynos5_dmc_probe(struct platform_device *pdev)
     	/* There is two modes in which the driver works: polling or IRQ */
     	irq[0] = platform_get_irq_byname(pdev, "drex_0");
     	irq[1] = platform_get_irq_byname(pdev, "drex_1");
    -	if (irq[0] > 0 && irq[1] > 0) {
    +	if (irq[0] > 0 && irq[1] > 0 && irqmode) {
     		ret = devm_request_threaded_irq(dev, irq[0], NULL,
     						dmc_irq_thread, IRQF_ONESHOT,
     						dev_name(dev), dmc);
    @@ -1470,10 +1476,10 @@ static int exynos5_dmc_probe(struct platform_device *pdev)
     		 * Setup default thresholds for the devfreq governor.
     		 * The values are chosen based on experiments.
     		 */
    -		dmc->gov_data.upthreshold = 30;
    +		dmc->gov_data.upthreshold = 10;
     		dmc->gov_data.downdifferential = 5;
     
    -		exynos5_dmc_df_profile.polling_ms = 500;
    +		exynos5_dmc_df_profile.polling_ms = 100;
     	}
     
     
    @@ -1489,7 +1495,7 @@ static int exynos5_dmc_probe(struct platform_device *pdev)
     	if (dmc->in_irq_mode)
     		exynos5_dmc_start_perf_events(dmc, PERF_COUNTER_START_VALUE);
     
    -	dev_info(dev, "DMC initialized\n");
    +	dev_info(dev, "DMC initialized, in irq mode: %d\n", dmc->in_irq_mode);
     
     	return 0;
     
    diff --git a/include/linux/devfreq.h b/include/linux/devfreq.h
    index 2bae9ed..faf4148 100644
    --- a/include/linux/devfreq.h
    +++ b/include/linux/devfreq.h
    @@ -30,6 +30,13 @@
     #define	DEVFREQ_PRECHANGE		(0)
     #define DEVFREQ_POSTCHANGE		(1)
     
    +/* DEVFREQ work timers */
    +enum devfreq_timer {
    +	DEVFREQ_TIMER_DEFERRABLE = 0,
    +	DEVFREQ_TIMER_DELAYED,
    +	DEVFREQ_TIMER_NUM,
    +};
    +
     struct devfreq;
     struct devfreq_governor;
     
    @@ -69,6 +76,7 @@ struct devfreq_dev_status {
      * @initial_freq:	The operating frequency when devfreq_add_device() is
      *			called.
      * @polling_ms:		The polling interval in ms. 0 disables polling.
    + * @timer:		Timer type is either deferrable or delayed timer.
      * @target:		The device should set its operating frequency at
      *			freq or lowest-upper-than-freq value. If freq is
      *			higher than any operable frequency, set maximum.
    @@ -95,6 +103,7 @@ struct devfreq_dev_status {
     struct devfreq_dev_profile {
     	unsigned long initial_freq;
     	unsigned int polling_ms;
    +	enum devfreq_timer timer;
     
     	int (*target)(struct device *dev, unsigned long *freq, u32 flags);
     	int (*get_dev_status)(struct device *dev,
    
    I didn't check it on Armbian & linux-stable and HC1 board yet.
    So, I hope somebody can confirm it on Armbian and with a conditions that EXYNOS5422_DMC config is activated.
    Last edited by joy on Mon Aug 10, 2020 6:34 pm, edited 1 time in total.
    These users thanked the author joy for the post (total 2):
    odroid (Mon Aug 03, 2020 1:30 pm) • zerodroid (Tue Aug 04, 2020 1:40 am)

    User avatar
    lanefu
    Posts: 5
    Joined: Tue Jun 30, 2020 9:35 pm
    languages_spoken: english
    ODROIDs: N2, MC1-solo
    Has thanked: 2 times
    Been thanked: 2 times
    Contact:

    Re: HC2 performance regression on kernel 5.*

    Post by lanefu »

    Hello thanks for identifying and testing those patches.

    :D

    I applied the following patches on the Armbian-dev kernel 5.7.13 on a mc1-solo

    https://patchwork.kernel.org/patch/11657269/
    https://patchwork.kernel.org/patch/11657275/

    Code: Select all

     taskset -c 0 /usr/local/src/tinymembench/tinymembench
     
     ---
     standard memcpy                                      :    360.4 MB/s
     standard memset                                      :    591.6 MB/s
     ---
     taskset -c 4 /usr/local/src/tinymembench/tinymembench
     ---
     standard memcpy                                      :   2312.2 MB/s
     standard memset                                      :   4921.0 MB/s (1.0%)
     ---
    
    I'll merge this fix in for dev immediately, but going to wait on the -current (5.4) until after armbian v20.08 release.
    These users thanked the author lanefu for the post:
    joy (Mon Aug 10, 2020 5:40 pm)

    joy
    Posts: 1336
    Joined: Fri Oct 02, 2015 1:44 pm
    languages_spoken: english
    ODROIDs: ODROID-C1+, XU4, X
    Has thanked: 147 times
    Been thanked: 163 times
    Contact:

    Re: HC2 performance regression on kernel 5.*

    Post by joy »

    Hi @lanefu,
    Thank you for confirming it on Armbian and sharing the status here! :D

    Yes. The two patches you picked are correct ones.
    lanefu wrote:
    Thu Aug 06, 2020 7:58 am
    I applied the following patches on the Armbian-dev kernel 5.7.13 on a mc1-solo

    https://patchwork.kernel.org/patch/11657269/
    https://patchwork.kernel.org/patch/11657275/

    Post Reply

    Return to “Linux Kernel 5.4 Development Party”

    Who is online

    Users browsing this forum: No registered users and 1 guest