What is the sysbench test result?

Post Reply
crossover
Posts: 113
Joined: Wed Jul 22, 2015 2:23 pm
languages_spoken: english
ODROIDs: XU4, C1+, C2, USB-IO, HC2, Tinkering kits
Has thanked: 0
Been thanked: 0
Contact:

What is the sysbench test result?

Unread post by crossover » Tue Mar 01, 2016 2:01 pm

I want to know the result of sysbench with 10,000 and 20,000 of prime number options with the Ubuntu 16.04 64bit.
can you guys share the test results?

User avatar
odroid
Site Admin
Posts: 32677
Joined: Fri Feb 22, 2013 11:14 pm
languages_spoken: English
ODROIDs: ODROID
Has thanked: 209 times
Been thanked: 364 times
Contact:

Re: What is the sysbench test result?

Unread post by odroid » Tue Mar 01, 2016 2:26 pm

Just installed the sysbench with apt-get command and ran it.

20,000 of prime number

Code: Select all

odroid@odroid64:~$ sysbench --test=cpu run --num-threads=4 --cpu-max-prime=20000

sysbench 0.4.12:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 4

Doing CPU performance benchmark

Threads started!
Done.

Maximum prime number checked in CPU test: 20000


Test execution summary:
    total time:                          6.0053s
    total number of events:              10000
    total time taken by event execution: 23.9961
    per-request statistics:
         min:                                  2.38ms
         avg:                                  2.40ms
         max:                                 12.73ms
         approx.  95 percentile:               2.44ms
                                                                                
Threads fairness:                                                               
    events (avg/stddev):           2500.0000/21.97                              
    execution time (avg/stddev):   5.9990/0.00

10,000 of prime number

Code: Select all

odroid@odroid64:~$ sysbench --test=cpu run --num-threads=4 --cpu-max-prime=10000

sysbench 0.4.12:  multi-threaded system evaluation benchmark                    
                                                                                
Running the test with following options:                                        
Number of threads: 4                                                            
                                                                                
Doing CPU performance benchmark                                                 
                                                                                
Threads started!                                                                
Done.                                                                           
                                                                                
Maximum prime number checked in CPU test: 10000                                 
                                                                                
                                                                                
Test execution summary:                                                         
    total time:                          2.4317s                                
    total number of events:              10000                                  
    total time taken by event execution: 9.7157                                 
    per-request statistics:                                                     
         min:                                  0.96ms                           
         avg:                                  0.97ms                           
         max:                                  1.23ms                           
         approx.  95 percentile:               0.99ms                           
                                                                                
Threads fairness:                                                               
    events (avg/stddev):           2500.0000/25.43                              
    execution time (avg/stddev):   2.4289/0.00 
If we had spare time, we could get a better result with some more optimizing build options.

mlinuxguy
Posts: 840
Joined: Thu Feb 28, 2013 10:28 am
languages_spoken: english
ODROIDs: X, X2, XU, XU3, XU4, C1, C1+, C2, N1, USB-IO
Has thanked: 0
Been thanked: 0
Contact:

Re: What is the sysbench test result?

Unread post by mlinuxguy » Tue Mar 01, 2016 2:28 pm

I can re-run it with different options
This is built from the 0.4.12 source with the following options:
AM_CFLAGS = -march=armv8-a -mtune=cortex-a53 ...the rest of their opts...
per gcc docs -march can be: ‘armv8-a’, ‘armv8-a+crc’, ‘armv8.1-a’, ‘armv8.1-a+crc’
10,000 primes -- optimized build

Code: Select all

root@odroid64:~/0.4/sysbench# ./sysbench --test=cpu --cpu-max-prime=10000 --num-threads=4 run
sysbench 0.4.12:  multi-threaded system evaluation benchmark
Running the test with following options:
Number of threads: 4
Random number generator seed is 0 and will be ignored
Doing CPU performance benchmark

Primer numbers limit: 10000
Threads started!
Done.
General statistics:
    total time:                          2.2994s
    total number of events:              10000
    total time taken by event execution: 9.1822
    response time:
         min:                                  0.91ms
         avg:                                  0.92ms
         max:                                  1.89ms
         approx.  95 percentile:               0.96ms
Threads fairness:
    events (avg/stddev):           2500.0000/52.05
    execution time (avg/stddev):   2.2956/0.00
20,000 primes -- optimized build

Code: Select all

root@odroid64:~/0.4/sysbench# ./sysbench --test=cpu --cpu-max-prime=20000 --num-threads=4 run
sysbench 0.4.12:  multi-threaded system evaluation benchmark
Running the test with following options:
Number of threads: 4
Random number generator seed is 0 and will be ignored
Doing CPU performance benchmark

Primer numbers limit: 20000
Threads started!
Done.
General statistics:
    total time:                          5.6492s
    total number of events:              10000
    total time taken by event execution: 22.5734
    response time:
         min:                                  2.23ms
         avg:                                  2.26ms
         max:                                 12.66ms
         approx.  95 percentile:               2.35ms
Threads fairness:
    events (avg/stddev):           2500.0000/51.40
    execution time (avg/stddev):   5.6434/0.00
Last edited by mlinuxguy on Tue Mar 01, 2016 3:09 pm, edited 1 time in total.

crossover
Posts: 113
Joined: Wed Jul 22, 2015 2:23 pm
languages_spoken: english
ODROIDs: XU4, C1+, C2, USB-IO, HC2, Tinkering kits
Has thanked: 0
Been thanked: 0
Contact:

Re: What is the sysbench test result?

Unread post by crossover » Tue Mar 01, 2016 2:47 pm

Thank you guys !
I wanted to compare C2 with RPi3. But the results are quite weird. How C2 is 10~20 times faster than Pi3?
https://www.raspberrypi.org/magpi/raspb ... enchmarks/

I thought C2 should be 1.5 ~ 2 times faster than Pi3. Something is wrong.

User avatar
odroid
Site Admin
Posts: 32677
Joined: Fri Feb 22, 2013 11:14 pm
languages_spoken: English
ODROIDs: ODROID
Has thanked: 209 times
Been thanked: 364 times
Contact:

Re: What is the sysbench test result?

Unread post by odroid » Tue Mar 01, 2016 2:53 pm

I believe they didn't activate the ARM V8 capability yet.
They still use the old ARM V7 or V6 based OS images on the V8 architecture.
I think it could be a good direction for better compatibility with less issues as a short term solution.

Let's wait for few months to have a fair comparison.

User avatar
memeka
Posts: 4395
Joined: Mon May 20, 2013 10:22 am
languages_spoken: english
ODROIDs: XU rev2 + eMMC + UART
U3 + eMMC + IO Shield + UART
Has thanked: 1 time
Been thanked: 42 times
Contact:

Re: What is the sysbench test result?

Unread post by memeka » Tue Mar 01, 2016 2:54 pm

rpi3 is using armv6 libs, c2 is using armv8

mlinuxguy
Posts: 840
Joined: Thu Feb 28, 2013 10:28 am
languages_spoken: english
ODROIDs: X, X2, XU, XU3, XU4, C1, C1+, C2, N1, USB-IO
Has thanked: 0
Been thanked: 0
Contact:

Re: What is the sysbench test result?

Unread post by mlinuxguy » Tue Mar 01, 2016 3:10 pm

I updated my reply above with the results of an optimized build

crossover
Posts: 113
Joined: Wed Jul 22, 2015 2:23 pm
languages_spoken: english
ODROIDs: XU4, C1+, C2, USB-IO, HC2, Tinkering kits
Has thanked: 0
Been thanked: 0
Contact:

Re: What is the sysbench test result?

Unread post by crossover » Tue Mar 01, 2016 3:26 pm

mlinuxguy wrote:I updated my reply above with the results of an optimized build
The Pine64(1.2Ghz) gave 3.2487sec execution time while the C2 gave 2.2956sec !
So we can say C2 is 30~40% faster than Pine64 as expected. Is it correct?
http://forum.armbian.com/index.php/topi ... 64/?p=5731

mlinuxguy
Posts: 840
Joined: Thu Feb 28, 2013 10:28 am
languages_spoken: english
ODROIDs: X, X2, XU, XU3, XU4, C1, C1+, C2, N1, USB-IO
Has thanked: 0
Been thanked: 0
Contact:

Re: What is the sysbench test result?

Unread post by mlinuxguy » Tue Mar 01, 2016 3:29 pm

Yes, we have a higher default clock than they do
SoC – Allwinner R18 (based on Allwinner A64) quad core ARM Cortex A53 processor @ 1.2 GHz with Mali-400MP2 GPU

crossover
Posts: 113
Joined: Wed Jul 22, 2015 2:23 pm
languages_spoken: english
ODROIDs: XU4, C1+, C2, USB-IO, HC2, Tinkering kits
Has thanked: 0
Been thanked: 0
Contact:

Re: What is the sysbench test result?

Unread post by crossover » Tue Mar 01, 2016 3:46 pm

Thanks for your confirmation.

tkaiser
Posts: 672
Joined: Mon Nov 09, 2015 12:30 am
languages_spoken: english
ODROIDs: C1+, C2, XU4, HC1
Has thanked: 0
Been thanked: 1 time
Contact:

Re: What is the sysbench test result?

Unread post by tkaiser » Tue Mar 01, 2016 11:24 pm

crossover wrote:
mlinuxguy wrote:I updated my reply above with the results of an optimized build
The Pine64(1.2Ghz) gave 3.2487sec execution time while the C2 gave 2.2956sec !
So we can say C2 is 30~40% faster than Pine64 as expected. Is it correct?
http://forum.armbian.com/index.php/topi ... 64/?p=5731
Nope, this isn't 'correct' at all. That's just a perfect example that people love comparing numbers and refuse to understand their meaning. Without any doubt the S905 is overall faster than the A64. But to think calculating prime numbers is a good way to evaluate overall system performance is a bit weird (unless it's your main use case to calculate prime numbers all day long of course ;) ).

Then: All these new SoCs suffer from thermal throttling under high load. Running 'benchmarks' that are able to finish within seconds therefore only might tell you something about peak performance and not how the whole system in question behaves under constant high load ("whole" means: taking cooling strategies into account). I would suspect the ODROID C2 wears a heatsink for a reason? So let's have a look how sysbench behaves (to stay with rather irrelevant comparisons) when it's calculating 200,000 prime numbers. On my Pine64+ then already thermal throttling occurs (from the max 1152 down to 1104 MHz) and I get 185.1680s as execution time (using the sysbench deb Ubuntu 16.04 Xenial provides -- if you start compiling benchmarks from scratch you only show how moronic this whole benchmarking is :D )

Image

Way more interesting would be to compare 7-zip performance (as a measure of integer/memory comparison):

Code: Select all

tk@pine64plus:/var/lib# 7zr b

7-Zip (A) 9.20  Copyright (c) 1999-2010 Igor Pavlov  2010-11-18
p7zip Version 9.20 (locale=de_DE.UTF-8,Utf16=on,HugeFiles=on,4 CPUs)

RAM size:     979 MB,  # CPU hardware threads:   4
RAM usage:    850 MB,  # Benchmark threads:      4

Dict        Compressing          |        Decompressing
      Speed Usage    R/U Rating  |    Speed Usage    R/U Rating
       KB/s     %   MIPS   MIPS  |     KB/s     %   MIPS   MIPS

22:    1452   279    506   1413  |    41588   389    965   3752
23:    1447   287    514   1474  |    40172   389    944   3676
24:    1382   292    509   1486  |    39419   387    943   3657
25:    1373   302    518   1567  |    37506   389    907   3527
----------------------------------------------------------------
Avr:          290    512   1485               388    940   3653
Tot:          339    726   2569
I would suspect the S905 performs here way better. Can anyone please give it a try (and please stop being childish and try to tune benchmarks so they loose any meaning at all and use Ubuntu's default p7zip package!)

BTW: http://www.brendangregg.com/activebenchmarking.html :)

mlinuxguy
Posts: 840
Joined: Thu Feb 28, 2013 10:28 am
languages_spoken: english
ODROIDs: X, X2, XU, XU3, XU4, C1, C1+, C2, N1, USB-IO
Has thanked: 0
Been thanked: 0
Contact:

Re: What is the sysbench test result?

Unread post by mlinuxguy » Tue Mar 01, 2016 11:53 pm

Attacking us and calling us childish is a great approach, I'm sure we'll be great buddies now
Most of the ubuntu repo packages are poorly optimized for armv8, to see what the HW is capable of it makes sense to do
an optimized build.
That said here are yourbenchmarks done from the Ubuntu repo

Code: Select all

root@odroid64:~# 7zr b
The program '7zr' is currently not installed. You can install it by typing:
apt install p7zip
root@odroid64:~# apt install p7zip
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following NEW packages will be installed:
  p7zip
0 upgraded, 1 newly installed, 0 to remove and 175 not upgraded.
Need to get 0 B/317 kB of archives.
After this operation, 976 kB of additional disk space will be used.
Selecting previously unselected package p7zip.
(Reading database ... 182535 files and directories currently installed.)
Preparing to unpack .../p7zip_9.20.1~dfsg.1-5_arm64.deb ...
Unpacking p7zip (9.20.1~dfsg.1-5) ...
Processing triggers for man-db (2.7.5-1) ...
Setting up p7zip (9.20.1~dfsg.1-5) ...
And of course we mustn't forget to actually run it

Code: Select all

root@odroid64:~# 7zr b

7-Zip (A) 9.20  Copyright (c) 1999-2010 Igor Pavlov  2010-11-18
p7zip Version 9.20 (locale=C,Utf16=off,HugeFiles=on,4 CPUs)

RAM size:    1718 MB,  # CPU hardware threads:   4
RAM usage:    850 MB,  # Benchmark threads:      4

Dict        Compressing          |        Decompressing
      Speed Usage    R/U Rating  |    Speed Usage    R/U Rating
       KB/s     %   MIPS   MIPS  |     KB/s     %   MIPS   MIPS

22:    2246   309    707   2184  |    55021   378   1311   4964
23:    2179   314    706   2220  |    54046   379   1305   4945
24:    2135   320    717   2295  |    52030   372   1297   4827
25:    2111   329    732   2410  |    51884   379   1288   4879
----------------------------------------------------------------
Avr:          318    716   2278               377   1300   4904
Tot:          347   1008   3591

tkaiser
Posts: 672
Joined: Mon Nov 09, 2015 12:30 am
languages_spoken: english
ODROIDs: C1+, C2, XU4, HC1
Has thanked: 0
Been thanked: 1 time
Contact:

Re: What is the sysbench test result?

Unread post by tkaiser » Wed Mar 02, 2016 12:03 am

Using the meaningless sysbench with 200,000 prime numbers and an annoying fan (on the left) then it finishes in 178 seconds (all the time clocked at 1152 MHz) and when testing with the 2nd Pine64+ without heatsink execution time increases up to 197 seconds since thermal throttling already reduces CPU clockspeed to 1008 MHz:

Image

Again: Calculating prime numbers is a good benchmark for people that calculate prime numbers for a living. Everyone else should start to think and be careful in interpreting such results -- if you really believe C2 is 20%-30% faster than Pine64 based on sysbench then unfortunately you also have to believe that S905 and A64 are more than 15 times faster than the BCM2837 used on the RPi 3 (since the cores are slow Cortex-A53 like on S905/A64 the sysbench results for RPi will automagically explode as soon as they run code made for ARMv8 on it).

Now the fun part: the Android kernel I used for the tests runs not that long on A64 with Linux. More or less 'factory settings'. Same with the dvfs/thermal stuff. I hadn't had the time to tune Vcore voltages (the lower the better since thermal throttling jumps in later and the SoC gets automagically faster -- on the other hand you get reliability problems when voltages are too low) so if this part is resolved and we found that we're able to reduce Vcore voltages for A64 (and can then also use slightly higher clockspeeds, maybe even exceeding 1.2 GHz) performance might improve. And while I still doubt that A64 will be faster than S905 in any area I think there are performance gains possible.

BTW: The most interesting tuneable stuff happens somewhere else than looking at irrelevant synthetic benchs that run just for a few seconds :)

Final note: I would assume S905 is a throttling candidate so keeping an eye on cpufreq while doing synthetic benchmarking is necessary. Using RPi-Monitor on Ubuntu 16.04/arm64 is pretty straightforward (just the steps in http://kaiser-edv.de/tmp/4U4tkD/install ... a83t_h8.sh but you have to adjust the template for S905 afterwards of course since the paths to temp/voltages available through sysfs differ)

tkaiser
Posts: 672
Joined: Mon Nov 09, 2015 12:30 am
languages_spoken: english
ODROIDs: C1+, C2, XU4, HC1
Has thanked: 0
Been thanked: 1 time
Contact:

Re: What is the sysbench test result?

Unread post by tkaiser » Wed Mar 02, 2016 12:16 am

mlinuxguy wrote:Most of the ubuntu repo packages are poorly optimized for armv8, to see what the HW is capable of it makes sense to do
an optimized build.
Of course makes it sense. But not to compare different systems (then identical compiler settings are important since otherwise you do not compare hardware but that what most benchmarks do anyway: compiler optimisations) but in a different area: to get a clue how optimised software performs better. Anyway: It should be obvious that sysbench's cpu test is crap to compare different systems now that C2 seems to be more than 15 times faster than RPi 3? :)

Thx for the 7zip result. Also not that great (especially compared to XU3/XU4). And unless cpufreq settings are known unfortunately without any meaning (did throttling occured or not?)

mlinuxguy
Posts: 840
Joined: Thu Feb 28, 2013 10:28 am
languages_spoken: english
ODROIDs: X, X2, XU, XU3, XU4, C1, C1+, C2, N1, USB-IO
Has thanked: 0
Been thanked: 0
Contact:

Re: What is the sysbench test result?

Unread post by mlinuxguy » Wed Mar 02, 2016 1:44 am

I did the sysbench test in a response to a forum request, not because I thought it was a great benchmark
I did an extensive post on the C2 CPU clocks in the hardware forum, while I didn't look into thermal throttling based
on the current Hardkernel thermal triggers it would probably take a much longer test to hit thermal limits without the GPU
also being active.

I also did thermal tests on the C2 in the hardware forum in another post

My goal of the two tests was to understand the thermal and clock limits to build an overclocked C2
Perhaps even use the Turbo boost code in our kernel to activate it... sadly to do OC will require
Amlogic to change their binary blob in the boot section to different FREQ/Volt tables
I've asked for that, but who knows if they will effectively "unlock" our tables for OC

tkaiser
Posts: 672
Joined: Mon Nov 09, 2015 12:30 am
languages_spoken: english
ODROIDs: C1+, C2, XU4, HC1
Has thanked: 0
Been thanked: 1 time
Contact:

Re: What is the sysbench test result?

Unread post by tkaiser » Wed Mar 02, 2016 2:25 am

mlinuxguy wrote:I did the sysbench test in a response to a forum request, not because I thought it was a great benchmark
I know. My response was towards @crossover's citation of my sysbench measurements over at http://forum.armbian.com/index.php/topi ... 64/?p=5731 (BTW: I don't want to attack anyone here or play 'we vs. them' games -- I'm also an ODROID user and get the C2 as soon as it is available at pollin.de).

The only reason I published the '3.2562 seconds' sysbench result from Pine64+ was a response to the benchmarks results published by RPi Foundation. To show two things:

1. It might make a huge difference whether you optimise code for the hardware or not (something Eben Upton still doesn't seem to accept)
2. sysbench is obviously the wrong tool to compare different architectures

And then a forum user here takes these silly results as the proof that C2 is faster than Pine64 (which definitely is the case -- but not based on calculating prime numbers! ;) ).

BTW: Pine64 results for 10,000/20,000/200,000 prime numbers were 3,25/8/185 seconds. And 7zip score was: 2569

Now I did some optimisation and I get either 2.7902/6.8141/153.3979 (7zip: 2848) or 2.7739/7.1842/174.2047 (7zip: 2621). Why are the 10,000 nearly identical and the results for 200,000 and 7zip differ that much? Which optimisation has been done? Pretty simple: I unlocked thermal throttling (not recommended!) by modifiying the .dts and used a fan on the 1st run and just a heatsink on the 2nd. Now I'm able to clock at up to 1344 MHz and also the throttling strategy changes and will keep the SoC at higher temperatures:

Image

Especially the 7zip run is heavy and without fan almost immediately throttling occured. And if we compare sysbench results and try to ignore their (non-existing) relevance then the Pine64+ is nearly as fast as the C2: 2.4317s vs. 2.7902s and 6.0053s vs. 6.8141s (which is something that I don't understand since S905 should clock way higher and being able to perform better?). And all of this is plain bullshit -- the only lesson to learn from this comparison is to NOT blindly trust in benchmarks numbers and take the thermal stuff into account if you want to draw any conclusion from synthetic benchmarks regarding 'reald world high load' situations.

BTW: I'm still curious why the S905 7zip score is that low. Really no throttling happened?

PS: results, first with fan, 2nd run without:

Code: Select all

root@pine64plus:/sys/devices/system/cpu/cpu0/cpufreq# sysbench --test=cpu --cpu-max-prime=10000 run --num-threads=4 && sysbench --test=cpu --cpu-max-prime=20000 run --num-threads=4 && sysbench --test=cpu --cpu-max-prime=200000 run --num-threads=4
sysbench 0.4.12:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 4

Doing CPU performance benchmark

Threads started!
Done.

Maximum prime number checked in CPU test: 10000


Test execution summary:
    total time:                          2.7902s
    total number of events:              10000
    total time taken by event execution: 11.1134
    per-request statistics:
         min:                                  1.10ms
         avg:                                  1.11ms
         max:                                 21.12ms
         approx.  95 percentile:               1.11ms

Threads fairness:
    events (avg/stddev):           2500.0000/18.01
    execution time (avg/stddev):   2.7783/0.01

sysbench 0.4.12:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 4

Doing CPU performance benchmark

Threads started!
Done.

Maximum prime number checked in CPU test: 20000


Test execution summary:
    total time:                          6.8141s
    total number of events:              10000
    total time taken by event execution: 27.2437
    per-request statistics:
         min:                                  2.72ms
         avg:                                  2.72ms
         max:                                  4.07ms
         approx.  95 percentile:               2.74ms

Threads fairness:
    events (avg/stddev):           2500.0000/1.73
    execution time (avg/stddev):   6.8109/0.00

sysbench 0.4.12:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 4

Doing CPU performance benchmark

Threads started!
Done.

Maximum prime number checked in CPU test: 200000


Test execution summary:
    total time:                          153.4298s
    total number of events:              10000
    total time taken by event execution: 613.5915
    per-request statistics:
         min:                                 60.91ms
         avg:                                 61.36ms
         max:                                 99.02ms
         approx.  95 percentile:              65.38ms

Threads fairness:
    events (avg/stddev):           2500.0000/5.43
    execution time (avg/stddev):   153.3979/0.02

root@pine64plus:/sys/devices/system/cpu/cpu0/cpufreq# 7zr b

7-Zip (A) 9.20  Copyright (c) 1999-2010 Igor Pavlov  2010-11-18
p7zip Version 9.20 (locale=de_DE.UTF-8,Utf16=on,HugeFiles=on,4 CPUs)

RAM size:     979 MB,  # CPU hardware threads:   4
RAM usage:    850 MB,  # Benchmark threads:      4

Dict        Compressing          |        Decompressing
      Speed Usage    R/U Rating  |    Speed Usage    R/U Rating
       KB/s     %   MIPS   MIPS  |     KB/s     %   MIPS   MIPS

22:    1645   286    560   1600  |    45328   390   1048   4089
23:    1611   294    558   1641  |    44830   388   1057   4102
24:    1515   288    565   1629  |    43262   388   1034   4013
25:    1503   304    565   1716  |    42496   390   1024   3996
----------------------------------------------------------------
Avr:          293    562   1646               389   1041   4050
Tot:          341    801   2848

Code: Select all

root@pine64plus:/sys/devices/system/cpu/cpu0/cpufreq# sysbench --test=cpu --cpu-max-prime=10000 run --num-threads=4 && sysbench --test=cpu --cpu-max-prime=20000 run --num-threads=4 && sysbench --test=cpu --cpu-max-prime=200000 run --num-threads=4
sysbench 0.4.12:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 4

Doing CPU performance benchmark

Threads started!
Done.

Maximum prime number checked in CPU test: 10000


Test execution summary:
    total time:                          2.7739s
    total number of events:              10000
    total time taken by event execution: 11.0834
    per-request statistics:
         min:                                  1.10ms
         avg:                                  1.11ms
         max:                                 11.22ms
         approx.  95 percentile:               1.11ms

Threads fairness:
    events (avg/stddev):           2500.0000/12.51
    execution time (avg/stddev):   2.7708/0.00

sysbench 0.4.12:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 4

Doing CPU performance benchmark

Threads started!
Done.

Maximum prime number checked in CPU test: 20000


Test execution summary:
    total time:                          7.1842s
    total number of events:              10000
    total time taken by event execution: 28.7091
    per-request statistics:
         min:                                  2.70ms
         avg:                                  2.87ms
         max:                                 10.98ms
         approx.  95 percentile:               3.05ms

Threads fairness:
    events (avg/stddev):           2500.0000/2.55
    execution time (avg/stddev):   7.1773/0.00

sysbench 0.4.12:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 4

Doing CPU performance benchmark

Threads started!
Done.

Maximum prime number checked in CPU test: 200000


Test execution summary:
    total time:                          174.2047s
    total number of events:              10000
    total time taken by event execution: 696.7132
    per-request statistics:
         min:                                 68.22ms
         avg:                                 69.67ms
         max:                                118.27ms
         approx.  95 percentile:              71.22ms

Threads fairness:
    events (avg/stddev):           2500.0000/5.61
    execution time (avg/stddev):   174.1783/0.03

root@pine64plus:/sys/devices/system/cpu/cpu0/cpufreq# 7zr b

7-Zip (A) 9.20  Copyright (c) 1999-2010 Igor Pavlov  2010-11-18
p7zip Version 9.20 (locale=de_DE.UTF-8,Utf16=on,HugeFiles=on,4 CPUs)

RAM size:     979 MB,  # CPU hardware threads:   4
RAM usage:    850 MB,  # Benchmark threads:      4

Dict        Compressing          |        Decompressing
      Speed Usage    R/U Rating  |    Speed Usage    R/U Rating
       KB/s     %   MIPS   MIPS  |     KB/s     %   MIPS   MIPS

22:    1575   286    536   1533  |    42377   388    984   3823
23:    1439   284    515   1466  |    41216   388    972   3771
24:    1430   294    523   1537  |    39435   390    937   3658
25:    1408   302    531   1607  |    37968   390    916   3570
----------------------------------------------------------------
Avr:          292    526   1536               389    952   3706
Tot:          340    739   2621

mlinuxguy
Posts: 840
Joined: Thu Feb 28, 2013 10:28 am
languages_spoken: english
ODROIDs: X, X2, XU, XU3, XU4, C1, C1+, C2, N1, USB-IO
Has thanked: 0
Been thanked: 0
Contact:

Re: What is the sysbench test result?

Unread post by mlinuxguy » Wed Mar 02, 2016 2:48 am

Watching temperatures during: # sysbench --test=cpu --cpu-max-prime=200000 run --num-threads=4
Every 5 seconds: starting temps:

Code: Select all

40000
39000
At end of run:

Code: Select all

47000
48000
48000
49000
max was 49000 so 49C

From Devicetree their trip points are

Code: Select all

        soc_thermal {
            polling-delay = <1000>;
            polling-delay-passive = <100>;
            sustainable-power = <3600>;

            thermal-sensors = <&aml_sensor0 3>;

            trips {
                switch_on: trip-point@0 {
                    temperature = <70000>;
                    hysteresis = <1000>;
                    type = "passive";
                };
                control: trip-point@1 {
                    temperature = <80000>;
                    hysteresis = <1000>;
                    type = "passive";
                };
                hot: trip-point@2 {
                    temperature = <90000>;
                    hysteresis = <5000>;
                    type = "hot";
                };
                critical: trip-point@3 {
                    temperature = <110000>;
                    hysteresis = <1000>;
                    type = "critical";
                };
            };

tkaiser
Posts: 672
Joined: Mon Nov 09, 2015 12:30 am
languages_spoken: english
ODROIDs: C1+, C2, XU4, HC1
Has thanked: 0
Been thanked: 1 time
Contact:

Re: What is the sysbench test result?

Unread post by tkaiser » Wed Mar 02, 2016 3:25 am

mlinuxguy wrote:From Devicetree their trip points are

Code: Select all

        soc_thermal {
            polling-delay = <1000>;
            polling-delay-passive = <100>;
            sustainable-power = <3600>;

            thermal-sensors = <&aml_sensor0 3>;

            trips {
                switch_on: trip-point@0 {
                    temperature = <70000>;
                    hysteresis = <1000>;
                    type = "passive";
                };
                control: trip-point@1 {
                    temperature = <80000>;
                    hysteresis = <1000>;
                    type = "passive";
                };
                hot: trip-point@2 {
                    temperature = <90000>;
                    hysteresis = <5000>;
                    type = "hot";
                };
                critical: trip-point@3 {
                    temperature = <110000>;
                    hysteresis = <1000>;
                    type = "critical";
                };
            };
Thx! Ok, calculating prime numbers seems to be not that challenging given the thermal read outs and the trip points defined. And as already written: time to give up on sysbench to compare anything when we're talking about ARMv8. And the huge heatsink does apparently a pretty good job.

The linux-sunxi devs developed some time ago a set of programs to be able to heat up a SoC to the maximum just to check then whether it still can run stable: http://linux-sunxi.org/Hardware_Reliabi ... y_settings

Siarhei added now cpuburn-a53 to it https://github.com/ssvb/cpuburn-arm -- this works that efficient that my Pine64+ immediately deadlocks when I start it (I would suspect undervoltage since Pine64 uses Micro USB for DC-IN -- this is the other thing I really like at Hardkernel apart from the huge heatsink: a real power barrel/plug!). Maybe you want to give it a try? Would be really interesting how S905 behaves (A64 when limited to 816MHz at 1.1V draws already a whopping 1100mA more according to the tool's author)

mlinuxguy
Posts: 840
Joined: Thu Feb 28, 2013 10:28 am
languages_spoken: english
ODROIDs: X, X2, XU, XU3, XU4, C1, C1+, C2, N1, USB-IO
Has thanked: 0
Been thanked: 0
Contact:

Re: What is the sysbench test result?

Unread post by mlinuxguy » Wed Mar 02, 2016 5:44 am

I had to rebuild the kernel and enable CPU Freq gov: userspace

Code: Select all

CPU stress test, which is doing JPEG decoding by libjpeg-turbo
at different cpufreq operating points.

Testing CPU 0
 2016 MHz ............................................................ OK
 1752 MHz ............................................................ OK
 1536 MHz ............................................................ OK
 1296 MHz ............................................................ OK
 1000 MHz ............................................................ OK
  500 MHz ............................................................ OK
  250 MHz ............................................................ OK
  100 MHz .......
Its still going, during the entire period the temperature was being logged

Code: Select all

37000 <--- starting temp
42000
43000
42000
43000  <--- peak temp
43000
42000
42000
41000

tkaiser
Posts: 672
Joined: Mon Nov 09, 2015 12:30 am
languages_spoken: english
ODROIDs: C1+, C2, XU4, HC1
Has thanked: 0
Been thanked: 1 time
Contact:

Re: What is the sysbench test result?

Unread post by tkaiser » Wed Mar 02, 2016 6:06 am

mlinuxguy wrote:I had to rebuild the kernel and enable CPU Freq gov: userspace
Sorry, the most important part for modern SBCs is unfortunately this:
There are border cases in which extended tests show a device might not be stable at certain settings even though they pass the tests in this script. Especially on a multi-core system you may want to run CPU-intensive tasks in the background while running cpufreq-ljt-stress-test in order to keep all cores busy. The cpuburn scripts (see below) or compiling a kernel might be suitable tasks for this end.
So without running heavy/demanding tasks in parallel cpufreq-ljt-stress-test isn't of any use on multi-core SoCs (note to myself: talk with Siarhei about both wiki and script). But it really seems thermal throttling with the C2 jumps in not caused by CPU activity alone (maybe when using cpuburn-a53). That's definitively an advantage over A64/Pine64.

BTW: I just figured out that A64 can be overvolted/overclocked just like its smaller 32-bit sibling H3 (A64 is more or less an H3 with Cortex-A53 cores and less USB ports). Clocking at 1536 MHz Pine64 performs identical to C2 as long as we're talking about pretty irrelevant sysbench scores :)

User avatar
odroid
Site Admin
Posts: 32677
Joined: Fri Feb 22, 2013 11:14 pm
languages_spoken: English
ODROIDs: ODROID
Has thanked: 209 times
Been thanked: 364 times
Contact:

Re: What is the sysbench test result?

Unread post by odroid » Wed Mar 02, 2016 9:46 am

I fully agree the sysbench seems not to be a good benchmark tool for comparing different platforms.

We already performed CPU/GPU/VPU combined intensive tests to check the thermal throttling issue on the C2 Android platform.
It was very hard to get 65C at ambient temperature of 25C. There was no thermal throttling happened for several hours test.
So the cooling effect of the stock heat sink should be acceptable and reliable. If we removed the heat sink, we could observe a bunch of thermal throttling events.

BTW, we will try to add a few more over-clock options in the Trustzone firmware with Amlogic's help. It was inspired by mlinuxguy.
But we are focusing on the x11 Mali 64bit/32bit driver development and the XEN/KVM implementation at this moment.
Once the Kodi runs on Ubuntu 64bit, we may assign our possible resources for developing Kernel 4.4. or mainlining from May time frame.

tkaiser
Posts: 672
Joined: Mon Nov 09, 2015 12:30 am
languages_spoken: english
ODROIDs: C1+, C2, XU4, HC1
Has thanked: 0
Been thanked: 1 time
Contact:

Re: What is the sysbench test result?

Unread post by tkaiser » Wed Mar 02, 2016 4:16 pm

odroid wrote:It was very hard to get 65C at ambient temperature of 25C. There was no thermal throttling happened for several hours test.
So the cooling effect of the stock heat sink should be acceptable and reliable. If we removed the heat sink, we could observe a bunch of thermal throttling events.
Thanks for the confirmation so it seems your board design using this huge heatsink is something others can learn from. The A64 used on Pine64 definitely has thermal/throttling problems when used without a heatsink. Fortunately for Pine users the SBC layout allows attaching huge heatsinks covering also DRAM but due to missing mounting holes on the PCB this will be a bit challenging.

And I would also focus on really important areas like KVM and Mali issues (the Mali450 doesn't support OpenCL? That started with T600 right?) before thinking about overclocking (pretty useless unless you want to shine in benchmarks or do number crunching on the wrong device)

nobe
Posts: 129
Joined: Sun Feb 07, 2016 9:52 pm
languages_spoken: english, french
ODROIDs: Odroid-C2
Has thanked: 0
Been thanked: 0
Contact:

Re: What is the sysbench test result?

Unread post by nobe » Wed Mar 02, 2016 5:58 pm

tkaiser wrote:it seems your board design using this huge heatsink is something others can learn from
imho, they can also learn from the cases HK sells : putting airvents right above the heatsink should be mandatory, but most tv box and SBC makers don't think about that unfortunately.

User avatar
odroid
Site Admin
Posts: 32677
Joined: Fri Feb 22, 2013 11:14 pm
languages_spoken: English
ODROIDs: ODROID
Has thanked: 209 times
Been thanked: 364 times
Contact:

Re: What is the sysbench test result?

Unread post by odroid » Wed Mar 02, 2016 6:00 pm

Yes, OpenGL ES 3.1, OpenCL 1.2 and Vulkan are available only on the Mali T6xx or higher like ODROID-XU4.

BTW, everybody could enjoy 10~20% of bonus performance by the overclocking on the ODROID-C1+ with mlinuxguy's nice patch.
So we expected it again on the C2. But the DVFS table seems to be controlled by some hidden firmware inside like Intel. :(

tkaiser
Posts: 672
Joined: Mon Nov 09, 2015 12:30 am
languages_spoken: english
ODROIDs: C1+, C2, XU4, HC1
Has thanked: 0
Been thanked: 1 time
Contact:

Re: What is the sysbench test result?

Unread post by tkaiser » Wed Mar 02, 2016 7:09 pm

odroid wrote:BTW, everybody could enjoy 10~20% of bonus performance by the overclocking on the ODROID-C1+ with mlinuxguy's nice patch.
So we expected it again on the C2. But the DVFS table seems to be controlled by some hidden firmware inside like Intel. :(
A locked dvfs table is not a bug but a feature! ;)

Yes, C1+ runs absolutely stable with 1.7GHz (thanks to the right decision to use that huge heatsink by default). That's the reason I chose to let it run at 1.7GHz when reviewing LeMaker's Guitar that has been advertised as being capable of running at 'up to 1.3GHz' but in reality Actions Semi's S500 SoC had such thermal troubles that this clockspeed can not be considered reliable on the S500 [1].

But to be honest: 1.5GHz vs. 1.7GHz (or now 2.0 vs. 2.2 with C2) might let benchmark scores look better but doesn't change anything in reality. Distributing IRQs across different CPU cores (at least assigning mmc0, eth0 and USB to cpu1/cpu2/cpu3), tweaking process scheduler settings and optimising software/settings is more important when it's about performance gains that really matter. Synthetic benchmarks that only test one specific irrelevant operation are not able to show this.

In other words: I'm against overclocking since it's the wrong approach to solve 'real world' performance problems and especially when reliability is concerned (was definitely not the case on C1+). Already try to convince other Pine64 devs to revert changes in this area ;) https://github.com/longsleep/build-pine64-image/pull/3


[1] Results should be taken with a grain of salt as usual:

Image

crossover
Posts: 113
Joined: Wed Jul 22, 2015 2:23 pm
languages_spoken: english
ODROIDs: XU4, C1+, C2, USB-IO, HC2, Tinkering kits
Has thanked: 0
Been thanked: 0
Contact:

Re: What is the sysbench test result?

Unread post by crossover » Wed Mar 02, 2016 7:51 pm

OMG! I was just curious about the sysbench results.
But I feel that I was very stupid.
I've learned a lot from this long discussion. Thank you guys.

mlinuxguy
Posts: 840
Joined: Thu Feb 28, 2013 10:28 am
languages_spoken: english
ODROIDs: X, X2, XU, XU3, XU4, C1, C1+, C2, N1, USB-IO
Has thanked: 0
Been thanked: 0
Contact:

Re: What is the sysbench test result?

Unread post by mlinuxguy » Thu Mar 03, 2016 3:54 am

but to be honest: 1.5GHz vs. 1.7GHz (or now 2.0 vs. 2.2 with C2) might let benchmark scores look better but doesn't change anything in reality.
In most cases yes, as the entire system that needs to be equal to higher clocks. If not the gains in real life are minuscule.

I typically overclock a system immediately so I can get a handle on how the various components that make up the processor react and how
well it scales to real life applications. I've been overclocking since my first computer a TRS-80 Model 3 (ran it double the shipping clock speed with a switch on the side)
In many cases there is headroom for overclocking, especially at a new process node before an optimization pass.
So much of the gains you might see from an OC depend on L1 and L2 cache sizes, was ram also OC'd, and does the app fit in the L1/L2 cache among others.

I don't intend to get into an argument over the merits of overclocking or lack.... that has been argued extensively over decades.
Note: even turbo-boost from intel is essentially a thermally limited overclock

I started the OC exploration to answer my own questions about how this particular silicon scales and would it be possible to adapt the turbo boost code
to support this Amlogic chip. Its application to real world work loads is unknown until we try.... and getting locked out of trying with Trustzone is
just like an Intel locked core (where they charge you more for the privilege of OC). Only here we don't even have the option of paying more to experiment.

tkaiser
Posts: 672
Joined: Mon Nov 09, 2015 12:30 am
languages_spoken: english
ODROIDs: C1+, C2, XU4, HC1
Has thanked: 0
Been thanked: 1 time
Contact:

Re: What is the sysbench test result?

Unread post by tkaiser » Thu Mar 03, 2016 10:51 pm

mlinuxguy wrote:
but to be honest: 1.5GHz vs. 1.7GHz (or now 2.0 vs. 2.2 with C2) might let benchmark scores look better but doesn't change anything in reality.
In most cases yes, as the entire system that needs to be equal to higher clocks. If not the gains in real life are minuscule.
I fear we can already agree to disagree regarding overclocking :)

IMO it depends on the use case: If I want the C2 as some sort of a desktop machine then being able to unlock HW acceleration (GPU/VPU -- a matter of drivers and software optimisation) is way more important than CPU/memory clockspeed adjustments... and the other area where the C2 will really shine: high I/O throughput with low latency. Being able to use the rather expensive eMMC modules Hardkernel sells makes a huge or maybe the difference especially when comparing with Pine64 (A64 could use eMMC as well but the necessary pins are already used for other stuff on the Pine boards) and RPi. At least the Pine64 is limited to slow SD card transfers -- no idea whether that has changed with the RPI 3 in Raspberry land.

Other use case: High I/O demands. If it's about that I would choose another SoC that is optimised in this area even if it's just dual core with less clockspeed. Since if SATA, GbE and a few USB ports can be satured that is more important than CPU horsepower.

Other use case: Need for encryption (file system or for networking, eg. VPN server). Being able to use integrated crypto engines (AES-NI on Intel, CESA on Marvell SoCs, CAAM with Freescale's i.MX series and so on) is way more important than increasing clocks a little. And again: This is driver/software stuff.

Another use case: Number crunching. Sorry you chose the wrong device since you still get not enough bang for the buck even when clocking S905 with 2.5GHz.

IMO the only area where OC really matters is producing numbers without meaning (also called 'benchmarks') ;)

mlinuxguy
Posts: 840
Joined: Thu Feb 28, 2013 10:28 am
languages_spoken: english
ODROIDs: X, X2, XU, XU3, XU4, C1, C1+, C2, N1, USB-IO
Has thanked: 0
Been thanked: 0
Contact:

Re: What is the sysbench test result?

Unread post by mlinuxguy » Fri Mar 04, 2016 1:33 am

Interestingly enough I'm a pine64 kickstarter supporter (max'ed out board version)
When it finally comes in I'll have another 64-bit board to "peek" and "poke" values into strange memory locations

tkaiser
Posts: 672
Joined: Mon Nov 09, 2015 12:30 am
languages_spoken: english
ODROIDs: C1+, C2, XU4, HC1
Has thanked: 0
Been thanked: 1 time
Contact:

Re: What is the sysbench test result?

Unread post by tkaiser » Fri Mar 04, 2016 1:59 am

mlinuxguy wrote:Interestingly enough I'm a pine64 kickstarter supporter (max'ed out board version)
When it finally comes in I'll have another 64-bit board to "peek" and "poke" values into strange memory locations
Be prepared that you can tweak there a lot and think about ordering this piece of metal soon so that it arrives in time: http://www.ebay.com/itm/Aluminum-Heatsi ... 1518718878 (30x40mm would be better)

And when you start overvolting the SoC maybe a heatsink for the AXP803 PMIC might be necessary too (9x9mm in size, you might use 14x14mm max due to placement of other PCB components)

But first you'll have to unsolder the Micro USB receptacle and replace it with something more sane to power the board since it will simply deadlock when you try the overclocking game with this inappropriate DC-IN solution :)

tkaiser
Posts: 672
Joined: Mon Nov 09, 2015 12:30 am
languages_spoken: english
ODROIDs: C1+, C2, XU4, HC1
Has thanked: 0
Been thanked: 1 time
Contact:

Re: What is the sysbench test result?

Unread post by tkaiser » Tue Mar 08, 2016 7:38 pm

BTW: http://openbenchmarking.org/result/1603 ... 603082GA36

So I played the OC game (unlocking 1344 MHz on Pine64, defining less aggressive throttling strategies and using a fan since currently I've only a pretty small and crappy heatsink applied to the A64 so a fan has to help too). Necessary comments here: http://kaiser-edv.de/tmp/mc6CyL/PTS_Settings.txt

I managed to stay almost the whole test duration at the upper clockspeed (no wonder, the Phoronix tests are rather lightweight):

Image

IMO three conclusions can be drawn from the results:
  • - ODROID-C2 is the fastest of them and especially the superiour heat dissipation and the option to use eMMC 5.0 might speed things up
    - RPi 3 users can hope that the RPi foundation will sometimes get the idea to move with Raspbian from ARMv6 to ARMv8 (then benchmark scores will improve a lot)
    - Pine64 is a bit in a loser's position even if the numbers might look different. But it's already challenging to let it run stable at this clockspeed (both throttling and the insufficient DC-IN solution being responsible for)
The good news (for Pine64 users and especially the overclocker's camp): It's possible to feed DC-IN through the so called Euler connector and the Pine64 guys evaluate a heatsink set (containing also thermal pads). And also funny: since the A64 is nearly the same as Allwinner's H3 just with less USB ports and Cortex-A53 cores instead stuff like GPU and VPU acceleration might be ported easily to A64 (I've been surprised how well the H3 plays H.264/H.265 hardware accelerated. Details as usual in the Armbian forums)

Post Reply

Return to “Ubuntu”

Who is online

Users browsing this forum: No registered users and 2 guests