Micro-benchmarking the N2

Post Reply
blu
Posts: 78
Joined: Wed Mar 08, 2017 11:30 pm
languages_spoken: english
ODROIDs: XU4, N2
Has thanked: 3 times
Been thanked: 20 times
Contact:

Micro-benchmarking the N2

Unread post by blu » Sun May 12, 2019 4:03 am

I finally got my hands on the elusive N2, and CPU microbenchmarks were unleashed without mercy at the little machine's CPU.

(^f Amlogic)
Mandelbrot in BF interpreter
prime factorization
GEMM

For branchy integer code CA73 is a clear winner over CA72. For neon/asimd2 CA73 is reasonably close to the CA72 (~17% difference).
All-in-all, rarely have I seen such results from a $80 passively-cooled machine. Good job, HK/Amlogic!

User avatar
odroid
Site Admin
Posts: 32734
Joined: Fri Feb 22, 2013 11:14 pm
languages_spoken: English
ODROIDs: ODROID
Has thanked: 220 times
Been thanked: 369 times
Contact:

Re: Micro-benchmarking the N2

Unread post by odroid » Mon May 13, 2019 9:13 am

Thank you for sharing the detail benchmark test results.
I hope the N2 must be a quiet and cool device. :)

blu
Posts: 78
Joined: Wed Mar 08, 2017 11:30 pm
languages_spoken: english
ODROIDs: XU4, N2
Has thanked: 3 times
Been thanked: 20 times
Contact:

Re: Micro-benchmarking the N2

Unread post by blu » Mon May 13, 2019 1:19 pm

odroid wrote:
Mon May 13, 2019 9:13 am
Thank you for sharing the detail benchmark test results.
I hope the N2 must be a quiet and cool device. :)
It's as quiet as it gets and in contrast to other fanless boards -- entirely justified. Advanced chip fabnodes FTW : )

User avatar
odroid
Site Admin
Posts: 32734
Joined: Fri Feb 22, 2013 11:14 pm
languages_spoken: English
ODROIDs: ODROID
Has thanked: 220 times
Been thanked: 369 times
Contact:

Re: Micro-benchmarking the N2

Unread post by odroid » Mon May 13, 2019 1:56 pm

Glad to know that we didn't waste money to make a bulky and heavy heatsink.
But I have to agree the modern 12nm fab silicon must be the main reason why we don't need a noisy fan. :D

blu
Posts: 78
Joined: Wed Mar 08, 2017 11:30 pm
languages_spoken: english
ODROIDs: XU4, N2
Has thanked: 3 times
Been thanked: 20 times
Contact:

Re: Micro-benchmarking the N2

Unread post by blu » Fri May 24, 2019 7:10 am

Finally got to benchmark the Mali-G52 on rudimentary raytracing in OCL, and that little bugger actually outpeforms a 2010 macbook (geforce 320M):

geforce 320M: 47fps
mali-G52 mp2: 50fps
Last edited by blu on Mon Jul 22, 2019 7:55 pm, edited 2 times in total.

User avatar
odroid
Site Admin
Posts: 32734
Joined: Fri Feb 22, 2013 11:14 pm
languages_spoken: English
ODROIDs: ODROID
Has thanked: 220 times
Been thanked: 369 times
Contact:

Re: Micro-benchmarking the N2

Unread post by odroid » Fri May 24, 2019 8:51 am

Thank you sharing another nice GPU benchmark result.

blu
Posts: 78
Joined: Wed Mar 08, 2017 11:30 pm
languages_spoken: english
ODROIDs: XU4, N2
Has thanked: 3 times
Been thanked: 20 times
Contact:

Re: Micro-benchmarking the N2

Unread post by blu » Sat Jun 15, 2019 7:07 am

And here's how S922X performs at a very BW-demanding CPU task -- variations of a single-threaded binary search -- in comparison to other machines:

Image

(note: all quoted RAM BWs are theoretical)
These users thanked the author blu for the post:
odroid (Mon Jun 17, 2019 11:27 am)

blu
Posts: 78
Joined: Wed Mar 08, 2017 11:30 pm
languages_spoken: english
ODROIDs: XU4, N2
Has thanked: 3 times
Been thanked: 20 times
Contact:

Re: Micro-benchmarking the N2

Unread post by blu » Wed Jun 19, 2019 5:28 am

Here's the code for the above binary-search benchmark (repo wasn't public at the time of the post).
These users thanked the author blu for the post:
odroid (Wed Jun 19, 2019 9:17 am)

blu
Posts: 78
Joined: Wed Mar 08, 2017 11:30 pm
languages_spoken: english
ODROIDs: XU4, N2
Has thanked: 3 times
Been thanked: 20 times
Contact:

Re: Micro-benchmarking the N2

Unread post by blu » Wed Jul 17, 2019 8:30 pm

BTW, something I've noticed in a couple of microbenches: gcc/g++-8.x tends to produce better-scheduled code when tuned for CA57 (-mtune/-mcpu=cortex-a57) over native tuning (-mtune/-mcpu=cortex-a73). That seems to be a recurring issue with gcc (and occasionally clang) across several cortex big uarchs. So as a word of advice, check your performance-critical tuning with both schedulers.

blu
Posts: 78
Joined: Wed Mar 08, 2017 11:30 pm
languages_spoken: english
ODROIDs: XU4, N2
Has thanked: 3 times
Been thanked: 20 times
Contact:

Re: Micro-benchmarking the N2

Unread post by blu » Sun Jul 21, 2019 12:05 am

Mali-G52's performance has been the hot topic of conversations recently, so I decided to compile some ancient GLES2/3 unit tests of mine for the N2, and publish the binaries so that everybody can do GPU cycle burning to their heart's desire : )

On a more serious note, I wanted to test the 4K (3840x2160) performance of G52 as found in the N2. And I was impressed!

All four unit tests run at a 4K, 16x MSAA (multi-sampled anti-aliasing) at a nearly rock-steady 60 fps -- very few frames are dropped (to an avg fps of ~59.9).

Running the tests for ~3.3 minutes in a row (3e10 frames each) brings the SoC temp to ~50C, for an ambient temp of 25-26C. Running the test for ~4.4 minutes (4e10 frames each) bumps the temp to ~53C. Running the test for ~5.5 minutes does not raise the temp any further. All tests were pinned to the big cores.

Binaries were build on an 4.9.162-22 kernel and Bifrost Mali fbdev blob ver 'git.c8adbf9'.

[ed] To give some frame of reference -- the performance of these tests on other GPUs:
  • first test, Mali T720mp2 : 57.0fps @ 1920x1200, no MSAA (G52mp2 does 3840x2160x16 / 1920 / 1200 = 57.6x samples, @ 59.9fps)
  • first test, Rogue GX6250 : 30.0fps @ 2160x2160, no MSAA (G52mp2 does 16x the amount of samples per frame @ 59.9fps) (note 1)
  • first test, Rogue GX6250 : 59.4fps @ 1920x1080, no MSAA (G52mp2 does 64x the amount of samples per frame @ 59.9fps) (note 1)
  • second test, Rogue GX6250 : 29.2fps @ 2160x2160, no MSAA (G52mp2 does 16x the amount of samples per frame @ 59.9fps) (note 1)
  • second test, Rogue GX6250 : 59.5fps @ 1920x1080, no MSAA (G52mp2 does 64x the amount of samples per frame @ 59.9fps) (note 1)
  • fourth test, Mali T720mp2 : 56.9fps @ 1920x1200, no MSAA (G52mp2 does 57.6x the amount of samples per frame @ 59.9fps)
  • fourth test, Rogue GX6250 : 26.2fps @ 2160x2160, no MSAA (G52mp2 does 16x the amount of samples per frame @ 59.9fps) (note 1)
  • fourth test, Rogue GX6250 : 54.0fps @ 1920x1080, no MSAA (G52mp2 does 64x the amount of samples per frame @ 59.9fps) (note 1) -- SoC reached 53C within 4 minutes at 24C ambient.
note 1: Vsync capped at 30hz @ 4K (HDMI port limitation). Test goes over DRI drivers w/ alpha compositing; I've tried shrinking the 4K viewport horizontal margins -- does not affect the rasterized triangle area, but reduces the area composited with the clear background color.
Last edited by blu on Wed Jul 24, 2019 4:42 am, edited 7 times in total.
These users thanked the author blu for the post:
rooted (Tue Jul 23, 2019 2:01 am)

User avatar
odroid
Site Admin
Posts: 32734
Joined: Fri Feb 22, 2013 11:14 pm
languages_spoken: English
ODROIDs: ODROID
Has thanked: 220 times
Been thanked: 369 times
Contact:

Re: Micro-benchmarking the N2

Unread post by odroid » Mon Jul 22, 2019 7:42 pm

Glad to hear the SoC temperature was not higher than 55°C with the very heavy 4K GPU burning tests. :)

BTW, can I ask you which CPU has the Mali-T720MP2?

blu
Posts: 78
Joined: Wed Mar 08, 2017 11:30 pm
languages_spoken: english
ODROIDs: XU4, N2
Has thanked: 3 times
Been thanked: 20 times
Contact:

Re: Micro-benchmarking the N2

Unread post by blu » Mon Jul 22, 2019 8:05 pm

T720mp2 is a 4x CA53 @ 1.51GHz machine. BTW, just so you get an idea how demanding such GPU tests can be to the CPU -- if I don't pin the first test to the big cores on N2, the CPU delay in the submission of command buffers (due to switching to less performant little cores) causes the 4k 16x MSAA fps to immediately drop to 30 fps.

User avatar
odroid
Site Admin
Posts: 32734
Joined: Fri Feb 22, 2013 11:14 pm
languages_spoken: English
ODROIDs: ODROID
Has thanked: 220 times
Been thanked: 369 times
Contact:

Re: Micro-benchmarking the N2

Unread post by odroid » Mon Jul 22, 2019 8:24 pm

Ok. It seems to be a MediaTek device.

I have no idea why big-little CPU core switching makes the huge difference of the GPU performance.
Is the GPU IRQ handler assigned to a big core?

User avatar
memeka
Posts: 4399
Joined: Mon May 20, 2013 10:22 am
languages_spoken: english
ODROIDs: XU rev2 + eMMC + UART
U3 + eMMC + IO Shield + UART
Has thanked: 2 times
Been thanked: 48 times
Contact:

Re: Micro-benchmarking the N2

Unread post by memeka » Mon Jul 22, 2019 9:45 pm

This also means that the benchmark is pretty useless, since you are comparing A73-based performance on G52 with A53-based performance on T720.
So more realistically you should compare same resolution on same CPU cores :P

blu
Posts: 78
Joined: Wed Mar 08, 2017 11:30 pm
languages_spoken: english
ODROIDs: XU4, N2
Has thanked: 3 times
Been thanked: 20 times
Contact:

Re: Micro-benchmarking the N2

Unread post by blu » Mon Jul 22, 2019 9:47 pm

odroid wrote:
Mon Jul 22, 2019 8:24 pm
Ok. It seems to be a MediaTek device.

I have no idea why big-little CPU core switching makes the huge difference of the GPU performance.
Is the GPU IRQ handler assigned to a big core?
That's right, the T720mp2 device is a MT8163A.

The halving of GPU performance comes from the vsync (test is vsync'd on all devices). I should stress that 16x MSAA poses a huge performance impact on the fillrate -- that's 16x the depth samples of normal rendering. If I ran the same tests sans MSAA, the first test has no problem running 4K @ 60fps pinned on the little cores of the N2. So generally (and without profiling the setup in detail), my guess is that the GPU is really stressed at 4K 16x MSAA @ 60 fps, and delaying the command buffer submission by the CPU (big -> little "degradation") causes the GPU to consistently miss its 16ms frame window when vsync'd.
memeka wrote:
Mon Jul 22, 2019 9:45 pm
This also means that the benchmark is pretty useless, since you are comparing A73-based performance on G52 with A53-based performance on T720.
So more realistically you should compare same resolution on same CPU cores :P
Well, I don't have a CA73-based T720 device, so it is what it is ; )

Seriously, though, test was done to push the thermals of the N2, and the N2 took it like a champ. I'll post results from Jetson Nano later this week (away from home ATM).
These users thanked the author blu for the post:
odroid (Tue Jul 23, 2019 10:01 am)

back2future
Posts: 265
Joined: Sun Jul 23, 2017 3:19 pm
languages_spoken: english
Has thanked: 11 times
Been thanked: 5 times
Contact:

Re: Micro-benchmarking the N2

Unread post by back2future » Thu Jul 25, 2019 6:17 am

[
Binaries were build on an 4.9.162-22 kernel
That kernel shows repeating blocking behaviour (up to once per minute for ~1 second, not regularly) for mouse and keyboard?
Benchmarking works. ]

blu
Posts: 78
Joined: Wed Mar 08, 2017 11:30 pm
languages_spoken: english
ODROIDs: XU4, N2
Has thanked: 3 times
Been thanked: 20 times
Contact:

Re: Micro-benchmarking the N2

Unread post by blu » Thu Jul 25, 2019 5:35 pm

back2future wrote:
Thu Jul 25, 2019 6:17 am
That kernel shows repeating blocking behaviour (up to once per minute for ~1 second, not regularly) for mouse and keyboard?
I had no idea of that quirk, as it has not manifested to me. Could it be since I have no mouse attached, just a kbd?
Benchmarking works.
Good to know it runs outside of my N2 too : ) In theory it should not depend on the kernel version (i.e. as long as kernel has the relevant Mali IOCTL code), but it *might* depend on the Mali fbdev stack version, as test relies on certain EGL config IDs, hardcoded in the script, which IDs could change across libEGL (i.e. libMali) versions.

back2future
Posts: 265
Joined: Sun Jul 23, 2017 3:19 pm
languages_spoken: english
Has thanked: 11 times
Been thanked: 5 times
Contact:

Re: Micro-benchmarking the N2

Unread post by back2future » Fri Jul 26, 2019 6:59 am

I had no idea of that quirk, as it has not manifested to me. Could it be since I have no mouse attached, just a kbd?
While researching journalctl recognized that a secondary usb hub, providing keyboard and mouse connection, gets disconnected often with 4.9.162-22, but not with 4.9.156-14. So it a very local situation and problem. Only difficulty is, that Benchmarking can't be done on 4.9.156-14 kernel.
Good to know it runs outside of my N2 too : ) In theory it should not depend on the kernel version (i.e. as long as kernel has the relevant Mali IOCTL code), but it *might* depend on the Mali fbdev stack version, as test relies on certain EGL config IDs, hardcoded in the script, which IDs could change across libEGL (i.e. libMali) versions.
Thx for the explanation. Might help with: Do g-truc/ogl-samples compile and run on Your N2 (ACK for x86_64)?
[ https://github.com/dv1/eglinfo ]

blu
Posts: 78
Joined: Wed Mar 08, 2017 11:30 pm
languages_spoken: english
ODROIDs: XU4, N2
Has thanked: 3 times
Been thanked: 20 times
Contact:

Re: Micro-benchmarking the N2

Unread post by blu » Fri Aug 02, 2019 6:13 am

So now that I have an aarch64 OCL stack on my nanoPi M4 (thanks to the good folks at Armbian and Rockchip's github), I can finally compare the Midgard and Bifrost side to side (literally, on the same desk : ) in octree ray-traversal:

Code: Select all

NanoPi-M4: Rockchip RK3399:
	2x Cortex-A72 @ 1.8GHz, 4x Cortex-A53 @ 1.42GHz, Mali-T860 MP4 @ 800MHz, 4GB LPDDR3 x64 @ 1600MT/s (12.8GB/s)

$ ./problem_4 -frames 100
kernel preferred workgroup size multiple: 4
device max work-item sizes: 256 256 256
total frames rendered: 100
elapsed time: 2.384069 s
average FPS: 41.945092

Code: Select all

Odroid-N2: Amlogic S922X:
	4x Cortex-A73 @ 1.8GHz, 2x Cortex-A53 @ 1.896GHz, Mali-G52 MP2 (2x3) @ 750MHz, 4GB LPDDR4 x32 @ 2640MT/s (10.56GB/s)

$ ./problem_4 -frames 100                        
kernel preferred workgroup size multiple: 8
device max work-item sizes: 384 384 384
total frames rendered: 100
elapsed time: 1.995514 s
average FPS: 50.112401
back2future wrote:
Fri Jul 26, 2019 6:59 am
Thx for the explanation. Might help with: Do g-truc/ogl-samples compile and run on Your N2 (ACK for x86_64)?
[ https://github.com/dv1/eglinfo ]
Not yet, though by the looks of it eglinfo *should* compile for target fbdev on the N2. I'll report later.
These users thanked the author blu for the post:
rooted (Fri Aug 02, 2019 10:15 am)

User avatar
odroid
Site Admin
Posts: 32734
Joined: Fri Feb 22, 2013 11:14 pm
languages_spoken: English
ODROIDs: ODROID
Has thanked: 220 times
Been thanked: 369 times
Contact:

Re: Micro-benchmarking the N2

Unread post by odroid » Fri Aug 02, 2019 8:40 am

Thank you for the OCL comparison test.
It seems that N2 has 20% faster OpenCL GPU performance than N1.

blu
Posts: 78
Joined: Wed Mar 08, 2017 11:30 pm
languages_spoken: english
ODROIDs: XU4, N2
Has thanked: 3 times
Been thanked: 20 times
Contact:

Re: Micro-benchmarking the N2

Unread post by blu » Fri Aug 02, 2019 2:33 pm

odroid wrote:
Fri Aug 02, 2019 8:40 am
Thank you for the OCL comparison test.
It seems that N2 has 20% faster OpenCL GPU performance than N1.
Keep in mind that that test is very ALU-friendly to the VLIW arch of the Midgard, other workloads could exhibit much larger differences. Bifrost is a good GPGPU arch (tm).

User avatar
odroid
Site Admin
Posts: 32734
Joined: Fri Feb 22, 2013 11:14 pm
languages_spoken: English
ODROIDs: ODROID
Has thanked: 220 times
Been thanked: 369 times
Contact:

Re: Micro-benchmarking the N2

Unread post by odroid » Fri Aug 02, 2019 2:42 pm

Yes, the difference must be various depends on type of workloads something like this.
viewtopic.php?f=176&t=34020

back2future
Posts: 265
Joined: Sun Jul 23, 2017 3:19 pm
languages_spoken: english
Has thanked: 11 times
Been thanked: 5 times
Contact:

Re: Micro-benchmarking the N2

Unread post by back2future » Fri Aug 02, 2019 3:01 pm

[ Question for "GPU & VRAM profiling" fits into context of monitoring (top-like, e.g. Intel) software for Mali (multi) gpu on *nix os'

While 64-bit cpu pipeline width programming is no real advantage to userland that much now (compared to 32bit memory usage efficiency @ <2-4GB ram), comparison between rk3399 and s922x(-b) shows advantage of 64-bit memory lanes width for lpddr3 or even more on lpddr4?

eglinfo compiles on N2:

Code: Select all

EGL information:
    API version:    1.4
    vendor string:  ARM
    version string: 1.4 Bifrost-"git"
    client APIs:    OpenGL_ES
    extensions:
      EGL_KHR_partial_update
      EGL_EXT_image_dma_buf_import
      EGL_KHR_config_attribs
      EGL_KHR_image
      EGL_KHR_image_base
      EGL_KHR_fence_sync
      EGL_KHR_wait_sync
      EGL_KHR_gl_colorspace
      EGL_KHR_get_all_proc_addresses
      EGL_IMG_context_priority
      EGL_KHR_no_config_context
      EGL_ARM_pixmap_multisample_discard
      EGL_ARM_implicit_external_sync
      EGL_KHR_gl_texture_2D_image
      EGL_KHR_gl_renderbuffer_image
      EGL_KHR_create_context
      EGL_KHR_surfaceless_context
      EGL_KHR_gl_texture_cubemap_image
      EGL_EXT_create_context_robustness
   number of configurations: 25

win = window      (c) = conformant             slow      = slow config               gl      = Desktop OpenGL                                                         
pb  = pbuffer     (n) = non-conformant         nonconfmt = non-conformant config     es1,es2 = OpenGL ES 1.x/2.x                                                      
pix = pixmap                                                                         vg      = OpenVG                                                                 

      #      ID  LEVEL  COLORBUFFER..........  DEPTH  STENCIL  MULTISAMPLE....  VISUAL.......  SURFACES..  RENDERABLES......................  TRANSPARENT..  CAVEAT...
                        type size  r  g  b  a  size   size     samples buffers  type   id                  apis                       native  type  r  g  b           
      0       1      0  rgb    32  8  8  8  8     0      0           0       0  0x0000 0x10bba0a  win,pb,pix  es1(c),es2(c)              no      none  0  0  0  none     
      1       2      0  rgb    32  8  8  8  8    24      0           0       0  0x0000 0x10bba0a  win,pb,pix  es1(c),es2(c)              no      none  0  0  0  none     
      2       3      0  rgb    32  8  8  8  8    24      8           0       0  0x0000 0x10bba0a  win,pb,pix  es1(c),es2(c)              no      none  0  0  0  none     
      3       4      0  rgb    32  8  8  8  8    24      8           4       1  0x0000 0x10bba0a  win,pb,pix  es1(c),es2(c)              no      none  0  0  0  none     
      4       5      0  rgb    16  5  6  5  0     0      0           0       0  0x3038 0x0000  pb,pix      es1(c),es2(c)              no      none  0  0  0  none     
      5       6      0  rgb    16  5  6  5  0    24      0           0       0  0x3038 0x0000  pb,pix      es1(c),es2(c)              no      none  0  0  0  none     
      6       7      0  rgb    16  5  6  5  0    24      8           0       0  0x3038 0x0000  pb,pix      es1(c),es2(c)              no      none  0  0  0  none     
      7       8      0  rgb    16  5  6  5  0    24      8           4       1  0x3038 0x0000  pb,pix      es1(c),es2(c)              no      none  0  0  0  none     
      8       9      0  rgb    24  8  8  8  0     0      0           0       0  0x0000 0x10bba0a  win,pb,pix  es1(c),es2(c)              no      none  0  0  0  none     
      9      10      0  rgb    24  8  8  8  0    24      8           0       0  0x0000 0x10bba0a  win,pb,pix  es1(c),es2(c)              no      none  0  0  0  none     
     10      11      0  rgb    24  8  8  8  0     0      0           4       1  0x0000 0x10bba0a  win,pb,pix  es1(c),es2(c)              no      none  0  0  0  none     
     11      12      0  rgb    24  8  8  8  0    24      8           4       1  0x0000 0x10bba0a  win,pb,pix  es1(c),es2(c)              no      none  0  0  0  none     
     12      13      0  rgb    16  5  5  5  1    24      8           0       0  0x3038 0x0000  pb,pix      es1(c),es2(c)              no      none  0  0  0  none     
     13      14      0  rgb    16  5  5  5  1    24      8           4       1  0x3038 0x0000  pb,pix      es1(c),es2(c)              no      none  0  0  0  none     
     14      15      0  rgb    16  4  4  4  4    24      8           0       0  0x3038 0x0000  pb,pix      es1(c),es2(c)              no      none  0  0  0  none     
     15      16      0  rgb    16  4  4  4  4    24      8           4       1  0x3038 0x0000  pb,pix      es1(c),es2(c)              no      none  0  0  0  none     
     16      17      0  rgb    32  8  8  8  8    24      8           8       1  0x0000 0x10bba0a  win,pb,pix  es1(c),es2(c)              no      none  0  0  0  none     
     17      18      0  rgb    16  5  6  5  0    24      8           8       1  0x3038 0x0000  pb,pix      es1(c),es2(c)              no      none  0  0  0  none     
     18      19      0  rgb    24  8  8  8  0    24      8           8       1  0x0000 0x10bba0a  win,pb,pix  es1(c),es2(c)              no      none  0  0  0  none     
     19      20      0  rgb    32  8  8  8  8    24      8          16       1  0x0000 0x10bba0a  win,pb,pix  es1(c),es2(c)              no      none  0  0  0  none     
     20      21      0  rgb    16  5  6  5  0    24      8          16       1  0x3038 0x0000  pb,pix      es1(c),es2(c)              no      none  0  0  0  none     
     21      22      0  rgb    24  8  8  8  0    24      8          16       1  0x0000 0x10bba0a  win,pb,pix  es1(c),es2(c)              no      none  0  0  0  none     
     22      23      0  rgb    24  8  8  8  0     0      0           0       0  0x3038 0x0000  pb          es1(c),es2(c)              no      none  0  0  0  none     
     23      24      0  rgb    64 16 16 16 16    24      8           0       0  0x3038 0x0000  pb          es1(n),es2(c)              no      none  0  0  0  none     
     24      25      0  rgb    32 10 10 10  2    24      8           0       0  0x3038 0x0000  pb          es1(n),es2(c)              no      none  0  0  0  none     

Could not find config for OpenGL (perhaps this API is unsupported?)

OpenGL ES 1 information:
    version string:  OpenGL ES-CM 1.1 git.c8adbf9.122c9daed32dbba4b3056f41a2f23c58
    renderer string: Mali-G52
    extensions:
      GL_OES_byte_coordinates
      GL_OES_fixed_point
      GL_OES_single_precision
      GL_OES_matrix_get
      GL_OES_compressed_paletted_texture
      GL_OES_point_size_array
      GL_OES_point_sprite
      GL_OES_read_format
      GL_OES_compressed_ETC1_RGB8_texture
      GL_OES_depth24
      GL_OES_stencil8
      GL_OES_framebuffer_object
      GL_OES_packed_depth_stencil
      GL_OES_rgb8_rgba8
      GL_EXT_read_format_bgra
      GL_OES_matrix_palette
      GL_OES_extended_matrix_palette
      GL_OES_draw_texture
      GL_OES_blend_equation_separate
      GL_OES_blend_func_separate
      GL_OES_blend_subtract
      GL_OES_stencil_wrap
      GL_OES_texture_mirrored_repeat
      GL_EXT_texture_format_BGRA8888
      GL_OES_query_matrix
      GL_OES_EGL_image
      GL_OES_EGL_image_external
      GL_OES_EGL_sync
      GL_OES_texture_npot
      GL_OES_vertex_half_float
      GL_OES_required_internalformat
      GL_OES_vertex_array_object
      GL_OES_mapbuffer
      GL_OES_fbo_render_mipmap
      GL_OES_element_index_uint
      GL_ARM_rgba8
      GL_EXT_blend_minmax
      GL_EXT_discard_framebuffer
      GL_EXT_texture_storage
      GL_OES_texture_compression_astc
      GL_KHR_texture_compression_astc_ldr
      GL_KHR_texture_compression_astc_hdr
      GL_KHR_texture_compression_astc_sliced_3d
      GL_EXT_texture_compression_astc_decode_mode
      GL_EXT_texture_compression_astc_decode_mode_rgb9e5
      GL_OES_surfaceless_context
      GL_EXT_multisampled_render_to_texture
      GL_OES_texture_cube_map
      GL_KHR_debug
      GL_EXT_sRGB
      GL_EXT_robustness
      GL_EXT_texture_filter_anisotropic
  main stats:
    max texture size:                 8192
    max cubemap texture size:         4096
    max texture image units:          85
    max renderbuffer size:            8192
    max combined texture image units: 127
    num compressed texture formats:   69
    aliased line width range:         1 - 4096
    aliased point size range:         1 - 1024
    implementation color read format: RGB
    implementation color read type:   unsigned byte
    max viewport dimensions:          8192 x 8192
    subpixel bits:                    8
    supported compressed texture formats:
      PALETTE4_RGB8_OES
      PALETTE4_RGBA8_OES
      PALETTE4_R5_G6_B5_OES
      PALETTE4_RGBA4_OES
      PALETTE4_RGB5_A1_OES
      PALETTE8_RGB8_OES
      PALETTE8_RGBA8_OES
      PALETTE8_R5_G6_B5_OES
      PALETTE8_RGBA4_OES
      PALETTE8_RGB5_A1_OES
      ETC1_RGB8
      0x9274
      0x9275
      0x9278
      0x9279
      0x9276
      0x9277
      0x9270
      0x9272
      0x9271
      0x9273
      COMPRESSED_RGBA_ASTC_4x4_KHR
      COMPRESSED_RGBA_ASTC_5x4_KHR
      COMPRESSED_RGBA_ASTC_5x5_KHR
      COMPRESSED_RGBA_ASTC_6x5_KHR
      COMPRESSED_RGBA_ASTC_6x6_KHR
      COMPRESSED_RGBA_ASTC_8x5_KHR
      COMPRESSED_RGBA_ASTC_8x6_KHR
      COMPRESSED_RGBA_ASTC_8x8_KHR
      COMPRESSED_RGBA_ASTC_10x5_KHR
      COMPRESSED_RGBA_ASTC_10x6_KHR
      COMPRESSED_RGBA_ASTC_10x8_KHR
      COMPRESSED_RGBA_ASTC_10x10_KHR
      COMPRESSED_RGBA_ASTC_12x10_KHR
      COMPRESSED_RGBA_ASTC_12x12_KHR
      COMPRESSED_SRGB8_ALPHA8_ASTC_4x4_KHR
      COMPRESSED_SRGB8_ALPHA8_ASTC_5x4_KHR
      COMPRESSED_SRGB8_ALPHA8_ASTC_5x5_KHR
      COMPRESSED_SRGB8_ALPHA8_ASTC_6x5_KHR
      COMPRESSED_SRGB8_ALPHA8_ASTC_6x6_KHR
      COMPRESSED_SRGB8_ALPHA8_ASTC_8x5_KHR
      COMPRESSED_SRGB8_ALPHA8_ASTC_8x6_KHR
      COMPRESSED_SRGB8_ALPHA8_ASTC_8x8_KHR
      COMPRESSED_SRGB8_ALPHA8_ASTC_10x5_KHR
      COMPRESSED_SRGB8_ALPHA8_ASTC_10x6_KHR
      COMPRESSED_SRGB8_ALPHA8_ASTC_10x8_KHR
      COMPRESSED_SRGB8_ALPHA8_ASTC_10x10_KHR
      COMPRESSED_SRGB8_ALPHA8_ASTC_12x10_KHR
      COMPRESSED_SRGB8_ALPHA8_ASTC_12x12_KHR
      0x93c0
      0x93c1
      0x93c2
      0x93c3
      0x93c4
      0x93c5
      0x93c6
      0x93c7
      0x93c8
      0x93c9
      0x93e0
      0x93e1
      0x93e2
      0x93e3
      0x93e4
      0x93e5
      0x93e6
      0x93e7
      0x93e8
      0x93e9

OpenGL ES 2 information:
    version string:  OpenGL ES 3.2 git.c8adbf9.122c9daed32dbba4b3056f41a2f23c58
    renderer string: Mali-G52
    extensions:
      GL_ARM_rgba8
      GL_ARM_mali_shader_binary
      GL_OES_depth24
      GL_OES_depth_texture
      GL_OES_depth_texture_cube_map
      GL_OES_packed_depth_stencil
      GL_OES_rgb8_rgba8
      GL_EXT_read_format_bgra
      GL_OES_compressed_paletted_texture
      GL_OES_compressed_ETC1_RGB8_texture
      GL_OES_standard_derivatives
      GL_OES_EGL_image
      GL_OES_EGL_image_external
      GL_OES_EGL_image_external_essl3
      GL_OES_EGL_sync
      GL_OES_texture_npot
      GL_OES_vertex_half_float
      GL_OES_required_internalformat
      GL_OES_vertex_array_object
      GL_OES_mapbuffer
      GL_EXT_texture_format_BGRA8888
      GL_EXT_texture_rg
      GL_EXT_texture_type_2_10_10_10_REV
      GL_OES_fbo_render_mipmap
      GL_OES_element_index_uint
      GL_EXT_shadow_samplers
      GL_OES_texture_compression_astc
      GL_KHR_texture_compression_astc_ldr
      GL_KHR_texture_compression_astc_hdr
      GL_KHR_texture_compression_astc_sliced_3d
      GL_EXT_texture_compression_astc_decode_mode
      GL_EXT_texture_compression_astc_decode_mode_rgb9e5
      GL_KHR_debug
      GL_EXT_occlusion_query_boolean
      GL_EXT_disjoint_timer_query
      GL_EXT_blend_minmax
      GL_EXT_discard_framebuffer
      GL_OES_get_program_binary
      GL_OES_texture_3D
      GL_EXT_texture_storage
      GL_EXT_multisampled_render_to_texture
      GL_OES_surfaceless_context
      GL_OES_texture_stencil8
      GL_EXT_shader_pixel_local_storage
      GL_ARM_shader_framebuffer_fetch
      GL_ARM_shader_framebuffer_fetch_depth_stencil
      GL_ARM_mali_program_binary
      GL_EXT_sRGB
      GL_EXT_sRGB_write_control
      GL_EXT_texture_sRGB_decode
      GL_EXT_texture_sRGB_R8
      GL_EXT_texture_sRGB_RG8
      GL_KHR_blend_equation_advanced
      GL_KHR_blend_equation_advanced_coherent
      GL_OES_texture_storage_multisample_2d_array
      GL_OES_shader_image_atomic
      GL_EXT_robustness
      GL_EXT_draw_buffers_indexed
      GL_OES_draw_buffers_indexed
      GL_EXT_texture_border_clamp
      GL_OES_texture_border_clamp
      GL_EXT_texture_cube_map_array
      GL_OES_texture_cube_map_array
      GL_OES_sample_variables
      GL_OES_sample_shading
      GL_OES_shader_multisample_interpolation
      GL_EXT_shader_io_blocks
      GL_OES_shader_io_blocks
      GL_EXT_tessellation_shader
      GL_OES_tessellation_shader
      GL_EXT_primitive_bounding_box
      GL_OES_primitive_bounding_box
      GL_EXT_geometry_shader
      GL_OES_geometry_shader
      GL_ANDROID_extension_pack_es31a
      GL_EXT_gpu_shader5
      GL_OES_gpu_shader5
      GL_EXT_texture_buffer
      GL_OES_texture_buffer
      GL_EXT_copy_image
      GL_OES_copy_image
      GL_EXT_shader_non_constant_global_initializers
      GL_EXT_color_buffer_half_float
      GL_EXT_color_buffer_float
      GL_EXT_YUV_target
      GL_OVR_multiview
      GL_OVR_multiview2
      GL_OVR_multiview_multisampled_render_to_texture
      GL_KHR_robustness
      GL_KHR_robust_buffer_access_behavior
      GL_EXT_draw_elements_base_vertex
      GL_OES_draw_elements_base_vertex
      GL_EXT_buffer_storage
      GL_EXT_texture_filter_anisotropic
  main stats:
    max texture size:                 8192
    max cubemap texture size:         4096
    max texture image units:          16
    max renderbuffer size:            8192
    max combined texture image units: 96
    num compressed texture formats:   69
    aliased line width range:         1 - 4096
    aliased point size range:         1 - 1024
    implementation color read format: RGB
    implementation color read type:   unsigned byte
    max viewport dimensions:          8192 x 8192
    subpixel bits:                    8
    supported compressed texture formats:
      PALETTE4_RGB8_OES
      PALETTE4_RGBA8_OES
      PALETTE4_R5_G6_B5_OES
      PALETTE4_RGBA4_OES
      PALETTE4_RGB5_A1_OES
      PALETTE8_RGB8_OES
      PALETTE8_RGBA8_OES
      PALETTE8_R5_G6_B5_OES
      PALETTE8_RGBA4_OES
      PALETTE8_RGB5_A1_OES
      ETC1_RGB8
      0x9274
      0x9275
      0x9278
      0x9279
      0x9276
      0x9277
      0x9270
      0x9272
      0x9271
      0x9273
      COMPRESSED_RGBA_ASTC_4x4_KHR
      COMPRESSED_RGBA_ASTC_5x4_KHR
      COMPRESSED_RGBA_ASTC_5x5_KHR
      COMPRESSED_RGBA_ASTC_6x5_KHR
      COMPRESSED_RGBA_ASTC_6x6_KHR
      COMPRESSED_RGBA_ASTC_8x5_KHR
      COMPRESSED_RGBA_ASTC_8x6_KHR
      COMPRESSED_RGBA_ASTC_8x8_KHR
      COMPRESSED_RGBA_ASTC_10x5_KHR
      COMPRESSED_RGBA_ASTC_10x6_KHR
      COMPRESSED_RGBA_ASTC_10x8_KHR
      COMPRESSED_RGBA_ASTC_10x10_KHR
      COMPRESSED_RGBA_ASTC_12x10_KHR
      COMPRESSED_RGBA_ASTC_12x12_KHR
      COMPRESSED_SRGB8_ALPHA8_ASTC_4x4_KHR
      COMPRESSED_SRGB8_ALPHA8_ASTC_5x4_KHR
      COMPRESSED_SRGB8_ALPHA8_ASTC_5x5_KHR
      COMPRESSED_SRGB8_ALPHA8_ASTC_6x5_KHR
      COMPRESSED_SRGB8_ALPHA8_ASTC_6x6_KHR
      COMPRESSED_SRGB8_ALPHA8_ASTC_8x5_KHR
      COMPRESSED_SRGB8_ALPHA8_ASTC_8x6_KHR
      COMPRESSED_SRGB8_ALPHA8_ASTC_8x8_KHR
      COMPRESSED_SRGB8_ALPHA8_ASTC_10x5_KHR
      COMPRESSED_SRGB8_ALPHA8_ASTC_10x6_KHR
      COMPRESSED_SRGB8_ALPHA8_ASTC_10x8_KHR
      COMPRESSED_SRGB8_ALPHA8_ASTC_10x10_KHR
      COMPRESSED_SRGB8_ALPHA8_ASTC_12x10_KHR
      COMPRESSED_SRGB8_ALPHA8_ASTC_12x12_KHR
      0x93c0
      0x93c1
      0x93c2
      0x93c3
      0x93c4
      0x93c5
      0x93c6
      0x93c7
      0x93c8
      0x93c9
      0x93e0
      0x93e1
      0x93e2
      0x93e3
      0x93e4
      0x93e5
      0x93e6
      0x93e7
      0x93e8
      0x93e9
  shader specific stats:
    max vertex attribs:             16
    max vertex texture image units: 16
    num program binary formats:     1
    num shader binary formats:      1
    max varying vectors:            15
    max vertex uniform vectors:     1024
    max fragment uniform vectors:   1024
    shader compiler:                yes
    supported program binary formats:
      MALI_PROGRAM_BINARY_ARM
    supported shader binary formats:
      MALI_SHADER_BINARY

Kirin 970 (4x16bit being quad ram (ic) = 27.82 GiB/s, 10nm) is G72 (12core). G76 (Q2 2018) is possibly 46% performance add and 178% consumption savings to G72 (2nd gen Bifrost, Q2 2017, like G52, Q1 2018). (GFXBench Manhattan (5.0?)).
G77 (Valhall, 1st gen, Q2 2019) enhances that to ~30%/30% . ]

blu
Posts: 78
Joined: Wed Mar 08, 2017 11:30 pm
languages_spoken: english
ODROIDs: XU4, N2
Has thanked: 3 times
Been thanked: 20 times
Contact:

Re: Micro-benchmarking the N2

Unread post by blu » Sun Aug 04, 2019 8:35 am

I've finally been able to test Jetson Nano at the unit tests posted earlier, and here are the results with respect to N2.
  • Nano has no problem keeping 60fps at 3840x2160 without MSAA.
  • Max available MSAA on Nano is 8 (vs 16 on N2), and there performance takes a hit in the form of jittery fps, averaging to 58-59fps (basically some dropped frames, manifesting to the observer as jittery, non-fluid animations)
  • That has been going via EGL over x11/compiz. Despite my effort to get EGL over wayland going, that path seems quite rough on the Nano right now (think teen fps for the same unit tests), so I'll wait for NV to fix it. Ideally, I'd love to try a 'raw' fbdev path, but I have no clue how to get that on Nano (hints welcome).
  • While compiz compositing may seem like an unfair comparison to N2's fbdev double-buffering, keep in mind Nano has 2.5x the RAM BW of N2, so a hw compositor/blitter would have headroom to act.
In summary, though this Nano-vs-N2 test was not exactly apples-to-apples as I wanted it to be, it still gives indications that G52mp2 might be a more potent performer at these unit tests for the target resolution and MSAA setup. Perhaps I should go offscreen for the next round : )

ps: Turns out the original 59.9fps on the N2 was a red herring -- fps was very stable, but the 4K TV I tested on takes a 59.94Hz signal (grrr, NTSC anachronisms).

[ed] After eliminating the dreaded compositor on the Nano the latter consistently hits 59.9 fps @ 3840x2160 @ MSAA8.

blu
Posts: 78
Joined: Wed Mar 08, 2017 11:30 pm
languages_spoken: english
ODROIDs: XU4, N2
Has thanked: 3 times
Been thanked: 20 times
Contact:

Re: Micro-benchmarking the N2

Unread post by blu » Tue Aug 27, 2019 6:47 am

Got a favor to ask of you, fellow N2'ers, as I'm away from N2 through the end of month, I need somebody with a stable linux to:

Code: Select all

git clone https://github.com/blu/hello-gcclessness
cd hello-gcclessness
as hello.s -o hello.o
ld hello.o -o hello
time taskset 0x3c ./hello # a few times at an idle system
and post back the times. Thanks a bunch!

User avatar
odroid
Site Admin
Posts: 32734
Joined: Fri Feb 22, 2013 11:14 pm
languages_spoken: English
ODROIDs: ODROID
Has thanked: 220 times
Been thanked: 369 times
Contact:

Re: Micro-benchmarking the N2

Unread post by odroid » Tue Aug 27, 2019 9:57 am

I stopped the lightdm (x11 desktop) before running the commands.
Anyway, It's been a long time since I saw ARM assembly code to implement "itoa()" function. :)

Code: Select all

root@odroid:~/hello-gcclessness# time taskset 0x3c ./hello
string_xNN(0x123456789abcdef)
0123456789abcdef

real    0m1.795s
user    0m1.792s
sys     0m0.000s

root@odroid:~/hello-gcclessness# time taskset 0x3c ./hello
string_xNN(0x123456789abcdef)
0123456789abcdef

real    0m1.794s
user    0m1.792s
sys     0m0.000s

root@odroid:~/hello-gcclessness# time taskset 0x3c ./hello
string_xNN(0x123456789abcdef)
0123456789abcdef

real    0m1.794s
user    0m1.792s
sys     0m0.000s
These users thanked the author odroid for the post:
blu (Tue Aug 27, 2019 2:15 pm)

blu
Posts: 78
Joined: Wed Mar 08, 2017 11:30 pm
languages_spoken: english
ODROIDs: XU4, N2
Has thanked: 3 times
Been thanked: 20 times
Contact:

Re: Micro-benchmarking the N2

Unread post by blu » Tue Aug 27, 2019 2:32 pm

Beautiful, thanks, odroid!

CA73 poses a significant improvement over CA72 on this code (measured at ideal conditions -- data in L1 caches, optimal alignment):

Code: Select all

              total time for   funcall/s   chars/s     clocks/char
              2^28 funcalls

CA72 2.1GHz   0m2.462s         109M        1745M       1.20
CA73 1.8GHz   0m1.794s         150M        2394M       0.75
odroid wrote:
Tue Aug 27, 2019 9:57 am
Anyway, It's been a long time since I saw ARM assembly code to implement "itoa()" function. :)
My arm64-gcc-over-armhf sandbox died while I was 'stranded' on a beach with low internet connectivity, so I had to keep myself busy somehow ; )

Cheers!
Last edited by blu on Tue Aug 27, 2019 10:51 pm, edited 3 times in total.
These users thanked the author blu for the post:
odroid (Tue Aug 27, 2019 2:35 pm)

User avatar
mad_ady
Posts: 6891
Joined: Wed Jul 15, 2015 5:00 pm
languages_spoken: english
ODROIDs: XU4, C1+, C2, N1, H2, N2
Location: Bucharest, Romania
Has thanked: 248 times
Been thanked: 181 times
Contact:

Re: Micro-benchmarking the N2

Unread post by mad_ady » Tue Aug 27, 2019 10:06 pm

There might be a N2 on the odroid bench, to keep you busy between cocktails :)

blu
Posts: 78
Joined: Wed Mar 08, 2017 11:30 pm
languages_spoken: english
ODROIDs: XU4, N2
Has thanked: 3 times
Been thanked: 20 times
Contact:

Re: Micro-benchmarking the N2

Unread post by blu » Tue Aug 27, 2019 10:55 pm

mad_ady wrote:
Tue Aug 27, 2019 10:06 pm
There might be a N2 on the odroid bench, to keep you busy between cocktails :)
Ahh, thanks a bunch, I might just as well, if my mobile data plan does not run dry (any day now).

glenswada
Posts: 10
Joined: Sun Sep 01, 2019 3:19 pm
languages_spoken: english
ODROIDs: Odroid N2
Has thanked: 2 times
Been thanked: 1 time
Contact:

Re: Micro-benchmarking the N2

Unread post by glenswada » Thu Sep 05, 2019 5:37 am

I have been recently comparing my new N2 against Udoo x86 which I use as my nginx web server.

My US$174 Udoo x86 does 995 req/s whilst consuming 8.3watts
My $70 Odroid N2 does 965 req/s whilst consuming 2.9 watts

So have decided to move my webserver to odroid N2 given that anything over 200 req/s is going to flood my available bandwidth anyway.

Does anyone want to swap me 2x N2 for a udoo x86? No!, what about one?

blu
Posts: 78
Joined: Wed Mar 08, 2017 11:30 pm
languages_spoken: english
ODROIDs: XU4, N2
Has thanked: 3 times
Been thanked: 20 times
Contact:

Re: Micro-benchmarking the N2

Unread post by blu » Thu Sep 05, 2019 7:37 pm

glenswada wrote:
Thu Sep 05, 2019 5:37 am
Does anyone want to swap me 2x N2 for a udoo x86? No!, what about one?
I have a counter-offer to make -- I'm trading a dual-socket P3 tualatin supermicro board for a N2 ; )

powerful owl
Posts: 110
Joined: Thu Mar 28, 2019 8:57 pm
languages_spoken: english
ODROIDs: 6 x HC1, 3 x H2
Has thanked: 20 times
Been thanked: 9 times
Contact:

Re: Micro-benchmarking the N2

Unread post by powerful owl » Thu Sep 05, 2019 11:51 pm

glenswada wrote:
Thu Sep 05, 2019 5:37 am
Does anyone want to swap me 2x N2 for a udoo x86? No!, what about one?
What country are you in?

glenswada
Posts: 10
Joined: Sun Sep 01, 2019 3:19 pm
languages_spoken: english
ODROIDs: Odroid N2
Has thanked: 2 times
Been thanked: 1 time
Contact:

Re: Micro-benchmarking the N2

Unread post by glenswada » Fri Sep 06, 2019 11:01 am

powerful owl wrote:
Thu Sep 05, 2019 11:51 pm
glenswada wrote:
Thu Sep 05, 2019 5:37 am
Does anyone want to swap me 2x N2 for a udoo x86? No!, what about one?
What country are you in?
Melbourne, Australia.

powerful owl
Posts: 110
Joined: Thu Mar 28, 2019 8:57 pm
languages_spoken: english
ODROIDs: 6 x HC1, 3 x H2
Has thanked: 20 times
Been thanked: 9 times
Contact:

Re: Micro-benchmarking the N2

Unread post by powerful owl » Fri Sep 06, 2019 12:12 pm

Hey let's trade. Will send a PM.

glenswada
Posts: 10
Joined: Sun Sep 01, 2019 3:19 pm
languages_spoken: english
ODROIDs: Odroid N2
Has thanked: 2 times
Been thanked: 1 time
Contact:

Re: Micro-benchmarking the N2

Unread post by glenswada » Fri Sep 06, 2019 12:40 pm

powerful owl wrote:
Fri Sep 06, 2019 12:12 pm
Hey let's trade. Will send a PM.
Cannot pm too knew. I am after 4gb N2 so no go. I will put it on ebay on this weekend or next.

Sorry I brought this up as its not what the thread is about.

glenswada
Posts: 10
Joined: Sun Sep 01, 2019 3:19 pm
languages_spoken: english
ODROIDs: Odroid N2
Has thanked: 2 times
Been thanked: 1 time
Contact:

Re: Micro-benchmarking the N2

Unread post by glenswada » Sun Sep 08, 2019 8:56 am

powerful owl wrote:
Fri Sep 06, 2019 12:12 pm
Hey let's trade. Will send a PM.
On ebay now you never know $1 bid might do it!

blu
Posts: 78
Joined: Wed Mar 08, 2017 11:30 pm
languages_spoken: english
ODROIDs: XU4, N2
Has thanked: 3 times
Been thanked: 20 times
Contact:

Re: Micro-benchmarking the N2

Unread post by blu » Sat Sep 21, 2019 8:44 am

Mali-T860mp4 vs Mali-G52mp6 vs 4x CA73 vs 4x CA73 + 2x CA53 vs 4x CA72 -- OCL1.2 raycasting showdown:

Code: Select all

NanoPi-M4: Rockchip RK3399:
	2x Cortex-A72 @ 1.8GHz, 4x Cortex-A53 @ 1.42GHz, Mali-T860 MP4 @ 800MHz, 4GB LPDDR3 x64 @ 1600MT/s (12.8GB/s)

$ taskset 0x30 ./problem_4 -frames 100 # Mali-T860
kernel preferred workgroup size multiple: 4
device max work-item sizes: 256 256 256
total frames rendered: 100
elapsed time: 2.236905 s
average FPS: 44.704617

Code: Select all

Odroid-N2: Amlogic S922X:
	4x Cortex-A73 @ 1.8GHz, 2x Cortex-A53 @ 1.896GHz, Mali-G52 MP2 (2x3) @ 750MHz, 4GB LPDDR4 x32 @ 2640MT/s (10.56GB/s)

$ taskset 0x3c ./problem_4 -frames 100 # Mali-G52
kernel preferred workgroup size multiple: 8
device max work-item sizes: 384 384 384
total frames rendered: 100
elapsed time: 1.798982 s
average FPS: 55.586994

$ LD_LIBRARY_PATH=/usr/local/lib/ taskset 0x3c ./problem_4 -device 0 -frames 100 # 4x CA73, pocl1.3
kernel preferred workgroup size multiple: 8
device max work-item sizes: 4096 4096 4096
total frames rendered: 100
elapsed time: 3.310997 s
average FPS: 30.202380

$ LD_LIBRARY_PATH=/usr/local/lib/ ./problem_4 -device 0 -frames 100 # 4x CA73 + 2x CA53, pocl1.3
kernel preferred workgroup size multiple: 8
device max work-item sizes: 4096 4096 4096
total frames rendered: 100
elapsed time: 2.424427 s
average FPS: 41.246854

Code: Select all

MacchiatoBin: Marvell ARMADA 8040:
	4x Cortex-A72 @ 2.0GHz, 16GB DDR4 x64 @ 2400MT/s (19.2GB/s)

$ LD_LIBRARY_PATH=/usr/local/lib ./problem_4 -device 0 -frames 100 # pocl1.3
kernel preferred workgroup size multiple: 8
device max work-item sizes: 4096 4096 4096
total frames rendered: 100
elapsed time: 2.843099 s
average FPS: 35.172892
These users thanked the author blu for the post (total 2):
hominoid (Sat Sep 21, 2019 11:32 pm) • odroid (Mon Sep 23, 2019 8:19 am)

blu
Posts: 78
Joined: Wed Mar 08, 2017 11:30 pm
languages_spoken: english
ODROIDs: XU4, N2
Has thanked: 3 times
Been thanked: 20 times
Contact:

Re: Micro-benchmarking the N2

Unread post by blu » Mon Sep 30, 2019 7:40 am

Fun fact: at the same test a discrete 30W (GPU + RAM), 366 GFLOPS NVIDIA GK208B from 2014, 28nm, with dedicated GDDR5 worth of 30GB/s does only 3x the fps of Mali-G52, as found in the N2 (i.e. rated at 72 GFLOPS).

Post Reply

Return to “General Topics”

Who is online

Users browsing this forum: No registered users and 0 guests