ARM fun stuff

Post Reply
ab1jx
Posts: 67
Joined: Wed Jul 10, 2019 8:25 am
languages_spoken: english
Has thanked: 7 times
Been thanked: 2 times
Contact:

ARM fun stuff

Unread post by ab1jx » Mon Nov 04, 2019 10:51 pm

If you're interested in programming anyway

https://git.mlplatform.org/

The Compute Library is there, and NN (Neural Networks). It looks pretty fresh too.
armml.png
armml.png (39.51 KiB) Viewed 842 times

User avatar
rooted
Posts: 6769
Joined: Fri Dec 19, 2014 9:12 am
languages_spoken: english
Location: Gulf of Mexico, US
Has thanked: 216 times
Been thanked: 41 times
Contact:

Re: ARM fun stuff

Unread post by rooted » Tue Nov 05, 2019 1:24 am

If you use it for anything please share the results.

ab1jx
Posts: 67
Joined: Wed Jul 10, 2019 8:25 am
languages_spoken: english
Has thanked: 7 times
Been thanked: 2 times
Contact:

Re: ARM fun stuff

Unread post by ab1jx » Tue Nov 05, 2019 2:00 am

I also found https://github.com/biotrump/Mali_OpenCL_SDK which says it's for a Mali T600 series. All the samples compile (on my N2), once I set the g++ and ar to something that actually exists in the platform.mk file just inside the tarball. Not sure if it's actually running OpenCL or not. It made this Madelbrot image as a 2-color 4096x3280 bmp in:

Code: Select all

Profiling information:
Queued time: 	0.082458ms
Wait time: 	0.887401ms
Run time: 	274.649ms
output_4c_qsize.gif
output_4c_qsize.gif (13.74 KiB) Viewed 807 times
I had to resize it smaller to post here.

A couple of the examples run but make all-black bmp output files.

I do have installed:

Code: Select all

dpkg-query -l | grep -i opencl
ii  clinfo                                2.2.18.04.06-1                      arm64        Query OpenCL system information
ii  malig52-fbdev-opencl-odroid           20190330-r12p0-2                    arm64        Mali binary blob and development headers (ODROID build)
because I was interested in mining cryptocurrency with it. Not sure if the SDK uses that or not. The clinfo says OpenCL wasn't found but I think it looks for a specific OpenCL setup. I do have a libMali.so, not sure where it came from:
-rw-r--r-- 1 root root 45800368 Mar 27 2019 libMali.so
And

Code: Select all

ii  glmark2-es2-fbdev-odroid              20190710-1+deb10                    arm64        OpenGL ES 2.0 fbdev benchmark (ODROID build)
which works.

glmark2-es2-fbdev outputs

Code: Select all

=======================================================
    glmark2 2012.12
=======================================================
    OpenGL Information
    GL_VENDOR:     ARM
    GL_RENDERER:   Mali-G52
    GL_VERSION:    OpenGL ES 3.2 git.c8adbf9.122c9daed32dbba4b3056f41a2f23c58
=======================================================
[build] use-vbo=false: FPS: 59 FrameTime: 16.949 ms
[build] use-vbo=true: FPS: 59 FrameTime: 16.949 ms
[texture] texture-filter=nearest: FPS: 59 FrameTime: 16.949 ms
[texture] texture-filter=linear: FPS: 59 FrameTime: 16.949 ms
[texture] texture-filter=mipmap: FPS: 59 FrameTime: 16.949 ms
[shading] shading=gouraud: FPS: 59 FrameTime: 16.949 ms
[shading] shading=blinn-phong-inf: FPS: 58 FrameTime: 17.241 ms
[shading] shading=phong: FPS: 59 FrameTime: 16.949 ms
[bump] bump-render=high-poly: FPS: 59 FrameTime: 16.949 ms
[bump] bump-render=normals: FPS: 59 FrameTime: 16.949 ms
[bump] bump-render=height: FPS: 59 FrameTime: 16.949 ms
libpng warning: iCCP: known incorrect sRGB profile
[effect2d] kernel=0,1,0;1,-4,1;0,1,0;: FPS: 59 FrameTime: 16.949 ms
libpng warning: iCCP: known incorrect sRGB profile
[effect2d] kernel=1,1,1,1,1;1,1,1,1,1;1,1,1,1,1;: FPS: 57 FrameTime: 17.544 ms
[pulsar] light=false:quads=5:texture=false: FPS: 59 FrameTime: 16.949 ms
libpng warning: iCCP: known incorrect sRGB profile
[desktop] blur-radius=5:effect=blur:passes=1:separable=true:windows=4: FPS: 58 FrameTime: 17.241 ms
libpng warning: iCCP: known incorrect sRGB profile
[desktop] effect=shadow:windows=4: FPS: 59 FrameTime: 16.949 ms
[buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 57 FrameTime: 17.544 ms
[buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=subdata: FPS: 57 FrameTime: 17.544 ms
[buffer] columns=200:interleave=true:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 58 FrameTime: 17.241 ms
[ideas] speed=duration: FPS: 59 FrameTime: 16.949 ms
[jellyfish] <default>: FPS: 59 FrameTime: 16.949 ms
[terrain] <default>: FPS: 26 FrameTime: 38.462 ms
[shadow] <default>: FPS: 58 FrameTime: 17.241 ms
[refract] <default>: FPS: 46 FrameTime: 21.739 ms
[conditionals] fragment-steps=0:vertex-steps=0: FPS: 59 FrameTime: 16.949 ms
[conditionals] fragment-steps=5:vertex-steps=0: FPS: 59 FrameTime: 16.949 ms
[conditionals] fragment-steps=0:vertex-steps=5: FPS: 59 FrameTime: 16.949 ms
[function] fragment-complexity=low:fragment-steps=5: FPS: 59 FrameTime: 16.949 ms
[function] fragment-complexity=medium:fragment-steps=5: FPS: 59 FrameTime: 16.949 ms
[loop] fragment-loop=false:fragment-steps=5:vertex-steps=5: FPS: 59 FrameTime: 16.949 ms
[loop] fragment-steps=5:fragment-uniform=false:vertex-steps=5: FPS: 58 FrameTime: 17.241 ms
[loop] fragment-steps=5:fragment-uniform=true:vertex-steps=5: FPS: 59 FrameTime: 16.949 ms
=======================================================
                                  glmark2 Score: 57 
=======================================================
What I wonder about is that the compute library docs say:

Code: Select all

├── opencl-1.2-stubs
│   └── opencl_stubs.c --> OpenCL stubs implementation
├── opengles-3.1-stubs
│   ├── EGL.c --> EGL stubs implementation
│   └── GLESv2.c --> GLESv2 stubs implementation
Why are they just stubs? Does having this libMali work around that?
These users thanked the author ab1jx for the post:
rooted (Tue Nov 05, 2019 4:02 am)

User avatar
memeka
Posts: 4366
Joined: Mon May 20, 2013 10:22 am
languages_spoken: english
ODROIDs: XU rev2 + eMMC + UART
U3 + eMMC + IO Shield + UART
Has thanked: 1 time
Been thanked: 39 times
Contact:

Re: ARM fun stuff

Unread post by memeka » Tue Nov 05, 2019 9:08 am

They are stubs, because the actual implementation is not in libGLESv2 but in libMali.so

ab1jx
Posts: 67
Joined: Wed Jul 10, 2019 8:25 am
languages_spoken: english
Has thanked: 7 times
Been thanked: 2 times
Contact:

Re: ARM fun stuff

Unread post by ab1jx » Tue Nov 05, 2019 9:50 am

Hmm, OK, I had read somewhere that for the Malis the kernel routines were free and open source, but the userland stuff was proprietary and only available if you spend thousands of dollars on a license, which the average user wouldn't do. Or something like that.

I have a libMali.so, not sure how I got it, it's 45,800,368 bytes. I read somewhere that there's one libMali.so for Wayland but X11 needs a different one. What if you're doing bare metal? I'm not sure that makes sense.

Anyway I ran nm on my libMali.so into a text file then tried doing grep -r on some bunches of include files to look up function names. Not much matches, as much matches some Rockchip Mali headers as anything. The function names I found by using nm look interesting. Some gl*, some egl*, gles*, cl*. A bunch of wasted time really but I thought if I had the library and matching headers I could do something.

ab1jx
Posts: 67
Joined: Wed Jul 10, 2019 8:25 am
languages_spoken: english
Has thanked: 7 times
Been thanked: 2 times
Contact:

Re: ARM fun stuff

Unread post by ab1jx » Tue Nov 05, 2019 1:54 pm

I installed the compute library finally, got the OpenCL working well enough to run sgminer. Not very worthwhile, it's about as fast as a couple threads of cpuminer. But it seems to work and it's a point to experiment from. Or maybe it's not using the Mali, I installed some OpenCL stuff from Synaptic.

I thought for OpenCL there was supposed to be a file /etc/OpenCL/vendors, I don't have that. I'll try taking the OpenCL debs back out and see if that breaks it. I see:

Code: Select all

dpkg-query -l | grep -i opencl
ii  clinfo                                2.2.18.04.06-1                      arm64        Query OpenCL system information
ii  malig52-fbdev-opencl-odroid           20190330-r12p0-2                    arm64        Mali binary blob and development headers (ODROID build)
ii  opencl-c-headers                      2.2~2019.01.17-g49f07d3-1           all          OpenCL (Open Computing Language) C header files
ii  opencl-clhpp-headers                  2.0.10+git26-g806646c-1             all          C++ headers for OpenCL development
ii  opencl-headers                        2.2~2019.01.17-g49f07d3-1           all          OpenCL (Open Computing Language) header files
And litecoinpool isn't accepting any of the work I do. Using https://github.com/hominoids/sgminer-arm to try to mine.

User avatar
odroid
Site Admin
Posts: 32508
Joined: Fri Feb 22, 2013 11:14 pm
languages_spoken: English
ODROIDs: ODROID
Has thanked: 181 times
Been thanked: 349 times
Contact:

Re: ARM fun stuff

Unread post by odroid » Tue Nov 05, 2019 6:49 pm

The following two threads might be helpful to learn the Mali G52 GPU stuff.
viewtopic.php?f=176&t=34020
viewtopic.php?f=176&t=34949
These users thanked the author odroid for the post:
rooted (Tue Nov 05, 2019 8:27 pm)

hominoid
Posts: 316
Joined: Tue Feb 28, 2017 3:55 am
languages_spoken: english
ODROIDs: C2, XU4, MC1, N1, N2
Location: Lake Superior Basin, USA
Has thanked: 11 times
Been thanked: 23 times
Contact:

Re: ARM fun stuff

Unread post by hominoid » Wed Nov 06, 2019 12:24 am

sgminer-arm is out of date for any of the crypto coins that have had block chain algorithm changes since April 2018. FYI, there is also a thread discussion that covers mining gridcoin with boinc, towards the end if that interests you.

ab1jx
Posts: 67
Joined: Wed Jul 10, 2019 8:25 am
languages_spoken: english
Has thanked: 7 times
Been thanked: 2 times
Contact:

Re: ARM fun stuff

Unread post by ab1jx » Wed Nov 06, 2019 3:11 am

Well, it was partly an OpenCL test. I don't think I've ever had sgminer working with a GPU before. cgminer with small/old ASICs yes, I have 3 of them going now. I still use Pooler's cpuminer at times mostly to keep my network cards awake.

I just dumped the clinfo from the debs (which seems to expect OpenCL 1.0) and installed this one: https://github.com/simleb/clinfo There seem to be a few different ones. This says:

Code: Select all

./clinfo
Platform #0
  Name:                                  ARM Platform
  Version:                               OpenCL 2.0 git.c8adbf9.122c9daed32dbba4b3056f41a2f23c58

  Device #0
    Name:                                Mali-G52
    Type:                                GPU
    Version:                             OpenCL 2.0 git.c8adbf9.122c9daed32dbba4b3056f41a2f23c58
    Global memory size:                  3 GB 641 MB 44 kB 
    Local memory size:                   32 kB 
    Max work group size:                 384
    Max work item sizes:                 (384, 384, 384)
The sgminer I haven't tried to do any tuning on, and I am on an Odroid N2 which isn't what it's expecting. But mining Litecoin it's slower than 2 threads of Cpuminer on a Raspberry Pi. I let this run for 1/2 hour or so, the pool drops the difficulty to 256 and it sits there. The pool's stats page shows I have a worker 2 but doesn't give any hash rate.
sgminer_running.gif
sgminer_running.gif (21.02 KiB) Viewed 670 times
I had no idea what to expect for performance. But I just installed ARM's Compute Library so I'm playing with it.

hominoid
Posts: 316
Joined: Tue Feb 28, 2017 3:55 am
languages_spoken: english
ODROIDs: C2, XU4, MC1, N1, N2
Location: Lake Superior Basin, USA
Has thanked: 11 times
Been thanked: 23 times
Contact:

Re: ARM fun stuff

Unread post by hominoid » Wed Nov 06, 2019 3:59 am

ab1jx wrote:
Wed Nov 06, 2019 3:11 am
I just dumped the clinfo from the debs (which seems to expect OpenCL 1.0) and installed this one: https://github.com/simleb/clinfo There seem to be a few different ones.
FYI, I have not had any problem with clinfo reporting correctly using the current version from the repositories on HK images.
ab1jx wrote:
Wed Nov 06, 2019 3:11 am
I let this run for 1/2 hour or so, the pool drops the difficulty to 256 and it sits there. The pool's stats page shows I have a worker 2 but doesn't give any hash rate.
On these small devices, that also has been my experience when the GPU hardware is mismatched with high hash rate block chains. It can take 15-60 minutes just to generate a valid share to submit and generally, unless valid shares are being submitted the pool won't show a hash rate. When using small devices I usually focus on new or low difficulty coins were the device is better matched to the block chain hash rate.

ab1jx
Posts: 67
Joined: Wed Jul 10, 2019 8:25 am
languages_spoken: english
Has thanked: 7 times
Been thanked: 2 times
Contact:

Re: ARM fun stuff

Unread post by ab1jx » Wed Nov 06, 2019 6:41 am

The version of clinfo in the debs tells me:

Code: Select all

clinfo
clinfo: /usr/lib/aarch64-linux-gnu/libOpenCL.so.1: version `OPENCL_1.0' not found (required by clinfo)

even:
clinfo -v
clinfo: /usr/lib/aarch64-linux-gnu/libOpenCL.so.1: version `OPENCL_1.0' not found (required by clinfo)

and
ls -la | grep libOpenCL
lrwxrwxrwx  1 root root       10 Nov  5 09:21 libOpenCL.so -> libMali.so
lrwxrwxrwx  1 root root       10 Nov  5 09:25 libOpenCL.so.1 -> libMali.so
If I look at my libMali.so AKA libOpenCL.so.1 with the mc file viewer I can't find the exact string "OPENCL_1.0" in there anywhere, only later versions.

OK, I can try letting sgminer run overnight.

hominoid
Posts: 316
Joined: Tue Feb 28, 2017 3:55 am
languages_spoken: english
ODROIDs: C2, XU4, MC1, N1, N2
Location: Lake Superior Basin, USA
Has thanked: 11 times
Been thanked: 23 times
Contact:

Re: ARM fun stuff

Unread post by hominoid » Wed Nov 06, 2019 7:43 am

I think something is wrong with your OpenCL setup.

Code: Select all

hominoid@odroid-n2:~$ uname -a
Linux odroid-n2 4.9.196-63 #1 SMP PREEMPT Thu Oct 17 00:44:03 -03 2019 aarch64 aarch64 aarch64 GNU/Linux

Code: Select all

hominoid@odroid-n2:~$ cd /usr/lib/aarch64-linux-gnu/
hominoid@odroid-n2:/usr/lib/aarch64-linux-gnu$ ls -la | grep libOpenCL
lrwxrwxrwx   1 root     root           10 Sep 18 18:20 libOpenCL.so -> libMali.so
lrwxrwxrwx   1 root     root           18 Apr  5  2017 libOpenCL.so.1 -> libOpenCL.so.1.0.0
-rw-r--r--   1 root     root        34808 Apr  5  2017 libOpenCL.so.1.0.0

Code: Select all

hominoid@odroid-n2:~$ clinfo
Number of platforms                               3
  Platform Name                                   ARM Platform
  Platform Vendor                                 ARM
  Platform Version                                OpenCL 2.0 git.c8adbf9.122c9daed32dbba4b3056f41a2f23c58
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_3d_image_writes cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_fp16 cl_khr_icd cl_khr_egl_image cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_subgroups cl_khr_create_command_queue cl_arm_core_id cl_arm_printf cl_arm_thread_limit_hint cl_arm_non_uniform_work_group_size cl_arm_import_memory cl_arm_shared_virtual_memory
  Platform Extensions function suffix             ARM

  Platform Name                                   Portable Computing Language
  Platform Vendor                                 The pocl project
  Platform Version                                OpenCL 1.2 pocl 1.3 Debug+Asserts, LLVM 6.0.0, SLEEF, POCL_DEBUG, FP16
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_icd
  Platform Extensions function suffix             POCL

  Platform Name                                   Clover
  Platform Vendor                                 Mesa
  Platform Version                                OpenCL 1.1 Mesa 19.0.8
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_icd
  Platform Extensions function suffix             MESA

  Platform Name                                   ARM Platform
Number of devices                                 1
  Device Name                                     Mali-G52
  Device Vendor                                   ARM
  Device Vendor ID                                0x72120000
  Device Version                                  OpenCL 2.0 git.c8adbf9.122c9daed32dbba4b3056f41a2f23c58
  Driver Version                                  2.0
  Device OpenCL C Version                         OpenCL C 2.0 git.c8adbf9.122c9daed32dbba4b3056f41a2f23c58
  Device Type                                     GPU
  Device Profile                                  FULL_PROFILE
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Max compute units                               2
  Max clock frequency                             750MHz
  Device Partition                                (core)
    Max number of sub-devices                     0
    Supported partition types                     None
  Max work item dimensions                        3
  Max work item sizes                             384x384x384
  Max work group size                             384
  Preferred work group size multiple              8
  Preferred / native vector sizes                 
    char                                                16 / 4       
    short                                                8 / 2       
    int                                                  4 / 1       
    long                                                 2 / 1       
    half                                                 8 / 2        (cl_khr_fp16)
    float                                                4 / 1       
    double                                               0 / 0        (n/a)
  Half-precision Floating-point support           (cl_khr_fp16)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
  Single-precision Floating-point support         (core)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Double-precision Floating-point support         (n/a)
  Address bits                                    64, Little-Endian
  Global memory size                              3887570944 (3.621GiB)
  Error Correction support                        No
  Max memory allocation                           971892736 (926.9MiB)
  Unified memory for Host and Device              Yes
  Shared Virtual Memory (SVM) capabilities        (core)
    Coarse-grained buffer sharing                 Yes
    Fine-grained buffer sharing                   No
    Fine-grained system sharing                   No
    Atomics                                       No
  Shared Virtual Memory (SVM) capabilities (ARM)  
    Coarse-grained buffer sharing                 Yes
    Fine-grained buffer sharing                   No
    Fine-grained system sharing                   No
    Atomics                                       No
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       1024 bits (128 bytes)
  Preferred alignment for atomics                 
    SVM                                           0 bytes
    Global                                        0 bytes
    Local                                         0 bytes
  Max size for global variable                    65536 (64KiB)
  Preferred total size of global vars             0
  Global Memory cache type                        Read/Write
  Global Memory cache size                        131072 (128KiB)
  Global Memory cache line size                   64 bytes
  Image support                                   Yes
    Max number of samplers per kernel             16
    Max size for 1D images from buffer            65536 pixels
    Max 1D or 2D image array size                 2048 images
    Base address alignment for 2D image buffers   32 bytes
    Pitch alignment for 2D image buffers          64 pixels
    Max 2D image size                             65536x65536 pixels
    Max 3D image size                             65536x65536x65536 pixels
    Max number of read image args                 128
    Max number of write image args                64
    Max number of read/write image args           64
  Max number of pipe args                         16
  Max active pipe reservations                    1
  Max pipe packet size                            1024
  Local memory type                               Global
  Local memory size                               32768 (32KiB)
  Max number of constant args                     8
  Max constant buffer size                        65536 (64KiB)
  Max size of kernel argument                     1024
  Queue properties (on host)                      
    Out-of-order execution                        Yes
    Profiling                                     Yes
  Queue properties (on device)                    
    Out-of-order execution                        Yes
    Profiling                                     Yes
    Preferred size                                2097152 (2MiB)
    Max size                                      16777216 (16MiB)
  Max queues on device                            1
  Max events on device                            1024
  Prefer user sync for interop                    No
  Profiling timer resolution                      1000ns
  Execution capabilities                          
    Run OpenCL kernels                            Yes
    Run native kernels                            No
  printf() buffer size                            1048576 (1024KiB)
  Built-in kernels                                
  Device Extensions                               cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_3d_image_writes cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_fp16 cl_khr_icd cl_khr_egl_image cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_subgroups cl_khr_create_command_queue cl_arm_core_id cl_arm_printf cl_arm_thread_limit_hint cl_arm_non_uniform_work_group_size cl_arm_import_memory cl_arm_shared_virtual_memory

  Platform Name                                   Portable Computing Language
Number of devices                                 1
  Device Name                                     pthread-cortex-a53
  Device Vendor                                   ARM
  Device Vendor ID                                0x13b5
  Device Version                                  OpenCL 1.2 pocl HSTR: pthread-aarch64-unknown-linux-gnu-cortex-a73
  Driver Version                                  1.3
  Device OpenCL C Version                         OpenCL C 1.2 pocl
  Device Type                                     CPU
  Device Profile                                  FULL_PROFILE
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Max compute units                               6
  Max clock frequency                             1896MHz
  Device Partition                                (core)
    Max number of sub-devices                     6
    Supported partition types                     equally, by counts
  Max work item dimensions                        3
  Max work item sizes                             4096x4096x4096
  Max work group size                             4096
  Preferred work group size multiple              8
  Preferred / native vector sizes                 
    char                                                16 / 16      
    short                                                8 / 8       
    int                                                  4 / 4       
    long                                                 2 / 2       
    half                                                 8 / 8        (cl_khr_fp16)
    float                                                4 / 4       
    double                                               2 / 2        (cl_khr_fp64)
  Half-precision Floating-point support           (cl_khr_fp16)
    Denormals                                     No
    Infinity and NANs                             No
    Round to nearest                              No
    Round to zero                                 No
    Round to infinity                             No
    IEEE754-2008 fused multiply-add               No
    Support is emulated in software               No
  Single-precision Floating-point support         (core)
    Denormals                                     No
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 No
    Round to infinity                             No
    IEEE754-2008 fused multiply-add               No
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
  Address bits                                    64, Little-Endian
  Global memory size                              2919413760 (2.719GiB)
  Error Correction support                        No
  Max memory allocation                           1073741824 (1024MiB)
  Unified memory for Host and Device              Yes
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       1024 bits (128 bytes)
  Global Memory cache type                        None
  Image support                                   Yes
    Max number of samplers per kernel             16
    Max size for 1D images from buffer            67108864 pixels
    Max 1D or 2D image array size                 2048 images
    Max 2D image size                             8192x8192 pixels
    Max 3D image size                             2048x2048x2048 pixels
    Max number of read image args                 128
    Max number of write image args                128
  Local memory type                               Global
  Local memory size                               16777216 (16MiB)
  Max number of constant args                     8
  Max constant buffer size                        16777216 (16MiB)
  Max size of kernel argument                     1024
  Queue properties                                
    Out-of-order execution                        No
    Profiling                                     Yes
  Prefer user sync for interop                    Yes
  Profiling timer resolution                      1ns
  Execution capabilities                          
    Run OpenCL kernels                            Yes
    Run native kernels                            Yes
  printf() buffer size                            16777216 (16MiB)
  Built-in kernels                                
  Device Extensions                               cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr_fp16 cl_khr_fp64

  Platform Name                                   Clover
Number of devices                                 0

NULL platform behavior
  clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  ARM Platform
  clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   Success [ARM]
  clCreateContext(NULL, ...) [default]            Success [ARM]
  clCreateContext(NULL, ...) [other]              Success [POCL]
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT)  Success (1)
    Platform Name                                 ARM Platform
    Device Name                                   Mali-G52
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  Success (1)
    Platform Name                                 ARM Platform
    Device Name                                   Mali-G52
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  Success (1)
    Platform Name                                 ARM Platform
    Device Name                                   Mali-G52

ICD loader properties
  ICD loader Name                                 OpenCL ICD Loader
  ICD loader Vendor                               OCL Icd free software
  ICD loader Version                              2.2.11
  ICD loader Profile                              OpenCL 2.1
NOTE: The clinfo results show OpenCL 2.0 for the G-52 and also OpenCL 1.2 for pocl1.3(OpenCL for the CPU cores). The last dozen thread entries or so in the first link @odroid gave is were you can find information on pocl installation.

Be aware there is a release note for the latest HK Ubuntu image regarding OpenCL setup due to a dependency issue.

I think clinfo needs to report correctly before trying to get any valid OpenCL work through sgminer-arm or any other application.

ab1jx
Posts: 67
Joined: Wed Jul 10, 2019 8:25 am
languages_spoken: english
Has thanked: 7 times
Been thanked: 2 times
Contact:

Re: ARM fun stuff

Unread post by ab1jx » Wed Nov 06, 2019 10:41 am

hominoid wrote:
Wed Nov 06, 2019 7:43 am
I think something is wrong with your OpenCL setup.
Quite likely, I said this was partly an OpenCL test. I know for OpenGL and probably OpenCL there's stuff in the debs you DON'T want to have around because you don't want any software emulation. So I've tried to keep that minimal and later uninstalled most of it. malig52-fbdev-opencl-odroid is the source of my libMali.so file I think. Most things seem to link to it as a library, it's a big 45 MB file that has some of everything almost. Then yesterday I installed ARM's compute library which so far is sacred and infallible. I haven't done any tuning of it like https://arm-software.github.io/ComputeL ... 9_cl_tuner I just ran scons again with different settings but I still don't have arm_compute_benchmark anywhere (by locate at least).
hominoid wrote:
Wed Nov 06, 2019 7:43 am

Code: Select all

hominoid@odroid-n2:~$ uname -a
Linux odroid-n2 4.9.196-63 #1 SMP PREEMPT Thu Oct 17 00:44:03 -03 2019 aarch64 aarch64 aarch64 GNU/Linux
Mine's a couple weeks older, I thought updates and upgrades should replace it: You're using Ubuntu? That could be that difference.

Code: Select all

Linux odroid 4.9.190+ #1 SMP PREEMPT Wed Sep 4 08:20:28 CEST 2019 aarch64 GNU/Linux
hominoid wrote:
Wed Nov 06, 2019 7:43 am

Code: Select all

hominoid@odroid-n2:~$ cd /usr/lib/aarch64-linux-gnu/
hominoid@odroid-n2:/usr/lib/aarch64-linux-gnu$ ls -la | grep libOpenCL
lrwxrwxrwx   1 root     root           10 Sep 18 18:20 libOpenCL.so -> libMali.so
lrwxrwxrwx   1 root     root           18 Apr  5  2017 libOpenCL.so.1 -> libOpenCL.so.1.0.0
-rw-r--r--   1 root     root        34808 Apr  5  2017 libOpenCL.so.1.0.0
That's interesting, your libOpenCL.so.1.0.0 is a real file, not just a symlink to libMali.so. I have a symlink to libMali, you only have one on libOpenCL.so.
hominoid wrote:
Wed Nov 06, 2019 7:43 am

NOTE: The clinfo results show OpenCL 2.0 for the G-52 and also OpenCL 1.2 for pocl1.3(OpenCL for the CPU cores). The last dozen thread entries or so in the first link @odroid gave is were you can find information on pocl installation.
Nope, I get:

Code: Select all

clinfo
clinfo: /usr/lib/aarch64-linux-gnu/libOpenCL.so.1: version `OPENCL_1.0' not found (required by clinfo)
I did:
grep --binary-files=text "OPENCL_" libMali.so > /tmp/junk4.txt
and got

Code: Select all

����vector::reserve�Failed to open directory '������bifrost32�������raw�����__builtin_inff��__builtin_nanf��__builtin_uadd_overflow�__builtin_usub_overflow�__builtin_umul_overflow�pointer-arith���__builtin_va_start������__builtin_va_copy�������__builtin_va_end��������__builtin_va_arg��������enqueue_kernel��get_kernel_work_group_size������get_kernel_preferred_work_group_size_multiple���get_kernel_max_sub_group_size_for_ndrange�������get_kernel_sub_group_count_for_ndrange��to_local��������to_private������to_global�������commit_read_pipe��������reserve_read_pipe�������work_group_commit_read_pipe�����work_group_reserve_read_pipe����sub_group_commit_read_pipe������sub_group_reserve_read_pipe�����commit_write_pipe�������reserve_write_pipe������work_group_commit_write_pipe����work_group_reserve_write_pipe���sub_group_commit_write_pipe�����sub_group_reserve_write_pipe����get_pipe_max_packets����get_pipe_num_packets����Failed to handle define build options���Failed to handle include build options��#define __OPENCL_VERSION__ CL_VERSION_2_0
������CL_VERSION_1_1��CL_VERSION_1_2��CL_VERSION_2_0��#define __OPENCL_C_VERSION__ ���typedef unsigned long size_t;���typedef signed long intptr_t;���typedef unsigned long uintptr_t;
#ifdef __OPENCL_C_VERSION__
#ifdef __OPENCL_C_VERSION__
#ifndef __OPENCL_C_VERSION__
#if !defined(__OPENCL_C_VERSION__) || (__OPENCL_C_VERSION__ >= CL_VERSION_2_0)
#if defined (__OPENCL_C_VERSION__) &&  __OPENCL_C_VERSION__ >= CL_VERSION_2_0
#if defined(__OPENCL_C_VERSION__) && __OPENCL_C_VERSION__ >= CL_VERSION_2_0
#endif /* !defined(__OPENCL_C_VERSION__) || (__OPENCL_C_VERSION__ >= CL_VERSION_2_0) */
#if __OPENCL_C_VERSION__ >= CL_VERSION_1_2
#if __OPENCL_C_VERSION__ >= CL_VERSION_1_2
#if __OPENCL_C_VERSION__ >= CL_VERSION_2_0
#if __OPENCL_C_VERSION__ >= CL_VERSION_2_0
#if (__OPENCL_C_VERSION__ >= CL_VERSION_2_0) || defined(__CLCC_ENABLE_CL20_ATOMICS__)
#if __OPENCL_C_VERSION__ >= CL_VERSION_2_0
#if __OPENCL_C_VERSION__ >= CL_VERSION_2_0
#if __OPENCL_C_VERSION__ >= CL_VERSION_2_0
So there's no "OPENCL_1.0" string in there.
hominoid wrote:
Wed Nov 06, 2019 7:43 am
Be aware there is a release note for the latest HK Ubuntu image regarding OpenCL setup due to a dependency issue.
That's interesting but I use Debian, not Ubuntu so I didn't see it. I don't seem to have mali-fbdev (by locate again) except as sources in connection with glmark2. Apt-get install doesn't find it. In Debian there's no acceleration yet but I heard Ubuntu had it. I wonder how much of a mess I'd end up with if I tried to use the Ubuntu deb. I've done that in one or two other cases I think.
hominoid wrote:
Wed Nov 06, 2019 7:43 am
I think clinfo needs to report correctly before trying to get any valid OpenCL work through sgminer-arm or any other application.
I'm probably more interested in OpenGL ES than OpenCL, there are parts of all of them in the ARM Compute Library. I was hoping that was all I needed.

Debian has some pocl stuff, I'll try that.

Is your libOpenCL.so a stub maybe? The one in the compute library is under 32k bytes too but it comes in the "opencl-1.2-stubs" dir.

hominoid
Posts: 316
Joined: Tue Feb 28, 2017 3:55 am
languages_spoken: english
ODROIDs: C2, XU4, MC1, N1, N2
Location: Lake Superior Basin, USA
Has thanked: 11 times
Been thanked: 23 times
Contact:

Re: ARM fun stuff

Unread post by hominoid » Wed Nov 06, 2019 12:59 pm

ab1jx wrote:
Wed Nov 06, 2019 10:41 am
Is your libOpenCL.so a stub maybe?
No, it's a symbolic link to libMali.so

User avatar
meveric
Posts: 10527
Joined: Mon Feb 25, 2013 2:41 pm
languages_spoken: german, english
ODROIDs: X2, U2, U3, XU-Lite, XU3, XU3-Lite, C1, XU4, C2, C1+, XU4Q, HC1, N1, Go, H2 (N4100), N2, H2 (J4105)
Has thanked: 17 times
Been thanked: 149 times
Contact:

Re: ARM fun stuff

Unread post by meveric » Wed Nov 06, 2019 4:42 pm

It seems there is a better way to activate OpenCL, rather than linking it directly to libMali.so
I'll check if I can fix my Mali packages to reflect this, although I'm not sure this will help in your current situation.
I can probably tell you how to fix it manually though.
Donate to support my work on the ODROID GameStation Turbo Image for U2/U3 XU3/XU4 X2 X C1 as well as many other releases.
Check out the Games and Emulators section to find some of my work or check the files in my repository to find the software i build for ODROIDs.
If you want to add my repository to your image read my HOWTO integrate my repo into your image.

ab1jx
Posts: 67
Joined: Wed Jul 10, 2019 8:25 am
languages_spoken: english
Has thanked: 7 times
Been thanked: 2 times
Contact:

Re: ARM fun stuff

Unread post by ab1jx » Thu Nov 07, 2019 1:41 am

Really, I don't know either OpenCL or OpenGL ES. I might just as well do it in Mali assembly language, which seems like it shouldn't need any driver. I've read some of Herman Hermitage's QPU assembly language stuff for the RPI, never pursued it because it wasn't portable: a couple Pi iterations later it already didn't work right anymore.

I've been exploring Software Defined Radio from the nuts and bolts up. I use rtlsdrlib to fetch buffersfull of data from a common $10 RTL2832 dongle. Apply windowing, put it through FFTW then display it. I've gotten to that point a few times using a variety of graphics methods: doing framebuffer graphics by writing to memory addresses, doing the same thing with RFB/VNC, using xlib graphics. It needs to be fast and efficient. Up to 20--30 times per second I need to take the FFTed data, scale it to fit one window height and display it as a simple x-y plot. Most of the window is called the waterfall. Each line in it is comprised of a row of pixels where the color of each pixel depends on the amplitude of the peak in the top window. Then for every batch of data plotted in the top window I need to scroll the lower window down 1 pixel and draw a new top line. Doing this with the CPU turns into a bottleneck and the CPU usage creeps up as you write more code. It can be done with OpenGL ES by drawing the top line onto a texture that's rotating downward, which accomplishes the scrolling without CPU involvement,

I'm not concerned with lighting, angles, vectors, phong, any normal graphics stuff. If OpenCL can write to specific memory addresses in the framebuffer that's good. The scrolling can be done with memmoves or the equivalent. The object is to get as much as possible of the work done by the GPU and free up the CPU for other things. As far as I know I might just as well work out how to do it in assembly instead of relying on somebody's driver. I'm retired, I've been tinkering with it a few years on and off.

ab1jx
Posts: 67
Joined: Wed Jul 10, 2019 8:25 am
languages_spoken: english
Has thanked: 7 times
Been thanked: 2 times
Contact:

Re: ARM fun stuff

Unread post by ab1jx » Thu Nov 07, 2019 7:29 am

That sgminer did finally come to life, it's just looking really slow. I don't often get the accepted messages from litecoinpool running cpuminer either. But I get a steady stream of them on my old 80-chip Gridseed ASIC rig once they start. I think the way it works is that your work is accepted if it comes in soon enough to be useful, slow methods don't get a lot of accepteds.
sgminer_working.gif
sgminer_working.gif (11.19 KiB) Viewed 457 times

ab1jx
Posts: 67
Joined: Wed Jul 10, 2019 8:25 am
languages_spoken: english
Has thanked: 7 times
Been thanked: 2 times
Contact:

Re: ARM fun stuff

Unread post by ab1jx » Fri Nov 08, 2019 3:01 am

I think because OpenCL can also utilize the CPU it defaults to that if the GPU stuff isn't working, which of course is much slower.

And libOpenCL actually has a man page but it mostly is about ways to do what /etc/OpenCL/vendors/somename.icl does in specifying where to load the library from. I didn't have it in my LD_LIBRARY_PATH.

I have another Mali OpenCL SDK from https://github.com/biotrump/Mali_OpenCL_SDK but it's 3 years old for a Mali T600 series. I'm keeping it in its own directory and not mixing it up with other stuff. It does have a libOpenCL.so that's 19520 bytes. But that's a stub, I have the source. It's "Mali OpenCL SDK vv1.1.0.0a36a7 for Linux".

Running clinfo (from the debs) gets me:

Code: Select all

clinfo
clinfo: /usr/lib/aarch64-linux-gnu/libOpenCL.so.1: version `OPENCL_1.0' not found (required by clinfo)
And /usr/lib/aarch64-linux-gnu/libOpenCL.so.1 is a symlink to the 49 meg libMali.so. hominoid has one that isn't but I think he's using Ubuntu

I wasn't sure what symlnks I'd made and which were part of the malig52 deb so I uninstalled that then checked for libOpenCL* and libMali* in /usr/lib/aarch64-linux-gnu, there was nothing left. Put the malig52 deb back (not the Wayland one) and

Code: Select all

 ls -la | grep libOpenCL
lrwxrwxrwx  1 root root       14 Mar 30  2019 libOpenCL.so -> libOpenCL.so.1
lrwxrwxrwx  1 root root       18 Mar 30  2019 libOpenCL.so.1 -> libOpenCL.so.1.0.0
lrwxrwxrwx  1 root root       10 Mar 30  2019 libOpenCL.so.1.0.0 -> libMali.so
ls -la | grep libMali
lrwxrwxrwx  1 root root       10 Jul 10 02:37 libEGL.so.1.1.0 -> libMali.so
lrwxrwxrwx  1 root root       10 Mar 30  2019 libGLESv1_CM.so.1.1.0 -> libMali.so
lrwxrwxrwx  1 root root       10 Jul 10 02:37 libGLESv2.so.2.1.0 -> libMali.so
-rw-r--r--  1 root root 45800368 Mar 27  2019 libMali.so
lrwxrwxrwx  1 root root       10 Mar 30  2019 libOpenCL.so.1.0.0 -> libMali.so

User avatar
odroid
Site Admin
Posts: 32508
Joined: Fri Feb 22, 2013 11:14 pm
languages_spoken: English
ODROIDs: ODROID
Has thanked: 181 times
Been thanked: 349 times
Contact:

Re: ARM fun stuff

Unread post by odroid » Fri Nov 08, 2019 11:29 am

Our tested libMali.so size is around 37MB.

Try this Mali fbdev deb package for N2 Ubuntu.
http://deb.odroid.in/n2/pool/main/m/mal ... rd-driver/

ab1jx
Posts: 67
Joined: Wed Jul 10, 2019 8:25 am
languages_spoken: english
Has thanked: 7 times
Been thanked: 2 times
Contact:

Re: ARM fun stuff

Unread post by ab1jx » Fri Nov 08, 2019 12:19 pm

Hmm, should I just try the libMali.so from it?

dpkg -i says

Code: Select all

 dpkg -i mali-fbdev_0.1-1_arm64.deb
Selecting previously unselected package mali-fbdev.
dpkg: considering removing libegl1:arm64 in favour of mali-fbdev ...
dpkg: yes, will remove libegl1:arm64 in favour of mali-fbdev
dpkg: considering removing libgles2:arm64 in favour of mali-fbdev ...
dpkg: yes, will remove libgles2:arm64 in favour of mali-fbdev
dpkg: regarding mali-fbdev_0.1-1_arm64.deb containing mali-fbdev:
 mali-fbdev conflicts with libopencl1
  ocl-icd-libopencl1:arm64 provides libopencl1 and is present and installed.

dpkg: error processing archive mali-fbdev_0.1-1_arm64.deb (--install):
 conflicting packages - not installing mali-fbdev
Errors were encountered while processing:
 mali-fbdev_0.1-1_arm64.deb
So I started to remove libegl1 and libgles2, and somehow ended up at
mali2remove.gif
mali2remove.gif (17.14 KiB) Viewed 359 times
A fair bit of that stuff I actually use. Inflexibility is one aspect of package managers I don't like, unless there's some override like a --force option I've never heard of.

User avatar
odroid
Site Admin
Posts: 32508
Joined: Fri Feb 22, 2013 11:14 pm
languages_spoken: English
ODROIDs: ODROID
Has thanked: 181 times
Been thanked: 349 times
Contact:

Re: ARM fun stuff

Unread post by odroid » Fri Nov 08, 2019 12:28 pm

Since we've used Ubuntu only I have no idea what's wrong with your OS image.
Just try to extract and copy "libMali.so" from the package.

ab1jx
Posts: 67
Joined: Wed Jul 10, 2019 8:25 am
languages_spoken: english
Has thanked: 7 times
Been thanked: 2 times
Contact:

Re: ARM fun stuff

Unread post by ab1jx » Fri Nov 08, 2019 1:32 pm

Here's one difference, the Ubuntu libMali.so has been stripped of debugging symbols:

Code: Select all

root@odroid:.../c/ascfilt# file u_libMali.so
u_libMali.so: ELF 64-bit LSB shared object, ARM aarch64, version 1 (SYSV), dynamically linked, BuildID[sha1]=6a4c4cd19e6c93f9cec84b0b329aacadacc5281e, stripped
root@odroid:.../c/ascfilt# file d_libMali.so
d_libMali.so: ELF 64-bit LSB shared object, ARM aarch64, version 1 (SYSV), dynamically linked, BuildID[sha1]=6a4c4cd19e6c93f9cec84b0b329aacadacc5281e, with debug_info, not stripped
So looking at an nm of it doesn't work. I doubt that alone explains 8 MB difference in size. Interesting that they both have the same BuildID, whatever that is.

But clinfo still gives me:

Code: Select all

clinfo
clinfo: /usr/lib/aarch64-linux-gnu/libOpenCL.so.1: version `OPENCL_1.0' not found (required by clinfo)
glmark2-es2-fbdev looks about the same with a score of 56. But in both cases the shaders are working. esgears says

Code: Select all

EGLUT: failed to initialize EGL display
which I think is the same as with the Debian libMali.so. glxgears is probably irrelevant but I see about 430 FPS in the default size, 70 FPS maximized to 1920x1080. About the same.

User avatar
odroid
Site Admin
Posts: 32508
Joined: Fri Feb 22, 2013 11:14 pm
languages_spoken: English
ODROIDs: ODROID
Has thanked: 181 times
Been thanked: 349 times
Contact:

Re: ARM fun stuff

Unread post by odroid » Fri Nov 08, 2019 3:19 pm

Do you use meveric's Debian image?

On our Ubuntu 18.04.3 minimal image, there was no such issue as @hominoid mentioned.

Code: Select all

Welcome to Ubuntu 18.04.3 LTS (GNU/Linux 4.9.196-63 aarch64)                    
                                                                                
 * Documentation:  https://help.ubuntu.com                                      
 * Management:     https://landscape.canonical.com                              
 * Support:        https://ubuntu.com/advantage                                 
                                                                                
 * Kata Containers are now fully integrated in Charmed Kubernetes 1.16!         
   Yes, charms take the Krazy out of K8s Kata Kluster Konstruction.             
                                                                                
     https://ubuntu.com/kubernetes/docs/release-notes                           
root@odroid:~# glmark2-es2-fbdev                                                
=======================================================                         
    glmark2 2012.12                                                             
=======================================================                         
    OpenGL Information                                                          
    GL_VENDOR:     ARM                                                          
    GL_RENDERER:   Mali-G52                                                     
    GL_VERSION:    OpenGL ES 3.2 git.c8adbf9.122c9daed32dbba4b3056f41a2f23c58   
=======================================================                         
[build] use-vbo=false: FPS: 8 FrameTime: 125.000 ms                             
[build] use-vbo=true: FPS: 59 FrameTime: 16.949 ms                              
[texture] texture-filter=nearest: FPS: 59 FrameTime: 16.949 ms                  
[texture] texture-filter=linear: FPS: 59 FrameTime: 16.949 ms                   
[texture] texture-filter=mipmap: FPS: 59 FrameTime: 16.949 ms                   
[shading] shading=gouraud: FPS: 59 FrameTime: 16.949 ms                         
[shading] shading=blinn-phong-inf: FPS: 59 FrameTime: 16.949 ms                 
[shading] shading=phong: FPS: 59 FrameTime: 16.949 ms                           
[bump] bump-render=high-poly: FPS: 59 FrameTime: 16.949 ms                      
[bump] bump-render=normals: FPS: 59 FrameTime: 16.949 ms                        
[bump] bump-render=height: FPS: 59 FrameTime: 16.949 ms                         
libpng warning: iCCP: known incorrect sRGB profile                              
[effect2d] kernel=0,1,0;1,-4,1;0,1,0;: FPS: 59 FrameTime: 16.949 ms             
libpng warning: iCCP: known incorrect sRGB profile                              
[effect2d] kernel=1,1,1,1,1;1,1,1,1,1;1,1,1,1,1;: FPS: 59 FrameTime: 16.949 ms  
[pulsar] light=false:quads=5:texture=false: FPS: 59 FrameTime: 16.949 ms        
libpng warning: iCCP: known incorrect sRGB profile                              
[desktop] blur-radius=5:effect=blur:passes=1:separable=true:windows=4: FPS: 58 F
rameTime: 17.241 ms                                                             
libpng warning: iCCP: known incorrect sRGB profile                              
[desktop] effect=shadow:windows=4: FPS: 59 FrameTime: 16.949 ms                 
[buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 58 FrameTime: 17.241 ms                                 
[buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=subdata: FPS: 58 FrameTime: 17.241 ms                             
[buffer] columns=200:interleave=true:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 58 FrameTime: 17.241 ms                                  
[ideas] speed=duration: FPS: 59 FrameTime: 16.949 ms                            
[jellyfish] <default>: FPS: 59 FrameTime: 16.949 ms                             
[terrain] <default>: FPS: 27 FrameTime: 37.037 ms                               
[shadow] <default>: FPS: 59 FrameTime: 16.949 ms                                
[refract] <default>: FPS: 49 FrameTime: 20.408 ms                               
[conditionals] fragment-steps=0:vertex-steps=0: FPS: 59 FrameTime: 16.949 ms    
[conditionals] fragment-steps=5:vertex-steps=0: FPS: 59 FrameTime: 16.949 ms    
[conditionals] fragment-steps=0:vertex-steps=5: FPS: 59 FrameTime: 16.949 ms    
[function] fragment-complexity=low:fragment-steps=5: FPS: 59 FrameTime: 16.949 ms                                                                               
[function] fragment-complexity=medium:fragment-steps=5: FPS: 59 FrameTime: 16.949 ms                                                                            
[loop] fragment-loop=false:fragment-steps=5:vertex-steps=5: FPS: 59 FrameTime: 16.949 ms                                                                        
[loop] fragment-steps=5:fragment-uniform=false:vertex-steps=5: FPS: 59 FrameTime: 16.949 ms                                                                     
[loop] fragment-steps=5:fragment-uniform=true:vertex-steps=5: FPS: 59 FrameTime: 16.949 ms                                                                      
=======================================================                         
                                  glmark2 Score: 55                             
=======================================================     
I got the following result just after running the instruction in the release note. https://wiki.odroid.com/odroid-n2/os_im ... figuration

Code: Select all

root@odroid:~# clinfo 
Number of platforms                               1
  Platform Name                                   ARM Platform
  Platform Vendor                                 ARM
  Platform Version                                OpenCL 2.0 git.c8adbf9.122c9daed32dbba4b3056f41a2f23c58
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_3d_image_writes cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_fp16 cl_khr_icd cl_khr_egl_image cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_subgroups cl_khr_create_command_queue cl_arm_core_id cl_arm_printf cl_arm_thread_limit_hint cl_arm_non_uniform_work_group_size cl_arm_import_memory cl_arm_shared_virtual_memory
  Platform Extensions function suffix             ARM

  Platform Name                                   ARM Platform
Number of devices                                 1
  Device Name                                     Mali-G52
  Device Vendor                                   ARM
  Device Vendor ID                                0x72120000
  Device Version                                  OpenCL 2.0 git.c8adbf9.122c9daed32dbba4b3056f41a2f23c58
  Driver Version                                  2.0
  Device OpenCL C Version                         OpenCL C 2.0 git.c8adbf9.122c9daed32dbba4b3056f41a2f23c58
  Device Type                                     GPU
  Device Profile                                  FULL_PROFILE
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Max compute units                               2
  Max clock frequency                             750MHz
  Device Partition                                (core)
    Max number of sub-devices                     0
    Supported partition types                     None
  Max work item dimensions                        3
  Max work item sizes                             384x384x384
  Max work group size                             384
  Preferred work group size multiple              8
  Preferred / native vector sizes                 
    char                                                16 / 4       
    short                                                8 / 2       
    int                                                  4 / 1       
    long                                                 2 / 1       
    half                                                 8 / 2        (cl_khr_fp16)
    float                                                4 / 1       
    double                                               0 / 0        (n/a)
  Half-precision Floating-point support           (cl_khr_fp16)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
  Single-precision Floating-point support         (core)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Double-precision Floating-point support         (n/a)
  Address bits                                    64, Little-Endian
  Global memory size                              3887570944 (3.621GiB)
  Error Correction support                        No
  Max memory allocation                           971892736 (926.9MiB)
  Unified memory for Host and Device              Yes
  Shared Virtual Memory (SVM) capabilities        (core)
    Coarse-grained buffer sharing                 Yes
    Fine-grained buffer sharing                   No
    Fine-grained system sharing                   No
    Atomics                                       No
  Shared Virtual Memory (SVM) capabilities (ARM)  
    Coarse-grained buffer sharing                 Yes
    Fine-grained buffer sharing                   No
    Fine-grained system sharing                   No
    Atomics                                       No
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       1024 bits (128 bytes)
  Preferred alignment for atomics                 
    SVM                                           0 bytes
    Global                                        0 bytes
    Local                                         0 bytes
  Max size for global variable                    65536 (64KiB)
  Preferred total size of global vars             0
  Global Memory cache type                        Read/Write
  Global Memory cache size                        131072 (128KiB)
  Global Memory cache line size                   64 bytes
  Image support                                   Yes
    Max number of samplers per kernel             16
    Max size for 1D images from buffer            65536 pixels
    Max 1D or 2D image array size                 2048 images
    Base address alignment for 2D image buffers   32 bytes
    Pitch alignment for 2D image buffers          64 pixels
    Max 2D image size                             65536x65536 pixels
    Max 3D image size                             65536x65536x65536 pixels
    Max number of read image args                 128
    Max number of write image args                64
    Max number of read/write image args           64
  Max number of pipe args                         16
  Max active pipe reservations                    1
  Max pipe packet size                            1024
  Local memory type                               Global
  Local memory size                               32768 (32KiB)
  Max number of constant args                     8
  Max constant buffer size                        65536 (64KiB)
  Max size of kernel argument                     1024
  Queue properties (on host)                      
    Out-of-order execution                        Yes
    Profiling                                     Yes
  Queue properties (on device)                    
    Out-of-order execution                        Yes
    Profiling                                     Yes
    Preferred size                                2097152 (2MiB)
    Max size                                      16777216 (16MiB)
  Max queues on device                            1
  Max events on device                            1024
  Prefer user sync for interop                    No
  Profiling timer resolution                      1000ns
  Execution capabilities                          
    Run OpenCL kernels                            Yes
    Run native kernels                            No
  printf() buffer size                            1048576 (1024KiB)
  Built-in kernels                                
  Device Extensions                               cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_3d_image_writes cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_fp16 cl_khr_icd cl_khr_egl_image cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_subgroups cl_khr_create_command_queue cl_arm_core_id cl_arm_printf cl_arm_thread_limit_hint cl_arm_non_uniform_work_group_size cl_arm_import_memory cl_arm_shared_virtual_memory

NULL platform behavior
  clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  ARM Platform
  clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   Success [ARM]
  clCreateContext(NULL, ...) [default]            Success [ARM]
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT)  Success (1)
    Platform Name                                 ARM Platform
    Device Name                                   Mali-G52
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  Success (1)
    Platform Name                                 ARM Platform
    Device Name                                   Mali-G52
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  Success (1)
    Platform Name                                 ARM Platform
    Device Name                                   Mali-G52

ICD loader properties
  ICD loader Name                                 OpenCL ICD Loader
  ICD loader Vendor                               OCL Icd free software
  ICD loader Version                              2.2.11
  ICD loader Profile                              OpenCL 2.1

User avatar
meveric
Posts: 10527
Joined: Mon Feb 25, 2013 2:41 pm
languages_spoken: german, english
ODROIDs: X2, U2, U3, XU-Lite, XU3, XU3-Lite, C1, XU4, C2, C1+, XU4Q, HC1, N1, Go, H2 (N4100), N2, H2 (J4105)
Has thanked: 17 times
Been thanked: 149 times
Contact:

Re: ARM fun stuff

Unread post by meveric » Fri Nov 08, 2019 3:47 pm

viewtopic.php?p=272264#p272264 this should fix OpenCL on my Debian image.
These users thanked the author meveric for the post (total 2):
ab1jx (Fri Nov 08, 2019 8:47 pm) • odroid (Tue Nov 12, 2019 10:16 am)
Donate to support my work on the ODROID GameStation Turbo Image for U2/U3 XU3/XU4 X2 X C1 as well as many other releases.
Check out the Games and Emulators section to find some of my work or check the files in my repository to find the software i build for ODROIDs.
If you want to add my repository to your image read my HOWTO integrate my repo into your image.

ab1jx
Posts: 67
Joined: Wed Jul 10, 2019 8:25 am
languages_spoken: english
Has thanked: 7 times
Been thanked: 2 times
Contact:

Re: ARM fun stuff

Unread post by ab1jx » Fri Nov 08, 2019 8:35 pm

Looks good so far:

Code: Select all

Number of platforms                               1
  Platform Name                                   ARM Platform
  Platform Vendor                                 ARM
  Platform Version                                OpenCL 2.0 git.c8adbf9.122c9daed32dbba4b3056f41a2f23c58
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_3d_image_writes cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_fp16 cl_khr_icd cl_khr_egl_image cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_subgroups cl_khr_create_command_queue cl_arm_core_id cl_arm_printf cl_arm_thread_limit_hint cl_arm_non_uniform_work_group_size cl_arm_import_memory cl_arm_shared_virtual_memory
  Platform Extensions function suffix             ARM

  Platform Name                                   ARM Platform
Number of devices                                 1
  Device Name                                     Mali-G52
  Device Vendor                                   ARM
  Device Vendor ID                                0x72120000
  Device Version                                  OpenCL 2.0 git.c8adbf9.122c9daed32dbba4b3056f41a2f23c58
  Driver Version                                  2.0
  Device OpenCL C Version                         OpenCL C 2.0 git.c8adbf9.122c9daed32dbba4b3056f41a2f23c58
  Device Type                                     GPU
  Device Profile                                  FULL_PROFILE
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Max compute units                               2
  Max clock frequency                             750MHz
  Device Partition                                (core)
    Max number of sub-devices                     0
    Supported partition types                     None
    Supported affinity domains                    (n/a)
  Max work item dimensions                        3
  Max work item sizes                             384x384x384
  Max work group size                             384
  Preferred work group size multiple              8
  Preferred / native vector sizes                 
    char                                                16 / 4       
    short                                                8 / 2       
    int                                                  4 / 1       
    long                                                 2 / 1       
    half                                                 8 / 2        (cl_khr_fp16)
    float                                                4 / 1       
    double                                               0 / 0        (n/a)
  Half-precision Floating-point support           (cl_khr_fp16)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
  Single-precision Floating-point support         (core)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Double-precision Floating-point support         (n/a)
  Address bits                                    64, Little-Endian
  Global memory size                              3893407744 (3.626GiB)
  Error Correction support                        No
  Max memory allocation                           973351936 (928.3MiB)
  Unified memory for Host and Device              Yes
  Shared Virtual Memory (SVM) capabilities        (core)
    Coarse-grained buffer sharing                 Yes
    Fine-grained buffer sharing                   No
    Fine-grained system sharing                   No
    Atomics                                       No
  Shared Virtual Memory (SVM) capabilities (ARM)  
    Coarse-grained buffer sharing                 Yes
    Fine-grained buffer sharing                   No
    Fine-grained system sharing                   No
    Atomics                                       No
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       1024 bits (128 bytes)
  Preferred alignment for atomics                 
    SVM                                           0 bytes
    Global                                        0 bytes
    Local                                         0 bytes
  Max size for global variable                    65536 (64KiB)
  Preferred total size of global vars             0
  Global Memory cache type                        Read/Write
  Global Memory cache size                        131072 (128KiB)
  Global Memory cache line size                   64 bytes
  Image support                                   Yes
    Max number of samplers per kernel             16
    Max size for 1D images from buffer            65536 pixels
    Max 1D or 2D image array size                 2048 images
    Base address alignment for 2D image buffers   32 bytes
    Pitch alignment for 2D image buffers          64 pixels
    Max 2D image size                             65536x65536 pixels
    Max 3D image size                             65536x65536x65536 pixels
    Max number of read image args                 128
    Max number of write image args                64
    Max number of read/write image args           64
  Max number of pipe args                         16
  Max active pipe reservations                    1
  Max pipe packet size                            1024
  Local memory type                               Global
  Local memory size                               32768 (32KiB)
  Max number of constant args                     8
  Max constant buffer size                        65536 (64KiB)
  Max size of kernel argument                     1024
  Queue properties (on host)                      
    Out-of-order execution                        Yes
    Profiling                                     Yes
  Queue properties (on device)                    
    Out-of-order execution                        Yes
    Profiling                                     Yes
    Preferred size                                2097152 (2MiB)
    Max size                                      16777216 (16MiB)
  Max queues on device                            1
  Max events on device                            1024
  Prefer user sync for interop                    No
  Profiling timer resolution                      1000ns
  Execution capabilities                          
    Run OpenCL kernels                            Yes
    Run native kernels                            No
  printf() buffer size                            1048576 (1024KiB)
  Built-in kernels                                (n/a)
  Device Extensions                               cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_3d_image_writes cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_fp16 cl_khr_icd cl_khr_egl_image cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_subgroups cl_khr_create_command_queue cl_arm_core_id cl_arm_printf cl_arm_thread_limit_hint cl_arm_non_uniform_work_group_size cl_arm_import_memory cl_arm_shared_virtual_memory

NULL platform behavior
  clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  ARM Platform
  clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   Success [ARM]
  clCreateContext(NULL, ...) [default]            Success [ARM]
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT)  Success (1)
    Platform Name                                 ARM Platform
    Device Name                                   Mali-G52
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  Success (1)
    Platform Name                                 ARM Platform
    Device Name                                   Mali-G52
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  Success (1)
    Platform Name                                 ARM Platform
    Device Name                                   Mali-G52

ICD loader properties
  ICD loader Name                                 OpenCL ICD Loader
  ICD loader Vendor                               OCL Icd free software
  ICD loader Version                              2.2.12
  ICD loader Profile                              OpenCL 2.2
I'm new to OpenCL but it's about my 8th programming language. So https://developer.arm.com/solutions/gra ... -tutorials

Thank you, if I didn't have too many computers already I'd buy another N2. I've been wanting a Pinebook Pro for a long time, they almost have ANSI keyboards, you can order one now at least. A Mali T860 MP4 in a laptop. https://www.pine64.org/pinebook-pro/ Odroid should do a laptop.

ab1jx
Posts: 67
Joined: Wed Jul 10, 2019 8:25 am
languages_spoken: english
Has thanked: 7 times
Been thanked: 2 times
Contact:

Re: ARM fun stuff

Unread post by ab1jx » Fri Nov 08, 2019 9:14 pm

Didn't change this sgminer much though. I wonder if somebody could write an autotune program that used the API to optimize values. The intriguing thing about GPU mining is the ability to change algorithms for new coins.
sgminer_ok-cl.gif
sgminer_ok-cl.gif (10.54 KiB) Viewed 289 times
I've got a Gekko Science NewPac USB plug miner ASIC just breaking 100 Gh/s with a big heat sink bolted on. (Bitcoin) Uses a pair of bm1387 chips like in the Antminer s9. https://bitcointalk.org/index.php?topic=5053833.0

ab1jx
Posts: 67
Joined: Wed Jul 10, 2019 8:25 am
languages_spoken: english
Has thanked: 7 times
Been thanked: 2 times
Contact:

Re: ARM fun stuff

Unread post by ab1jx » Sun Nov 10, 2019 9:05 am

Not OpenCL stuff really but I discovered when you're trying to tune sgminer it pays to delete the compiled kernel (*.bin file) every time you make a change. Some of the parameters you enter on the command line or in your sgminer.conf file get compiled in, deleting it clears them.

Mali G52 page https://developer.arm.com/ip-products/g ... li-g52-gpu

I've read through a few tutorials now and got a "hello world" program copied from an ATI PDF to compile and run. OpenCL mostly has to do with the mechanics of splitting a job up to run on multiple cores. Instead of looping from 0 to N, you take what's inside the loop and send it to N processors, most of which are in the GPU. You have to consider the cost of accessing memory, local memory is quick, global isn't. The underlying language is C or C++.

https://www.khronos.org/files/opencl20- ... e-card.pdf

ab1jx
Posts: 67
Joined: Wed Jul 10, 2019 8:25 am
languages_spoken: english
Has thanked: 7 times
Been thanked: 2 times
Contact:

Re: ARM fun stuff

Unread post by ab1jx » Tue Nov 12, 2019 1:40 pm

OK, I finally found arm_compute_benchmark and ran it. All of the examples (most are benchmarks) get built when you build the complete library and put in the bin directory, it took me a while to figure that out. Arm_compute_benchmark cycles through all of them, the output looks like this. I understand very litttle of it. This is on my stock N2. When I tried to post the full output I got "Your message contains 1270572 characters.
The maximum number of allowed characters is 250000." So I chopped a chunk out of the middle to keep the summary at the end.

Code: Select all

Version = arm_compute_version=v19.08 Build options: {'arch': 'arm64-v8a', 'opencl': '1', 'neon': '1', 'benchmark_tests': '1', 'build': 'native', 'debug': '1', 'os': 'linux', 'Werror': '1'} Git hash=unknown
CommandLine = ./arm_compute_benchmark 
Seed = 4281109114
CL_DEVICE_VERSION = OpenCL 2.0 git.c8adbf9.122c9daed32dbba4b3056f41a2f23c58
cpu_has_fp16 = false
cpu_has_dotprod = false
CPU0 = A53
CPU1 = A53
CPU2 = GENERIC
CPU3 = GENERIC
CPU4 = GENERIC
CPU5 = GENERIC
Iterations = 1
Threads = 1
Dataset mode = PRECOMMIT
Running [0] 'CL/AlexNetActivationLayer@Shape=55x55x96:Info=RELU:DataType=F16:Batches=1'
  Wall clock/Wall clock time:    AVG=1841.0000 us
/S16/RunSmall@Shape=27x13x2x4:DataType=S16:BorderMode=UNDEFINED:FilterSize=5:FilterSize=9'
  Wall clock/Wall clock time:    AVG=1697.0000 us
Running [2444] 'CL/CustomConvolution/Rectangle/S16/RunSmall@Shape=27x13x2x4:DataType=S16:BorderMode=UNDEFINED:FilterSize=7:FilterSize=3'
  Wall clock/Wall clock time:    AVG=1417.0000 us

Executed 6450 test(s) (6196 passed, 0 expected failures, 0 failed, 0 crashed, 0 disabled) in 1051 second(s)
That last line just sunk in. It ran 254 more tests than passed but none failed or crashed? I bet that number being close to 2^8 has something to do with it.
Last edited by ab1jx on Fri Nov 15, 2019 3:08 am, edited 1 time in total.

User avatar
rooted
Posts: 6769
Joined: Fri Dec 19, 2014 9:12 am
languages_spoken: english
Location: Gulf of Mexico, US
Has thanked: 216 times
Been thanked: 41 times
Contact:

Re: ARM fun stuff

Unread post by rooted » Fri Nov 15, 2019 12:48 am

When you post long logs like that please use pastebin or a similar service, it really slows loading on mobile clients and even crashed Tapatalk.

ab1jx
Posts: 67
Joined: Wed Jul 10, 2019 8:25 am
languages_spoken: english
Has thanked: 7 times
Been thanked: 2 times
Contact:

Re: ARM fun stuff

Unread post by ab1jx » Fri Nov 15, 2019 3:21 am

rooted wrote:
Fri Nov 15, 2019 12:48 am
When you post long logs like that please use pastebin or a similar service, it really slows loading on mobile clients and even crashed Tapatalk.
OK, I chopped out the bulk of it, it isn't that meaningful anyway. Didn't know Tapatalk carried this. https://forum.xda-developers.com/ is the only one I visit much that I knew was on Tapatalk.

Why is it that when you get emailed that someone replied to your topic you have to log in to see what somebody browsing the forum as a guest (without logging in) could see anyway? And this site requires PITA passwords so I wouldn't try to remember it. It's a fluke of phpBB, I've seen it elsewhere too.

Post Reply

Return to “Ubuntu”

Who is online

Users browsing this forum: No registered users and 2 guests