Mali G52 OpenCL Comparison

hominoid
Posts: 854
Joined: Tue Feb 28, 2017 3:55 am
languages_spoken: english
ODROIDs: C2, C4, XU4, MC1, N1, N2, N2L, N2+, HC4, M1, H2, H3+
Location: Lake Superior Basin, USA
Has thanked: 121 times
Been thanked: 379 times
Contact:

Mali G52 OpenCL Comparison

Post by hominoid »

Wanting to get a better feel for the OpenCL performance of the Odroid N2’s Mali G52, I set up the N2 with a Odroid N1 and XU4 to run Lyra2rev2 on sgminer-arm 5.5.6a.

To get OpenCL running on the N2 I had to add the following link.

Code: Select all

cd /usr/lib/aarch64-linux-gnu
ln -s libMali.so libOpenCL.so
In order for clinfo to work the following directories and file had to be created

Code: Select all

sudo apt install clinfo
sudo mkdir /etc/OpenCL
sudo mkdir /etc/OpenCL/vendors
sudo vi /etc/OpenCL/vendors/mali.icd
add
libOpenCL.so
clinfo should now report:

Code: Select all

Number of platforms                               1
  Platform Name                                   ARM Platform
  Platform Vendor                                 ARM
  Platform Version                                OpenCL 2.0 git.c8adbf9.122c9daed32dbba4b3056f41a2f23c58
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_3d_image_writes cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_fp16 cl_khr_icd cl_khr_egl_image cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_subgroups cl_khr_create_command_queue cl_arm_core_id cl_arm_printf cl_arm_thread_limit_hint cl_arm_non_uniform_work_group_size cl_arm_import_memory cl_arm_shared_virtual_memory
  Platform Extensions function suffix             ARM

  Platform Name                                   ARM Platform
Number of devices                                 1
  Device Name                                     Mali-G52
  Device Vendor                                   ARM
  Device Vendor ID                                0x72120000
  Device Version                                  OpenCL 2.0 git.c8adbf9.122c9daed32dbba4b3056f41a2f23c58
  Driver Version                                  2.0
  Device OpenCL C Version                         OpenCL C 2.0 git.c8adbf9.122c9daed32dbba4b3056f41a2f23c58
  Device Type                                     GPU
  Device Profile                                  FULL_PROFILE
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Max compute units                               2
  Max clock frequency                             750MHz
  Device Partition                                (core)
    Max number of sub-devices                     0
    Supported partition types                     None
  Max work item dimensions                        3
  Max work item sizes                             384x384x384
  Max work group size                             384
  Preferred work group size multiple              8
  Preferred / native vector sizes                 
    char                                                16 / 4       
    short                                                8 / 2       
    int                                                  4 / 1       
    long                                                 2 / 1       
    half                                                 8 / 2        (cl_khr_fp16)
    float                                                4 / 1       
    double                                               0 / 0        (n/a)
  Half-precision Floating-point support           (cl_khr_fp16)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
  Single-precision Floating-point support         (core)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Double-precision Floating-point support         (n/a)
  Address bits                                    64, Little-Endian
  Global memory size                              3888140288 (3.621GiB)
  Error Correction support                        No
  Max memory allocation                           972035072 (927MiB)
  Unified memory for Host and Device              Yes
  Shared Virtual Memory (SVM) capabilities        (core)
    Coarse-grained buffer sharing                 Yes
    Fine-grained buffer sharing                   No
    Fine-grained system sharing                   No
    Atomics                                       No
  Shared Virtual Memory (SVM) capabilities (ARM)  
    Coarse-grained buffer sharing                 Yes
    Fine-grained buffer sharing                   No
    Fine-grained system sharing                   No
    Atomics                                       No
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       1024 bits (128 bytes)
  Preferred alignment for atomics                 
    SVM                                           0 bytes
    Global                                        0 bytes
    Local                                         0 bytes
  Max size for global variable                    65536 (64KiB)
  Preferred total size of global vars             0
  Global Memory cache type                        Read/Write
  Global Memory cache size                        131072 (128KiB)
  Global Memory cache line size                   64 bytes
  Image support                                   Yes
    Max number of samplers per kernel             16
    Max size for 1D images from buffer            65536 pixels
    Max 1D or 2D image array size                 2048 images
    Base address alignment for 2D image buffers   32 bytes
    Pitch alignment for 2D image buffers          64 pixels
    Max 2D image size                             65536x65536 pixels
    Max 3D image size                             65536x65536x65536 pixels
    Max number of read image args                 128
    Max number of write image args                64
    Max number of read/write image args           64
  Max number of pipe args                         16
  Max active pipe reservations                    1
  Max pipe packet size                            1024
  Local memory type                               Global
  Local memory size                               32768 (32KiB)
  Max number of constant args                     8
  Max constant buffer size                        65536 (64KiB)
  Max size of kernel argument                     1024
  Queue properties (on host)                      
    Out-of-order execution                        Yes
    Profiling                                     Yes
  Queue properties (on device)                    
    Out-of-order execution                        Yes
    Profiling                                     Yes
    Preferred size                                2097152 (2MiB)
    Max size                                      16777216 (16MiB)
  Max queues on device                            1
  Max events on device                            1024
  Prefer user sync for interop                    No
  Profiling timer resolution                      1000ns
  Execution capabilities                          
    Run OpenCL kernels                            Yes
    Run native kernels                            No
  printf() buffer size                            1048576 (1024KiB)
  Built-in kernels                                
  Device Extensions                               cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_3d_image_writes cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_fp16 cl_khr_icd cl_khr_egl_image cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_subgroups cl_khr_create_command_queue cl_arm_core_id cl_arm_printf cl_arm_thread_limit_hint cl_arm_non_uniform_work_group_size cl_arm_import_memory cl_arm_shared_virtual_memory

NULL platform behavior
  clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  ARM Platform
  clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   Success [ARM]
  clCreateContext(NULL, ...) [default]            Success [ARM]
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT)  Success (1)
    Platform Name                                 ARM Platform
    Device Name                                   Mali-G52
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  Success (1)
    Platform Name                                 ARM Platform
    Device Name                                   Mali-G52
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  Success (1)
    Platform Name                                 ARM Platform
    Device Name                                   Mali-G52

ICD loader properties
  ICD loader Name                                 OpenCL ICD Loader
  ICD loader Vendor                               OCL Icd free software
  ICD loader Version                              2.2.11
  ICD loader Profile                              OpenCL 2.1
For the test I tried to set things up as evenly as possible. All three SBC ran Lyra2rev2 simultaneously on a shared network. Each platform’s sgminer was compiled with the latest ARM Compute Library v18.11.

XU4 – Active air cooled, Split AirFlow Case with 80mm ducted fan
Linux odroid-xu4 4.14.102-156 #1 SMP PREEMPT Thu Feb 21 14:15:08 -03 2019 armv7l armv7l armv7l GNU/Linux
GCC 7.3
CFLAGS="-Ofast -Wall -std=gnu99 -march=armv7-a -mtune=cortex-a15.cortex-a7 -fexpensive-optimizations -fprefetch-loop-arrays -mfpu=neon -I/opt/arm_compute-v18.11-bin-linux/include/CL" LDFLAGS="-L/opt/arm_compute-v18.11-bin-linux/lib/linux-armv7a-neon-cl" ./configure --disable-adl –disable-adl-checks

N1 – Active air cooled, stock
Linux odroid-n1 4.4.114 #1 SMP Thu Sep 6 21:26:15 EDT 2018 aarch64 GNU/Linux
GCC 6.3
CFLAGS="-Ofast -Wall -march=armv8-a+crypto -mtune=cortex-a72.cortex-a53 -fexpensive-optimizations -fprefetch-loop-arrays -std=gnu99 -I/opt/arm_compute-v18.11-bin-linux/include/CL" LDFLAGS="-L/opt/arm_compute-v18.11-bin-linux/lib/linux-arm64-v8a-neon-cl" ./configure --disable-git-version --disable-adl --disable-adl-checks

N2 - Passive heatsink, stock
Linux odroid-n2 4.9.156-14 #1 SMP PREEMPT Sat Feb 16 02:15:44 -02 2019 aarch64 aarch64 aarch64 GNU/Linux
GCC 7.3
CFLAGS="-Ofast -Wall -march=armv8-a+crypto -mtune=cortex-a73.cortex-a53 -fexpensive-optimizations -fprefetch-loop-arrays -std=gnu99 -I/opt/arm_compute-v18.11-bin-linux/include/CL" LDFLAGS="-L/opt/arm_compute-v18.11-bin-linux/lib/linux-arm64-v8a-neon-cl" ./configure --disable-git-version --disable-adl –disable-adl-checks

The SBCs ran for a little over an hour with the following average Kilohash/s results
Comparison.jpg
Comparison.jpg (30.52 KiB) Viewed 30263 times
I also recorded the system and GPU temperatures for the Odroid N2 during the test.
N2 Temps.jpg
N2 Temps.jpg (61.72 KiB) Viewed 30263 times
I spent a fair amount of time tuning GPU parameters. I have the most experience with the XU4's Mali T-628 so most of the time was spent on the N2 and the N1. As a general observation the N2’s Mali G52 was a pleasure to work with and It didn’t take long to find the sweet spot were it was stable and sustainable. The bad boy out of the group is definitely the N1’s Mali T-860. I struggled to get it to perform near the XU4’s performance. This isn’t something new either. From the first time I started working with it a year ago, getting it to operate at or near the XU4, regardless of the algorithm, has been a challenge. Even considering it’s running an older kernel, compiler and drivers I feel the N1’s Mali T860 should do better. For this test I could get it to run in the 70-73 Kilohash/s range but it’s just not sustainable over the long run.

The N2’s Mali G52’s range during tuning could punch up to 103-110 Kilohash/s but could not be sustained. It runs relatively cool and comfortable at 98-99 Kilohash/s, considering it’s passive heatsink and affords a nice increase in performance to it’s closest competitor. Overall I’m very pleased with its performance and happy from a hardware perspective that HK waited for the S922X with it’s Mali G52. In between testing I’ve been running the N2 as a thin desktop with good experiences and looking forward to the new drivers that are in the works.
Last edited by hominoid on Thu Feb 28, 2019 10:26 pm, edited 1 time in total.
These users thanked the author hominoid for the post:
xabolcs (Fri Sep 13, 2019 5:29 pm)

User avatar
odroid
Site Admin
Posts: 41543
Joined: Fri Feb 22, 2013 11:14 pm
languages_spoken: English, Korean
ODROIDs: ODROID
Has thanked: 3321 times
Been thanked: 1837 times
Contact:

Re: Mali G52 OpenCL Comparison

Post by odroid »

Thank you for the detail analysis.
We've fixed the disk quota issue in the PHPBB forum settings.
Please upload the images.

hominoid
Posts: 854
Joined: Tue Feb 28, 2017 3:55 am
languages_spoken: english
ODROIDs: C2, C4, XU4, MC1, N1, N2, N2L, N2+, HC4, M1, H2, H3+
Location: Lake Superior Basin, USA
Has thanked: 121 times
Been thanked: 379 times
Contact:

Re: Mali G52 OpenCL Comparison

Post by hominoid »

Thank you, images uploaded.

meister ivar
Posts: 20
Joined: Fri Jan 05, 2018 7:27 pm
languages_spoken: english german
ODROIDs: xu4q c2 n2+
Has thanked: 2 times
Been thanked: 2 times
Contact:

Re: Mali G52 OpenCL Comparison

Post by meister ivar »

@hominoid

Do you know mandelbulber (https://github.com/buddhi1980/mandelbulber2)?
It is a 3d fractal-renderer which is able to use opencl for rendering.

Could you try your N2 on testing it by using ocl?

My XU4 isn't running well on that. Resulting images are kind of twisted...

hominoid
Posts: 854
Joined: Tue Feb 28, 2017 3:55 am
languages_spoken: english
ODROIDs: C2, C4, XU4, MC1, N1, N2, N2L, N2+, HC4, M1, H2, H3+
Location: Lake Superior Basin, USA
Has thanked: 121 times
Been thanked: 379 times
Contact:

Re: Mali G52 OpenCL Comparison

Post by hominoid »

I'm not familiar with it but I have been digging around for some other OpenCL Benchmarks today. I'm about to run out of space on the 16GB emmc so I'm moving to a larger emmc right now. I'll take a look as soon as I'm done and let you know how it goes.

hominoid
Posts: 854
Joined: Tue Feb 28, 2017 3:55 am
languages_spoken: english
ODROIDs: C2, C4, XU4, MC1, N1, N2, N2L, N2+, HC4, M1, H2, H3+
Location: Lake Superior Basin, USA
Has thanked: 121 times
Been thanked: 379 times
Contact:

Re: Mali G52 OpenCL Comparison

Post by hominoid »

meister ivar wrote:
Fri Mar 01, 2019 3:01 am
@hominoid

Do you know mandelbulber (https://github.com/buddhi1980/mandelbulber2)?
It is a 3d fractal-renderer which is able to use opencl for rendering.

Could you try your N2 on testing it by using ocl?

My XU4 isn't running well on that. Resulting images are kind of twisted...
I finished up last night staging Mandelbulb2 on the N2. I was not able to actually use OpenCL on the desktop because the current fbdev driver does not support acceleration. I was able to run MandelBulber2 without OpenCL and I was able to operate the OpenCL version of the application using the command line interface. None of this helped see the issue you were experiencing so I decided to stage Mandelbulb2 on a XU4 to try and replicate the OpenCL issue.

When selecting the predefined fractals, about half of them did not appear as I would have expected. Upon further investigation I found that the default values for the viewport were not appropriate. So for instance, the MendalBox when selected was a dark image with some lines. Once the camera distance was adjusted from the default 7 to approximately 25, the image was correct. I tried several problematic fractals and they all displayed correctly once the viewport parameters were set appropriately. I did not find any issues with OpenCL acceleration for this application on the XU4. OpenCL operated as I would have expected. Since we are now off topic for this thread, if you have any followup questions please open a new thread in the XU4 forum and send me a PM so I’m aware. By the way, nice app!
Mandelbox.jpg
Mandelbox.jpg (50.12 KiB) Viewed 30137 times

User avatar
odroid
Site Admin
Posts: 41543
Joined: Fri Feb 22, 2013 11:14 pm
languages_spoken: English, Korean
ODROIDs: ODROID
Has thanked: 3321 times
Been thanked: 1837 times
Contact:

Re: Mali G52 OpenCL Comparison

Post by odroid »

hominoid wrote:
Thu Feb 28, 2019 12:36 pm
The SBCs ran for a little over an hour with the following average Kilohash/s results
Image
OpenCL 2.0 accelerated Hash computing performance is very nice. :D

xabolcs
Posts: 76
Joined: Fri Jun 22, 2018 6:37 pm
languages_spoken: english
ODROIDs: N2
Has thanked: 139 times
Been thanked: 2 times
Contact:

Re: Mali G52 OpenCL Comparison

Post by xabolcs »

Hi!

Sorry for not opening a new topic, but my problem is also OpenCL related.

I want to compare my N2's OpenCL perfomance with hashcat. It builds and installs flawlessly, but then "error: Failed to open directory 'OpenCL'"

Code: Select all

root@odroid-stretch64:~# hashcat -b -m 1800                                                                             
hashcat (v5.1.0-1397-g7f4df9eb) starting in benchmark mode...                                                           
                                                                                                                        
Benchmarking uses hand-optimized kernel code by default.                                                                
You can use it in your cracking session by setting the -O option.                                                       
Note: Using optimized kernel code limits the maximum supported password length.                                         
To disable the optimized kernel code in benchmark mode, use the -w option.                                              
                                                                                                                        
OpenCL API (OpenCL 2.0 git.c8adbf9.122c9daed32dbba4b3056f41a2f23c58) - Platform #1 [ARM]                                
========================================================================================                                
* Device #1: Mali-G52, 928/3713 MB allocatable, 2MCU                                                                    
                                                                                                                        
Benchmark relevant options:                                                                                             
===========================                                                                                             
* --optimized-kernel-enable                                                                                             
                                                                                                                        
Hashmode: 1800 - sha512crypt $6$, SHA512 (Unix) (Iterations: 5000)                                                      
                                                                                                                        
clBuildProgram(): CL_INVALID_BUILD_OPTIONS                                                                              
                                                                                                                        
error: Failed to open directory 'OpenCL'                                                                                
error: Failed to handle include build options                                                                           
error: encountered invalid build options                                                                                
                                                                                                                        
* Device #1: Kernel /usr/local/share/hashcat/OpenCL/m01800-optimized.cl build failed - proceeding without this device.  
                                                                                                                        
                                                                                                                        
Started: Tue Sep 17 14:39:59 2019                                                                                       
Stopped: Tue Sep 17 14:40:01 2019                                                                                       
I'm on meveric's Debian Stretch with latest updates and the fbdev package installed.
I got the same error with Ubuntu Minimal.

On Ubuntu if I install clinfo then it uninstalls the fbdev package, and libMali.so disappears.

User avatar
mad_ady
Posts: 11590
Joined: Wed Jul 15, 2015 5:00 pm
languages_spoken: english
ODROIDs: XU4 (HC1, HC2), C1+, C2, C4 (HC4), N1, N2, N2L, H2, H3+, Go, Go Advance, M1
Location: Bucharest, Romania
Has thanked: 649 times
Been thanked: 1154 times
Contact:

Re: Mali G52 OpenCL Comparison

Post by mad_ady »

Didn't know hashcat had a linux release...

hominoid
Posts: 854
Joined: Tue Feb 28, 2017 3:55 am
languages_spoken: english
ODROIDs: C2, C4, XU4, MC1, N1, N2, N2L, N2+, HC4, M1, H2, H3+
Location: Lake Superior Basin, USA
Has thanked: 121 times
Been thanked: 379 times
Contact:

Re: Mali G52 OpenCL Comparison

Post by hominoid »

I wondered the same thing and went to the git; it does appear to support linux. My first thought is that you may not have done a "make install". In my experience that is how the OpenCL kernels get put in the right location for the application to find. If you did do a "make install" does the OpenCL kernel "/usr/local/share/hashcat/OpenCL/m01800-optimized.cl" exist? Did you use the default install location or specify a custom location?

xabolcs
Posts: 76
Joined: Fri Jun 22, 2018 6:37 pm
languages_spoken: english
ODROIDs: N2
Has thanked: 139 times
Been thanked: 2 times
Contact:

Re: Mali G52 OpenCL Comparison

Post by xabolcs »

Nothing special: yes I did "make install", yes the "m01800-optimized.cl" file exist at the specified location and looks like a source code, I did use the default installation.

Creating the "/etc/OpenCL/vendors/mali.icd" file with the specified content doesn't help.

User avatar
odroid
Site Admin
Posts: 41543
Joined: Fri Feb 22, 2013 11:14 pm
languages_spoken: English, Korean
ODROIDs: ODROID
Has thanked: 3321 times
Been thanked: 1837 times
Contact:

Re: Mali G52 OpenCL Comparison

Post by odroid »


xabolcs
Posts: 76
Joined: Fri Jun 22, 2018 6:37 pm
languages_spoken: english
ODROIDs: N2
Has thanked: 139 times
Been thanked: 2 times
Contact:

Re: Mali G52 OpenCL Comparison

Post by xabolcs »

No. It doesn't work.

On Ubuntu the clinfo and the fbdev packages uninstalls each other.
On Debian it misses a version string from libMali.so so it refuses to start.

(I'm away from my board for a few hours.)

User avatar
odroid
Site Admin
Posts: 41543
Joined: Fri Feb 22, 2013 11:14 pm
languages_spoken: English, Korean
ODROIDs: ODROID
Has thanked: 3321 times
Been thanked: 1837 times
Contact:

Re: Mali G52 OpenCL Comparison

Post by odroid »

We tried the instruction on the WiKi page with Ubuntu minimal image and it worked.

hominoid
Posts: 854
Joined: Tue Feb 28, 2017 3:55 am
languages_spoken: english
ODROIDs: C2, C4, XU4, MC1, N1, N2, N2L, N2+, HC4, M1, H2, H3+
Location: Lake Superior Basin, USA
Has thanked: 121 times
Been thanked: 379 times
Contact:

Re: Mali G52 OpenCL Comparison

Post by hominoid »

I just built hashcat on Ubuntu and was able to run without a problem. I was going to try on a fresh minimal Ubuntu build but I see HK just confirmed they could get clifo to report correctly. Hmmmm...no ideas pop up right now I have to think about it...

Code: Select all

hominoid@odroid-n2:~/hashcat$ hashcat -b -m 1800
hashcat (v5.1.0-1397-g7f4df9eb) starting in benchmark mode...

Benchmarking uses hand-optimized kernel code by default.
You can use it in your cracking session by setting the -O option.
Note: Using optimized kernel code limits the maximum supported password length.
To disable the optimized kernel code in benchmark mode, use the -w option.

OpenCL API (OpenCL 2.0 git.c8adbf9.122c9daed32dbba4b3056f41a2f23c58) - Platform #1 [ARM]
========================================================================================
* Device #1: Mali-G52, 926/3707 MB allocatable, 2MCU

Benchmark relevant options:
===========================
* --optimized-kernel-enable

Hashmode: 1800 - sha512crypt $6$, SHA512 (Unix) (Iterations: 5000)

Speed.#1.........:      305 H/s (63.14ms) @ Accel:4 Loops:32 Thr:384 Vec:1

Started: Tue Sep 17 20:15:37 2019
Stopped: Tue Sep 17 20:16:02 2019
hominoid@odroid-n2:~/hashcat$ 

xabolcs
Posts: 76
Joined: Fri Jun 22, 2018 6:37 pm
languages_spoken: english
ODROIDs: N2
Has thanked: 139 times
Been thanked: 2 times
Contact:

Re: Mali G52 OpenCL Comparison

Post by xabolcs »

Great!

Thanks for the confirmations! Looks like it will be an User Error from my side. :)

Will try it again soon.

xabolcs
Posts: 76
Joined: Fri Jun 22, 2018 6:37 pm
languages_spoken: english
ODROIDs: N2
Has thanked: 139 times
Been thanked: 2 times
Contact:

Re: Mali G52 OpenCL Comparison

Post by xabolcs »

Nailed it!


The problem was that I tried the "hashcat" command from root's home directory.

If I try from the git directory, it works! :shock:
It also works in the following cases:
  • root@odroid:~# hashcat/hashcat -b -m 1800
  • root@odroid:~/hashcat# ./hashcat -b -m 1800
Looks like it's a hashcat bug!

Code: Select all

root@odroid:~/hashcat# hashcat -b -m 1800                                                                               
hashcat (v5.1.0-1397-g7f4df9eb) starting in benchmark mode...                                                           
                                                                                                                        
Benchmarking uses hand-optimized kernel code by default.                                                                
You can use it in your cracking session by setting the -O option.                                                       
Note: Using optimized kernel code limits the maximum supported password length.                                         
To disable the optimized kernel code in benchmark mode, use the -w option.                                              
                                                                                                                        
OpenCL API (OpenCL 2.0 git.c8adbf9.122c9daed32dbba4b3056f41a2f23c58) - Platform #1 [ARM]                                
========================================================================================                                
* Device #1: Mali-G52, 926/3707 MB allocatable, 2MCU                                                                    
                                                                                                                        
Benchmark relevant options:                                                                                             
===========================                                                                                             
* --optimized-kernel-enable                                                                                             
                                                                                                                        
Hashmode: 1800 - sha512crypt $6$, SHA512 (Unix) (Iterations: 5000)                                                      
                                                                                                                        
Speed.#1.........:      314 H/s (61.33ms) @ Accel:4 Loops:32 Thr:384 Vec:1                                              
                                                                                                                        
Started: Wed Sep 18 08:46:07 2019                                                                                       
Stopped: Wed Sep 18 08:46:24 2019                                                                                       
root@odroid:~/hashcat# 

Thanks for the help!

Sav
Posts: 259
Joined: Mon Sep 02, 2019 2:33 am
languages_spoken: english
ODROIDs: odroid-n2
Has thanked: 81 times
Been thanked: 27 times
Contact:

Re: Mali G52 OpenCL Comparison

Post by Sav »

hominoid wrote:
Fri Mar 01, 2019 11:50 pm
meister ivar wrote:
Fri Mar 01, 2019 3:01 am
@hominoid

Do you know mandelbulber (https://github.com/buddhi1980/mandelbulber2)?
It is a 3d fractal-renderer which is able to use opencl for rendering.

Could you try your N2 on testing it by using ocl?

My XU4 isn't running well on that. Resulting images are kind of twisted...
I finished up last night staging Mandelbulb2 on the N2. I was not able to actually use OpenCL on the desktop because the current fbdev driver does not support acceleration. I was able to run MandelBulber2 without OpenCL and I was able to operate the OpenCL version of the application using the command line interface. None of this helped see the issue you were experiencing so I decided to stage Mandelbulb2 on a XU4 to try and replicate the OpenCL issue.

When selecting the predefined fractals, about half of them did not appear as I would have expected. Upon further investigation I found that the default values for the viewport were not appropriate. So for instance, the MendalBox when selected was a dark image with some lines. Once the camera distance was adjusted from the default 7 to approximately 25, the image was correct. I tried several problematic fractals and they all displayed correctly once the viewport parameters were set appropriately. I did not find any issues with OpenCL acceleration for this application on the XU4. OpenCL operated as I would have expected. Since we are now off topic for this thread, if you have any followup questions please open a new thread in the XU4 forum and send me a PM so I’m aware. By the way, nice app!
Mandelbox.jpg
Does MandelBulber2 work with OpenCL in the minimal environment of ubuntu minimal + mali-fbdev described in the wiki page ( https://wiki.odroid.com/odroid-n2/os_im ... figuration )

hominoid
Posts: 854
Joined: Tue Feb 28, 2017 3:55 am
languages_spoken: english
ODROIDs: C2, C4, XU4, MC1, N1, N2, N2L, N2+, HC4, M1, H2, H3+
Location: Lake Superior Basin, USA
Has thanked: 121 times
Been thanked: 379 times
Contact:

Re: Mali G52 OpenCL Comparison

Post by hominoid »

xabolcs wrote:
Wed Sep 18, 2019 5:47 pm
...Looks like it's a hashcat bug!...
It's not a bug. The sample data files it needs are located there.

hominoid
Posts: 854
Joined: Tue Feb 28, 2017 3:55 am
languages_spoken: english
ODROIDs: C2, C4, XU4, MC1, N1, N2, N2L, N2+, HC4, M1, H2, H3+
Location: Lake Superior Basin, USA
Has thanked: 121 times
Been thanked: 379 times
Contact:

Re: Mali G52 OpenCL Comparison

Post by hominoid »

Sav wrote:
Wed Sep 18, 2019 6:05 pm
hominoid wrote:
Fri Mar 01, 2019 11:50 pm
...I was able to run MandelBulber2 without OpenCL and I was able to operate the OpenCL version of the application using the command line interface.
Does MandelBulber2 work with OpenCL in the minimal environment of ubuntu minimal + mali-fbdev described in the wiki page ( https://wiki.odroid.com/odroid-n2/os_im ... figuration )
It should, OpenCL works fine on the N2 you just cannot get acceleration in X11. FYI, the CPU cores are considerably faster for MandelBulber2 in my experience.
These users thanked the author hominoid for the post:
Sav (Wed Sep 18, 2019 11:34 pm)

xabolcs
Posts: 76
Joined: Fri Jun 22, 2018 6:37 pm
languages_spoken: english
ODROIDs: N2
Has thanked: 139 times
Been thanked: 2 times
Contact:

Re: Mali G52 OpenCL Comparison

Post by xabolcs »

hominoid wrote:
Wed Sep 18, 2019 11:08 pm
It's not a bug. The sample data files it needs are located there.
It's not a bug. It's user error! 8-)

I don't know what happened, but now it works flawlessly.

Code: Select all

root@odroid:~# rm -rf hashcat/                                                                                          
root@odroid:~# hashcat -b -m 1800                                                                                       
hashcat (v5.1.0-1397-g7f4df9eb) starting in benchmark mode...                                                           
                                                                                                                        
Benchmarking uses hand-optimized kernel code by default.                                                                
You can use it in your cracking session by setting the -O option.                                                       
Note: Using optimized kernel code limits the maximum supported password length.                                         
To disable the optimized kernel code in benchmark mode, use the -w option.                                              
                                                                                                                        
OpenCL API (OpenCL 2.0 git.c8adbf9.122c9daed32dbba4b3056f41a2f23c58) - Platform #1 [ARM]                                
========================================================================================                                
* Device #1: Mali-G52, 926/3707 MB allocatable, 2MCU                                                                    
                                                                                                                        
Benchmark relevant options:                                                                                             
===========================                                                                                             
* --optimized-kernel-enable                                                                                             
                                                                                                                        
Hashmode: 1800 - sha512crypt $6$, SHA512 (Unix) (Iterations: 5000)                                                      
                                                                                                                        
Speed.#1.........:      298 H/s (64.67ms) @ Accel:4 Loops:32 Thr:384 Vec:1                                              
                                                                                                                        
Started: Wed Sep 18 16:18:05 2019                                                                                       
Stopped: Wed Sep 18 16:18:22 2019                                                                                       

User avatar
mad_ady
Posts: 11590
Joined: Wed Jul 15, 2015 5:00 pm
languages_spoken: english
ODROIDs: XU4 (HC1, HC2), C1+, C2, C4 (HC4), N1, N2, N2L, H2, H3+, Go, Go Advance, M1
Location: Bucharest, Romania
Has thanked: 649 times
Been thanked: 1154 times
Contact:

Re: Mali G52 OpenCL Comparison

Post by mad_ady »

So, what hash rate do you get for, let's say wpa2? I'm asking for a friend ;)

xabolcs
Posts: 76
Joined: Fri Jun 22, 2018 6:37 pm
languages_spoken: english
ODROIDs: N2
Has thanked: 139 times
Been thanked: 2 times
Contact:

Re: Mali G52 OpenCL Comparison

Post by xabolcs »

Hahhaha! My N2 board has 0 hash/s rate for WPA2! :lol:

Code: Select all

root@odroid:/usr/local/share/hashcat# hashcat -b -m 2500                                                                
hashcat (v5.1.0-1397-g7f4df9eb) starting in benchmark mode...                                                           
                                                                                                                        
Benchmarking uses hand-optimized kernel code by default.                                                                
You can use it in your cracking session by setting the -O option.                                                       
Note: Using optimized kernel code limits the maximum supported password length.                                         
To disable the optimized kernel code in benchmark mode, use the -w option.                                              
                                                                                                                        
/usr/local/share/hashcat/OpenCL/m02500-optimized.cl: Optimized kernel requested but not needed - falling back to pure kl
OpenCL API (OpenCL 2.0 git.c8adbf9.122c9daed32dbba4b3056f41a2f23c58) - Platform #1 [ARM]                                
========================================================================================                                
* Device #1: Mali-G52, 926/3707 MB allocatable, 2MCU                                                                    
                                                                                                                        
Benchmark relevant options:                                                                                             
===========================                                                                                             
* --optimized-kernel-enable                                                                                             
                                                                                                                        
Hashmode: 2500 - WPA-EAPOL-PBKDF2 (Iterations: 4095)                                                                    
                                                                                                                        
[  469.380578@2] Out of memory: Kill process 2801 (hashcat) score 923 or sacrifice child                                
[  469.382864@2] Killed process 2801 (hashcat) total-vm:4030220kB, anon-rss:3540628kB, file-rss:71380kB, shmem-rss:0kB  
Killed                                                                                                                  

hominoid
Posts: 854
Joined: Tue Feb 28, 2017 3:55 am
languages_spoken: english
ODROIDs: C2, C4, XU4, MC1, N1, N2, N2L, N2+, HC4, M1, H2, H3+
Location: Lake Superior Basin, USA
Has thanked: 121 times
Been thanked: 379 times
Contact:

Re: Mali G52 OpenCL Comparison

Post by hominoid »

Seems to be about 2000 H/s plus or minus 100 H/s over 6 runs. This is a memory hungry algorithm. I had to setup zram with a 16GB swap for it to complete.

Code: Select all

hominoid@odroid-n2:~/hashcat$ ./hashcat -b -m 2500
hashcat (v5.1.0-1397-g7f4df9eb) starting in benchmark mode...

Benchmarking uses hand-optimized kernel code by default.
You can use it in your cracking session by setting the -O option.
Note: Using optimized kernel code limits the maximum supported password length.
To disable the optimized kernel code in benchmark mode, use the -w option.

/home/hominoid/hashcat/OpenCL/m02500-optimized.cl: Optimized kernel requested but not needed - falling back to pure kernel
OpenCL API (OpenCL 2.0 git.c8adbf9.122c9daed32dbba4b3056f41a2f23c58) - Platform #1 [ARM]
========================================================================================
* Device #1: Mali-G52, 926/3707 MB allocatable, 2MCU

Benchmark relevant options:
===========================
* --optimized-kernel-enable

Hashmode: 2500 - WPA-EAPOL-PBKDF2 (Iterations: 4095)

Speed.#1.........:     2107 H/s (90.01ms) @ Accel:4 Loops:256 Thr:384 Vec:1

Started: Wed Sep 18 21:23:41 2019
Stopped: Wed Sep 18 21:28:24 2019
hominoid@odroid-n2:~/hashcat$
@xaboics here is the setup I used if your interested. It is not persistent across reboots:

Code: Select all

sudo apt install zram-config
zramctl --find --size 16000M
mkswap /dev/zram0
swapon /dev/zram0
You should see the swap space with the command 'free'.
These users thanked the author hominoid for the post:
xabolcs (Thu Sep 19, 2019 11:51 am)

hominoid
Posts: 854
Joined: Tue Feb 28, 2017 3:55 am
languages_spoken: english
ODROIDs: C2, C4, XU4, MC1, N1, N2, N2L, N2+, HC4, M1, H2, H3+
Location: Lake Superior Basin, USA
Has thanked: 121 times
Been thanked: 379 times
Contact:

Re: Mali G52 OpenCL Comparison

Post by hominoid »

Here is the complete benchmark for anyone interested. N2 with 4GB memory and zram-16GB swap.
There were 2 hashmodes that the GPU work size didn't suit the Mali G52. Not surprising since it was really written for much larger GPU's.

Code: Select all

hominoid@odroid-n2:~/hashcat$ ./hashcat -b
hashcat (v5.1.0-1397-g7f4df9eb) starting in benchmark mode...

Benchmarking uses hand-optimized kernel code by default.
You can use it in your cracking session by setting the -O option.
Note: Using optimized kernel code limits the maximum supported password length.
To disable the optimized kernel code in benchmark mode, use the -w option.

OpenCL API (OpenCL 2.0 git.c8adbf9.122c9daed32dbba4b3056f41a2f23c58) - Platform #1 [ARM]
========================================================================================
* Device #1: Mali-G52, 926/3707 MB allocatable, 2MCU

Benchmark relevant options:
===========================
* --optimized-kernel-enable

Hashmode: 0 - MD5

Speed.#1.........:   183.8 MH/s (67.14ms) @ Accel:16 Loops:1024 Thr:384 Vec:1

Hashmode: 100 - SHA1

Speed.#1.........: 28538.9 kH/s (53.94ms) @ Accel:4 Loops:512 Thr:384 Vec:1

Hashmode: 1400 - SHA2-256

Speed.#1.........: 22252.5 kH/s (69.28ms) @ Accel:2 Loops:1024 Thr:384 Vec:1

Hashmode: 1700 - SHA2-512

Speed.#1.........:  5554.6 kH/s (69.42ms) @ Accel:2 Loops:256 Thr:384 Vec:1

Hashmode: 2500 - WPA-EAPOL-PBKDF2 (Iterations: 4095)

Speed.#1.........:     2041 H/s (93.06ms) @ Accel:4 Loops:256 Thr:384 Vec:1

Hashmode: 1000 - NTLM

Speed.#1.........:   235.0 MH/s (52.31ms) @ Accel:64 Loops:256 Thr:384 Vec:1

Hashmode: 3000 - LM

clEnqueueNDRangeKernel(): CL_INVALID_WORK_GROUP_SIZE

Speed.#1.........:        0 H/s (0.00ms) @ Accel:64 Loops:256 Thr:64 Vec:1

Hashmode: 5500 - NetNTLMv1 / NetNTLMv1+ESS

Speed.#1.........:   159.8 MH/s (77.35ms) @ Accel:16 Loops:1024 Thr:384 Vec:1

Hashmode: 5600 - NetNTLMv2

Speed.#1.........: 10967.4 kH/s (70.16ms) @ Accel:2 Loops:512 Thr:384 Vec:1

Hashmode: 1500 - descrypt, DES (Unix), Traditional DES

Kernel minimum runtime larger than default TDR

clEnqueueNDRangeKernel(): CL_INVALID_GLOBAL_WORK_SIZE

Speed.#1.........:        0 H/s (2565.96ms) @ Accel:2 Loops:512 Thr:384 Vec:1

Hashmode: 500 - md5crypt, MD5 (Unix), Cisco-IOS $1$ (MD5) (Iterations: 1000)

Speed.#1.........:    60926 H/s (94.47ms) @ Accel:16 Loops:500 Thr:384 Vec:1

Hashmode: 3200 - bcrypt $2*$, Blowfish (Unix) (Iterations: 32)

Speed.#1.........:       50 H/s (77.05ms) @ Accel:4 Loops:2 Thr:8 Vec:1

Hashmode: 1800 - sha512crypt $6$, SHA512 (Unix) (Iterations: 5000)

Speed.#1.........:      303 H/s (63.61ms) @ Accel:4 Loops:32 Thr:384 Vec:1

Hashmode: 7500 - Kerberos 5, etype 23, AS-REQ Pre-Auth

Speed.#1.........:   248.5 kH/s (130.76ms) @ Accel:2 Loops:128 Thr:64 Vec:1

Hashmode: 13100 - Kerberos 5, etype 23, TGS-REP

Speed.#1.........:   320.7 kH/s (50.03ms) @ Accel:1 Loops:128 Thr:64 Vec:1

Hashmode: 15300 - DPAPI masterkey file v1 (Iterations: 23999)

Speed.#1.........:      353 H/s (91.83ms) @ Accel:4 Loops:256 Thr:384 Vec:1

Hashmode: 15900 - DPAPI masterkey file v2 (Iterations: 12899)

Speed.#1.........:      139 H/s (54.31ms) @ Accel:1 Loops:128 Thr:384 Vec:1

Hashmode: 7100 - macOS v10.8+ (PBKDF2-SHA512) (Iterations: 1023)

Speed.#1.........:     1180 H/s (70.21ms) @ Accel:16 Loops:7 Thr:384 Vec:1

Hashmode: 11600 - 7-Zip (Iterations: 16384)

Speed.#1.........:      478 H/s (49.34ms) @ Accel:2 Loops:256 Thr:384 Vec:1

Hashmode: 12500 - RAR3-hp (Iterations: 262144)

Speed.#1.........:      224 H/s (213.79ms) @ Accel:1 Loops:16384 Thr:384 Vec:1

Hashmode: 13000 - RAR5 (Iterations: 32799)

Speed.#1.........:      209 H/s (56.79ms) @ Accel:2 Loops:256 Thr:384 Vec:1

Hashmode: 6211 - TrueCrypt RIPEMD160 + XTS 512 bit (Iterations: 1999)

Speed.#1.........:     2113 H/s (89.09ms) @ Accel:4 Loops:128 Thr:384 Vec:1

Hashmode: 13400 - KeePass 1 (AES/Twofish) and KeePass 2 (AES) (Iterations: 24569)

Speed.#1.........:      124 H/s (128.38ms) @ Accel:2 Loops:256 Thr:384 Vec:1

Hashmode: 6800 - LastPass + LastPass sniffed (Iterations: 499)

Speed.#1.........:    14413 H/s (32.87ms) @ Accel:2 Loops:249 Thr:384 Vec:1

Hashmode: 11300 - Bitcoin/Litecoin wallet.dat (Iterations: 200459)

Speed.#1.........:       16 H/s (61.53ms) @ Accel:1 Loops:256 Thr:384 Vec:1

Started: Wed Sep 18 22:10:28 2019
Stopped: Wed Sep 18 22:31:44 2019
hominoid@odroid-n2:~/hashcat$ 
Better description of the hashmodes taken from 'hashcat -h'

Code: Select all

      # | Name                                             | Category
  ======+==================================================+======================================
    900 | MD4                                              | Raw Hash
      0 | MD5                                              | Raw Hash
    100 | SHA1                                             | Raw Hash
   1300 | SHA2-224                                         | Raw Hash
   1400 | SHA2-256                                         | Raw Hash
  10800 | SHA2-384                                         | Raw Hash
   1700 | SHA2-512                                         | Raw Hash
  17300 | SHA3-224                                         | Raw Hash
  17400 | SHA3-256                                         | Raw Hash
  17500 | SHA3-384                                         | Raw Hash
  17600 | SHA3-512                                         | Raw Hash
   6000 | RIPEMD-160                                       | Raw Hash
    600 | BLAKE2b-512                                      | Raw Hash
  11700 | GOST R 34.11-2012 (Streebog) 256-bit, big-endian | Raw Hash
  11800 | GOST R 34.11-2012 (Streebog) 512-bit, big-endian | Raw Hash
   6900 | GOST R 34.11-94                                  | Raw Hash
   5100 | Half MD5                                         | Raw Hash
  18700 | Java Object hashCode()                           | Raw Hash
  17700 | Keccak-224                                       | Raw Hash
  17800 | Keccak-256                                       | Raw Hash
  17900 | Keccak-384                                       | Raw Hash
  18000 | Keccak-512                                       | Raw Hash
  21400 | sha256(sha256_bin(pass))                         | Raw Hash
   6100 | Whirlpool                                        | Raw Hash
  10100 | SipHash                                          | Raw Hash
  21000 | BitShares v0.x - sha512(sha512_bin(pass))        | Raw Hash
     10 | md5($pass.$salt)                                 | Raw Hash, Salted and/or Iterated
     20 | md5($salt.$pass)                                 | Raw Hash, Salted and/or Iterated
   3800 | md5($salt.$pass.$salt)                           | Raw Hash, Salted and/or Iterated
   3710 | md5($salt.md5($pass))                            | Raw Hash, Salted and/or Iterated
   4110 | md5($salt.md5($pass.$salt))                      | Raw Hash, Salted and/or Iterated
   4010 | md5($salt.md5($salt.$pass))                      | Raw Hash, Salted and/or Iterated
  21300 | md5($salt.sha1($salt.$pass))                     | Raw Hash, Salted and/or Iterated
     40 | md5($salt.utf16le($pass))                        | Raw Hash, Salted and/or Iterated
   2600 | md5(md5($pass))                                  | Raw Hash, Salted and/or Iterated
   3910 | md5(md5($pass).md5($salt))                       | Raw Hash, Salted and/or Iterated
   4400 | md5(sha1($pass))                                 | Raw Hash, Salted and/or Iterated
  20900 | md5(sha1($pass).md5($pass).sha1($pass))          | Raw Hash, Salted and/or Iterated
  21200 | md5(sha1($salt).md5($pass))                      | Raw Hash, Salted and/or Iterated
   4300 | md5(strtoupper(md5($pass)))                      | Raw Hash, Salted and/or Iterated
     30 | md5(utf16le($pass).$salt)                        | Raw Hash, Salted and/or Iterated
    110 | sha1($pass.$salt)                                | Raw Hash, Salted and/or Iterated
    120 | sha1($salt.$pass)                                | Raw Hash, Salted and/or Iterated
   4900 | sha1($salt.$pass.$salt)                          | Raw Hash, Salted and/or Iterated
   4520 | sha1($salt.sha1($pass))                          | Raw Hash, Salted and/or Iterated
    140 | sha1($salt.utf16le($pass))                       | Raw Hash, Salted and/or Iterated
  19300 | sha1($salt1.$pass.$salt2)                        | Raw Hash, Salted and/or Iterated
  14400 | sha1(CX)                                         | Raw Hash, Salted and/or Iterated
   4700 | sha1(md5($pass))                                 | Raw Hash, Salted and/or Iterated
   4710 | sha1(md5($pass).$salt)                           | Raw Hash, Salted and/or Iterated
  21100 | sha1(md5($pass.$salt))                           | Raw Hash, Salted and/or Iterated
  18500 | sha1(md5(md5($pass)))                            | Raw Hash, Salted and/or Iterated
   4500 | sha1(sha1($pass))                                | Raw Hash, Salted and/or Iterated
    130 | sha1(utf16le($pass).$salt)                       | Raw Hash, Salted and/or Iterated
   1410 | sha256($pass.$salt)                              | Raw Hash, Salted and/or Iterated
   1420 | sha256($salt.$pass)                              | Raw Hash, Salted and/or Iterated
   1440 | sha256($salt.utf16le($pass))                     | Raw Hash, Salted and/or Iterated
  20800 | sha256(md5($pass))                               | Raw Hash, Salted and/or Iterated
  20710 | sha256(sha256($pass).$salt)                      | Raw Hash, Salted and/or Iterated
   1430 | sha256(utf16le($pass).$salt)                     | Raw Hash, Salted and/or Iterated
   1710 | sha512($pass.$salt)                              | Raw Hash, Salted and/or Iterated
   1720 | sha512($salt.$pass)                              | Raw Hash, Salted and/or Iterated
   1740 | sha512($salt.utf16le($pass))                     | Raw Hash, Salted and/or Iterated
   1730 | sha512(utf16le($pass).$salt)                     | Raw Hash, Salted and/or Iterated
  19500 | Ruby on Rails Restful-Authentication             | Raw Hash, Salted and/or Iterated
     50 | HMAC-MD5 (key = $pass)                           | Raw Hash, Authenticated
     60 | HMAC-MD5 (key = $salt)                           | Raw Hash, Authenticated
    150 | HMAC-SHA1 (key = $pass)                          | Raw Hash, Authenticated
    160 | HMAC-SHA1 (key = $salt)                          | Raw Hash, Authenticated
   1450 | HMAC-SHA256 (key = $pass)                        | Raw Hash, Authenticated
   1460 | HMAC-SHA256 (key = $salt)                        | Raw Hash, Authenticated
   1750 | HMAC-SHA512 (key = $pass)                        | Raw Hash, Authenticated
   1760 | HMAC-SHA512 (key = $salt)                        | Raw Hash, Authenticated
  11750 | HMAC-Streebog-256 (key = $pass), big-endian      | Raw Hash, Authenticated
  11760 | HMAC-Streebog-256 (key = $salt), big-endian      | Raw Hash, Authenticated
  11850 | HMAC-Streebog-512 (key = $pass), big-endian      | Raw Hash, Authenticated
  11860 | HMAC-Streebog-512 (key = $salt), big-endian      | Raw Hash, Authenticated
  11500 | CRC32                                            | Raw Checksum
  14100 | 3DES (PT = $salt, key = $pass)                   | Raw Cipher, Known-Plaintext attack
  14000 | DES (PT = $salt, key = $pass)                    | Raw Cipher, Known-Plaintext attack
  15400 | ChaCha20                                         | Raw Cipher, Known-Plaintext attack
  14900 | Skip32 (PT = $salt, key = $pass)                 | Raw Cipher, Known-Plaintext attack
  11900 | PBKDF2-HMAC-MD5                                  | Generic KDF
  12000 | PBKDF2-HMAC-SHA1                                 | Generic KDF
  10900 | PBKDF2-HMAC-SHA256                               | Generic KDF
  12100 | PBKDF2-HMAC-SHA512                               | Generic KDF
   8900 | scrypt                                           | Generic KDF
    400 | phpass                                           | Generic KDF
  16900 | Ansible Vault                                    | Generic KDF
  12001 | Atlassian (PBKDF2-HMAC-SHA1)                     | Generic KDF
  20200 | Python passlib pbkdf2-sha512                     | Generic KDF
  20300 | Python passlib pbkdf2-sha256                     | Generic KDF
  20400 | Python passlib pbkdf2-sha1                       | Generic KDF
  16100 | TACACS+                                          | Network Protocols
  11400 | SIP digest authentication (MD5)                  | Network Protocols
   5300 | IKE-PSK MD5                                      | Network Protocols
   5400 | IKE-PSK SHA1                                     | Network Protocols
   2500 | WPA-EAPOL-PBKDF2                                 | Network Protocols
   2501 | WPA-EAPOL-PMK                                    | Network Protocols
  16800 | WPA-PMKID-PBKDF2                                 | Network Protocols
  16801 | WPA-PMKID-PMK                                    | Network Protocols
   7300 | IPMI2 RAKP HMAC-SHA1                             | Network Protocols
  10200 | CRAM-MD5                                         | Network Protocols
   4800 | iSCSI CHAP authentication, MD5(CHAP)             | Network Protocols
  16500 | JWT (JSON Web Token)                             | Network Protocols
   7500 | Kerberos 5, etype 23, AS-REQ Pre-Auth            | Network Protocols
  13100 | Kerberos 5, etype 23, TGS-REP                    | Network Protocols
  18200 | Kerberos 5, etype 23, AS-REP                     | Network Protocols
  19600 | Kerberos 5, etype 17, TGS-REP                    | Network Protocols
  19700 | Kerberos 5, etype 18, TGS-REP                    | Network Protocols
  19800 | Kerberos 5, etype 17, Pre-Auth                   | Network Protocols
  19900 | Kerberos 5, etype 18, Pre-Auth                   | Network Protocols
   5500 | NetNTLMv1 / NetNTLMv1+ESS                        | Network Protocols
   5600 | NetNTLMv2                                        | Network Protocols
     23 | Skype                                            | Network Protocols
  11100 | PostgreSQL CRAM (MD5)                            | Network Protocols
  11200 | MySQL CRAM (SHA1)                                | Network Protocols
   8500 | RACF                                             | Operating System
   6300 | AIX {smd5}                                       | Operating System
   6700 | AIX {ssha1}                                      | Operating System
   6400 | AIX {ssha256}                                    | Operating System
   6500 | AIX {ssha512}                                    | Operating System
   3000 | LM                                               | Operating System
  19000 | QNX /etc/shadow (MD5)                            | Operating System
  19100 | QNX /etc/shadow (SHA256)                         | Operating System
  19200 | QNX /etc/shadow (SHA512)                         | Operating System
  15300 | DPAPI masterkey file v1                          | Operating System
  15900 | DPAPI masterkey file v2                          | Operating System
   7200 | GRUB 2                                           | Operating System
  12800 | MS-AzureSync PBKDF2-HMAC-SHA256                  | Operating System
  12400 | BSDi Crypt, Extended DES                         | Operating System
   1000 | NTLM                                             | Operating System
    122 | macOS v10.4, macOS v10.5, MacOS v10.6            | Operating System
   1722 | macOS v10.7                                      | Operating System
   7100 | macOS v10.8+ (PBKDF2-SHA512)                     | Operating System
   9900 | Radmin2                                          | Operating System
   5800 | Samsung Android Password/PIN                     | Operating System
   3200 | bcrypt $2*$, Blowfish (Unix)                     | Operating System
    500 | md5crypt, MD5 (Unix), Cisco-IOS $1$ (MD5)        | Operating System
   1500 | descrypt, DES (Unix), Traditional DES            | Operating System
   7400 | sha256crypt $5$, SHA256 (Unix)                   | Operating System
   1800 | sha512crypt $6$, SHA512 (Unix)                   | Operating System
  13800 | Windows Phone 8+ PIN/password                    | Operating System
   2410 | Cisco-ASA MD5                                    | Operating System
   9200 | Cisco-IOS $8$ (PBKDF2-SHA256)                    | Operating System
   9300 | Cisco-IOS $9$ (scrypt)                           | Operating System
   5700 | Cisco-IOS type 4 (SHA256)                        | Operating System
   2400 | Cisco-PIX MD5                                    | Operating System
   8100 | Citrix NetScaler                                 | Operating System
   1100 | Domain Cached Credentials (DCC), MS Cache        | Operating System
   2100 | Domain Cached Credentials 2 (DCC2), MS Cache 2   | Operating System
   7000 | FortiGate (FortiOS)                              | Operating System
    125 | ArubaOS                                          | Operating System
    501 | Juniper IVE                                      | Operating System
     22 | Juniper NetScreen/SSG (ScreenOS)                 | Operating System
  15100 | Juniper/NetBSD sha1crypt                         | Operating System
    131 | MSSQL (2000)                                     | Database Server
    132 | MSSQL (2005)                                     | Database Server
   1731 | MSSQL (2012, 2014)                               | Database Server
     12 | PostgreSQL                                       | Database Server
   3100 | Oracle H: Type (Oracle 7+)                       | Database Server
    112 | Oracle S: Type (Oracle 11+)                      | Database Server
  12300 | Oracle T: Type (Oracle 12+)                      | Database Server
    200 | MySQL323                                         | Database Server
    300 | MySQL4.1/MySQL5                                  | Database Server
   8000 | Sybase ASE                                       | Database Server
   1421 | hMailServer                                      | FTP, HTTP, SMTP, LDAP Server
   8300 | DNSSEC (NSEC3)                                   | FTP, HTTP, SMTP, LDAP Server
  16400 | CRAM-MD5 Dovecot                                 | FTP, HTTP, SMTP, LDAP Server
   1411 | SSHA-256(Base64), LDAP {SSHA256}                 | FTP, HTTP, SMTP, LDAP Server
   1711 | SSHA-512(Base64), LDAP {SSHA512}                 | FTP, HTTP, SMTP, LDAP Server
  15000 | FileZilla Server >= 0.9.55                       | FTP, HTTP, SMTP, LDAP Server
  12600 | ColdFusion 10+                                   | FTP, HTTP, SMTP, LDAP Server
   1600 | Apache $apr1$ MD5, md5apr1, MD5 (APR)            | FTP, HTTP, SMTP, LDAP Server
    141 | Episerver 6.x < .NET 4                           | FTP, HTTP, SMTP, LDAP Server
   1441 | Episerver 6.x >= .NET 4                          | FTP, HTTP, SMTP, LDAP Server
    101 | nsldap, SHA-1(Base64), Netscape LDAP SHA         | FTP, HTTP, SMTP, LDAP Server
    111 | nsldaps, SSHA-1(Base64), Netscape LDAP SSHA      | FTP, HTTP, SMTP, LDAP Server
   7700 | SAP CODVN B (BCODE)                              | Enterprise Application Software (EAS)
   7701 | SAP CODVN B (BCODE) from RFC_READ_TABLE          | Enterprise Application Software (EAS)
   7800 | SAP CODVN F/G (PASSCODE)                         | Enterprise Application Software (EAS)
   7801 | SAP CODVN F/G (PASSCODE) from RFC_READ_TABLE     | Enterprise Application Software (EAS)
  10300 | SAP CODVN H (PWDSALTEDHASH) iSSHA-1              | Enterprise Application Software (EAS)
    133 | PeopleSoft                                       | Enterprise Application Software (EAS)
  13500 | PeopleSoft PS_TOKEN                              | Enterprise Application Software (EAS)
  21500 | SolarWinds Orion                                 | Enterprise Application Software (EAS)
   8600 | Lotus Notes/Domino 5                             | Enterprise Application Software (EAS)
   8700 | Lotus Notes/Domino 6                             | Enterprise Application Software (EAS)
   9100 | Lotus Notes/Domino 8                             | Enterprise Application Software (EAS)
  20600 | Oracle Transportation Management (SHA256)        | Enterprise Application Software (EAS)
   4711 | Huawei sha1(md5($pass).$salt)                    | Enterprise Application Software (EAS)
  20711 | AuthMe sha256                                    | Enterprise Application Software (EAS)
  12200 | eCryptfs                                         | Full-Disk Encryption (FDE)
  14600 | LUKS                                             | Full-Disk Encryption (FDE)
  13711 | VeraCrypt RIPEMD160 + XTS 512 bit                | Full-Disk Encryption (FDE)
  13712 | VeraCrypt RIPEMD160 + XTS 1024 bit               | Full-Disk Encryption (FDE)
  13713 | VeraCrypt RIPEMD160 + XTS 1536 bit               | Full-Disk Encryption (FDE)
  13741 | VeraCrypt RIPEMD160 + XTS 512 bit + boot-mode    | Full-Disk Encryption (FDE)
  13742 | VeraCrypt RIPEMD160 + XTS 1024 bit + boot-mode   | Full-Disk Encryption (FDE)
  13743 | VeraCrypt RIPEMD160 + XTS 1536 bit + boot-mode   | Full-Disk Encryption (FDE)
  13751 | VeraCrypt SHA256 + XTS 512 bit                   | Full-Disk Encryption (FDE)
  13752 | VeraCrypt SHA256 + XTS 1024 bit                  | Full-Disk Encryption (FDE)
  13753 | VeraCrypt SHA256 + XTS 1536 bit                  | Full-Disk Encryption (FDE)
  13761 | VeraCrypt SHA256 + XTS 512 bit + boot-mode       | Full-Disk Encryption (FDE)
  13762 | VeraCrypt SHA256 + XTS 1024 bit + boot-mode      | Full-Disk Encryption (FDE)
  13763 | VeraCrypt SHA256 + XTS 1536 bit + boot-mode      | Full-Disk Encryption (FDE)
  13721 | VeraCrypt SHA512 + XTS 512 bit                   | Full-Disk Encryption (FDE)
  13722 | VeraCrypt SHA512 + XTS 1024 bit                  | Full-Disk Encryption (FDE)
  13723 | VeraCrypt SHA512 + XTS 1536 bit                  | Full-Disk Encryption (FDE)
  13771 | VeraCrypt Streebog-512 + XTS 512 bit             | Full-Disk Encryption (FDE)
  13772 | VeraCrypt Streebog-512 + XTS 1024 bit            | Full-Disk Encryption (FDE)
  13773 | VeraCrypt Streebog-512 + XTS 1536 bit            | Full-Disk Encryption (FDE)
  13731 | VeraCrypt Whirlpool + XTS 512 bit                | Full-Disk Encryption (FDE)
  13732 | VeraCrypt Whirlpool + XTS 1024 bit               | Full-Disk Encryption (FDE)
  13733 | VeraCrypt Whirlpool + XTS 1536 bit               | Full-Disk Encryption (FDE)
  16700 | FileVault 2                                      | Full-Disk Encryption (FDE)
  20011 | DiskCryptor SHA512 + XTS 512 bit                 | Full-Disk Encryption (FDE)
  20012 | DiskCryptor SHA512 + XTS 1024 bit                | Full-Disk Encryption (FDE)
  20013 | DiskCryptor SHA512 + XTS 1536 bit                | Full-Disk Encryption (FDE)
  12900 | Android FDE (Samsung DEK)                        | Full-Disk Encryption (FDE)
   8800 | Android FDE <= 4.3                               | Full-Disk Encryption (FDE)
  18300 | Apple File System (APFS)                         | Full-Disk Encryption (FDE)
   6211 | TrueCrypt RIPEMD160 + XTS 512 bit                | Full-Disk Encryption (FDE)
   6212 | TrueCrypt RIPEMD160 + XTS 1024 bit               | Full-Disk Encryption (FDE)
   6213 | TrueCrypt RIPEMD160 + XTS 1536 bit               | Full-Disk Encryption (FDE)
   6241 | TrueCrypt RIPEMD160 + XTS 512 bit + boot-mode    | Full-Disk Encryption (FDE)
   6242 | TrueCrypt RIPEMD160 + XTS 1024 bit + boot-mode   | Full-Disk Encryption (FDE)
   6243 | TrueCrypt RIPEMD160 + XTS 1536 bit + boot-mode   | Full-Disk Encryption (FDE)
   6221 | TrueCrypt SHA512 + XTS 512 bit                   | Full-Disk Encryption (FDE)
   6222 | TrueCrypt SHA512 + XTS 1024 bit                  | Full-Disk Encryption (FDE)
   6223 | TrueCrypt SHA512 + XTS 1536 bit                  | Full-Disk Encryption (FDE)
   6231 | TrueCrypt Whirlpool + XTS 512 bit                | Full-Disk Encryption (FDE)
   6232 | TrueCrypt Whirlpool + XTS 1024 bit               | Full-Disk Encryption (FDE)
   6233 | TrueCrypt Whirlpool + XTS 1536 bit               | Full-Disk Encryption (FDE)
  10400 | PDF 1.1 - 1.3 (Acrobat 2 - 4)                    | Documents
  10410 | PDF 1.1 - 1.3 (Acrobat 2 - 4), collider #1       | Documents
  10420 | PDF 1.1 - 1.3 (Acrobat 2 - 4), collider #2       | Documents
  10500 | PDF 1.4 - 1.6 (Acrobat 5 - 8)                    | Documents
  10600 | PDF 1.7 Level 3 (Acrobat 9)                      | Documents
  10700 | PDF 1.7 Level 8 (Acrobat 10 - 11)                | Documents
   9400 | MS Office 2007                                   | Documents
   9500 | MS Office 2010                                   | Documents
   9600 | MS Office 2013                                   | Documents
   9700 | MS Office <= 2003 $0/$1, MD5 + RC4               | Documents
   9710 | MS Office <= 2003 $0/$1, MD5 + RC4, collider #1  | Documents
   9720 | MS Office <= 2003 $0/$1, MD5 + RC4, collider #2  | Documents
   9800 | MS Office <= 2003 $3/$4, SHA1 + RC4              | Documents
   9810 | MS Office <= 2003 $3, SHA1 + RC4, collider #1    | Documents
   9820 | MS Office <= 2003 $3, SHA1 + RC4, collider #2    | Documents
  18400 | Open Document Format (ODF) 1.2 (SHA-256, AES)    | Documents
  18600 | Open Document Format (ODF) 1.1 (SHA-1, Blowfish) | Documents
  16200 | Apple Secure Notes                               | Documents
  15500 | JKS Java Key Store Private Keys (SHA1)           | Password Managers
   6600 | 1Password, agilekeychain                         | Password Managers
   8200 | 1Password, cloudkeychain                         | Password Managers
   9000 | Password Safe v2                                 | Password Managers
   5200 | Password Safe v3                                 | Password Managers
   6800 | LastPass + LastPass sniffed                      | Password Managers
  13400 | KeePass 1 (AES/Twofish) and KeePass 2 (AES)      | Password Managers
  11300 | Bitcoin/Litecoin wallet.dat                      | Password Managers
  16600 | Electrum Wallet (Salt-Type 1-3)                  | Password Managers
  12700 | Blockchain, My Wallet                            | Password Managers
  15200 | Blockchain, My Wallet, V2                        | Password Managers
  18800 | Blockchain, My Wallet, Second Password (SHA256)  | Password Managers
  16300 | Ethereum Pre-Sale Wallet, PBKDF2-HMAC-SHA256     | Password Managers
  15600 | Ethereum Wallet, PBKDF2-HMAC-SHA256              | Password Managers
  15700 | Ethereum Wallet, SCRYPT                          | Password Managers
  11600 | 7-Zip                                            | Archives
  12500 | RAR3-hp                                          | Archives
  13000 | RAR5                                             | Archives
  17200 | PKZIP (Compressed)                               | Archives
  17220 | PKZIP (Compressed Multi-File)                    | Archives
  17225 | PKZIP (Mixed Multi-File)                         | Archives
  17230 | PKZIP (Mixed Multi-File Checksum-Only)           | Archives
  17210 | PKZIP (Uncompressed)                             | Archives
  20500 | PKZIP Master Key                                 | Archives
  20510 | PKZIP Master Key (6 byte optimization)           | Archives
  14700 | iTunes backup < 10.0                             | Archives
  14800 | iTunes backup >= 10.0                            | Archives
  13600 | WinZip                                           | Archives
  18900 | Android Backup                                   | Archives
  13200 | AxCrypt                                          | Archives
  13300 | AxCrypt in-memory SHA1                           | Archives
   8400 | WBB3 (Woltlab Burning Board)                     | Forums, CMS, E-Commerce, Frameworks
   2611 | vBulletin < v3.8.5                               | Forums, CMS, E-Commerce, Frameworks
   2711 | vBulletin >= v3.8.5                              | Forums, CMS, E-Commerce, Frameworks
   2612 | PHPS                                             | Forums, CMS, E-Commerce, Frameworks
    121 | SMF (Simple Machines Forum) > v1.1               | Forums, CMS, E-Commerce, Frameworks
   3711 | MediaWiki B type                                 | Forums, CMS, E-Commerce, Frameworks
   4521 | Redmine                                          | Forums, CMS, E-Commerce, Frameworks
  10000 | Django (PBKDF2-SHA256)                           | Forums, CMS, E-Commerce, Frameworks
    124 | Django (SHA-1)                                   | Forums, CMS, E-Commerce, Frameworks
     11 | Joomla < 2.5.18                                  | Forums, CMS, E-Commerce, Frameworks
  13900 | OpenCart                                         | Forums, CMS, E-Commerce, Frameworks
  11000 | PrestaShop                                       | Forums, CMS, E-Commerce, Frameworks
  16000 | Tripcode                                         | Forums, CMS, E-Commerce, Frameworks
   7900 | Drupal7                                          | Forums, CMS, E-Commerce, Frameworks
     21 | osCommerce, xt:Commerce                          | Forums, CMS, E-Commerce, Frameworks
   4522 | PunBB                                            | Forums, CMS, E-Commerce, Frameworks
   2811 | MyBB 1.2+, IPB2+ (Invision Power Board)          | Forums, CMS, E-Commerce, Frameworks
  18100 | TOTP (HMAC-SHA1)                                 | One-Time Passwords
   2000 | STDOUT                                           | Plaintext
  99999 | Plaintext                                        | Plaintext
These users thanked the author hominoid for the post:
xabolcs (Thu Sep 19, 2019 11:51 am)

User avatar
odroid
Site Admin
Posts: 41543
Joined: Fri Feb 22, 2013 11:14 pm
languages_spoken: English, Korean
ODROIDs: ODROID
Has thanked: 3321 times
Been thanked: 1837 times
Contact:

Re: Mali G52 OpenCL Comparison

Post by odroid »

hominoid wrote:
Thu Sep 19, 2019 11:44 am
zram-16GB swap.
Huge swap size. :o My laptop has only 8GB RAM. :oops:

User avatar
mad_ady
Posts: 11590
Joined: Wed Jul 15, 2015 5:00 pm
languages_spoken: english
ODROIDs: XU4 (HC1, HC2), C1+, C2, C4 (HC4), N1, N2, N2L, H2, H3+, Go, Go Advance, M1
Location: Bucharest, Romania
Has thanked: 649 times
Been thanked: 1154 times
Contact:

Re: Mali G52 OpenCL Comparison

Post by mad_ady »

Interesting results, thanks! Are these 2kH/s GPU only, or CPU+GPU?
For reference:
C1: 300H/s (CPU)
C2: 550H/s (CPU)
XU4: 1300H/s (CPU+GPU)

hominoid
Posts: 854
Joined: Tue Feb 28, 2017 3:55 am
languages_spoken: english
ODROIDs: C2, C4, XU4, MC1, N1, N2, N2L, N2+, HC4, M1, H2, H3+
Location: Lake Superior Basin, USA
Has thanked: 121 times
Been thanked: 379 times
Contact:

Re: Mali G52 OpenCL Comparison

Post by hominoid »

I believe it is GPU only. I tried to get OpenCL running on the CPU cores unsuccessfully, so far. When I load pocl I only get a single Cortex-A53 in clinfo and It does not showup in hashcat. Through some brief research I'm led to believe that pocl may not have support for the A73's yet. Are there any other means to run OpenCL on the CPU Cores that your aware of? Was pocl used for the CPU hash scores you posted? One other note, the hash score I posted is conservative because I was monitoring memory usage for the first couple of runs. I notice a slight increase in performance in successive runs when I wasn't and the scores were all over 2000 H/s. I suspect that with the CPU cores running as well the score would be considerably higher.

Code: Select all

hominoid@odroid-n2:~$ clinfo
Number of platforms                               3
  Platform Name                                   ARM Platform
  Platform Vendor                                 ARM
  Platform Version                                OpenCL 2.0 git.c8adbf9.122c9daed32dbba4b3056f41a2f23c58
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_3d_image_writes cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_fp16 cl_khr_icd cl_khr_egl_image cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_subgroups cl_khr_create_command_queue cl_arm_core_id cl_arm_printf cl_arm_thread_limit_hint cl_arm_non_uniform_work_group_size cl_arm_import_memory cl_arm_shared_virtual_memory
  Platform Extensions function suffix             ARM

  Platform Name                                   Portable Computing Language
  Platform Vendor                                 The pocl project
  Platform Version                                OpenCL 1.2 pocl 1.1 None+Asserts, LLVM 6.0.0, SLEEF, POCL_DEBUG, FP16
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_icd
  Platform Extensions function suffix             POCL

  Platform Name                                   Clover
  Platform Vendor                                 Mesa
  Platform Version                                OpenCL 1.1 Mesa 19.0.8
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_icd
  Platform Extensions function suffix             MESA

  Platform Name                                   ARM Platform
Number of devices                                 1
  Device Name                                     Mali-G52
  Device Vendor                                   ARM
  Device Vendor ID                                0x72120000
  Device Version                                  OpenCL 2.0 git.c8adbf9.122c9daed32dbba4b3056f41a2f23c58
  Driver Version                                  2.0
  Device OpenCL C Version                         OpenCL C 2.0 git.c8adbf9.122c9daed32dbba4b3056f41a2f23c58
  Device Type                                     GPU
  Device Profile                                  FULL_PROFILE
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Max compute units                               2
  Max clock frequency                             750MHz
  Device Partition                                (core)
    Max number of sub-devices                     0
    Supported partition types                     None
  Max work item dimensions                        3
  Max work item sizes                             384x384x384
  Max work group size                             384
  Preferred work group size multiple              8
  Preferred / native vector sizes                 
    char                                                16 / 4       
    short                                                8 / 2       
    int                                                  4 / 1       
    long                                                 2 / 1       
    half                                                 8 / 2        (cl_khr_fp16)
    float                                                4 / 1       
    double                                               0 / 0        (n/a)
  Half-precision Floating-point support           (cl_khr_fp16)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
  Single-precision Floating-point support         (core)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Double-precision Floating-point support         (n/a)
  Address bits                                    64, Little-Endian
  Global memory size                              3887546368 (3.621GiB)
  Error Correction support                        No
  Max memory allocation                           971886592 (926.9MiB)
  Unified memory for Host and Device              Yes
  Shared Virtual Memory (SVM) capabilities        (core)
    Coarse-grained buffer sharing                 Yes
    Fine-grained buffer sharing                   No
    Fine-grained system sharing                   No
    Atomics                                       No
  Shared Virtual Memory (SVM) capabilities (ARM)  
    Coarse-grained buffer sharing                 Yes
    Fine-grained buffer sharing                   No
    Fine-grained system sharing                   No
    Atomics                                       No
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       1024 bits (128 bytes)
  Preferred alignment for atomics                 
    SVM                                           0 bytes
    Global                                        0 bytes
    Local                                         0 bytes
  Max size for global variable                    65536 (64KiB)
  Preferred total size of global vars             0
  Global Memory cache type                        Read/Write
  Global Memory cache size                        131072 (128KiB)
  Global Memory cache line size                   64 bytes
  Image support                                   Yes
    Max number of samplers per kernel             16
    Max size for 1D images from buffer            65536 pixels
    Max 1D or 2D image array size                 2048 images
    Base address alignment for 2D image buffers   32 bytes
    Pitch alignment for 2D image buffers          64 pixels
    Max 2D image size                             65536x65536 pixels
    Max 3D image size                             65536x65536x65536 pixels
    Max number of read image args                 128
    Max number of write image args                64
    Max number of read/write image args           64
  Max number of pipe args                         16
  Max active pipe reservations                    1
  Max pipe packet size                            1024
  Local memory type                               Global
  Local memory size                               32768 (32KiB)
  Max number of constant args                     8
  Max constant buffer size                        65536 (64KiB)
  Max size of kernel argument                     1024
  Queue properties (on host)                      
    Out-of-order execution                        Yes
    Profiling                                     Yes
  Queue properties (on device)                    
    Out-of-order execution                        Yes
    Profiling                                     Yes
    Preferred size                                2097152 (2MiB)
    Max size                                      16777216 (16MiB)
  Max queues on device                            1
  Max events on device                            1024
  Prefer user sync for interop                    No
  Profiling timer resolution                      1000ns
  Execution capabilities                          
    Run OpenCL kernels                            Yes
    Run native kernels                            No
  printf() buffer size                            1048576 (1024KiB)
  Built-in kernels                                
  Device Extensions                               cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_3d_image_writes cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_fp16 cl_khr_icd cl_khr_egl_image cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_subgroups cl_khr_create_command_queue cl_arm_core_id cl_arm_printf cl_arm_thread_limit_hint cl_arm_non_uniform_work_group_size cl_arm_import_memory cl_arm_shared_virtual_memory

  Platform Name                                   Portable Computing Language
Number of devices                                 1
  Device Name                                     pthread-cortex-a53
  Device Vendor                                   ARM
  Device Vendor ID                                0x13b5
  Device Version                                  OpenCL 1.2 pocl HSTR: pthread-aarch64-unknown-linux-gnu-GENERIC
  Driver Version                                  1.1
  Device OpenCL C Version                         OpenCL C 1.2 pocl
  Device Type                                     CPU
  Device Profile                                  FULL_PROFILE
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Max compute units                               6
  Max clock frequency                             1896MHz
  Device Partition                                (core)
    Max number of sub-devices                     6
    Supported partition types                     equally, by counts
  Max work item dimensions                        3
  Max work item sizes                             4096x4096x4096
  Max work group size                             4096
  Preferred work group size multiple              8
  Preferred / native vector sizes                 
    char                                                16 / 16      
    short                                                8 / 8       
    int                                                  4 / 4       
    long                                                 2 / 2       
    half                                                 8 / 8        (cl_khr_fp16)
    float                                                4 / 4       
    double                                               2 / 2        (cl_khr_fp64)
  Half-precision Floating-point support           (cl_khr_fp16)
    Denormals                                     No
    Infinity and NANs                             No
    Round to nearest                              No
    Round to zero                                 No
    Round to infinity                             No
    IEEE754-2008 fused multiply-add               No
    Support is emulated in software               No
  Single-precision Floating-point support         (core)
    Denormals                                     No
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 No
    Round to infinity                             No
    IEEE754-2008 fused multiply-add               No
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
  Address bits                                    64, Little-Endian
  Global memory size                              2919395328 (2.719GiB)
  Error Correction support                        No
  Max memory allocation                           1073741824 (1024MiB)
  Unified memory for Host and Device              Yes
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       1024 bits (128 bytes)
  Global Memory cache type                        None
  Image support                                   Yes
    Max number of samplers per kernel             16
    Max size for 1D images from buffer            67108864 pixels
    Max 1D or 2D image array size                 2048 images
    Max 2D image size                             8192x8192 pixels
    Max 3D image size                             2048x2048x2048 pixels
    Max number of read image args                 128
    Max number of write image args                128
  Local memory type                               Global
  Local memory size                               16777216 (16MiB)
  Max number of constant args                     8
  Max constant buffer size                        16777216 (16MiB)
  Max size of kernel argument                     1024
  Queue properties                                
    Out-of-order execution                        No
    Profiling                                     Yes
  Prefer user sync for interop                    Yes
  Profiling timer resolution                      1ns
  Execution capabilities                          
    Run OpenCL kernels                            Yes
    Run native kernels                            Yes
  printf() buffer size                            1048576 (1024KiB)
  Built-in kernels                                
  Device Extensions                               cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr_fp16 cl_khr_fp64

  Platform Name                                   Clover
Number of devices                                 0

NULL platform behavior
  clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  ARM Platform
  clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   Success [ARM]
  clCreateContext(NULL, ...) [default]            Success [ARM]
  clCreateContext(NULL, ...) [other]              Success [POCL]
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT)  Success (1)
    Platform Name                                 ARM Platform
    Device Name                                   Mali-G52
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  Success (1)
    Platform Name                                 ARM Platform
    Device Name                                   Mali-G52
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  Success (1)
    Platform Name                                 ARM Platform
    Device Name                                   Mali-G52

ICD loader properties
  ICD loader Name                                 OpenCL ICD Loader
  ICD loader Vendor                               OCL Icd free software
  ICD loader Version                              2.2.11
  ICD loader Profile                              OpenCL 2.1

User avatar
mad_ady
Posts: 11590
Joined: Wed Jul 15, 2015 5:00 pm
languages_spoken: english
ODROIDs: XU4 (HC1, HC2), C1+, C2, C4 (HC4), N1, N2, N2L, H2, H3+, Go, Go Advance, M1
Location: Bucharest, Romania
Has thanked: 649 times
Been thanked: 1154 times
Contact:

Re: Mali G52 OpenCL Comparison

Post by mad_ady »

I was using pyrit back then, but I don't remember where I got the numbers from... viewtopic.php?t=22488
I remember on the XU4 when using the GPU, it would replace two little cores with the 2 GPU units in the XU4...

https://magazine.odroid.com/wp-content/ ... df#page=10

pyrit list_cores:
https://magazine.odroid.com/wp-content/ ... df#page=16
These users thanked the author mad_ady for the post:
hominoid (Fri Sep 20, 2019 11:21 pm)

blu
Posts: 84
Joined: Wed Mar 08, 2017 11:30 pm
languages_spoken: english
ODROIDs: XU4, N2
Has thanked: 3 times
Been thanked: 22 times
Contact:

Re: Mali G52 OpenCL Comparison

Post by blu »

hominoid wrote:
Thu Sep 19, 2019 11:13 pm
I believe it is GPU only. I tried to get OpenCL running on the CPU cores unsuccessfully, so far. When I load pocl I only get a single Cortex-A53 in clinfo and It does not showup in hashcat. Through some brief research I'm led to believe that pocl may not have support for the A73's yet. Are there any other means to run OpenCL on the CPU Cores that your aware of?
From my experience, it's been pocl-or-bust on aarch64 for a long time now. But pocl are quite competent. Re CA73 support -- they *might* be using an older llvm, though that's unlikely, as llvm has had CA73 support since later 2016/early 2017. I need to give CA73 a try in pocl -- last time I tried CA72 it worked very well, with a few patches.

hominoid
Posts: 854
Joined: Tue Feb 28, 2017 3:55 am
languages_spoken: english
ODROIDs: C2, C4, XU4, MC1, N1, N2, N2L, N2+, HC4, M1, H2, H3+
Location: Lake Superior Basin, USA
Has thanked: 121 times
Been thanked: 379 times
Contact:

Re: Mali G52 OpenCL Comparison

Post by hominoid »

Thanks for the info @blu. I kind of figured that pocl was the only game in town but thought I would ask. If you do find some time to try the A73's with pocl please share your experience.

blu
Posts: 84
Joined: Wed Mar 08, 2017 11:30 pm
languages_spoken: english
ODROIDs: XU4, N2
Has thanked: 3 times
Been thanked: 22 times
Contact:

Re: Mali G52 OpenCL Comparison

Post by blu »

Building pocl 1.3 as a library (i.e. self-contained libOpenCL.so that you link against, not an ICD!) for use with the CA73 cores on ODROID-N2.

Prerequisites /I hope I haven't forgotten something:

Code: Select all

$ sudo apt-get install build-essential llvm-8-dev llvm-8 clang-8 libclang-8-dev cmake
Getting the pocl tree and applying N2 patches:

Code: Select all

$ mkdir pocl
$ cd pocl
$ git clone -b release_1_3 --single-branch https://github.com/pocl/pocl.git
$ cd pocl
$ git apply quick-n-dirty-hack-for-building-on-N2-as-a-DSO.patch # attachment to this post
$ cd ..
Note: above patch removes dependency on hwloc -- trying to install that package breaks too many other packages on N2, incl mali-fbdev. Instead we hardcode the cpu setup, otherwise queried through hwloc.

Building :

Code: Select all

$ mkdir build
$ cd build
$ cmake ../pocl -DLLC_HOST_CPU=cortex-a73 -DDEFAULT_ENABLE_ICD=0
$ make -j2
$ sudo make install
Running an app against so-installed pocl-1.3:

Code: Select all

$ LD_LIBRARY_PATH=/usr/local/lib/ taskset 0x3c ./test_cl # runs against pocl libOpenCL.so, big cores only
$ ./test_cl # same app run against mali libOpenCL.so
ps:
1) CL_DEVICE_NAME is reported as pthread-cortex-a53 because pocl picks the 1st cpu from cpuinfo and assumes that as a name.
2) Likewise with CL_DEVICE_MAX_CLOCK_FREQUENCY, which, again, is taken from the 1st cpu.
3) pocl OCL is 1.2, vs 2.0 on mali.
Attachments
quick-n-dirty-hack-for-building-on-N2-as-a-DSO.patch.txt
(6.85 KiB) Downloaded 302 times
These users thanked the author blu for the post:
hominoid (Sat Sep 21, 2019 11:13 pm)

blu
Posts: 84
Joined: Wed Mar 08, 2017 11:30 pm
languages_spoken: english
ODROIDs: XU4, N2
Has thanked: 3 times
Been thanked: 22 times
Contact:

Re: Mali G52 OpenCL Comparison

Post by blu »

After some investigation of why package hwloc nukes package mali-fbdev, it turns out the issue is in an *optional* dependency of hwloc -- libhwloc-plugins. To ignore that dependency one can:

Code: Select all

$ sudo apt-get install --no-install-recommends libhwloc5 libhwloc-dev
This makes the patch from the above post redundant, as long as one also specifies the proper llvm-config if multiple llvms are installed (always the case with me) as pocl needs llvm-8:

Code: Select all

$ cmake ../pocl -DLLC_HOST_CPU=cortex-a73 -DDEFAULT_ENABLE_ICD=0 -DWITH_LLVM_CONFIG=/usr/bin/llvm-config-8
Naturally, the original, unpatched pocl1.3 will use all 6 cores, so CA73-exclusive runs are not possible.
These users thanked the author blu for the post:
hominoid (Sat Sep 21, 2019 11:14 pm)

hominoid
Posts: 854
Joined: Tue Feb 28, 2017 3:55 am
languages_spoken: english
ODROIDs: C2, C4, XU4, MC1, N1, N2, N2L, N2+, HC4, M1, H2, H3+
Location: Lake Superior Basin, USA
Has thanked: 121 times
Been thanked: 379 times
Contact:

Re: Mali G52 OpenCL Comparison

Post by hominoid »

Works great, I can see the CPU cores now in hashcat.

Code: Select all

hominoid@odroid-n2:~/hashcat$ LD_LIBRARY_PATH=/usr/local/lib/ ./hashcat -I
hashcat (v5.1.0-1397-g7f4df9eb) starting...

OpenCL Info:
============

OpenCL Platform ID #1
  Vendor..: The pocl project
  Name....: Portable Computing Language
  Version.: OpenCL 1.2 pocl 1.3 Debug+Asserts, LLVM 8.0.0, SLEEF, POCL_DEBUG, FP16

  Backend Device ID #1
    Type...........: CPU
    Vendor.ID......: 2147483648
    Vendor.........: ARM
    Name...........: pthread-cortex-a53
    Version........: OpenCL 1.2 pocl HSTR: pthread-aarch64-unknown-linux-gnu-cortex-a73
    Processor(s)...: 6
    Clock..........: 1896
    Memory.........: 1024/2784 MB allocatable
    OpenCL.Version.: OpenCL C 1.2 pocl
    Driver.Version.: 1.3
Looks like an average of 1430 H/s on the CPU cores for WPA2.

Code: Select all

hominoid@odroid-n2:~/hashcat$ LD_LIBRARY_PATH=/usr/local/lib/ ./hashcat -b -m 2500
hashcat (v5.1.0-1397-g7f4df9eb) starting in benchmark mode...

Benchmarking uses hand-optimized kernel code by default.
You can use it in your cracking session by setting the -O option.
Note: Using optimized kernel code limits the maximum supported password length.
To disable the optimized kernel code in benchmark mode, use the -w option.

/home/hominoid/hashcat/OpenCL/m02500-optimized.cl: Optimized kernel requested but not needed - falling back to pure kernel
OpenCL API (OpenCL 1.2 pocl 1.3 Debug+Asserts, LLVM 8.0.0, SLEEF, POCL_DEBUG, FP16) - Platform #1 [The pocl project]
====================================================================================================================
* Device #1: pthread-cortex-a53, 1024/2784 MB allocatable, 6MCU

Benchmark relevant options:
===========================
* --optimized-kernel-enable

Hashmode: 2500 - WPA-EAPOL-PBKDF2 (Iterations: 4095)

Speed.#1.........:     1435 H/s (66.72ms) @ Accel:512 Loops:128 Thr:1 Vec:4

Started: Sat Sep 21 16:40:11 2019
Stopped: Sat Sep 21 16:40:17 2019
These users thanked the author hominoid for the post (total 3):
blu (Sun Sep 22, 2019 6:23 am) • xabolcs (Sun Sep 22, 2019 5:36 pm) • odroid (Mon Sep 23, 2019 8:18 am)

ASword
Posts: 245
Joined: Fri Aug 04, 2017 12:48 pm
languages_spoken: english
ODROIDs: XU4, HC1, 2x N2, 2x N2+
Has thanked: 24 times
Been thanked: 7 times
Contact:

Re: Mali G52 OpenCL Comparison

Post by ASword »

blu wrote:
Sat Sep 21, 2019 6:30 pm
so CA73-exclusive runs are not possible.
Are cpusets supported? Perhaps you can just narrow the processes CPU usage to the big cores?

hominoid
Posts: 854
Joined: Tue Feb 28, 2017 3:55 am
languages_spoken: english
ODROIDs: C2, C4, XU4, MC1, N1, N2, N2L, N2+, HC4, M1, H2, H3+
Location: Lake Superior Basin, USA
Has thanked: 121 times
Been thanked: 379 times
Contact:

Re: Mali G52 OpenCL Comparison

Post by hominoid »

I did not patch pocl 1.3 and used

Code: Select all

cmake ../pocl -DLLC_HOST_CPU=cortex-a73 -DDEFAULT_ENABLE_ICD=0 -DWITH_LLVM_CONFIG=/usr/bin/llvm-config-8
CA73 and CA53 Cores

Code: Select all

hominoid@odroid-n2:~/hashcat$ LD_LIBRARY_PATH=/usr/local/lib/ taskset 0x3F ./hashcat -b -m 2500
hashcat (v5.1.0-1397-g7f4df9eb) starting in benchmark mode...

Benchmarking uses hand-optimized kernel code by default.
You can use it in your cracking session by setting the -O option.
Note: Using optimized kernel code limits the maximum supported password length.
To disable the optimized kernel code in benchmark mode, use the -w option.

/home/hominoid/hashcat/OpenCL/m02500-optimized.cl: Optimized kernel requested but not needed - falling back to pure kernel
OpenCL API (OpenCL 1.2 pocl 1.3 Debug+Asserts, LLVM 8.0.0, SLEEF, POCL_DEBUG, FP16) - Platform #1 [The pocl project]
====================================================================================================================
* Device #1: pthread-cortex-a53, 1024/2784 MB allocatable, 6MCU

Benchmark relevant options:
===========================
* --optimized-kernel-enable

Hashmode: 2500 - WPA-EAPOL-PBKDF2 (Iterations: 4095)

Speed.#1.........:     1435 H/s (66.77ms) @ Accel:512 Loops:128 Thr:1 Vec:4

Started: Sat Sep 21 19:35:28 2019
Stopped: Sat Sep 21 19:35:34 2019
CA73 Cores

Code: Select all

hominoid@odroid-n2:~/hashcat$ LD_LIBRARY_PATH=/usr/local/lib/ taskset 0x3c ./hashcat -b -m 2500
hashcat (v5.1.0-1397-g7f4df9eb) starting in benchmark mode...

Benchmarking uses hand-optimized kernel code by default.
You can use it in your cracking session by setting the -O option.
Note: Using optimized kernel code limits the maximum supported password length.
To disable the optimized kernel code in benchmark mode, use the -w option.

/home/hominoid/hashcat/OpenCL/m02500-optimized.cl: Optimized kernel requested but not needed - falling back to pure kernel
OpenCL API (OpenCL 1.2 pocl 1.3 Debug+Asserts, LLVM 8.0.0, SLEEF, POCL_DEBUG, FP16) - Platform #1 [The pocl project]
====================================================================================================================
* Device #1: pthread-cortex-a53, 1024/2784 MB allocatable, 6MCU

Benchmark relevant options:
===========================
* --optimized-kernel-enable

Hashmode: 2500 - WPA-EAPOL-PBKDF2 (Iterations: 4095)

Speed.#1.........:      942 H/s (50.74ms) @ Accel:64 Loops:512 Thr:1 Vec:4

Started: Sat Sep 21 19:33:58 2019
Stopped: Sat Sep 21 19:34:03 2019
CA53 Cores

Code: Select all

hominoid@odroid-n2:~/hashcat$ LD_LIBRARY_PATH=/usr/local/lib/ taskset 0x03 ./hashcat -b -m 2500
hashcat (v5.1.0-1397-g7f4df9eb) starting in benchmark mode...

Benchmarking uses hand-optimized kernel code by default.
You can use it in your cracking session by setting the -O option.
Note: Using optimized kernel code limits the maximum supported password length.
To disable the optimized kernel code in benchmark mode, use the -w option.

/home/hominoid/hashcat/OpenCL/m02500-optimized.cl: Optimized kernel requested but not needed - falling back to pure kernel
OpenCL API (OpenCL 1.2 pocl 1.3 Debug+Asserts, LLVM 8.0.0, SLEEF, POCL_DEBUG, FP16) - Platform #1 [The pocl project]
====================================================================================================================
* Device #1: pthread-cortex-a53, 1024/2784 MB allocatable, 6MCU

Benchmark relevant options:
===========================
* --optimized-kernel-enable

Hashmode: 2500 - WPA-EAPOL-PBKDF2 (Iterations: 4095)

Speed.#1.........:      414 H/s (57.54ms) @ Accel:32 Loops:512 Thr:1 Vec:4

Started: Sat Sep 21 19:34:22 2019
Stopped: Sat Sep 21 19:34:30 2019

hominoid
Posts: 854
Joined: Tue Feb 28, 2017 3:55 am
languages_spoken: english
ODROIDs: C2, C4, XU4, MC1, N1, N2, N2L, N2+, HC4, M1, H2, H3+
Location: Lake Superior Basin, USA
Has thanked: 121 times
Been thanked: 379 times
Contact:

Re: Mali G52 OpenCL Comparison

Post by hominoid »

Looks like hashcat's cpu-affinity works as well for controlling cores.

Code: Select all

hominoid@odroid-n2:~/hashcat$ LD_LIBRARY_PATH=/usr/local/lib/ ./hashcat -b -m 2500 --cpu-affinity=3,4,5,6
hashcat (v5.1.0-1397-g7f4df9eb) starting in benchmark mode...

Benchmarking uses hand-optimized kernel code by default.
You can use it in your cracking session by setting the -O option.
Note: Using optimized kernel code limits the maximum supported password length.
To disable the optimized kernel code in benchmark mode, use the -w option.

/home/hominoid/hashcat/OpenCL/m02500-optimized.cl: Optimized kernel requested but not needed - falling back to pure kernel
OpenCL API (OpenCL 1.2 pocl 1.3 Debug+Asserts, LLVM 8.0.0, SLEEF, POCL_DEBUG, FP16) - Platform #1 [The pocl project]
====================================================================================================================
* Device #1: pthread-cortex-a53, 1024/2784 MB allocatable, 6MCU

Benchmark relevant options:
===========================
* --optimized-kernel-enable

Hashmode: 2500 - WPA-EAPOL-PBKDF2 (Iterations: 4095)

Speed.#1.........:      958 H/s (99.98ms) @ Accel:128 Loops:512 Thr:1 Vec:4

Started: Sat Sep 21 20:09:23 2019
Stopped: Sat Sep 21 20:09:28 2019

blu
Posts: 84
Joined: Wed Mar 08, 2017 11:30 pm
languages_spoken: english
ODROIDs: XU4, N2
Has thanked: 3 times
Been thanked: 22 times
Contact:

Re: Mali G52 OpenCL Comparison

Post by blu »

ASword wrote:
Sun Sep 22, 2019 7:08 am
blu wrote:
Sat Sep 21, 2019 6:30 pm
so CA73-exclusive runs are not possible.
Are cpusets supported? Perhaps you can just narrow the processes CPU usage to the big cores?
Yes, the affinity mask works, as demonstrated by @hominoid, but pocl is not aware of it, and still thinks the number of compute units is 6. Subsequently, it schedules more threads than available cores. That may or may not affect the performance of the workload at hand, depending on the workload.

blu
Posts: 84
Joined: Wed Mar 08, 2017 11:30 pm
languages_spoken: english
ODROIDs: XU4, N2
Has thanked: 3 times
Been thanked: 22 times
Contact:

Re: Mali G52 OpenCL Comparison

Post by blu »

Just a heads-up for the fellow N2 compute aficionados: latest pocl (pending 1.5) got a notable boost in performance of convert_T() vector functions, so if your OCL code is rich in those -- try it out ; )
These users thanked the author blu for the post (total 2):
odroid (Sat Dec 21, 2019 9:09 am) • hominoid (Sat Dec 21, 2019 9:50 am)

alprakas
Posts: 11
Joined: Thu Jul 31, 2014 12:24 pm
languages_spoken: english
ODROIDs: ODROID-XU-E, ODROID-XU3
Location: Singapore
Has thanked: 3 times
Been thanked: 1 time
Contact:

Re: Mali G52 OpenCL Comparison

Post by alprakas »

Hey guys.. thanks to this detailed thread, I managed to get both pocl and mali OpenCL support to show up in clinfo.

Now I am trying to get them execute simultaneously on CPU and GPU and need the icd_loader support for that. I thought that it was already built in, but it does not seem to be working for me. As in, I can only see one platform when I query through a C application using the clGetPlatformIDs(2, platform_id, &num_platforms); API. I did compile POCL with ICD support by NOT using -DDEFAULT_ENABLE_ICD=0 during cmake. I confirmed ICD support was enabled during POCL's installation.

Any thoughts? I remember the XU3 days and how pain in the behind pocl used to be. So I used FreeOCL and although its performance was worse than pocl, at least it used to work flawlessly with icd loaders. I even wrote a paper using that setup (https://www.researchgate.net/publicatio ... _Platforms). :-) But that was too long ago, seemed to have forgotten a lot of stuff since then.. FreeOCL is not being actively developed these days, so I would prefer to use pocl, especially since it can support CA73 of N2.

Thank you all for your time.

hominoid
Posts: 854
Joined: Tue Feb 28, 2017 3:55 am
languages_spoken: english
ODROIDs: C2, C4, XU4, MC1, N1, N2, N2L, N2+, HC4, M1, H2, H3+
Location: Lake Superior Basin, USA
Has thanked: 121 times
Been thanked: 379 times
Contact:

Re: Mali G52 OpenCL Comparison

Post by hominoid »

I had spent a little time trying to get the icd interface to work simultaneously with the cpu and gpu on pocl v1.3. Since pocl 1.5 has a lot of bug fixes and improvements I started with it and lvvm 9.0. It does appear to be working but we'll see if your code runs. Here is what I did:

Code: Select all

sudo apt-get install build-essential llvm-9-dev llvm-9 clang-9 libclang-9-dev cmake opencl-headers ocl-icd-opencl-dev
sudo apt-get install --no-install-recommends libhwloc5 libhwloc-dev
git clone -b release_1_5 --single-branch https://github.com/pocl/pocl.git
cd pocl
mkdir build
cd build
cmake ../ -DLLC_HOST_CPU=cortex-a73 -DWITH_LLVM_CONFIG=/usr/bin/llvm-config-9
make -j4
sudo make install
cp /usr/local/etc/OpenCL/vendors/pocl.icd /etc/OpenCL/vendors
clinfo should now report both devices

Code: Select all

hominoid@odroid-n2:~/pocl/pocl_1.5/build$ clinfo
Number of platforms                               2
  Platform Name                                   ARM Platform
  Platform Vendor                                 ARM
  Platform Version                                OpenCL 2.0 git.c8adbf9.122c9daed32dbba4b3056f41a2f23c58
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_3d_image_writes cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_fp16 cl_khr_icd cl_khr_egl_image cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_subgroups cl_khr_create_command_queue cl_arm_core_id cl_arm_printf cl_arm_thread_limit_hint cl_arm_non_uniform_work_group_size cl_arm_import_memory cl_arm_shared_virtual_memory
  Platform Extensions function suffix             ARM

  Platform Name                                   Portable Computing Language
  Platform Vendor                                 The pocl project
  Platform Version                                OpenCL 1.2 pocl 1.5, Debug+Asserts, LLVM 9.0.0, RELOC, SLEEF, FP16, POCL_DEBUG
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_icd
  Platform Extensions function suffix             POCL

  Platform Name                                   ARM Platform
Number of devices                                 1
  Device Name                                     Mali-G52
  Device Vendor                                   ARM
  Device Vendor ID                                0x72120000
  Device Version                                  OpenCL 2.0 git.c8adbf9.122c9daed32dbba4b3056f41a2f23c58
  Driver Version                                  2.0
  Device OpenCL C Version                         OpenCL C 2.0 git.c8adbf9.122c9daed32dbba4b3056f41a2f23c58
  Device Type                                     GPU
  Device Profile                                  FULL_PROFILE
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Max compute units                               2
  Max clock frequency                             750MHz
  Device Partition                                (core)
    Max number of sub-devices                     0
    Supported partition types                     None
  Max work item dimensions                        3
  Max work item sizes                             384x384x384
  Max work group size                             384
  Preferred work group size multiple              8
  Preferred / native vector sizes                 
    char                                                16 / 4       
    short                                                8 / 2       
    int                                                  4 / 1       
    long                                                 2 / 1       
    half                                                 8 / 2        (cl_khr_fp16)
    float                                                4 / 1       
    double                                               0 / 0        (n/a)
  Half-precision Floating-point support           (cl_khr_fp16)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
  Single-precision Floating-point support         (core)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Double-precision Floating-point support         (n/a)
  Address bits                                    64, Little-Endian
  Global memory size                              3887480832 (3.62GiB)
  Error Correction support                        No
  Max memory allocation                           971870208 (926.8MiB)
  Unified memory for Host and Device              Yes
  Shared Virtual Memory (SVM) capabilities        (core)
    Coarse-grained buffer sharing                 Yes
    Fine-grained buffer sharing                   No
    Fine-grained system sharing                   No
    Atomics                                       No
  Shared Virtual Memory (SVM) capabilities (ARM)  
    Coarse-grained buffer sharing                 Yes
    Fine-grained buffer sharing                   No
    Fine-grained system sharing                   No
    Atomics                                       No
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       1024 bits (128 bytes)
  Preferred alignment for atomics                 
    SVM                                           0 bytes
    Global                                        0 bytes
    Local                                         0 bytes
  Max size for global variable                    65536 (64KiB)
  Preferred total size of global vars             0
  Global Memory cache type                        Read/Write
  Global Memory cache size                        131072 (128KiB)
  Global Memory cache line size                   64 bytes
  Image support                                   Yes
    Max number of samplers per kernel             16
    Max size for 1D images from buffer            65536 pixels
    Max 1D or 2D image array size                 2048 images
    Base address alignment for 2D image buffers   32 bytes
    Pitch alignment for 2D image buffers          64 pixels
    Max 2D image size                             65536x65536 pixels
    Max 3D image size                             65536x65536x65536 pixels
    Max number of read image args                 128
    Max number of write image args                64
    Max number of read/write image args           64
  Max number of pipe args                         16
  Max active pipe reservations                    1
  Max pipe packet size                            1024
  Local memory type                               Global
  Local memory size                               32768 (32KiB)
  Max number of constant args                     8
  Max constant buffer size                        65536 (64KiB)
  Max size of kernel argument                     1024
  Queue properties (on host)                      
    Out-of-order execution                        Yes
    Profiling                                     Yes
  Queue properties (on device)                    
    Out-of-order execution                        Yes
    Profiling                                     Yes
    Preferred size                                2097152 (2MiB)
    Max size                                      16777216 (16MiB)
  Max queues on device                            1
  Max events on device                            1024
  Prefer user sync for interop                    No
  Profiling timer resolution                      1000ns
  Execution capabilities                          
    Run OpenCL kernels                            Yes
    Run native kernels                            No
  printf() buffer size                            1048576 (1024KiB)
  Built-in kernels                                
  Device Extensions                               cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_3d_image_writes cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_fp16 cl_khr_icd cl_khr_egl_image cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_subgroups cl_khr_create_command_queue cl_arm_core_id cl_arm_printf cl_arm_thread_limit_hint cl_arm_non_uniform_work_group_size cl_arm_import_memory cl_arm_shared_virtual_memory

  Platform Name                                   Portable Computing Language
Number of devices                                 1
  Device Name                                     pthread-cortex-a53
  Device Vendor                                   ARM
  Device Vendor ID                                0x13b5
  Device Version                                  OpenCL 1.2 pocl HSTR: pthread-aarch64-unknown-linux-gnu-cortex-a73
  Driver Version                                  1.5
  Device OpenCL C Version                         OpenCL C 1.2 pocl
  Device Type                                     CPU
  Device Profile                                  FULL_PROFILE
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Max compute units                               6
  Max clock frequency                             1896MHz
  Device Partition                                (core)
    Max number of sub-devices                     6
    Supported partition types                     equally, by counts
  Max work item dimensions                        3
  Max work item sizes                             4096x4096x4096
  Max work group size                             4096
  Preferred work group size multiple              8
  Preferred / native vector sizes                 
    char                                                16 / 16      
    short                                                8 / 8       
    int                                                  4 / 4       
    long                                                 2 / 2       
    half                                                 8 / 8        (cl_khr_fp16)
    float                                                4 / 4       
    double                                               2 / 2        (cl_khr_fp64)
  Half-precision Floating-point support           (cl_khr_fp16)
    Denormals                                     No
    Infinity and NANs                             No
    Round to nearest                              No
    Round to zero                                 No
    Round to infinity                             No
    IEEE754-2008 fused multiply-add               No
    Support is emulated in software               No
  Single-precision Floating-point support         (core)
    Denormals                                     No
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 No
    Round to infinity                             No
    IEEE754-2008 fused multiply-add               No
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
  Address bits                                    64, Little-Endian
  Global memory size                              2919346176 (2.719GiB)
  Error Correction support                        No
  Max memory allocation                           1073741824 (1024MiB)
  Unified memory for Host and Device              Yes
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       1024 bits (128 bytes)
  Global Memory cache type                        None
  Image support                                   Yes
    Max number of samplers per kernel             16
    Max size for 1D images from buffer            67108864 pixels
    Max 1D or 2D image array size                 2048 images
    Max 2D image size                             8192x8192 pixels
    Max 3D image size                             2048x2048x2048 pixels
    Max number of read image args                 128
    Max number of write image args                128
  Local memory type                               Global
  Local memory size                               16777216 (16MiB)
  Max number of constant args                     8
  Max constant buffer size                        16777216 (16MiB)
  Max size of kernel argument                     1024
  Queue properties                                
    Out-of-order execution                        Yes
    Profiling                                     Yes
  Prefer user sync for interop                    Yes
  Profiling timer resolution                      1ns
  Execution capabilities                          
    Run OpenCL kernels                            Yes
    Run native kernels                            Yes
  printf() buffer size                            16777216 (16MiB)
  Built-in kernels                                
  Device Extensions                               cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr_fp16 cl_khr_fp64

NULL platform behavior
  clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  ARM Platform
  clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   Success [ARM]
  clCreateContext(NULL, ...) [default]            Success [ARM]
  clCreateContext(NULL, ...) [other]              Success [POCL]
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT)  Success (1)
    Platform Name                                 ARM Platform
    Device Name                                   Mali-G52
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  Success (1)
    Platform Name                                 ARM Platform
    Device Name                                   Mali-G52
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  Success (1)
    Platform Name                                 ARM Platform
    Device Name                                   Mali-G52

ICD loader properties
  ICD loader Name                                 OpenCL ICD Loader
  ICD loader Vendor                               OCL Icd free software
  ICD loader Version                              2.2.11
  ICD loader Profile                              OpenCL 2.1
I can now run both OpenCL devices simultaneously in hashcat. The benchmark below has appropriate performance gains to the one in the previous post which was OpenCL on the CPU cores only.

Code: Select all

hominoid@odroid-n2:~/hashcat$ ./hashcat -D 1,2 -b --force
hashcat (v5.1.0-1778-gc5d2d539) starting in benchmark mode...

Benchmarking uses hand-optimized kernel code by default.
You can use it in your cracking session by setting the -O option.
Note: Using optimized kernel code limits the maximum supported password length.
To disable the optimized kernel code in benchmark mode, use the -w option.

You have enabled --force to bypass dangerous warnings and errors!
This can hide serious problems and should only be done when debugging.
Do not report hashcat issues encountered when using --force.
OpenCL API (OpenCL 2.0 git.c8adbf9.122c9daed32dbba4b3056f41a2f23c58) - Platform #1 [ARM]
========================================================================================
* Device #1: Mali-G52, 3643/3707 MB (926 MB allocatable), 2MCU

OpenCL API (OpenCL 1.2 pocl 1.5, Debug+Asserts, LLVM 9.0.0, RELOC, SLEEF, FP16, POCL_DEBUG) - Platform #2 [The pocl project]
============================================================================================================================
* Device #2: pthread-cortex-a53, 2720/2784 MB (1024 MB allocatable), 6MCU

Benchmark relevant options:
===========================
* --force
* --opencl-device-types=1,2
* --optimized-kernel-enable

Hashmode: 0 - MD5

Speed.#1.........:   185.7 MH/s (67.15ms) @ Accel:16 Loops:1024 Thr:384 Vec:1
Speed.#2.........: 58561.6 kH/s (53.20ms) @ Accel:512 Loops:1024 Thr:1 Vec:4
Speed.#*.........:   244.3 MH/s

Hashmode: 100 - SHA1

Speed.#1.........: 25727.2 kH/s (60.64ms) @ Accel:4 Loops:512 Thr:384 Vec:1
Speed.#2.........: 34088.8 kH/s (91.72ms) @ Accel:1024 Loops:512 Thr:1 Vec:4
Speed.#*.........: 59816.0 kH/s

Hashmode: 1400 - SHA2-256

Speed.#1.........: 22532.8 kH/s (69.27ms) @ Accel:2 Loops:1024 Thr:384 Vec:1
Speed.#2.........: 15625.8 kH/s (49.84ms) @ Accel:128 Loops:1024 Thr:1 Vec:4
Speed.#*.........: 38158.6 kH/s

Hashmode: 1700 - SHA2-512

Speed.#1.........:  5211.2 kH/s (74.84ms) @ Accel:4 Loops:128 Thr:384 Vec:1
Speed.#2.........:  5196.1 kH/s (75.21ms) @ Accel:128 Loops:512 Thr:1 Vec:2
Speed.#*.........: 10407.4 kH/s

Hashmode: 22000 - WPA-PBKDF2-PMKID+EAPOL (Iterations: 4095)

Speed.#1.........:     1119 H/s (85.13ms) @ Accel:4 Loops:128 Thr:384 Vec:1
Speed.#2.........:      906 H/s (64.93ms) @ Accel:256 Loops:256 Thr:1 Vec:4
Speed.#*.........:     2025 H/s

Hashmode: 1000 - NTLM

Speed.#1.........:   238.3 MH/s (52.25ms) @ Accel:32 Loops:512 Thr:384 Vec:1
Speed.#2.........:   102.6 MH/s (60.73ms) @ Accel:1024 Loops:1024 Thr:1 Vec:4
Speed.#*.........:   340.9 MH/s

Hashmode: 3000 - LM

clEnqueueNDRangeKernel(): CL_INVALID_WORK_GROUP_SIZE

Speed.#1.........:        0 H/s (0.00ms) @ Accel:32 Loops:512 Thr:64 Vec:1
Speed.#2.........: 44464.7 kH/s (67.25ms) @ Accel:512 Loops:1024 Thr:1 Vec:4
Speed.#*.........: 44464.7 kH/s

Hashmode: 5500 - NetNTLMv1 / NetNTLMv1+ESS

Speed.#1.........:   163.7 MH/s (76.24ms) @ Accel:16 Loops:1024 Thr:384 Vec:1
Speed.#2.........: 63377.7 kH/s (49.22ms) @ Accel:1024 Loops:512 Thr:1 Vec:4
Speed.#*.........:   227.1 MH/s

Hashmode: 5600 - NetNTLMv2

Speed.#1.........: 11128.5 kH/s (70.14ms) @ Accel:2 Loops:512 Thr:384 Vec:1
Speed.#2.........:  3425.6 kH/s (56.89ms) @ Accel:32 Loops:1024 Thr:1 Vec:4
Speed.#*.........: 14554.2 kH/s

Hashmode: 1500 - descrypt, DES (Unix), Traditional DES

clEnqueueNDRangeKernel(): CL_INVALID_WORK_GROUP_SIZE

Speed.#1.........:        0 H/s (0.00ms) @ Accel:2 Loops:512 Thr:64 Vec:1
Speed.#2.........:  1761.3 kH/s (49.58ms) @ Accel:16 Loops:1024 Thr:1 Vec:4
Speed.#*.........:  1761.3 kH/s

Hashmode: 500 - md5crypt, MD5 (Unix), Cisco-IOS $1$ (MD5) (Iterations: 1000)

Speed.#1.........:    60941 H/s (94.63ms) @ Accel:16 Loops:500 Thr:384 Vec:1
Speed.#2.........:    10896 H/s (71.87ms) @ Accel:512 Loops:500 Thr:1 Vec:4
Speed.#*.........:    71837 H/s

Hashmode: 3200 - bcrypt $2*$, Blowfish (Unix) (Iterations: 32)

Speed.#1.........:       53 H/s (72.07ms) @ Accel:2 Loops:4 Thr:8 Vec:1
Speed.#2.........:      390 H/s (55.49ms) @ Accel:32 Loops:16 Thr:1 Vec:4
Speed.#*.........:      443 H/s

Hashmode: 1800 - sha512crypt $6$, SHA512 (Unix) (Iterations: 5000)

Speed.#1.........:      265 H/s (72.85ms) @ Accel:4 Loops:32 Thr:384 Vec:1
Speed.#2.........:     1256 H/s (60.40ms) @ Accel:128 Loops:512 Thr:1 Vec:2
Speed.#*.........:     1521 H/s

Hashmode: 7500 - Kerberos 5, etype 23, AS-REQ Pre-Auth

Speed.#1.........:   250.5 kH/s (130.08ms) @ Accel:2 Loops:128 Thr:64 Vec:1
Speed.#2.........:  1560.9 kH/s (62.61ms) @ Accel:2 Loops:128 Thr:64 Vec:4
Speed.#*.........:  1811.5 kH/s

Hashmode: 13100 - Kerberos 5, etype 23, TGS-REP

Speed.#1.........:   307.5 kH/s (52.78ms) @ Accel:1 Loops:128 Thr:64 Vec:1
Speed.#2.........:  1556.5 kH/s (62.84ms) @ Accel:2 Loops:128 Thr:64 Vec:4
Speed.#*.........:  1864.0 kH/s

Hashmode: 15300 - DPAPI masterkey file v1 (Iterations: 23999)

Speed.#1.........:      211 H/s (77.33ms) @ Accel:2 Loops:256 Thr:384 Vec:1
Speed.#2.........:      252 H/s (64.87ms) @ Accel:512 Loops:128 Thr:1 Vec:4
Speed.#*.........:      463 H/s

Hashmode: 15900 - DPAPI masterkey file v2 (Iterations: 12899)

Speed.#1.........:      135 H/s (55.58ms) @ Accel:1 Loops:128 Thr:384 Vec:1
Speed.#2.........:      179 H/s (80.56ms) @ Accel:32 Loops:1024 Thr:1 Vec:2
Speed.#*.........:      314 H/s

Hashmode: 7100 - macOS v10.8+ (PBKDF2-SHA512) (Iterations: 1023)

Speed.#1.........:     1170 H/s (70.97ms) @ Accel:16 Loops:7 Thr:384 Vec:1
Speed.#2.........:     2221 H/s (83.56ms) @ Accel:32 Loops:1023 Thr:1 Vec:2
Speed.#*.........:     3391 H/s

Hashmode: 11600 - 7-Zip (Iterations: 16384)

Speed.#1.........:     2615 H/s (68.78ms) @ Accel:1 Loops:4096 Thr:384 Vec:1
Speed.#2.........:     1607 H/s (58.17ms) @ Accel:64 Loops:4096 Thr:1 Vec:4
Speed.#*.........:     4223 H/s

Hashmode: 12500 - RAR3-hp (Iterations: 262144)

Speed.#1.........:      178 H/s (269.16ms) @ Accel:1 Loops:16384 Thr:384 Vec:1
Speed.#2.........:      213 H/s (56.18ms) @ Accel:32 Loops:16384 Thr:1 Vec:4
Speed.#*.........:      391 H/s

Hashmode: 13000 - RAR5 (Iterations: 32799)

Speed.#1.........:      202 H/s (58.90ms) @ Accel:2 Loops:256 Thr:384 Vec:1
Speed.#2.........:      191 H/s (62.71ms) @ Accel:256 Loops:256 Thr:1 Vec:4
Speed.#*.........:      393 H/s

Hashmode: 6211 - TrueCrypt RIPEMD160 + XTS 512 bit (Iterations: 1999)

Speed.#1.........:     2102 H/s (89.11ms) @ Accel:4 Loops:128 Thr:384 Vec:1
Speed.#2.........:     1193 H/s (79.75ms) @ Accel:128 Loops:256 Thr:1 Vec:4
Speed.#*.........:     3295 H/s

Hashmode: 13400 - KeePass 1 (AES/Twofish) and KeePass 2 (AES) (Iterations: 24569)

Speed.#1.........:      125 H/s (127.68ms) @ Accel:2 Loops:256 Thr:384 Vec:1
Speed.#2.........:      416 H/s (76.73ms) @ Accel:128 Loops:1024 Thr:1 Vec:4
Speed.#*.........:      541 H/s

Hashmode: 6800 - LastPass + LastPass sniffed (Iterations: 499)

Speed.#1.........:    14023 H/s (34.56ms) @ Accel:2 Loops:249 Thr:384 Vec:1
Speed.#2.........:    12370 H/s (40.43ms) @ Accel:256 Loops:249 Thr:1 Vec:4
Speed.#*.........:    26393 H/s

Hashmode: 11300 - Bitcoin/Litecoin wallet.dat (Iterations: 200459)

Speed.#1.........:       15 H/s (64.51ms) @ Accel:1 Loops:256 Thr:384 Vec:1
Speed.#2.........:       25 H/s (79.17ms) @ Accel:64 Loops:1024 Thr:1 Vec:2
Speed.#*.........:       40 H/s

Started: Wed Apr 29 22:16:06 2020
Stopped: Wed Apr 29 22:58:36 2020
This is really cool that it works and should be awesome for the right uses. I have another application or two that I want to try as well moving forward. If for some reason this configuration doesn't fix your problem, posting a small piece of code that demonstrates the problem your seeing might help isolate/replicate the problem. On another subject, I read your paper. It would be interesting to see the same study on a more modern design and manufacturing process like the N2.
These users thanked the author hominoid for the post (total 2):
mad_ady (Thu Apr 30, 2020 2:18 pm) • alprakas (Tue May 05, 2020 1:07 pm)

alprakas
Posts: 11
Joined: Thu Jul 31, 2014 12:24 pm
languages_spoken: english
ODROIDs: ODROID-XU-E, ODROID-XU3
Location: Singapore
Has thanked: 3 times
Been thanked: 1 time
Contact:

Re: Mali G52 OpenCL Comparison

Post by alprakas »

Thank you so much @hominoid for the clear instructions and confirmation with the hashcat application. :D
I had started with pocl 1.5 as well based on @blu last post. Except I built pocl with llvm 8 instead of version 9. My clinfo utility shows two platforms and lists them rather perfectly with the correct descriptions.

I will first try with llvm 9 to see if that fixes the problem, unlikely as it is. If not, I will paste a stub from my code, which is extremely simply by the way, just using the clGetPlatformIDs command.

Indeed I am now trying to replicate my old study for this new platform, provided I dont get distracted with other things. 5 years ago I had all the freedom to do what I wanted to research on, but now too many other admin responsibilities. Anyway, currently working on something this week, but hopefully I will come back to this soon and report.

Edit: By the way, any ideas on how to get some kind of GPU utilization metric in N2? Looks like the new kernel does not have such information readily available. or maybe I have not found one yet.

hominoid
Posts: 854
Joined: Tue Feb 28, 2017 3:55 am
languages_spoken: english
ODROIDs: C2, C4, XU4, MC1, N1, N2, N2L, N2+, HC4, M1, H2, H3+
Location: Lake Superior Basin, USA
Has thanked: 121 times
Been thanked: 379 times
Contact:

Re: Mali G52 OpenCL Comparison

Post by hominoid »

The only GPU profiling tool I'm aware of is ARM's Streamline. There is a community edition available in exchange for user registration information and there is also an Odroid wiki page on setting it up for the N2.
These users thanked the author hominoid for the post:
alprakas (Thu May 07, 2020 1:15 pm)

alprakas
Posts: 11
Joined: Thu Jul 31, 2014 12:24 pm
languages_spoken: english
ODROIDs: ODROID-XU-E, ODROID-XU3
Location: Singapore
Has thanked: 3 times
Been thanked: 1 time
Contact:

Re: Mali G52 OpenCL Comparison

Post by alprakas »

Bingo! Finally got the two platforms to show up in my application. I have been trying to get the icd loader to work properly all this while since that is the one responsible for looking at the icd files and listing the platforms by using the clGetPlatformIDs command, but without any success...
However, when I saw the first line of your command you shared the other day, I noticed the ocl-icd-opencl-dev package. So I installed that and the opencl-headers first and tried again. This time with the mali.icd in /etc/OpenCL/vendors directory pointed to /usr/lib/aarch64-linux-gnu/libMali.so, the clGetPlatformIDs command picked up both platforms perfectly! So I will stick with llvm 8 for now.

I must thank you immensely for this hominoid. Now, I can get on with my work. :D

Regarding the GPU utilization, I have used ARM Streamline before but it used to a messy thing to setup in the past. I do not need the GPU utilization that desperately just yet. If I do, I will go ahead with the Streamline installation. Thank you very much for the suggestion nonetheless.
These users thanked the author alprakas for the post:
hominoid (Thu May 07, 2020 10:35 pm)

galleta38
Posts: 3
Joined: Sun May 24, 2020 3:16 pm
languages_spoken: english,spanish
Has thanked: 0
Been thanked: 0
Contact:

Re: Mali G52 OpenCL Comparison

Post by galleta38 »

hominoid wrote:
Wed May 06, 2020 4:36 am
Could you please post some OpenCL performance benchs (eg. hashcat) between Odroid C4 and N2? At least some basic fair comparison. I am really interested in Mali G31 vs G52, but would be also interesting CPU+GPU in both cases.

I have been searching almost 1 day and there are no comparatives between both devices. And I don't want to consider the hardkernel marketing pages as a proof.

Another option is that OpenCL in Odroid C4 is still not mature.

hominoid
Posts: 854
Joined: Tue Feb 28, 2017 3:55 am
languages_spoken: english
ODROIDs: C2, C4, XU4, MC1, N1, N2, N2L, N2+, HC4, M1, H2, H3+
Location: Lake Superior Basin, USA
Has thanked: 121 times
Been thanked: 379 times
Contact:

Re: Mali G52 OpenCL Comparison

Post by hominoid »

I tried to get OpenCL running on the C4 recently to do just that and ran into some issues. As soon as things mature enough and HK releases their image I will give it another go.

User avatar
meveric
Posts: 12102
Joined: Mon Feb 25, 2013 2:41 pm
languages_spoken: german, english
ODROIDs: X2, U2, U3, XU-Lite, XU3, XU3-Lite, C1, XU4, C2, C1+, XU4Q, HC1, N1, Go, H2 (N4100), N2, H2 (J4105), GoA, C4, GoA v1.1, H2+, HC4, GoS
Has thanked: 89 times
Been thanked: 667 times
Contact:

Re: Mali G52 OpenCL Comparison

Post by meveric »

my Debian image uses the Mali G31 image with OpenCL support, it should work when you install the GPU drivers.
Donate to support my work on the ODROID GameStation Turbo Image for U2/U3 XU3/XU4 X2 X C1 as well as many other releases.
Check out the Games and Emulators section to find some of my work or check the files in my repository to find the software i build for ODROIDs.
If you want to add my repository to your image read my HOWTO integrate my repo into your image.

hominoid
Posts: 854
Joined: Tue Feb 28, 2017 3:55 am
languages_spoken: english
ODROIDs: C2, C4, XU4, MC1, N1, N2, N2L, N2+, HC4, M1, H2, H3+
Location: Lake Superior Basin, USA
Has thanked: 121 times
Been thanked: 379 times
Contact:

Re: Mali G52 OpenCL Comparison

Post by hominoid »

meveric wrote:
Mon May 25, 2020 5:58 pm
my Debian image uses the Mali G31 image with OpenCL support, it should work when you install the GPU drivers.
Thanks @meveric, I was left a little concerned about the behavior of kernel 4.9.y under load on the C4 from my thermal stress tests. I thought it might be best to give HK a little more time to complete their kernel work before doing performance tests.

User avatar
meveric
Posts: 12102
Joined: Mon Feb 25, 2013 2:41 pm
languages_spoken: german, english
ODROIDs: X2, U2, U3, XU-Lite, XU3, XU3-Lite, C1, XU4, C2, C1+, XU4Q, HC1, N1, Go, H2 (N4100), N2, H2 (J4105), GoA, C4, GoA v1.1, H2+, HC4, GoS
Has thanked: 89 times
Been thanked: 667 times
Contact:

Re: Mali G52 OpenCL Comparison

Post by meveric »

in worst case install a fan on the heatsink, shouldn't be too hard.. there are even USB fans that could be used:
https://www.hardkernel.com/shop/40x40x1 ... oling-fan/
(you can probably find some on Amazon)
Donate to support my work on the ODROID GameStation Turbo Image for U2/U3 XU3/XU4 X2 X C1 as well as many other releases.
Check out the Games and Emulators section to find some of my work or check the files in my repository to find the software i build for ODROIDs.
If you want to add my repository to your image read my HOWTO integrate my repo into your image.

hominoid
Posts: 854
Joined: Tue Feb 28, 2017 3:55 am
languages_spoken: english
ODROIDs: C2, C4, XU4, MC1, N1, N2, N2L, N2+, HC4, M1, H2, H3+
Location: Lake Superior Basin, USA
Has thanked: 121 times
Been thanked: 379 times
Contact:

Re: Mali G52 OpenCL Comparison

Post by hominoid »

I have fans. It's not the thermals I'm concerned with, it's the performance inconsistency and how representative it would be of the actual potential of the C4. I don't know and would rather not have to do the work twice. :) I'm sure It will get done but is right now the best time...besides, I've been really busy with another project.

Post Reply

Return to “General Topics”

Who is online

Users browsing this forum: No registered users and 2 guests