Good hardware but with good software ? Need highlights from Odroid

Kuennek
Posts: 14
Joined: Tue Jul 02, 2019 5:45 pm
languages_spoken: english
Has thanked: 0
Been thanked: 1 time
Contact:

Re: Good hardware but with good software ? Need highlights from Odroid

Unread post by Kuennek » Sun Jul 07, 2019 3:48 pm

Can we overclock gpu on our Odroid N2 too?

easybob95
Posts: 61
Joined: Mon Apr 08, 2019 4:02 pm
languages_spoken: english
Has thanked: 1 time
Been thanked: 5 times
Contact:

Re: Good hardware but with good software ? Need highlights from Odroid

Unread post by easybob95 » Sun Jul 07, 2019 10:30 pm

memeka wrote:
Sun Jul 07, 2019 10:27 am
easybob95 wrote:
Sat Jul 06, 2019 10:57 pm
Anyway, Jetson Nano works really good.

I made a speed test, using an opencv program to compare N2 and Nano CPU :
- Odroid N2 : 1.5 seconds
- Jetson Nano : 2 seconds

I am a bit disappointed with N2 ; i thought it would be much faster than Nano. He is faster but not so much.

Alain
Disappointed in the cheaper product that is 25% faster?
Lol
Hello memeka,

you should try to read more carefully.

I thought A73 with higher frequency should bring much more power comparing to A57 lower frequency. 25% is quite good but i expected better performance.

Lol. :D

Alain

easybob95
Posts: 61
Joined: Mon Apr 08, 2019 4:02 pm
languages_spoken: english
Has thanked: 1 time
Been thanked: 5 times
Contact:

Re: Good hardware but with good software ? Need highlights from Odroid

Unread post by easybob95 » Sun Jul 07, 2019 10:38 pm

igorpec wrote:
Sun Jul 07, 2019 8:07 am
easybob95 wrote:
Sat Jul 06, 2019 10:57 pm
Nvidia has nothing to do with Armbian (that's quite normal) so Igor is angry. For sure and for now, Armbian is useless with N2 and Nano. Maybe in 1 year, Armbian will be useful but at this time, Ubuntu will be as good as Armbian. So Armbian is mainly useless.
I am angry on how terrible low level and Linux support is on Nano. Since you have no idea what that means, its pointless to discuss. You clearly have absolutely no idea what Armbian is as well. Let me give you some very short overview. Armbian is a build tool, an engine to create a Debian based distribution. Like Yocto, like buildroot. Linux distribution or images are a side product, a demonstration that it works. And if a vendor provides shit SDK, its (almost) impossible / hard / time consuming to build a Linux from sources. Any Linux, not just Armbian. That's the main issue here. Since you only see the desktop and you don't care if you can't change or improve anything, just enjoy and remain ...

There is (unofficial/community made) Armbian for Nano ... its a bit improved Nvidia's Ubuntu (official Ubuntu does not exits) but its not build, but glued. That is good enough for demonstration purposes ...
Hello Igor,

I understand. But i am not a kind of geek trying to get its own personal linux system. I can imagine some people want to do that but as for me, i just want to write software to go further in my project so, Ubuntu is enough for me and i don't really care about Armbian. If this makes me an ... idiot ?, well, i am an idiot. No problem.

To get back with the Odroid N2, my problem is that Odroid does not bring much response to my questions. Odroid N2 is finally a SBC dedicated to game emulation or things like that. Not really serious stuff.

Too bad.

And during this time, Odroid remains silent. Still silent. Always silent. What a pity.

Alain


easybob95
Posts: 61
Joined: Mon Apr 08, 2019 4:02 pm
languages_spoken: english
Has thanked: 1 time
Been thanked: 5 times
Contact:

Re: Good hardware but with good software ? Need highlights from Odroid

Unread post by easybob95 » Mon Jul 08, 2019 3:52 am

Something really interesting :

I told you i made a test with a gradient removal algorithm using Odroid N2 and Nvidia Jetson Nano (Python 3 software).

The results were :
- N2 : 1.5s
- Jetson Nano : 2s

I have compiled openCV 4.1.0 with Cuda for the Jetson Nano and i made a new test :
- N2 (always the same openCV library) : 1.5s (no change)
- Jetson Nano with new OpenCV4.1.0 : 0.5s

Well, that's really different. The Jetson Nano fall from 2s to 0.5s with good openCV library. This time, Jetson Nano is 3 time faster than Odroid N2 !

I will try to compile openCV 4.1.0 for Odroid N2 to see if there is some improvement with the N2.

Alain

User avatar
igorpec
Posts: 396
Joined: Sat Dec 12, 2015 4:34 pm
languages_spoken: english,german,slovene
ODROIDs: XU4, HC1, C2, C1+
Has thanked: 8 times
Been thanked: 29 times
Contact:

Re: Good hardware but with good software ? Need highlights from Odroid

Unread post by igorpec » Mon Jul 08, 2019 4:14 am

Ubuntu is enough for me
Which Ubuntu? Can't find anything https://www.google.com/search?q=site%3A ... etson+nano
ARMBIAN - follow on Twitter
linux for ARM development boards with user friendly development tools

easybob95
Posts: 61
Joined: Mon Apr 08, 2019 4:02 pm
languages_spoken: english
Has thanked: 1 time
Been thanked: 5 times
Contact:

Re: Good hardware but with good software ? Need highlights from Odroid

Unread post by easybob95 » Mon Jul 08, 2019 4:19 am

It's Ubuntu 18.04 LTS

Alain

blu
Posts: 76
Joined: Wed Mar 08, 2017 11:30 pm
languages_spoken: english
ODROIDs: XU4, N2
Has thanked: 3 times
Been thanked: 20 times
Contact:

Re: Good hardware but with good software ? Need highlights from Odroid

Unread post by blu » Mon Jul 08, 2019 4:53 am

easybob95 wrote:
Mon Jul 08, 2019 3:52 am
I will try to compile openCV 4.1.0 for Odroid N2 to see if there is some improvement with the N2.
Let us know how the build for OCL2.0 goes.
easybob95 wrote:
Mon Jul 08, 2019 4:19 am
It's Ubuntu 18.04 LTS
It's not really, it's L4T, aka Linux for Tegra -- the userspace is largely based off Ubuntu, but there end the similarities.

easybob95
Posts: 61
Joined: Mon Apr 08, 2019 4:02 pm
languages_spoken: english
Has thanked: 1 time
Been thanked: 5 times
Contact:

Re: Good hardware but with good software ? Need highlights from Odroid

Unread post by easybob95 » Mon Jul 08, 2019 3:00 pm

Hello blu,

you are right, it's L4T based on Ubuntu 18.04.

But it is not really important, at least for me, as long i can make my work with it.

Concerning OpenCV 4.1.0 for N2, i will give it a try this week. I will tell you if it brings some extra performance to N2 (if i can succeed compilation and you it with Python of course).

Alain

User avatar
memeka
Posts: 4339
Joined: Mon May 20, 2013 10:22 am
languages_spoken: english
ODROIDs: XU rev2 + eMMC + UART
U3 + eMMC + IO Shield + UART
Has thanked: 1 time
Been thanked: 24 times
Contact:

Re: Good hardware but with good software ? Need highlights from Odroid

Unread post by memeka » Mon Jul 08, 2019 3:27 pm

easybob95 wrote:
Sun Jul 07, 2019 10:30 pm
Hello memeka,

you should try to read more carefully.

I thought A73 with higher frequency should bring much more power comparing to A57 lower frequency. 25% is quite good but i expected better performance.

Lol. :D

Alain
A57 vs A73 (or A72 vs A73) is not really that big difference in terms of real-life performance. The improvements in A73 is more in terms of power consumption. In certain workloads, A72 is even faster than A73 (and probably A57 too).
So the difference is clock speed. 25% for 1.8Ghz vs. 1.43 Ghz is in line with what one would expect.
The real advantage of A73 is lower consumption and better technology which allows higher clock rates and thus better performance because of this. The 1.8Ghz frequency of the A73 in the Odroid N2 is actually low because of some issues with the initial chip design (from Amlogic, not Hardkernel), and I expect Hardkernel to release soon a new revision of the N2 which has these issue fixed, and new frequencies at least 2.2 Ghz to be unlocked. This would mean 50-60% better than the Nano (e.g. just under 1s in your test) with that.... let's call it N2+ :).
On the other hand, this would be results for CPU only.

Using Cuda, you can get better results (like you got 0.5s) with the Nano. Cuda does not work on N2. So compiling OpenCV 4.1.0 for N2, I can tell you now there is little chance you'll get better results on the N2.

It is still up to debate the benefits of OpenCL on the N2. AFAIR, OpenCV has some OpenCL optimizations (see https://opencv.org/wp-content/uploads/2 ... 901763.jpg), but nothing that I myself found useful for my own applications. So if you wanna test the "raw" performance of N2 GPU, you can write an OpenCV benchmark that uses sobel, filter2D and whatever function is OpenCL accelerated, but unless you make use of that in your own code, it's useless (whereas Cuda I think has a much tighter integration than OpenCL). You can also use the ARM compute library (https://github.com/ARM-software/ComputeLibrary) if you want to test the N2 GPU performance, they have more things implemented than OpenCV.

User avatar
memeka
Posts: 4339
Joined: Mon May 20, 2013 10:22 am
languages_spoken: english
ODROIDs: XU rev2 + eMMC + UART
U3 + eMMC + IO Shield + UART
Has thanked: 1 time
Been thanked: 24 times
Contact:

Re: Good hardware but with good software ? Need highlights from Odroid

Unread post by memeka » Mon Jul 08, 2019 3:36 pm

McSpud3rd wrote:
Sat Jul 06, 2019 7:14 pm
Would you say the mainline Linux build is ready for someone like me to start toying with? I'd be happy to help out with user testing if that's something that this project would benefit from at this point?
I think you should wait for the next LTS (5.4?), then use mainline + check some out of tree patches from amlogic mainline developers (as they get queued for linux-next).
I think the armbian kernel people track the amlogic developers quite well, so you might want to look at their repository for a compilation of out-of-tree patches, or use their kernel/distro when they upgrade to next LTS.

McSpud3rd
Posts: 2
Joined: Sat Jul 06, 2019 6:58 pm
languages_spoken: english
ODROIDs: XU3 N2
Has thanked: 1 time
Been thanked: 0
Contact:

Re: Good hardware but with good software ? Need highlights from Odroid

Unread post by McSpud3rd » Mon Jul 08, 2019 7:25 pm

memeka wrote:
Mon Jul 08, 2019 3:36 pm
I think you should wait for the next LTS (5.4?), then use mainline + check some out of tree patches from amlogic mainline developers (as they get queued for linux-next).
I think the armbian kernel people track the amlogic developers quite well, so you might want to look at their repository for a compilation of out-of-tree patches, or use their kernel/distro when they upgrade to next LTS.
Thanks memeka, I'll have a play once the next LTS has been released.

easybob95
Posts: 61
Joined: Mon Apr 08, 2019 4:02 pm
languages_spoken: english
Has thanked: 1 time
Been thanked: 5 times
Contact:

Re: Good hardware but with good software ? Need highlights from Odroid

Unread post by easybob95 » Wed Jul 10, 2019 4:42 am

Hello,

i have succeed to compile opencv 4.1.0 with opencl=ON for Odroid N2. So, cv2 for Python is now 4.1.0 version.

I made a new test with my gradient removal algorithm using Odroid N2 :
- Odroid N2 opencv 4.1.0 : 0.5 second
- Jetson Nano opencv 4.1.0 : 0.5 second

That is to say equality now.

Many thanks Odroid for your precious help ! ;)

Alain
These users thanked the author easybob95 for the post (total 2):
blu (Wed Jul 10, 2019 3:30 pm) • odroid (Wed Jul 10, 2019 3:34 pm)

blu
Posts: 76
Joined: Wed Mar 08, 2017 11:30 pm
languages_spoken: english
ODROIDs: XU4, N2
Has thanked: 3 times
Been thanked: 20 times
Contact:

Re: Good hardware but with good software ? Need highlights from Odroid

Unread post by blu » Wed Jul 10, 2019 3:26 pm

easybob95 wrote:
Wed Jul 10, 2019 4:42 am
i have succeed to compile opencv 4.1.0 with opencl=ON for Odroid N2. So, cv2 for Python is now 4.1.0 version.

I made a new test with my gradient removal algorithm using Odroid N2 :
- Odroid N2 opencv 4.1.0 : 0.5 second
- Jetson Nano opencv 4.1.0 : 0.5 second

That is to say equality now.
Good! Is OCV 4.1 relying on OCL2 or can it run with a lower-version OCL, like 1.2? If the latter, it might be interesting to try and run the same test on XU4's Midgard, so that we get a general idea how much better Bifrost is for this kind of task.

User avatar
memeka
Posts: 4339
Joined: Mon May 20, 2013 10:22 am
languages_spoken: english
ODROIDs: XU rev2 + eMMC + UART
U3 + eMMC + IO Shield + UART
Has thanked: 1 time
Been thanked: 24 times
Contact:

Re: Good hardware but with good software ? Need highlights from Odroid

Unread post by memeka » Wed Jul 10, 2019 4:42 pm

easybob95 wrote:
Wed Jul 10, 2019 4:42 am
Hello,

i have succeed to compile opencv 4.1.0 with opencl=ON for Odroid N2. So, cv2 for Python is now 4.1.0 version.

I made a new test with my gradient removal algorithm using Odroid N2 :
- Odroid N2 opencv 4.1.0 : 0.5 second
- Jetson Nano opencv 4.1.0 : 0.5 second

That is to say equality now.

Many thanks Odroid for your precious help ! ;)

Alain
This is better than I expected, are you using OCL implemented functions in opencv?

easybob95
Posts: 61
Joined: Mon Apr 08, 2019 4:02 pm
languages_spoken: english
Has thanked: 1 time
Been thanked: 5 times
Contact:

Re: Good hardware but with good software ? Need highlights from Odroid

Unread post by easybob95 » Wed Jul 10, 2019 9:11 pm

From what i know, i think openCLversion is 1.2

Concerning opencv functions i use, it is mainly image filters like gaussian filter.

I am trying to find which functions i must use to be sure UMAT will bring opencl acceleration.

This is not really clear for Python language. It is more clear for C language.

But i will look at this letter because for now, i must install pycuda on Jetson Nano andi have small issues installing pycuda.

Alain

User avatar
tony.hong
Posts: 39
Joined: Tue Jun 04, 2019 1:49 pm
languages_spoken: korean
ODROIDs: All
Location: korea
Has thanked: 9 times
Been thanked: 4 times
Contact:

Re: Good hardware but with good software ? Need highlights from Odroid

Unread post by tony.hong » Mon Jul 15, 2019 10:49 am


easybob95
Posts: 61
Joined: Mon Apr 08, 2019 4:02 pm
languages_spoken: english
Has thanked: 1 time
Been thanked: 5 times
Contact:

Re: Good hardware but with good software ? Need highlights from Odroid

Unread post by easybob95 » Wed Jul 17, 2019 3:46 pm

I confirm i get OpenCL 1.2 version with OpenCV 4.1.0.

The speed tests for my purpose are good but not better than Jatson Nano.

Concerning Jetson Nano, i have compiled Cuda examples (Cuda toolkit) to see what Maxwell GPU can do. It is really amazing. I made focus on image filtering and the difference is huge between Cuda and OpenCV. Really impressive.

That makes me say Odroid N2 is a half SBC without G52 support.

Alain

blu
Posts: 76
Joined: Wed Mar 08, 2017 11:30 pm
languages_spoken: english
ODROIDs: XU4, N2
Has thanked: 3 times
Been thanked: 20 times
Contact:

Re: Good hardware but with good software ? Need highlights from Odroid

Unread post by blu » Wed Jul 17, 2019 6:17 pm

easybob95 wrote:
Wed Jul 17, 2019 3:46 pm
I confirm i get OpenCL 1.2 version with OpenCV 4.1.0.

The speed tests for my purpose are good but not better than Jatson Nano.
Most GPU and GPGPU tests so far demonstrate similar performance levels between G52/mp2 (*) and Maxwell2.0/128. Actually, I'd expect Nano to have a general edge due to 2.5x the system BW of Nano over N2, but I guess your workload fits well with G52's caches.
Concerning Jetson Nano, i have compiled Cuda examples (Cuda toolkit) to see what Maxwell GPU can do. It is really amazing. I made focus on image filtering and the difference is huge between Cuda and OpenCV. Really impressive.

That makes me say Odroid N2 is a half SBC without G52 support.
Mali will never have CUDA support. But you can learn OCL and write your own kernels doing the same things in OCL2, across a myriad of non-NV GPUs : )

* It's 2 devices of 3 pipelines each, so the setup is often referred to as mp6.

easybob95
Posts: 61
Joined: Mon Apr 08, 2019 4:02 pm
languages_spoken: english
Has thanked: 1 time
Been thanked: 5 times
Contact:

Re: Good hardware but with good software ? Need highlights from Odroid

Unread post by easybob95 » Wed Jul 17, 2019 7:59 pm

For now, i will focus my work on Nano and Cuda/PyCuda. Nvidia support is really useful and i will be able to use my work with something more powerful like a laptop for specific needs.

But i don't give up with N2. I will probably try to use OCL for other projects but Cuda will go first.

Many thanks for your comment blu.

Alain

User avatar
memeka
Posts: 4339
Joined: Mon May 20, 2013 10:22 am
languages_spoken: english
ODROIDs: XU rev2 + eMMC + UART
U3 + eMMC + IO Shield + UART
Has thanked: 1 time
Been thanked: 24 times
Contact:

Re: Good hardware but with good software ? Need highlights from Odroid

Unread post by memeka » Wed Jul 17, 2019 8:11 pm

What do you mean no G52 support? How else are you running OCL?

easybob95
Posts: 61
Joined: Mon Apr 08, 2019 4:02 pm
languages_spoken: english
Has thanked: 1 time
Been thanked: 5 times
Contact:

Re: Good hardware but with good software ? Need highlights from Odroid

Unread post by easybob95 » Thu Jul 18, 2019 6:21 pm

Where is the G52 software support ?

I can manage OpenCV but i need more OCL support to learn it and get specific informations about G52 to make optimized programs.

I don't speak about OGL of course or native GPU support with Ubuntu.

Anyway, i will be back with N2 when we will get some serious informations and libraries. For now, as i said, i focus my work on pycuda/cuda. Very interesting and well documented.

User avatar
memeka
Posts: 4339
Joined: Mon May 20, 2013 10:22 am
languages_spoken: english
ODROIDs: XU rev2 + eMMC + UART
U3 + eMMC + IO Shield + UART
Has thanked: 1 time
Been thanked: 24 times
Contact:

Re: Good hardware but with good software ? Need highlights from Odroid

Unread post by memeka » Thu Jul 18, 2019 8:30 pm

Not clear what is it you want?
OpenCL information?
G52 arch information?

easybob95
Posts: 61
Joined: Mon Apr 08, 2019 4:02 pm
languages_spoken: english
Has thanked: 1 time
Been thanked: 5 times
Contact:

Re: Good hardware but with good software ? Need highlights from Odroid

Unread post by easybob95 » Fri Jul 19, 2019 1:08 am

Both. But now, i don't care much anymore as i said i focus on Cuda.

easybob95
Posts: 61
Joined: Mon Apr 08, 2019 4:02 pm
languages_spoken: english
Has thanked: 1 time
Been thanked: 5 times
Contact:

Re: Good hardware but with good software ? Need highlights from Odroid

Unread post by easybob95 » Sat Jul 20, 2019 5:21 am

I have made my first test program using Python and PyCuda. It works great.

The test program is quite simple and its only purpose is to compare pycuda and classic Python performance.

For trigonometric calculus with iterations :
Jetson Nano :
Python + Numpy : 4.39s
Python + pycuda 16 blocks 32 threads per block : 0.0042s

Odroid N2 :
Python + Numpy : 2.42s

PyCuda is quite impressive.

Just considering CPU, Odroid N2 out perform Jetson Nano.

I made an other test only using numpy simple routine (i multiply 2 big arrays 20 million elements each) :
Jetson Nano :
Python + Numpy : 1.6s
Python + pycuda 16 blocks 32 threads per block : 0.249s

Odroid N2 :
Python + Numpy : 0.98s

The difference is much smaller.

Just considering CPU, Odroid N2 is still ahead Jetson Nano.

In fact, the second test used much more python lines than the first. So, we can see numpy is a very good Python library (very fast routines) but if you get some python lines in the algorithm, then you can see easily python is an interpreted language.

Well, N2 performs great but for specific routines, GPU is far ahead CPU and in that case, pycuda and Nano are really interesting. Everyone knows that but it is interesting to make the test.

blu
Posts: 76
Joined: Wed Mar 08, 2017 11:30 pm
languages_spoken: english
ODROIDs: XU4, N2
Has thanked: 3 times
Been thanked: 20 times
Contact:

Re: Good hardware but with good software ? Need highlights from Odroid

Unread post by blu » Sat Jul 20, 2019 5:44 am

easybob95 wrote:
Sat Jul 20, 2019 5:21 am
Well, N2 performs great but for specific routines, GPU is far ahead CPU and in that case, pycuda and Nano are really interesting. Everyone knows that but it is interesting to make the test.
Well, I guess GPGPU is pushing out xeons et al from HPC for a good reason ; )

It'd be curious if you got confident with writing your own kernels and one day did similar things with OCL -- just as Mali will never run CUDA, there are some very interesting devices running OCL that will never run CUDA.

easybob95
Posts: 61
Joined: Mon Apr 08, 2019 4:02 pm
languages_spoken: english
Has thanked: 1 time
Been thanked: 5 times
Contact:

Re: Good hardware but with good software ? Need highlights from Odroid

Unread post by easybob95 » Mon Jul 22, 2019 11:09 pm

I will take à look at OCL but later. For now, i just work on pycuda.

It's more simple than i previous thought (thanks pycuda).

GPU programming is really interesting. I make simple tests to compare OpenCV + numy VS Cuda. From what i can see, Cuda (i should say pycuda) is not always better than opencv + numpy.

For appropriate routines, pycuda is about 10 times faster than opencv + numpy (sometimes much more like 100x or 500x but it really depends of the routine). But i must say opencv + numpy is not bad at all.

It depends a lot of routine, number of blocks and number of Threads per block, memory management etc. I need to make a lot of tests to find which routines worth pycuda and which not.

Alain

easybob95
Posts: 61
Joined: Mon Apr 08, 2019 4:02 pm
languages_spoken: english
Has thanked: 1 time
Been thanked: 5 times
Contact:

Re: Good hardware but with good software ? Need highlights from Odroid

Unread post by easybob95 » Thu Jul 25, 2019 1:07 am

Hello,

to end my comparison between Odroid N2 and Jetson Nano.

I use about 10 filters with my sky survey software. For now, i have succeed to convert 8 of them using PyCuda for Jetson Nano.

Numpy routines are quite easy to convert with PyCuda.

Filters like blur, Gaussian blur or sharpen needs convolution filter with different kernels so i had to write a convolution filter using PyCuda. Not a big deal.

I made tests with classical method (numpy + opencv) and PyCuda method.

For a 3096*2080 pixels picture (the resolution of my camera) using 8 filters (maximum load) :
Odroid N2 + numpy + opencv 4.1.0 : 5.9 seconds
Jetson Nano + Numpy + opencv 4.1.0 : 7.4 seconds
Jetson Nano + PyCuda : 1.04 seconds

For a 1544*1040 pixels picture (BIN 2 camera) using 4 filters (typical load) :
Odroid N2 + numpy + opencv 4.1.0 : 1.38 seconds
Jetson Nano + Numpy + opencv 4.1.0 : 1.2 seconds
Jetson Nano + PyCuda : 0.21 seconds

PyCuda out perform Numpy + openCV 4.1.0 (5.7x more speed).

I did not make a test with opencv 3.3 but other tests i made before show opencv 4.1.0 (compiled with opencv sources and Jetson Nano/Odroid N2) make me think that PyCuda is 10 to 15 time faster than opencv 3.3.

Alain

blu
Posts: 76
Joined: Wed Mar 08, 2017 11:30 pm
languages_spoken: english
ODROIDs: XU4, N2
Has thanked: 3 times
Been thanked: 20 times
Contact:

Re: Good hardware but with good software ? Need highlights from Odroid

Unread post by blu » Thu Jul 25, 2019 1:50 am

Alain, your results show that numpy is crucial enough for your workload that its GPGPU-fication (via pycuda) significantly affects your tests. Here's a reddit thread on the exact same subject, you might want to check it out: https://www.reddit.com/r/Python/comment ... or_opencl/

easybob95
Posts: 61
Joined: Mon Apr 08, 2019 4:02 pm
languages_spoken: english
Has thanked: 1 time
Been thanked: 5 times
Contact:

Re: Good hardware but with good software ? Need highlights from Odroid

Unread post by easybob95 » Thu Jul 25, 2019 5:38 pm

In fact, for now, numpy is not my major problem. I have also read some things about cupy but i did not succeed to install it on Jetson Nano. Anyway, numpy is used to transfer data from CPU to GPU. In order to decrease time transfer, i have to merge several routines in the same Cuda routine.

From what i have understand, the major problem is memory management with Cuda routines. For now, i made "naive" algorithms. I will have to work on better routines with better memory management to get more interesting results. But for that, i think C coding with real Cuda coding will be necessary.

I will look at this later. Now, i want to get some results with pycuda (it is hard to get motivated if you have to work for months before getting some very first results !).

Concerning openCL, i think Python is not the best way at all to get openCL improvements. When i will try to work seriously with C (later, later), i will take a look at openCL.

But for quick results with not so hard work, i have to say that Python + Numpy + openCV + pycuda (for Nvidia only) can bring very interesting results.

Alain

User avatar
tony.hong
Posts: 39
Joined: Tue Jun 04, 2019 1:49 pm
languages_spoken: korean
ODROIDs: All
Location: korea
Has thanked: 9 times
Been thanked: 4 times
Contact:

Re: Good hardware but with good software ? Need highlights from Odroid

Unread post by tony.hong » Fri Jul 26, 2019 1:19 pm

easybob95 wrote:
Thu Jul 25, 2019 5:38 pm
But for quick results with not so hard work, i have to say that Python + Numpy + openCV + pycuda (for Nvidia only) can bring very interesting results.
Last 1 ~ 2 weeks, I tested OpenCV and OpenCL.

As a result, I agree with easybob95.



Before testing, I thought GPGPU(OpenCL, CUDA...) is very fast.

But it is fast only under certain conditions. (not only Odroid but also desktop or other SBC)

If anyone has spare time, please read https://wiki.odroid.com/etc/opencv/opencl#cpu_vs_gpu and post your thought here.
(The link may change over time.)

blu
Posts: 76
Joined: Wed Mar 08, 2017 11:30 pm
languages_spoken: english
ODROIDs: XU4, N2
Has thanked: 3 times
Been thanked: 20 times
Contact:

Re: Good hardware but with good software ? Need highlights from Odroid

Unread post by blu » Fri Jul 26, 2019 2:18 pm

tony.hong wrote:
Fri Jul 26, 2019 1:19 pm
Last 1 ~ 2 weeks, I tested OpenCV and OpenCL.

As a result, I agree with easybob95.

Before testing, I thought GPGPU(OpenCL, CUDA...) is very fast.

But it is fast only under certain conditions. (not only Odroid but also desktop or other SBC)

If anyone has spare time, please read https://wiki.odroid.com/etc/opencv/opencl#cpu_vs_gpu and post your thought here.
(The link may change over time.)
GPUs are throughput beasts that come at a latency price (compared to CPUs), and as such they prefer huge amounts of work. Naturally there's a workload threshold above which things make sense to be sent over to the GPU. Conversely, below that things should stay on the CPU (unless you can somehow piggyback that work onto other GPU work). Also, GPUs are sensitive to certain types of control flow, but that's orthogonal to the point.

Last but not least, when squeezing performance out of a GPU, one has to constantly keep in mind the 'occupancy factor' -- a function of register-file (RF) utilization by the kernel, vs number of kernel instances that can be kept in-flight at a time. That is maintained via workgroup size tuning.

Now, Mali line of mobile GPUs is interesting in its own regard, as it
  • does not have its own device memory (same as other mobile and APU GPUs)
  • instead has a very elaborate caching hierarchy
  • architectures changed significantly between Utgard and Midgard (split shaders -> unified shaders; no GPGPU on the Utgard) and between Mitgard and Bifrost (VLIW threads -> scalar threads; improved ALU efficiency due to TLP being easier to exploit than ILP).
Re your conclusions:
If you need more optimization, use OpenCL directly without OpenCV.

Yep.
These users thanked the author blu for the post:
tony.hong (Sat Jul 27, 2019 1:44 am)

User avatar
tony.hong
Posts: 39
Joined: Tue Jun 04, 2019 1:49 pm
languages_spoken: korean
ODROIDs: All
Location: korea
Has thanked: 9 times
Been thanked: 4 times
Contact:

Re: Good hardware but with good software ? Need highlights from Odroid

Unread post by tony.hong » Sat Jul 27, 2019 1:44 am

blu wrote:
Fri Jul 26, 2019 2:18 pm
GPUs are throughput beasts that come at a latency price (compared to CPUs), and as such they prefer huge amounts of work. Naturally there's a workload threshold above which things make sense to be sent over to the GPU. Conversely, below that things should stay on the CPU (unless you can somehow piggyback that work onto other GPU work). Also, GPUs are sensitive to certain types of control flow, but that's orthogonal to the point.

Last but not least, when squeezing performance out of a GPU, one has to constantly keep in mind the 'occupancy factor' -- a function of register-file (RF) utilization by the kernel, vs number of kernel instances that can be kept in-flight at a time. That is maintained via workgroup size tuning.

Now, Mali line of mobile GPUs is interesting in its own regard, as it
  • does not have its own device memory (same as other mobile and APU GPUs)
  • instead has a very elaborate caching hierarchy
  • architectures changed significantly between Utgard and Midgard (split shaders -> unified shaders; no GPGPU on the Utgard) and between Mitgard and Bifrost (VLIW threads -> scalar threads; improved ALU efficiency due to TLP being easier to exploit than ILP).
I didn't know VLIW, scalar, TLP, ILP, etc. I've searched for these abbreviations on google and read some articles for a few hours.
I understood some parts that I did not understand or felt abstract.

Thanx blu ;)

blu
Posts: 76
Joined: Wed Mar 08, 2017 11:30 pm
languages_spoken: english
ODROIDs: XU4, N2
Has thanked: 3 times
Been thanked: 20 times
Contact:

Re: Good hardware but with good software ? Need highlights from Odroid

Unread post by blu » Sat Jul 27, 2019 4:37 am

tony.hong wrote:
Sat Jul 27, 2019 1:44 am
I didn't know VLIW, scalar, TLP, ILP, etc. I've searched for these abbreviations on google and read some articles for a few hours.
I understood some parts that I did not understand or felt abstract.

Thanx blu ;)
Absolutely np. If you feel confident with c++, here's a very compact hello-world in OCL* I used to examine the nuances of OCL 1.1-1.2 back in the day (when OCL 1.2 was just released). It uses the profiling capabilities of the OCL stack to print out not just the time a kernel appears to take from the POV of your process (which may include actions from the entire command queue that the kernel depends on, directly or indirectly), but the true time the kernel takes to execute on the respective device (via CL_PROFILING_COMMAND_START, CL_PROFILING_COMMAND_END). Currently, there's a basic small dense 4x4 matrix multiplying kernel with various tune-ups one can specify at CLI. Despite all the CLI tunings the global worksize is hardcoded in the source -- image_w, image_h in main.cpp (I got lazy). The code also can be used to examine the caps of all OCL platforms in the system -- that comes handy when studying OCL and the different devices. Last but not least, the code base does not depend on any external code -- you just need to have libOpenCL and clang (changeable to you favorite c++ compiler in the build script).

* Repo uses mercurial, not git.

easybob95
Posts: 61
Joined: Mon Apr 08, 2019 4:02 pm
languages_spoken: english
Has thanked: 1 time
Been thanked: 5 times
Contact:

Re: Good hardware but with good software ? Need highlights from Odroid

Unread post by easybob95 » Fri Aug 02, 2019 4:23 am

Hello,

i made some denoise filters using CUDA (NLM equivalent to cv2.FastNLMeanDenoising and KNN filter) with Jetson Nano.

I made some speed tests comparisons between Odroid N2 (OpenCV 4.1.0) and Jetson Nano (OpenCV 4.1.0 and CUDA) with Python (and pycuda).

1544*1040 pixels image :

Odroid N2 :
- OpenCV FastNLMeanDenoise : 0.72 seconds
Jetson Nano :
- OpenCV FastNLMeanDenoise : 0.92 seconds
- fast NLM PyCuda : 0.17 seconds
- KNN PyCuda : 0.068 seconds


3096*2080 pixels image :

Odroid N2 :
- OpenCV FastNLMeanDenoise : 1.92 seconds
Jetson Nano :
- OpenCV FastNLMeanDenoise : 2.6 seconds
- fast NLM PyCuda : 0.67 seconds
- KNN PyCuda : 0.25 seconds

Odroid N2 CPU is faster than Jetson Nano CPU, no doubt about that. Maxwell GPU with CUDA brings real improvements.

Unfortunately, as i don't know much things about opencl, i can't make tests with N2 and ocl. One day, maybe.

Alain

blu
Posts: 76
Joined: Wed Mar 08, 2017 11:30 pm
languages_spoken: english
ODROIDs: XU4, N2
Has thanked: 3 times
Been thanked: 20 times
Contact:

Re: Good hardware but with good software ? Need highlights from Odroid

Unread post by blu » Fri Aug 02, 2019 6:01 am

easybob95 wrote:
Fri Aug 02, 2019 4:23 am
Odroid N2 CPU is faster than Jetson Nano CPU, no doubt about that. Maxwell GPU with CUDA brings real improvements.

Unfortunately, as i don't know much things about opencl, i can't make tests with N2 and ocl. One day, maybe.
Nobody is born with knowledge of OCL -- we all have to learn it the good old-fashioned way ; )

Good results with CUDA you get there! And CPU-wise, CA73 is a really nice little performer -- 'little' as in something that was meant to balance power and performance, rather than go outright for performance -- it outperforms clock-for-clock both CA57 and CA72 on quite a few workloads.

easybob95
Posts: 61
Joined: Mon Apr 08, 2019 4:02 pm
languages_spoken: english
Has thanked: 1 time
Been thanked: 5 times
Contact:

Re: Good hardware but with good software ? Need highlights from Odroid

Unread post by easybob95 » Fri Aug 02, 2019 7:36 pm

I am not sure A73 outperforms A57 clock for clock.

Jetson Nano is about 1.5 GHz and N2 is about 1.9 GHz. If you calculate, Jetson Nano is about the same as N2 clock to clock.

Anyway, for my work, 10 or 30% improvement is not a must have. Only CUDA (and maybe OpenCL) can bring significant improvement.

Alain

blu
Posts: 76
Joined: Wed Mar 08, 2017 11:30 pm
languages_spoken: english
ODROIDs: XU4, N2
Has thanked: 3 times
Been thanked: 20 times
Contact:

Re: Good hardware but with good software ? Need highlights from Odroid

Unread post by blu » Fri Aug 02, 2019 9:15 pm

easybob95 wrote:
Fri Aug 02, 2019 7:36 pm
I am not sure A73 outperforms A57 clock for clock.
I didn't say that based on your numbers -- I've spent some time poking at the two (well, three) uarchs.

Nano's 2x L2 size and 2.5x RAM BW (vs N2) likely play a much larger role for your workloads.

easybob95
Posts: 61
Joined: Mon Apr 08, 2019 4:02 pm
languages_spoken: english
Has thanked: 1 time
Been thanked: 5 times
Contact:

Re: Good hardware but with good software ? Need highlights from Odroid

Unread post by easybob95 » Sat Aug 03, 2019 12:14 am

blu wrote:
Fri Aug 02, 2019 9:15 pm
easybob95 wrote:
Fri Aug 02, 2019 7:36 pm
I am not sure A73 outperforms A57 clock for clock.
I didn't say that based on your numbers -- I've spent some time poking at the two (well, three) uarchs.

Nano's 2x L2 size and 2.5x RAM BW (vs N2) likely play a much larger role for your workloads.
ok, ok. I have misunderstood !

Have a nice day.

Alain

blu
Posts: 76
Joined: Wed Mar 08, 2017 11:30 pm
languages_spoken: english
ODROIDs: XU4, N2
Has thanked: 3 times
Been thanked: 20 times
Contact:

Re: Good hardware but with good software ? Need highlights from Odroid

Unread post by blu » Sat Aug 03, 2019 1:03 am

easybob95 wrote:
Sat Aug 03, 2019 12:14 am
Have a nice day.
You too, have a good weekend! ; )

Post Reply

Return to “General Topics”

Who is online

Users browsing this forum: No registered users and 1 guest