USB3 ports in compare to XU4

Post Reply
User avatar
Digimaster
Posts: 247
Joined: Tue Feb 26, 2013 4:16 pm
languages_spoken: english, russian
ODROIDs: U2, X2, U3, C1, C2, XU4(500+), N2PLUS
Location: Moscow, Russia
Has thanked: 5 times
Been thanked: 2 times
Contact:

USB3 ports in compare to XU4

Post by Digimaster »

Hi, I'm just curious, what the difference in USB3.0 schematics in N2plus regarding to XU4. It seems to me N2plus can do video capture from 5 to 10 times faster then XU4 can do.

User avatar
mctom
Posts: 1959
Joined: Wed Nov 11, 2020 4:44 am
languages_spoken: english, polish
ODROIDs: OGA, XU4, C2, M1
Location: Gdansk, Poland
Has thanked: 224 times
Been thanked: 279 times
Contact:

Re: USB3 ports in compare to XU4

Post by mctom »

Probably USB3 is not a bottleneck in video capture. I'd rather blame video encoding performance.
Both of these boards have vastly different SoCs, so it's hard to compare implementation details really.
Punk ain't no religious cult, punk means thinking for yourself!

Maintainer of PiStackMon

User avatar
Digimaster
Posts: 247
Joined: Tue Feb 26, 2013 4:16 pm
languages_spoken: english, russian
ODROIDs: U2, X2, U3, C1, C2, XU4(500+), N2PLUS
Location: Moscow, Russia
Has thanked: 5 times
Been thanked: 2 times
Contact:

Re: USB3 ports in compare to XU4

Post by Digimaster »

I used the same usb3.0 camera for comparison.
No encoding, just plain YUV buffer capture.
1920x1080 can only do 18-20 fps on XU4 (100% one big core load)
The same c++ code on N2plus can capture at maximum 60fps
still consumed only about 30% of one little core. This is what I mean.

User avatar
mctom
Posts: 1959
Joined: Wed Nov 11, 2020 4:44 am
languages_spoken: english, polish
ODROIDs: OGA, XU4, C2, M1
Location: Gdansk, Poland
Has thanked: 224 times
Been thanked: 279 times
Contact:

Re: USB3 ports in compare to XU4

Post by mctom »

So, here you have your answer. Your C++ code is probably single threaded and only goes as much as a single big core in XU4 will allow.
If you mention it's the same C++ code, I assume you have built it yourself. I hope you didn't apply the same optimizations for both cases!

Keep in mind XU4's big core is A15, a 32-bit legacy thing. Little cores in N2+ (A53?) scored about half of XU4's big core performance on my benchmark, but that didn't account for "bitness" of operations, I think.

Maybe something else is going on, though. Perhaps USB3.0 consumes a lot of CPU power on XU4 for any reason, but then again, that would be handled by a separate thread, I imagine.
Punk ain't no religious cult, punk means thinking for yourself!

Maintainer of PiStackMon

User avatar
rooted
Posts: 9635
Joined: Fri Dec 19, 2014 9:12 am
languages_spoken: english
Location: Gulf of Mexico, US
Has thanked: 761 times
Been thanked: 505 times
Contact:

Re: USB3 ports in compare to XU4

Post by rooted »

Something I've noticed after using the XU4 for many years, performance is highly dependant on task and kernel.

Some tasks perform much better on old kernels and vice versa.

User avatar
Digimaster
Posts: 247
Joined: Tue Feb 26, 2013 4:16 pm
languages_spoken: english, russian
ODROIDs: U2, X2, U3, C1, C2, XU4(500+), N2PLUS
Location: Moscow, Russia
Has thanked: 5 times
Been thanked: 2 times
Contact:

Re: USB3 ports in compare to XU4

Post by Digimaster »

mctom wrote:
Mon Feb 21, 2022 7:44 am
So, here you have your answer. Your C++ code is probably single threaded and only goes as much as a single big core in XU4 will allow.
If you mention it's the same C++ code, I assume you have built it yourself. I hope you didn't apply the same optimizations for both cases!

Keep in mind XU4's big core is A15, a 32-bit legacy thing. Little cores in N2+ (A53?) scored about half of XU4's big core performance on my benchmark, but that didn't account for "bitness" of operations, I think.

Maybe something else is going on, though. Perhaps USB3.0 consumes a lot of CPU power on XU4 for any reason, but then again, that would be handled by a separate thread, I imagine.
V4L2 driver capturing is the only way you can do the task. At least legal way.
And this is particularly single threaded. You read driver's buffer one by one from the same select func. There is no possibility to do it by any other way. The single core performance of XU4 and N2p is not different so much. At least it is not of 5-10 times. So I made conclusion it is USB3.0 chip/bus limitation.

User avatar
Digimaster
Posts: 247
Joined: Tue Feb 26, 2013 4:16 pm
languages_spoken: english, russian
ODROIDs: U2, X2, U3, C1, C2, XU4(500+), N2PLUS
Location: Moscow, Russia
Has thanked: 5 times
Been thanked: 2 times
Contact:

Re: USB3 ports in compare to XU4

Post by Digimaster »

rooted wrote:
Mon Feb 21, 2022 12:11 pm
Something I've noticed after using the XU4 for many years, performance is highly dependant on task and kernel.

Some tasks perform much better on old kernels and vice versa.
Yeah, this is why I have tryed kernel 4.x and 5.x on both machines.
There are no difference in capture performance regarding to kernel.
It is pity. I like XU4 but now I'm looking to switch my production to N2p.

User avatar
mctom
Posts: 1959
Joined: Wed Nov 11, 2020 4:44 am
languages_spoken: english, polish
ODROIDs: OGA, XU4, C2, M1
Location: Gdansk, Poland
Has thanked: 224 times
Been thanked: 279 times
Contact:

Re: USB3 ports in compare to XU4

Post by mctom »

So, you do see XU4 hog CPU to 100% and you know the application is single threaded, so you decided to blame USB port instead? I see...
Digimaster wrote:
Mon Feb 21, 2022 5:29 pm
The single core performance of XU4 and N2p is not different so much.
According to my benchmark (Dhrystone), N2+'s big core is faster than XU4 big core by a factor of 4,6. I wonder what's your opinion based on.

But anyway, to stay productive on a topic, are you able to observe singificant CPU utilization on XU4 when doing a big file transfer from USB storage?
Last edited by mctom on Mon Feb 21, 2022 6:19 pm, edited 1 time in total.
Punk ain't no religious cult, punk means thinking for yourself!

Maintainer of PiStackMon

User avatar
rooted
Posts: 9635
Joined: Fri Dec 19, 2014 9:12 am
languages_spoken: english
Location: Gulf of Mexico, US
Has thanked: 761 times
Been thanked: 505 times
Contact:

Re: USB3 ports in compare to XU4

Post by rooted »

It's good that you have tried old and new, seems it may indeed be time to upgrade.

I still hang on to hope we will see an XU5 someday, until then the N2+ is as good as it gets.
These users thanked the author rooted for the post:
Digimaster (Mon Feb 21, 2022 6:19 pm)

User avatar
Digimaster
Posts: 247
Joined: Tue Feb 26, 2013 4:16 pm
languages_spoken: english, russian
ODROIDs: U2, X2, U3, C1, C2, XU4(500+), N2PLUS
Location: Moscow, Russia
Has thanked: 5 times
Been thanked: 2 times
Contact:

Re: USB3 ports in compare to XU4

Post by Digimaster »

mctom wrote:
Mon Feb 21, 2022 6:12 pm
So, you do see XU4 hog CPU to 100% and you know the application is single threaded, so you decided to blame USB port instead? I see...
...
According to my benchmark (Dhrystone), N2+'s big core is faster than XU4 big core by a factor of 4,6. I wonder what's your opinion based on.

But anyway, to stay productive on a topic, are you able to observe singificant CPU utilization on XU4 when doing a big file transfer from USB storage?
First of all, my capture application always run on BIG core of XU4 due to a lack of performance.
But for N2p it used a LITTLE core because it only utilise about 30% of little CPU core for all fullHD 60fps capture. So, I compare a BIG core of XU4 with a little core of N2p.
Anyway, I think it doesn't much matter. Capturing is not integer (drystones) operations at most. It is more bandwidth dependent, which is highly architectural related thing.
I suppose there is much more effective periferal bridge on N2p then on XU4. This was the question.

User avatar
mctom
Posts: 1959
Joined: Wed Nov 11, 2020 4:44 am
languages_spoken: english, polish
ODROIDs: OGA, XU4, C2, M1
Location: Gdansk, Poland
Has thanked: 224 times
Been thanked: 279 times
Contact:

Re: USB3 ports in compare to XU4

Post by mctom »

I think comparing two USB roots is futile. Both must conform to the same USB standards, and there's no way one uses more CPU in user applications than the other. A kernel process serving a driver could have more work to do in a separate thread, but not your program.
The reason is simple - your program does not contain any code that deals with hardware directly. Kernel does that, and your code kindly asks Kernel to deliver it. Kernel cannot assign more work to your code than what's compiled into it.
I take this for granted that you do not speak to hardware directly, because you use the same code on two vastly different computers.

Thus, this is not a reasonable explanation why your program hogs CPU. I bet there is something different going on.

By starting that discussion I had an impression you'd like to get to the bottom of this and find the actual cause of the problem. If this is still the case, I think using a profiler on both machines could be helpful to determine which part of the program it spends the most time in, and if it's roughly the same on both systems.

If you just want it working with no extra effort then I guess you got your answer already.
Punk ain't no religious cult, punk means thinking for yourself!

Maintainer of PiStackMon

User avatar
Digimaster
Posts: 247
Joined: Tue Feb 26, 2013 4:16 pm
languages_spoken: english, russian
ODROIDs: U2, X2, U3, C1, C2, XU4(500+), N2PLUS
Location: Moscow, Russia
Has thanked: 5 times
Been thanked: 2 times
Contact:

Re: USB3 ports in compare to XU4

Post by Digimaster »

Let me describe conditions: I used c++ V4L2 official capture example. Anyone can download it, build it and use it. In my case I only used "capture YUV buffer to memory" without any extra calculations. Moreover absolutely all capture-related projects used this capture examle as a base. The code itself is very thin and effective and consists of just some ioctl kernel calls.
When we starts capturing we open driver descriptor and wait on kernel call "select" for data to be ready. It's simple and absolutely clear.
During select CPU does nothing and free for other tasks. So, select is not a point of load. When descriptor is fired by kernel it calls READ code which is also well optimised basic
c++ library call. The point of high cpu load is READ function, which is commonly of I/O operations. The I/O slowlyness is a result of a periferal operations speed lack.
Ok, the real READ is not as simple as I described (there are at least of two ioctl calls - dequeue buffer, then enqueue buffer), but for simplicity it's fine. No big deal how many ioctl calls are inside.
The main thing is that the code is straight forward for determine problem.

User avatar
mctom
Posts: 1959
Joined: Wed Nov 11, 2020 4:44 am
languages_spoken: english, polish
ODROIDs: OGA, XU4, C2, M1
Location: Gdansk, Poland
Has thanked: 224 times
Been thanked: 279 times
Contact:

Re: USB3 ports in compare to XU4

Post by mctom »

Thanks for the detailed explanation! :)

Could be that the c++ library that the example code has been compiled against does merciless polling without any sleep() in its loop, that works fine in most cases, but not this one.

You may be actually better off switching to n2+ anyway, as I suspect its availability will surely outlive XU4.
Punk ain't no religious cult, punk means thinking for yourself!

Maintainer of PiStackMon

User avatar
Digimaster
Posts: 247
Joined: Tue Feb 26, 2013 4:16 pm
languages_spoken: english, russian
ODROIDs: U2, X2, U3, C1, C2, XU4(500+), N2PLUS
Location: Moscow, Russia
Has thanked: 5 times
Been thanked: 2 times
Contact:

Re: USB3 ports in compare to XU4

Post by Digimaster »

mctom wrote:
Mon Feb 21, 2022 8:33 pm
Thanks for the detailed explanation! :)

Could be that the c++ library that the example code has been compiled against does merciless polling without any sleep() in its loop, that works fine in most cases, but not this one.

You may be actually better off switching to n2+ anyway, as I suspect its availability will surely outlive XU4.
There are no any polling at all. Select func, when called, is immediately going to sleep current thread. This is absolutely basic behaviour for that kernel call.
The thread wakes only when descriptor is fired (again by kernel event). Or it can awaike current thread in case of timeout (if you program it). And even actual data copying (dequeue buffer) happens in kernel. User code only allocate buffer for data and give it to kernel. So user code cannot provide any load for capturing.

User avatar
mctom
Posts: 1959
Joined: Wed Nov 11, 2020 4:44 am
languages_spoken: english, polish
ODROIDs: OGA, XU4, C2, M1
Location: Gdansk, Poland
Has thanked: 224 times
Been thanked: 279 times
Contact:

Re: USB3 ports in compare to XU4

Post by mctom »

All that sounds like the user process has almost nothing to do, and yet it does a lot, and apparently even more so in XU4.
Punk ain't no religious cult, punk means thinking for yourself!

Maintainer of PiStackMon

Post Reply

Return to “Hardware and peripherals”

Who is online

Users browsing this forum: No registered users and 1 guest