Unexpected graphics performance differences in application

wallyz21
Posts: 104
Joined: Thu Apr 04, 2019 11:00 am
languages_spoken: english
ODROIDs: N2
Has thanked: 9 times
Been thanked: 12 times
Contact:

Unexpected graphics performance differences in application

Unread post by wallyz21 » Thu Apr 25, 2019 2:22 pm

I am developing a SDL2 based emulator for the Tandy Color Computer.

I was analyzing a poor frame rate issue when I stumbled across some strange behavior.

Usually when I start the emulator it hovers around 35 fps.

Just every now and again when I start the emulator it sits around 57-60 fps and it will stay this way until I start some other desktop application which seems to upset the frame rate and it drops back down to 35 fps.

The only thing I have noticed is the amount of available memory. Strangely the less memory available the more likely it runs at the full 60 fps speed!

The same behaviour occurs regardless of whether I start the emulator under the desktop environment or under the minimal SDL environment!

This kind of makes no sense to me at all!!!

wallyz21
Posts: 104
Joined: Thu Apr 04, 2019 11:00 am
languages_spoken: english
ODROIDs: N2
Has thanked: 9 times
Been thanked: 12 times
Contact:

Re: Unexpected graphics performance differences in application

Unread post by wallyz21 » Thu Apr 25, 2019 10:15 pm

I think I'm getting closer to the issue.

My vertical sync wait routine which uses usleep to simulate vsync in the event there isn't one is randomly sleeping too long.

I have used this routine on many systems instead of using the the real vsync interrupt.

On the N2 I am getting extremely loose timing compared to other Linux environments.

I am wondering if this has got something to do with process context switching.

Is is possible to assign an application to run on just certain cpus?

User avatar
mad_ady
Posts: 6791
Joined: Wed Jul 15, 2015 5:00 pm
languages_spoken: english
ODROIDs: XU4, C1+, C2, N1, H2, N2
Location: Bucharest, Romania
Has thanked: 216 times
Been thanked: 166 times
Contact:

Re: Unexpected graphics performance differences in application

Unread post by mad_ady » Fri Apr 26, 2019 1:08 am

Yes - via tasksel or taskset - I don't remember...

User avatar
meveric
Posts: 10528
Joined: Mon Feb 25, 2013 2:41 pm
languages_spoken: german, english
ODROIDs: X2, U2, U3, XU-Lite, XU3, XU3-Lite, C1, XU4, C2, C1+, XU4Q, HC1, N1, Go, H2 (N4100), N2, H2 (J4105)
Has thanked: 17 times
Been thanked: 149 times
Contact:

Re: Unexpected graphics performance differences in application

Unread post by meveric » Fri Apr 26, 2019 1:08 am

is this application suppose to use any accelerated graphics, if so, then the desktop environment won't work, as the N2 does not have accelerated GPU drivers for the desktop and you most likely run everything on very slow MESA software OpenGL renderer, which is a major slow down for the emulator.
Donate to support my work on the ODROID GameStation Turbo Image for U2/U3 XU3/XU4 X2 X C1 as well as many other releases.
Check out the Games and Emulators section to find some of my work or check the files in my repository to find the software i build for ODROIDs.
If you want to add my repository to your image read my HOWTO integrate my repo into your image.

wallyz21
Posts: 104
Joined: Thu Apr 04, 2019 11:00 am
languages_spoken: english
ODROIDs: N2
Has thanked: 9 times
Been thanked: 12 times
Contact:

Re: Unexpected graphics performance differences in application

Unread post by wallyz21 » Fri Apr 26, 2019 1:25 am

mad_ady wrote:
Fri Apr 26, 2019 1:08 am
Yes - via tasksel or taskset - I don't remember...
Thanks found it.

Tried tying the application to cpus 0,1 and cpus 2,3,4,5 but no real difference.
Worth a shot!

wallyz21
Posts: 104
Joined: Thu Apr 04, 2019 11:00 am
languages_spoken: english
ODROIDs: N2
Has thanked: 9 times
Been thanked: 12 times
Contact:

Re: Unexpected graphics performance differences in application

Unread post by wallyz21 » Fri Apr 26, 2019 1:29 am

meveric wrote:
Fri Apr 26, 2019 1:08 am
does this application is suppose to use any accelerated graphics, if so, then the desktop environment won't work, as the N2 does not have accelerated GPU drivers for the desktop and you most likely run everything on very slow MESA software OpenGL renderer, which is a major slow down for the emulator.
It doesn't need accelerated graphics. Htop shows at 35 frames per second the average CPU usage is 25%.

I have also tried this on the accelerated mali-fbdev driver using SDL2 but I can't check the CPU usage as all other consoles are locked out once I start the emulator. Not sure why this happens!

But I can ssh and run htop from another system. So using mali-fbdev the application shows at 40fps 14% CPU utilization. So plenty of overhead left to reach 60 fps. So why doesn't it?
Last edited by wallyz21 on Fri Apr 26, 2019 11:16 am, edited 1 time in total.

User avatar
meveric
Posts: 10528
Joined: Mon Feb 25, 2013 2:41 pm
languages_spoken: german, english
ODROIDs: X2, U2, U3, XU-Lite, XU3, XU3-Lite, C1, XU4, C2, C1+, XU4Q, HC1, N1, Go, H2 (N4100), N2, H2 (J4105)
Has thanked: 17 times
Been thanked: 149 times
Contact:

Re: Unexpected graphics performance differences in application

Unread post by meveric » Fri Apr 26, 2019 1:40 am

Have you tried the code on any other device?

Is the code available for testing?
I can try and compare on different ODROIDs both 32bit and 64bit, using X11 GPU drivers to see if it makes any difference.
Donate to support my work on the ODROID GameStation Turbo Image for U2/U3 XU3/XU4 X2 X C1 as well as many other releases.
Check out the Games and Emulators section to find some of my work or check the files in my repository to find the software i build for ODROIDs.
If you want to add my repository to your image read my HOWTO integrate my repo into your image.

wallyz21
Posts: 104
Joined: Thu Apr 04, 2019 11:00 am
languages_spoken: english
ODROIDs: N2
Has thanked: 9 times
Been thanked: 12 times
Contact:

Re: Unexpected graphics performance differences in application

Unread post by wallyz21 » Fri Apr 26, 2019 1:44 am

Some more testing with usleep and nanosleep shows the actual sleep time is very inaccurate and can be upto 10milliseconds off. When a frame has to complete in 16.7 milliseconds this is not sufficient.

On a I5 2 core 4 thread 1.7Ghz cpu the average nanosleep inaccuracy is in the 100s of nsecs an the max was 85000 nsecs. On the N2 ithe average is 285000 nsec and the max is 9600000 nsecs. Both systems using Ubuntu 18.04. It's an order of magnitude larger!

So no vsync or accurate timing under X!

ASword
Posts: 194
Joined: Fri Aug 04, 2017 12:48 pm
languages_spoken: english
ODROIDs: XU4, HC1, 2x N2
Has thanked: 5 times
Been thanked: 3 times
Contact:

Re: Unexpected graphics performance differences in application

Unread post by ASword » Fri Apr 26, 2019 4:08 am

You should never rely on sleep or usleep for timing purposes, especially for anything needing sub-second precision.

wallyz21
Posts: 104
Joined: Thu Apr 04, 2019 11:00 am
languages_spoken: english
ODROIDs: N2
Has thanked: 9 times
Been thanked: 12 times
Contact:

Re: Unexpected graphics performance differences in application

Unread post by wallyz21 » Fri Apr 26, 2019 10:51 am

ASword wrote:
Fri Apr 26, 2019 4:08 am
You should never rely on sleep or usleep for timing purposes, especially for anything needing sub-second precision.
I agree!

Unfortunately there is not a lot remaining to choose from!!!

wallyz21
Posts: 104
Joined: Thu Apr 04, 2019 11:00 am
languages_spoken: english
ODROIDs: N2
Has thanked: 9 times
Been thanked: 12 times
Contact:

Re: Unexpected graphics performance differences in application

Unread post by wallyz21 » Fri Apr 26, 2019 10:54 am

meveric wrote:
Fri Apr 26, 2019 1:40 am
Have you tried the code on any other device?

Is the code available for testing?
I can try and compare on different ODROIDs both 32bit and 64bit, using X11 GPU drivers to see if it makes any difference.
I have tested this on various Intel platforms, under Ubuntu and Windows.

The code is available but there are some tricky dependencies that require compiling before hand. And the code has never been tested on 32 bit!

Or I could provide a link to prebuild 64bit libraries and binaries to try on other 64bit Linux systems!

User avatar
meveric
Posts: 10528
Joined: Mon Feb 25, 2013 2:41 pm
languages_spoken: german, english
ODROIDs: X2, U2, U3, XU-Lite, XU3, XU3-Lite, C1, XU4, C2, C1+, XU4Q, HC1, N1, Go, H2 (N4100), N2, H2 (J4105)
Has thanked: 17 times
Been thanked: 149 times
Contact:

Re: Unexpected graphics performance differences in application

Unread post by meveric » Fri Apr 26, 2019 2:23 pm

pre-build binary doesn't help to get it on ODROIDs running.
So only the source code can tell
Donate to support my work on the ODROID GameStation Turbo Image for U2/U3 XU3/XU4 X2 X C1 as well as many other releases.
Check out the Games and Emulators section to find some of my work or check the files in my repository to find the software i build for ODROIDs.
If you want to add my repository to your image read my HOWTO integrate my repo into your image.

User avatar
mad_ady
Posts: 6791
Joined: Wed Jul 15, 2015 5:00 pm
languages_spoken: english
ODROIDs: XU4, C1+, C2, N1, H2, N2
Location: Bucharest, Romania
Has thanked: 216 times
Been thanked: 166 times
Contact:

Re: Unexpected graphics performance differences in application

Unread post by mad_ady » Fri Apr 26, 2019 2:39 pm

Meveric is not afraid of compiling or tricky dependencies :)

wallyz21
Posts: 104
Joined: Thu Apr 04, 2019 11:00 am
languages_spoken: english
ODROIDs: N2
Has thanked: 9 times
Been thanked: 12 times
Contact:

Re: Unexpected graphics performance differences in application

Unread post by wallyz21 » Fri Apr 26, 2019 3:54 pm

meveric wrote:
Fri Apr 26, 2019 2:23 pm
pre-build binary doesn't help to get it on ODROIDs running.
So only the source code can tell
Here are the links and some instructions.

https://drive.google.com/drive/folders/ ... sp=sharing

In this google share you will find 3 archive (.gz) files:

1. Linux_Arm64_OVCC.tar.gz :

This is a prebuilt snap shot of binaries and libraries on Ubuntu 18.04 Arm64. Contains roms and virtual drive used by emulator. You shouldn't need it but it demonstrates the directory structure of the installed app.

2. agar-1.5.0.tar.gz :

This is the major dependency required by the emulator. It is a snap shot of my current agar build on Ubuntu N2.
It too has it dependencies you can read about here: (http://libagar.org/docs/inst/linux.html)

# apt-get install libfreetype6-dev # Debian, Linux Mint, Ubuntu
# apt-get install libxinerama-dev # Debian, Linux Mint, Ubuntu
# apt-get install libfontconfig-dev # Debian, Linux Mint, Ubuntu
# apt-get install libsdl2-dev # Debian, Linux Mint, Ubuntu

Be aware this IS NOT the standard AGAR 1.5.0 provided on their web site but has special support for SDL2 (which is a requirement).
Ensure you don't have SDL1 installed (just SDL2) or the build will not work!

In my snapshot (clean the build) you will need to remove all .o .lo and and libs.
Then from the top level directory:

# ./configure
and respond to any errors. If you get a successful config (make sure the SDL2 check succeeds) then:

# make depend all
again respond to any errors.

If all goes well :

# sudo make install

That completes all dependencies:

Now build the emulator.

OVCC.tar.gz

Go to top level directory and do:

# make clean
(manually remove any .so libraries as well)
# make
copy all .so libraries into a directory called CoCo/libs (after the make there should be several)
cd into CoCo directory

If you build it this way the included Vcc.ini setting file will not need to be modified!

Run the emulator:

# ./ovcc

The first screen will flash green (MS Basic) and then pre-boot to a blue menu screen. It doesn't matter what you do the FPS is all that matters,

Now on the bottom of the main window there is a status bar showing the FPS. This should remain a steady 60 FPS. If it flickers between 0 and some other value there is a problem. If it flickers between two close numbers like 58-59 that is ok!

OK assuming you get it to the point where it runs on several platforms and the FPS is 60 on any of them (and it should as I have had this running on low end Intel with no issues) you will probably want to know where in the code the frame rate is calculated and controlled. If you need to know just let me know!

Also I have never built this on 32 bit so if you try that good luck!

Again Good Luck!
These users thanked the author wallyz21 for the post (total 3):
mad_ady (Fri Apr 26, 2019 4:07 pm) • meveric (Fri Apr 26, 2019 4:19 pm) • AreaScout (Fri Apr 26, 2019 7:02 pm)

User avatar
meveric
Posts: 10528
Joined: Mon Feb 25, 2013 2:41 pm
languages_spoken: german, english
ODROIDs: X2, U2, U3, XU-Lite, XU3, XU3-Lite, C1, XU4, C2, C1+, XU4Q, HC1, N1, Go, H2 (N4100), N2, H2 (J4105)
Has thanked: 17 times
Been thanked: 149 times
Contact:

Re: Unexpected graphics performance differences in application

Unread post by meveric » Fri Apr 26, 2019 4:09 pm

I'll try and see what I can find out.
I probably going to rebuild everything, as I don't use Ubuntu and don't even want to try if any of the pre-build binaries work.
Thanks for the build guide, I'll see how far I get :)
Donate to support my work on the ODROID GameStation Turbo Image for U2/U3 XU3/XU4 X2 X C1 as well as many other releases.
Check out the Games and Emulators section to find some of my work or check the files in my repository to find the software i build for ODROIDs.
If you want to add my repository to your image read my HOWTO integrate my repo into your image.

crashoverride
Posts: 4549
Joined: Tue Dec 30, 2014 8:42 pm
languages_spoken: english
ODROIDs: C1
Has thanked: 0
Been thanked: 77 times
Contact:

Re: Unexpected graphics performance differences in application

Unread post by crashoverride » Fri Apr 26, 2019 5:22 pm

wallyz21 wrote:
Fri Apr 26, 2019 1:44 am
So no vsync or accurate timing under X!
VSYNC is available using FBIO_WAITFORVSYNC:
https://github.com/hardkernel/linux/blo ... #L845-L852

User avatar
AreaScout
Posts: 1094
Joined: Sun Jul 07, 2013 3:05 am
languages_spoken: german, english
ODROIDs: X2, U3, XU3, C2, HiFi Shield, XU4, XU4Q,
N1, Go, VU5A, Show2, CloudShell2,
H2, N2, VU7A, VuShell
Has thanked: 20 times
Been thanked: 60 times
Contact:

Re: Unexpected graphics performance differences in application

Unread post by AreaScout » Fri Apr 26, 2019 7:03 pm

Thank you for the AGAR GUI SDL2 version ! :) Can't wait to test it !

RG

wallyz21
Posts: 104
Joined: Thu Apr 04, 2019 11:00 am
languages_spoken: english
ODROIDs: N2
Has thanked: 9 times
Been thanked: 12 times
Contact:

Re: Unexpected graphics performance differences in application

Unread post by wallyz21 » Sat Apr 27, 2019 12:24 am

AreaScout wrote:
Fri Apr 26, 2019 7:03 pm
Thank you for the AGAR GUI SDL2 version ! :) Can't wait to test it !

RG
You can find AGAR SDL2 prebuilt libraries for Linux X86 64, Linux Arm64, Windows 64, OSX here:

https://drive.google.com/drive/folders/ ... p=sharing

Look in the appropriately named sub directory.

I can also provide instructions on building it from scratch if required!

wallyz21
Posts: 104
Joined: Thu Apr 04, 2019 11:00 am
languages_spoken: english
ODROIDs: N2
Has thanked: 9 times
Been thanked: 12 times
Contact:

Re: Unexpected graphics performance differences in application

Unread post by wallyz21 » Sat Apr 27, 2019 1:05 am

crashoverride wrote:
Fri Apr 26, 2019 5:22 pm
wallyz21 wrote:
Fri Apr 26, 2019 1:44 am
So no vsync or accurate timing under X!
VSYNC is available using FBIO_WAITFORVSYNC:
https://github.com/hardkernel/linux/blo ... #L845-L852
Thanks.

That almost works! But it suffers from the same pre-emption problems as nanosleep.

I ran the ioctl vsyn in a loop 600 times (10 seconds) and output the time for each frame. Here is a sample of the output:

...(0.017)(0.017)(0.017)(0.017)(0.017)(0.026)(0.008)(0.017)(0.017)(0.017)...

As you would expect every frame is 0.0167 (approx 0.017) seconds. But if you look closely even in the small sample you will see two times 0.026 & 0.008 (these two times add to 0.034 or two times 0.017 which is correct overall but incorrect individually.)

Code: Select all

 {
    int loop;
    float period;

    for (loop = 0 ; loop < 600 ; loop++)
    {
        period = timems();
        if (ioctl(fb, FBIO_WAITFORVSYNC, &zero) == -1)
        {
            printf("fb ioctl failed: %s\n", strerror(errno));
        }
        period = timems();
        printf("(%2.3f)", period);
    }
  }
As you can see in the loop there isn't much code but the application can be pre-empted at any time and it appears it is getting pre-empted during the WAITFORVSYNC ioctl call. So you can never be sure when the OS will return. It is most unusual for a process to be pre-empted when it has already suspended itself voluntarily. This should not occur in the nanosleep or any other wait call, but it is!

I have never experienced these delays in any other Linux distro!

These 9ms delays are occurring very often and they result in about a 40% loss of frames which is coincidentally a similar amount that Chromium and Vivaldi are dropping frames in youtube videos regardless of resolution.

If I can't rely on either the vsync or a high resolution timer then there is nothing left!!!

Full code follows:

Code: Select all

#include <sys/ioctl.h>
#include <sys/time.h>
#include <fcntl.h>
#include <linux/types.h>
#include <linux/ioctl.h>
#include <linux/fb.h>
#include <errno.h>
#include <string.h>
#include <stdio.h>
#include <assert.h>

float timems();

int main(int argc, char *argv[])
{
  int fb = open("/dev/fb0", O_RDWR);
  assert(fb != -1);
  int zero = 0;
  if (ioctl(fb, FBIO_WAITFORVSYNC, &zero) == -1)
  {
    printf("fb ioctl failed: %s\n", strerror(errno));
  }
  {
    int loop;
    float period;

    for (loop = 0 ; loop < 600 ; loop++)
    {
        period = timems();
        if (ioctl(fb, FBIO_WAITFORVSYNC, &zero) == -1)
        {
            printf("fb ioctl failed: %s\n", strerror(errno));
        }
        period = timems();
        printf("(%2.3f)", period);
    }
  }
  return 0;
}

float timems()
{
	static struct timeval tval_before, tval_after, tval_result;
	static int firsttime = 1;
	float secs, fsecs;

	if (firsttime)
	{
		gettimeofday(&tval_before, NULL);
		firsttime = 0;
		return 0.0;
	}

	gettimeofday(&tval_after, NULL);
	timersub(&tval_after, &tval_before, &tval_result);
	memcpy(&tval_before, &tval_after, sizeof(tval_after));

	secs = tval_result.tv_sec;
	fsecs = (float)tval_result.tv_usec / 1000000.0;

	return secs + fsecs;
}

User avatar
memeka
Posts: 4369
Joined: Mon May 20, 2013 10:22 am
languages_spoken: english
ODROIDs: XU rev2 + eMMC + UART
U3 + eMMC + IO Shield + UART
Has thanked: 1 time
Been thanked: 39 times
Contact:

Re: Unexpected graphics performance differences in application

Unread post by memeka » Sat Apr 27, 2019 1:51 am

Can you try with sdl2 on gbm or sdl2 on wayland?

crashoverride
Posts: 4549
Joined: Tue Dec 30, 2014 8:42 pm
languages_spoken: english
ODROIDs: C1
Has thanked: 0
Been thanked: 77 times
Contact:

Re: Unexpected graphics performance differences in application

Unread post by crashoverride » Sat Apr 27, 2019 1:55 am

wallyz21 wrote:
Sat Apr 27, 2019 1:05 am
That almost works! But it suffers from the same pre-emption problems as nanosleep.
Try setting your system governor to 'performance' for all cores. If code spends most of its time 'sleeping', the [interactive] governor will clock down the system.
These users thanked the author crashoverride for the post (total 2):
mad_ady (Sat Apr 27, 2019 2:52 am) • skeetre (Wed May 08, 2019 2:28 am)

wallyz21
Posts: 104
Joined: Thu Apr 04, 2019 11:00 am
languages_spoken: english
ODROIDs: N2
Has thanked: 9 times
Been thanked: 12 times
Contact:

Re: Unexpected graphics performance differences in application

Unread post by wallyz21 » Sat Apr 27, 2019 2:46 am

crashoverride wrote:
Sat Apr 27, 2019 1:55 am
wallyz21 wrote:
Sat Apr 27, 2019 1:05 am
That almost works! But it suffers from the same pre-emption problems as nanosleep.
Try setting your system governor to 'performance' for all cores. If code spends most of its time 'sleeping', the [interactive] governor will clock down the system.
Thanks. Learn something new everyday!

Actually made a difference. My emulator went from 35fps to 40fps. So getting closer!

The vsync tester only dropped one frame in 3600 so that is very good.

wallyz21
Posts: 104
Joined: Thu Apr 04, 2019 11:00 am
languages_spoken: english
ODROIDs: N2
Has thanked: 9 times
Been thanked: 12 times
Contact:

Re: Unexpected graphics performance differences in application

Unread post by wallyz21 » Sat Apr 27, 2019 10:57 am

I just checked the cpu governor settings on some Intel Ubuntu systems and they are set to powersave.

So I tried powersave mode on the N2 and that stopped the system almost completely!!! I was unable to type in the terminal window anymore or do anything else. I was forced to power cycle the N2.

So now I think we are getting closer to the real problem, and that is the way in which task pre-empting has been implemented in the kernel!

There seems to be a large difference in behavior for the cpu governor between Intel and Arm!

Why?

I don't believe that powersave mode should completely halt all execution!

(Apparently according to Arch Linux Wiki. Intel don't not use the same mechanism for CPU governance and use pstate instead which is built into the kernel as apposed to user space!)

So me looking at the scaling_governor on Intel probably means nothing at all as Intel and Arm just do things differently!

crashoverride
Posts: 4549
Joined: Tue Dec 30, 2014 8:42 pm
languages_spoken: english
ODROIDs: C1
Has thanked: 0
Been thanked: 77 times
Contact:

Re: Unexpected graphics performance differences in application

Unread post by crashoverride » Sat Apr 27, 2019 6:16 pm

wallyz21 wrote:
Sat Apr 27, 2019 10:57 am
So now I think we are getting closer to the real problem, and that is the way in which task pre-empting has been implemented in the kernel!
I have not seen evidence of this in any of the tests I have performed. If the premise is well founded, we should have seen evidence of it in the numerous benchmarks that have been done on N2. I suggest creating a minimal test case that demonstrates the theory.
wallyz21 wrote:
Sat Apr 27, 2019 10:57 am
There seems to be a large difference in behavior for the cpu governor between Intel and Arm!
Intel does not (at the time of this writing) have any non-symmetric processor designs. The S922 design is based on ARM big.LITTLE with each complex having potential differences (ie. cache size). The intended scheduler for this design (Energy Aware Scheduler) differs and, therefore, so will behavior.
wallyz21 wrote:
Sat Apr 27, 2019 10:57 am
I don't believe that powersave mode should completely halt all execution!
This sounds like its a bug and should be reported.

ASword
Posts: 194
Joined: Fri Aug 04, 2017 12:48 pm
languages_spoken: english
ODROIDs: XU4, HC1, 2x N2
Has thanked: 5 times
Been thanked: 3 times
Contact:

Re: Unexpected graphics performance differences in application

Unread post by ASword » Sun Apr 28, 2019 12:18 am

wallyz21 wrote:
Fri Apr 26, 2019 10:51 am
ASword wrote:
Fri Apr 26, 2019 4:08 am
You should never rely on sleep or usleep for timing purposes, especially for anything needing sub-second precision.
I agree!

Unfortunately there is not a lot remaining to choose from!!!
Have you tried these options: http://www.2net.co.uk/tutorial/periodic_threads

ASword
Posts: 194
Joined: Fri Aug 04, 2017 12:48 pm
languages_spoken: english
ODROIDs: XU4, HC1, 2x N2
Has thanked: 5 times
Been thanked: 3 times
Contact:

Re: Unexpected graphics performance differences in application

Unread post by ASword » Sun Apr 28, 2019 12:21 am

crashoverride wrote:
Sat Apr 27, 2019 6:16 pm
wallyz21 wrote:
Sat Apr 27, 2019 10:57 am
There seems to be a large difference in behavior for the cpu governor between Intel and Arm!
Intel does not (at the time of this writing) have any non-symmetric processor designs. The S922 design is based on ARM big.LITTLE with each complex having potential differences (ie. cache size). The intended scheduler for this design (Energy Aware Scheduler) differs and, therefore, so will behavior.
Intel can exhibit similar issues as well, depending on circumstances. This sort of behaviour can manifest in thermally throttled conditions, or depending on workloads else where on the machine. Different Intel processors can have quite different characteristics (e.g. I spent a lot of time battling the Xeon Phi with heaps of wimpy 4-way threaded cores running extremely performance sensitive SIMD code).

wallyz21
Posts: 104
Joined: Thu Apr 04, 2019 11:00 am
languages_spoken: english
ODROIDs: N2
Has thanked: 9 times
Been thanked: 12 times
Contact:

Re: Unexpected graphics performance differences in application

Unread post by wallyz21 » Sun Apr 28, 2019 4:27 pm

crashoverride wrote:
Sat Apr 27, 2019 6:16 pm
wallyz21 wrote:
Sat Apr 27, 2019 10:57 am
So now I think we are getting closer to the real problem, and that is the way in which task pre-empting has been implemented in the kernel!
I have not seen evidence of this in any of the tests I have performed. If the premise is well founded, we should have seen evidence of it in the numerous benchmarks that have been done on N2. I suggest creating a minimal test case that demonstrates the theory.
I think you are correct and I don't believe I can draw any conclusions here. All I can say is Intel and Arm do it differently.

As I edited my own post with :
(Apparently according to Arch Linux Wiki. Intel don't not use the same mechanism for CPU governance and use pstate instead which is built into the kernel as apposed to user space!)

So me looking at the scaling_governor on Intel probably means nothing at all as Intel and Arm just do things differently!

wallyz21
Posts: 104
Joined: Thu Apr 04, 2019 11:00 am
languages_spoken: english
ODROIDs: N2
Has thanked: 9 times
Been thanked: 12 times
Contact:

Re: Unexpected graphics performance differences in application

Unread post by wallyz21 » Sun Apr 28, 2019 4:33 pm

ASword wrote:
Sun Apr 28, 2019 12:18 am
wallyz21 wrote:
Fri Apr 26, 2019 10:51 am
ASword wrote:
Fri Apr 26, 2019 4:08 am
You should never rely on sleep or usleep for timing purposes, especially for anything needing sub-second precision.
I agree!

Unfortunately there is not a lot remaining to choose from!!!
Have you tried these options: http://www.2net.co.uk/tutorial/periodic_threads
Thanks I am having a look at his now. Will let you know it the next few days.

wallyz21
Posts: 104
Joined: Thu Apr 04, 2019 11:00 am
languages_spoken: english
ODROIDs: N2
Has thanked: 9 times
Been thanked: 12 times
Contact:

Re: Unexpected graphics performance differences in application

Unread post by wallyz21 » Sun Apr 28, 2019 4:49 pm

Update.

I have managed to get the emulator running at 60fps (solid). This was with the CPU governor set to interactive (the default).

In order to do this I had to remove all wait on timer (nanosleep, SDL_Delay, usleep) logic and now I just do a raw poll of the high resolution timer until the target time elapses. The down side to this is the CPU that runs the poll runs at 100% all the time. So the emulator runs better when hogging the CPU rather than relinquishing it!

I was able to get a timer to work in a fashion but only with a working resolution of 12milliseconds which made it more or less irrelevant. Since I'm working with 16.7millisec frames I may as well poll till the end of frame.

However this does prove the emulator can run at 60fps without any problem and that the problem is not in the X11 or SDL2.

I am looking at asword's timefd suggestion at the moment and will report back.

wallyz21
Posts: 104
Joined: Thu Apr 04, 2019 11:00 am
languages_spoken: english
ODROIDs: N2
Has thanked: 9 times
Been thanked: 12 times
Contact:

Re: Unexpected graphics performance differences in application

Unread post by wallyz21 » Sun Apr 28, 2019 10:32 pm

Update : timerfd API tested.

The timerfd API also suffers from the same timing inaccuracies and 40% of the time returns a missed indicator!

ret = read (info->timer_fd, &missed, sizeof (missed)); // this is the wait for timer to expire.

If for some reason the timer expires the number of times missed is passed in 'missed'. So 40% percent of the time this contains a value of 1.

It is no more or less accurate then the other timer based wait/delay I tried (SDL_Delay, nanosleep, usleep)

I will states this again. All the above methods were tested on both work on Ubuntu Linux 18.04 X86 64 and Arm 64 (unfortunately I don't have another Arm based system I can test this on).

They all work on the Intel as expected and none have worked on the N2. I will see if I can borrow a rPI and compile the code!

None of them work on N2 Ubuntu.

The timerfd was a good choice to test the issue because the consistent missed value now indicates something needs further consideration.

Here is the code snippet from my FrameWait routine:

Code: Select all

	if (CurrentTime > TargetTime)
	{
		return;
	}

	// Try timerfd API

	itval.it_interval.tv_sec = 0;
	itval.it_interval.tv_nsec = 0;
	itval.it_value.tv_sec = 0;
	itval.it_value.tv_nsec = TargetTime - CurrentTime;
	ret = timerfd_settime (fdtimer, 0, &itval, NULL);

	unsigned long long missed;

	/* Wait for the next timer event. If we have missed any the
	   number is written to "missed" */
	ret = read (fdtimer, &missed, sizeof (missed));

	if (missed)
	{
		fprintf(stderr, "!");
	}


hsu95066
Posts: 51
Joined: Wed Aug 10, 2016 5:01 pm
languages_spoken: Chinese, English
ODROIDs: C1, C1+, C2, XU4
Has thanked: 3 times
Been thanked: 6 times
Contact:

Re: Unexpected graphics performance differences in application

Unread post by hsu95066 » Mon Apr 29, 2019 5:26 pm

Hi,

I modified the test code provided by wallyz21, and changed the time resolution to microsecond.
The code attached here,

Code: Select all

#include <sys/ioctl.h>
#include <sys/time.h>
#include <fcntl.h>
#include <linux/types.h>
#include <linux/ioctl.h>
#include <linux/fb.h>
#include <errno.h>
#include <string.h>
#include <stdio.h>
#include <assert.h>

// gcc -o vsync_02 vsync_02.c

long long current_timestamp()
{
	struct timeval te;

	gettimeofday(&te, NULL); // get current time
	long long      us = te.tv_sec*1000000L + te.tv_usec; // calculate milliseconds
	// printf("milliseconds: %lld\n", milliseconds);
	return us;
}


int main(int argc, char *argv[])
{
	int fb = open("/dev/fb0", O_RDWR);
	assert(fb != -1);
	int zero = 0;
	if (ioctl(fb, FBIO_WAITFORVSYNC, &zero) == -1)
	{
	    printf("fb ioctl failed: %s\n", strerror(errno));
	}
	{
		int loop;
		long long timestamp[1000];

		for (loop = 0; loop <= 600 ; loop++)
		{
		    if (ioctl(fb, FBIO_WAITFORVSYNC, &zero) == -1)
		    {
		        printf("fb ioctl failed: %s\n", strerror(errno));
		    }
		    timestamp[loop] = current_timestamp();
		}

		// print loop
		printf("(%18lld)\n", timestamp[0]);
		for (loop = 1; loop <= 600 ; loop++)
		{
			printf("(%18lld) - %ld us\n", timestamp[loop], timestamp[loop] - timestamp[loop-1]);
		}
		printf("(%18lld)~(%18lld) - %ld us\n", timestamp[0], timestamp[600], timestamp[600] - timestamp[0]);
	} 

	return 0;
}
First, I run test code five times on ubuntu-18.04.2-4.9-minimal-odroid-n2-20190329.img. Some result output show below,
( 1555072712671181)
( 1555072712688101) - 16920 us
( 1555072712705037) - 16936 us
( 1555072712721999) - 16962 us
( 1555072712738999) - 17000 us
( 1555072712755975) - 16976 us
:
:
( 1555072722822682) - 16981 us
( 1555072722839675) - 16993 us
( 1555072722856752) - 17077 us
( 1555072712671181)~( 1555072722856752) - 10185571 us

The last row show the total time intervals for 600 times ioctl call of FBIO_WAITFORVSYNC.

Then, I also run the same code on tty1 terminal mode of Ubuntu 16.04 Mate desktop for C2. Results see below,
( 1455208368326936)
( 1455208368343556) - 16620 us
( 1455208368360213) - 16657 us
( 1455208368376860) - 16647 us
( 1455208368393513) - 16653 us
( 1455208368410202) - 16689 us
:
:
( 1455208378293808) - 16667 us
( 1455208378310476) - 16668 us
( 1455208378327142) - 16666 us
( 1455208368326936)~( 1455208378327142) - 10000206 us

The time intervals of v-sync are almost about 16.6~16.7 ms, and there are two times over 17.0 ms among 3000's calls (5 tests).

The time intervals' graph of N2 and C2 show below,
V001.jpg
V001.jpg (105.49 KiB) Viewed 3457 times
There is a strange thing that the total time of 600 times call is about 10.185 sec on N2, but the total time is about 10.000 sec on C2.
The clock's of N2 may be something wrong.
Last edited by hsu95066 on Sun May 05, 2019 2:34 pm, edited 4 times in total.
These users thanked the author hsu95066 for the post:
wallyz21 (Fri May 03, 2019 9:30 am)

User avatar
odroid
Site Admin
Posts: 32519
Joined: Fri Feb 22, 2013 11:14 pm
languages_spoken: English
ODROIDs: ODROID
Has thanked: 184 times
Been thanked: 349 times
Contact:

Re: Unexpected graphics performance differences in application

Unread post by odroid » Mon Apr 29, 2019 6:20 pm

@hsu95066,
Thank you for the test.
Did you measure the vsync intervals on Kernel 4.9.170?

wallyz21
Posts: 104
Joined: Thu Apr 04, 2019 11:00 am
languages_spoken: english
ODROIDs: N2
Has thanked: 9 times
Been thanked: 12 times
Contact:

Re: Unexpected graphics performance differences in application

Unread post by wallyz21 » Tue Apr 30, 2019 11:41 am

@ hsu95066

Nice job. The other thing that strikes me is the deviation from the average for each machine. The C2 provides a very flat consistent line with almost no spikes. The N2 is all over the place! It's like it's drunk who can't walk in a straight line.

Can you do me a favor and post your uname -a output for both the C2 and the N2, please?

hsu95066
Posts: 51
Joined: Wed Aug 10, 2016 5:01 pm
languages_spoken: Chinese, English
ODROIDs: C1, C1+, C2, XU4
Has thanked: 3 times
Been thanked: 6 times
Contact:

Re: Unexpected graphics performance differences in application

Unread post by hsu95066 » Tue Apr 30, 2019 12:15 pm

@odroid

N2 upgraded to kernel 4.9.170 today, but the vsync intervals are no more difference than kernel 4.9.162.
Some result below,

( 1556588434723585)
( 1556588434740480) - 16895 us
( 1556588434757457) - 16977 us
( 1556588434774422) - 16965 us
( 1556588434791377) - 16955 us
( 1556588434808374) - 16997 us
( 1556588434825334) - 16960 us
( 1556588434842320) - 16986 us
:
:
( 1556588444790754) - 16704 us
( 1556588444807676) - 16922 us
( 1556588444824661) - 16985 us
( 1556588444842033) - 17372 us
( 1556588444858617) - 16584 us
( 1556588444875587) - 16970 us
( 1556588444892576) - 16989 us
( 1556588444909591) - 17015 us
( 1556588434723585)~( 1556588444909591) - 10186006 us

I also did another test.
The total time of 60000's call is 16 min 59 sec by real time counter, and the average time of 600's call is 10.19 sec.
So the return time of gettimeofday() call is no problem.
There is a gap between the current time interval and the theoretical value of 16.7 ms on N2 now.
These users thanked the author hsu95066 for the post:
wallyz21 (Fri May 03, 2019 9:30 am)

wallyz21
Posts: 104
Joined: Thu Apr 04, 2019 11:00 am
languages_spoken: english
ODROIDs: N2
Has thanked: 9 times
Been thanked: 12 times
Contact:

Re: Unexpected graphics performance differences in application

Unread post by wallyz21 » Tue Apr 30, 2019 12:49 pm

hsu95066 wrote:
Tue Apr 30, 2019 12:15 pm
@odroid

N2 upgraded to kernel 4.9.170 today, but the vsync intervals are no more difference than kernel 4.9.162.
Some result below,

( 1556588434723585)
( 1556588434740480) - 16895 us
( 1556588434757457) - 16977 us
( 1556588434774422) - 16965 us
( 1556588434791377) - 16955 us
( 1556588434808374) - 16997 us
( 1556588434825334) - 16960 us
( 1556588434842320) - 16986 us
:
:
( 1556588444790754) - 16704 us
( 1556588444807676) - 16922 us
( 1556588444824661) - 16985 us
( 1556588444842033) - 17372 us
( 1556588444858617) - 16584 us
( 1556588444875587) - 16970 us
( 1556588444892576) - 16989 us
( 1556588444909591) - 17015 us
( 1556588434723585)~( 1556588444909591) - 10186006 us

I also did another test.
The total time of 60000's call is 16 min 59 sec by real time counter, and the average time of 600's call is 10.19 sec.
So the return time of gettimeofday() call is no problem.
There is a gap between the current time interval and the theoretical value of 16.7 ms on N2 now.
Can you please provide the output of "uname -a" for both system, please!

hsu95066
Posts: 51
Joined: Wed Aug 10, 2016 5:01 pm
languages_spoken: Chinese, English
ODROIDs: C1, C1+, C2, XU4
Has thanked: 3 times
Been thanked: 6 times
Contact:

Re: Unexpected graphics performance differences in application

Unread post by hsu95066 » Tue Apr 30, 2019 1:34 pm

@ wallyz21

On N2, ubuntu-18.04.2-4.9-minimal-odroid-n2-20190329.img
Linux odroid 4.9.170-27 #1 SMP PREEMPT Mon Apr 29 12:45:24 -03 2019 aarch64 aarch64 aarch64 GNU/Linux

On C2,
odroid@odroid64:~/Documents$ uname -a
Linux odroid64 3.14.79-117 #1 SMP PREEMPT Tue Jan 2 23:46:30 BRST 2018 aarch64 aarch64 aarch64 GNU/Linux

odroid@odroid64:~/Documents$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04.6 LTS
Release: 16.04
Codename: xenial
:)
These users thanked the author hsu95066 for the post:
wallyz21 (Fri May 03, 2019 9:30 am)

wallyz21
Posts: 104
Joined: Thu Apr 04, 2019 11:00 am
languages_spoken: english
ODROIDs: N2
Has thanked: 9 times
Been thanked: 12 times
Contact:

Re: Unexpected graphics performance differences in application

Unread post by wallyz21 » Tue Apr 30, 2019 2:45 pm

hsu95066 wrote:
Tue Apr 30, 2019 1:34 pm
@ wallyz21

On N2, ubuntu-18.04.2-4.9-minimal-odroid-n2-20190329.img
Linux odroid 4.9.170-27 #1 SMP PREEMPT Mon Apr 29 12:45:24 -03 2019 aarch64 aarch64 aarch64 GNU/Linux

On C2,
odroid@odroid64:~/Documents$ uname -a
Linux odroid64 3.14.79-117 #1 SMP PREEMPT Tue Jan 2 23:46:30 BRST 2018 aarch64 aarch64 aarch64 GNU/Linux

odroid@odroid64:~/Documents$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04.6 LTS
Release: 16.04
Codename: xenial
:)
Thanks. I just wanted to make sure both were using the same real time kernel options. It appears they are!

I never noticed the PREEMPT option before on Ubuntu. Certainly isn't set on mainline Intel Ubuntu ISO downloads and I was just a little suspicious.

However if the C2 has the same option then there is little point configuring a kernel without that option to see if it is causing any problems.

wallyz21
Posts: 104
Joined: Thu Apr 04, 2019 11:00 am
languages_spoken: english
ODROIDs: N2
Has thanked: 9 times
Been thanked: 12 times
Contact:

Re: Unexpected graphics performance differences in application

Unread post by wallyz21 » Tue Apr 30, 2019 2:58 pm

Ok so what we have determined so far is the :

- C2 has a very consistent 16.7 msecs vsync with a small deviation.
- N2 has an inconsistent 17.0 msecs vsync with a larger deviation.

Both kernels have real time PREEMPT.

The 17ms vsync explains why PPSSPPSDL on the N2 gets a steady 58 fps instead of 60 fps!

wallyz21
Posts: 104
Joined: Thu Apr 04, 2019 11:00 am
languages_spoken: english
ODROIDs: N2
Has thanked: 9 times
Been thanked: 12 times
Contact:

Re: Unexpected graphics performance differences in application

Unread post by wallyz21 » Tue Apr 30, 2019 8:57 pm

@hsu95066

I have modified your code so the timer mechanism can be selected from the command line:

Timer Test:

tt {timerfb/nanosleep/usleep/poll/vsync} #loopcnt

Example:

./tt vsync 600

Runs vsync 600 times. All output is single column suitable for csv so:

.//tt nanosleep 600 >nanosleep.csv

Will create a csv file.

Here is the full code:

Code: Select all

#include <sys/ioctl.h>
#include <sys/timerfd.h>
#include <sys/time.h>
#include <fcntl.h>
#include <linux/types.h>
#include <linux/ioctl.h>
#include <linux/fb.h>
#include <unistd.h>
#include <errno.h>
#include <string.h>
#include <stdio.h>
#include <assert.h>
#include <sched.h>
#include <string.h>

// gcc -o tt tt.c

#define TARGETFRAMERATE 60
#define TYPE_TIMERFD 1
#define TYPE_NANOSLEEP 2
#define TYPE_USLEEP 3
#define TYPE_POLL 4
#define TYPE_VSYNC 5

static unsigned long long StartTime, EndTime, OneFrame, CurrentTime, SleepRes, TargetTime, OneMs;
static struct timespec MonoClockResolution;
int timerfd, fb;

char *strlwr(char *str)
{
  unsigned char *p = (unsigned char *)str;

  while (*p) {
     *p = tolower((unsigned char)*p);
      p++;
  }

  return str;
}

long long current_timestamp()
{
	struct timeval te;

	gettimeofday(&te, NULL); // get current time
	long long      us = te.tv_sec*1000000L + te.tv_usec; // calculate milliseconds
	// printf("milliseconds: %lld\n", milliseconds);
	return us;
}

long long GetPerformanceCounter()
{
    int callStat;

    callStat = clock_gettime(CLOCK_MONOTONIC_RAW, &MonoClockResolution);

    return ((long long)MonoClockResolution.tv_sec*1000000000) + MonoClockResolution.tv_nsec;
}

void CalibrateThrottle(type)
{
    int callStat;
    struct sched_param schedparm;

    callStat = clock_getres(CLOCK_MONOTONIC_RAW, &MonoClockResolution);

    if (callStat)
    {
        fprintf(stderr, "Can't get CLOCK resolution %d %s\n", callStat, strerror(errno));
    }

	OneFrame = MonoClockResolution.tv_nsec * 1000000000 / (TARGETFRAMERATE);
	OneMs = MonoClockResolution.tv_nsec * 1000000;

    switch(type)
        case TYPE_TIMERFD:
            // Setup the timerfd and set scheduling in case real time kernel extensions are active
            timerfd = timerfd_create(CLOCK_MONOTONIC, 0);
            assert(timerfd != -1);
            memset(&schedparm, 0, sizeof(schedparm));
            schedparm.sched_priority = 1; // lowest rt priority
            pthread_setschedparam(0, SCHED_FIFO, &schedparm);
            break;
        
        case TYPE_VSYNC:
            fb = open("/dev/fb0", O_RDWR);
            assert(fb != -1);
            int zero = 0;

           if (ioctl(fb, FBIO_WAITFORVSYNC, &zero) == -1)
            {
                fprintf(stderr, "fb ioctl failed: %s\n", strerror(errno));
                assert(1);
            }
            break;

        default:
        break;
    }
}

void FrameWait(int type)
{
    CurrentTime = GetPerformanceCounter();
    TargetTime = CurrentTime + OneFrame;

    switch(type)
    {
        case TYPE_TIMERFD:
            // Use timerfd API to delay (this is slower than using nanosleep)
            {
                struct itimerspec waitspec;
                unsigned long expiries;

                waitspec.it_interval.tv_sec = 0;
                waitspec.it_interval.tv_nsec = 0;
                waitspec.it_value.tv_sec = 0;
                waitspec.it_value.tv_nsec = TargetTime - CurrentTime;
                timerfd_settime(timerfd, 0, &waitspec, 0);
                read(timerfd, &expiries, sizeof(expiries));
            }
            break;

        case TYPE_NANOSLEEP:
            // Use nanosleep to delay
            {
                struct timespec duration, dummy;

                duration.tv_sec = 0;
                duration.tv_nsec = TargetTime - CurrentTime;
                nanosleep(&duration, &dummy);
            }
            break ;

        case TYPE_USLEEP:
            // Use usleep to the nearest millisecond then poll the rest
            {
                unsigned long Tt_minus_1ms = TargetTime - OneMs;
                int msDelays = (Tt_minus_1ms - CurrentTime) / OneMs;

                if (msDelays > 1)
                {
                    usleep(msDelays*1000);
                }

                while (CurrentTime < TargetTime)	//Poll Until frame end.
                {
                    CurrentTime = GetPerformanceCounter();
                }
            }
            break;


        case TYPE_POLL:
            // Use busy CPU with high resolution counter to delay

            while (CurrentTime < TargetTime)	//Poll Until frame end.
            {
                CurrentTime = GetPerformanceCounter();
            }
            break;

        case TYPE_VSYNC:
            {
                int zero=0;
                if (ioctl(fb, FBIO_WAITFORVSYNC, &zero) == -1)
                {
                    fprintf(stderr, "fb ioctl failed: %s\n", strerror(errno));
                }
            }
            break;

        default:
        break;
    }

	return;
}

int main(int argc, char *argv[])
{
    int type;
    long loopcnt=0;

    if (argc != 3)
    {
        fprintf(stderr, "usage: tt {timerfb/nanosleep/usleep/poll/vsync} #loopcnt\n");
        return 0;
    }

    if (strcmp(strlwr(argv[1]), "timerfb") == 0)
    {
        type = TYPE_TIMERFD;
    }
    else if (strcmp(strlwr(argv[1]), "nanosleep") == 0)
    {
        type = TYPE_NANOSLEEP;
    }
    else if (strcmp(strlwr(argv[1]), "usleep") == 0)
    {
        type = TYPE_USLEEP;
    }
    else if (strcmp(strlwr(argv[1]), "poll") == 0)
    {
        type = TYPE_POLL;
    }
    else if (strcmp(strlwr(argv[1]), "vsync") == 0)
    {
        type = TYPE_VSYNC;
    }
    else
    {
        fprintf(stderr, "usage: tt {timerfb/nanosleep/usleep/poll/vsync} #loopcnt\n");
        return 0;
    }

    loopcnt = atoi(argv[2]);
    if (!loopcnt)
    {
        fprintf(stderr, "usage: tt {timerfb/nanosleep/usleep/poll/vsync} #loopcnt\n");
        return 0;
    }
    
    CalibrateThrottle(type);

	{
		int loop;
		long long *timestamp = malloc(loopcnt * sizeof(long long));

		for (loop = 0; loop < loopcnt ; loop++)
		{
           	    FrameWait(type);
		    timestamp[loop] = current_timestamp();
		}

		// print loop
		printf("(%18lld)\n", timestamp[0]);
		for (loop = 1; loop < loopcnt ; loop++)
		{
			printf("%ld\n", timestamp[loop] - timestamp[loop-1]);
		}
	} 

	return 0;
}

Timer Test:

I have already charted all the five timer options with 6000 samples.

nanosleep N2
https://drive.google.com/file/d/1u5PvJi ... sp=sharing
Image

Poll N2
https://drive.google.com/file/d/1ofaUHa ... sp=sharing
Image

timerfb N2
https://drive.google.com/file/d/1FOT6oO ... sp=sharing
Image

usleep N2
https://drive.google.com/file/d/1Pos9rs ... sp=sharing
Image

Vsync N2
https://drive.google.com/file/d/12LkQ5s ... sp=sharing
Image

I would like to compare the timings of the C2 with the N2. Are you able to run the code on the C2 and provide some graphs?

With the exception of the poll graph the N2 timing graphs look more like noise graphs!

hsu95066
Posts: 51
Joined: Wed Aug 10, 2016 5:01 pm
languages_spoken: Chinese, English
ODROIDs: C1, C1+, C2, XU4
Has thanked: 3 times
Been thanked: 6 times
Contact:

Re: Unexpected graphics performance differences in application

Unread post by hsu95066 » Tue Apr 30, 2019 11:47 pm

Labor Day is a holiday on May 1st, and I will test it on Thursday. :)

eval-
Posts: 7
Joined: Tue Apr 30, 2019 6:47 am
languages_spoken: english
ODROIDs: N2
Has thanked: 0
Been thanked: 1 time
Contact:

Re: Unexpected graphics performance differences in application

Unread post by eval- » Wed May 01, 2019 4:47 am

wallyz21 wrote:
Tue Apr 30, 2019 8:57 pm
With the exception of the poll graph the N2 timing graphs look more like noise graphs!
I think the problem is specific to HK's Ubuntu images? I had to switch to Arch Linux (never used it before now but it's actually nice, uses less RAM and feels faster) because the HDMI flickering under Ubuntu made me crazy. Crashoverride's comment on the HDMI thread brought me here. Anyway, I ran your tt.c under Arch Linux (Linux alarm 4.9.170-1-ARCH #1 SMP PREEMPT Sat Apr 27 10:31:29 MDT 2019 aarch64 GNU/Linux)

tt_odroid_n2_archlinux.png
tt_odroid_n2_archlinux.png (38.6 KiB) Viewed 3222 times

I clipped y-axis because timerfb, usleep, vsync have extreme outliers. Not sure why nanosleep takes 0.06ms longer than usleep (which has 50x higher standard deviation.)

If you want I can send the .csv files or /proc/config.gz, dmesg, whatever else might help.
These users thanked the author eval- for the post:
wallyz21 (Fri May 03, 2019 9:30 am)

wallyz21
Posts: 104
Joined: Thu Apr 04, 2019 11:00 am
languages_spoken: english
ODROIDs: N2
Has thanked: 9 times
Been thanked: 12 times
Contact:

Re: Unexpected graphics performance differences in application

Unread post by wallyz21 » Wed May 01, 2019 9:20 am

@ eval

That looks really good and is what I would expect to see. That is a good sign as it means the problem is not in the hardware but must be isolated to the Ubuntu OS.

I might flip over to arch linux and have a look. What is the default GUI desktop with Arch Linux?

The reason the usleep method is better is because I cheated. If you look in the code I use usleep to delay to the nearest millisec and then use polling to finish off the fine timing.

Which HDMI thread?

wallyz21
Posts: 104
Joined: Thu Apr 04, 2019 11:00 am
languages_spoken: english
ODROIDs: N2
Has thanked: 9 times
Been thanked: 12 times
Contact:

Re: Unexpected graphics performance differences in application

Unread post by wallyz21 » Wed May 01, 2019 11:54 am

@eval

Out of curiosity could you provide the output from "uname -s" from your Arch Linux setup!

eval-
Posts: 7
Joined: Tue Apr 30, 2019 6:47 am
languages_spoken: english
ODROIDs: N2
Has thanked: 0
Been thanked: 1 time
Contact:

Re: Unexpected graphics performance differences in application

Unread post by eval- » Wed May 01, 2019 3:44 pm

wallyz21 wrote:
Wed May 01, 2019 11:54 am
@eval

Out of curiosity could you provide the output from "uname -s" from your Arch Linux setup!
Sorry... just missed this, leaving for vacation.. but I already posted uname -a above. Kernel version is the same, why I offered .config (can send Monday)

As for default UI, not sure there is any... base rootfs was sparse, had to "Pacman" everything, even bash-completion (!). You could always just backup and then swap your sdcard /boot from arch (and copy over /lib/modules/... if you want them)?

hsu95066
Posts: 51
Joined: Wed Aug 10, 2016 5:01 pm
languages_spoken: Chinese, English
ODROIDs: C1, C1+, C2, XU4
Has thanked: 3 times
Been thanked: 6 times
Contact:

Re: Unexpected graphics performance differences in application

Unread post by hsu95066 » Thu May 02, 2019 5:42 pm

@wallyz21

the graph of C2 show below,
0502.png
0502.png (49.33 KiB) Viewed 3047 times

The total time of 60000's vsync call is 16 min 40 sec on C2 by time clock.
These users thanked the author hsu95066 for the post:
wallyz21 (Fri May 03, 2019 9:29 am)

wallyz21
Posts: 104
Joined: Thu Apr 04, 2019 11:00 am
languages_spoken: english
ODROIDs: N2
Has thanked: 9 times
Been thanked: 12 times
Contact:

Re: Unexpected graphics performance differences in application

Unread post by wallyz21 » Fri May 03, 2019 9:33 am

@hus95066

Nice work. It appears the timings on the C2 are very tight (low deviation) as well.

wallyz21
Posts: 104
Joined: Thu Apr 04, 2019 11:00 am
languages_spoken: english
ODROIDs: N2
Has thanked: 9 times
Been thanked: 12 times
Contact:

Re: Unexpected graphics performance differences in application

Unread post by wallyz21 » Fri May 03, 2019 9:44 am

A tiny bit a light at the end of tunnel.

If do all all the following things:

Run in minimal mode using the mali_fbdev
Set cpu goverernor to performance
taskset -c 2,3,4,5 ./myemulator

Then I get 59fps!!!

I think that is as close to 60fps as I will get with the timer inaccuracies.

Still I would like anyone to whom this may concern to continue to investigate the timer/vsync inaccuracies with N2 Ubuntu.

I have seen some negative reviews of the N2 lately and I think it is important we all stay on top of this. Basically the reviewer is not holding out that the current situation will improve at all!

As much as i don't like this review, I have provided the link here:

https://www.youtube.com/watch?v=Y8N6qZzWqic

I would also like to offer my help with anything. However I have no kernel debugging experience.

I really would like to get a counter positive youtube video organised showing the same benchmarks as in the negative video.

crashoverride
Posts: 4549
Joined: Tue Dec 30, 2014 8:42 pm
languages_spoken: english
ODROIDs: C1
Has thanked: 0
Been thanked: 77 times
Contact:

Re: Unexpected graphics performance differences in application

Unread post by crashoverride » Fri May 03, 2019 11:39 am

wallyz21 wrote:
Fri May 03, 2019 9:44 am
taskset -c 2,3,4,5 ./myemulator
Pro Tip: To set both clusters to "performance", it is necessary to issue two (2) commands. For reasons unknown, a single command can not be used.

Code: Select all

sudo cpufreq-set -c 0-1 -g performance
sudo cpufreq-set -c 2-5 -g pefrormance
Although your app may use a single cluster, there are many other tasks and IRQs that may end up elsewhere.
wallyz21 wrote:
Fri May 03, 2019 9:44 am
I really would like to get a counter positive youtube video organised showing the same benchmarks as in the negative video.
Image

The video shows pure CPU rendering. Ironically, I was very impressed that N2 could emulate both a Wii and a desktop class GPU as fast as it does!
These users thanked the author crashoverride for the post:
wallyz21 (Fri May 03, 2019 12:55 pm)

wallyz21
Posts: 104
Joined: Thu Apr 04, 2019 11:00 am
languages_spoken: english
ODROIDs: N2
Has thanked: 9 times
Been thanked: 12 times
Contact:

Re: Unexpected graphics performance differences in application

Unread post by wallyz21 » Fri May 03, 2019 12:40 pm

@crashoverride

Thanks yes I was aware:

sudo cpufreq-set -c 0 -g performance
sudo cpufreq-set -c 2 -g performance

You can save a whole 4 characters by using 0 and 2 instead of 0-1 and 2-5 as all dependent CPUs in a set are changed.

Don't waste those characters. One day they'll run out then you'll be sorry :)

User avatar
meveric
Posts: 10528
Joined: Mon Feb 25, 2013 2:41 pm
languages_spoken: german, english
ODROIDs: X2, U2, U3, XU-Lite, XU3, XU3-Lite, C1, XU4, C2, C1+, XU4Q, HC1, N1, Go, H2 (N4100), N2, H2 (J4105)
Has thanked: 17 times
Been thanked: 149 times
Contact:

Re: Unexpected graphics performance differences in application

Unread post by meveric » Fri May 03, 2019 6:12 pm

crashoverride wrote:
Fri May 03, 2019 11:39 am
The video shows pure CPU rendering. Ironically, I was very impressed that N2 could emulate both a Wii and a desktop class GPU as fast as it does!
Yeah I was shocked about this as well, he's talking about "driver optimization" and will never run and bullshit like that, while he runs it entirely in MESA software OpenGL :D
Donate to support my work on the ODROID GameStation Turbo Image for U2/U3 XU3/XU4 X2 X C1 as well as many other releases.
Check out the Games and Emulators section to find some of my work or check the files in my repository to find the software i build for ODROIDs.
If you want to add my repository to your image read my HOWTO integrate my repo into your image.

Post Reply

Return to “Ubuntu”

Who is online

Users browsing this forum: No registered users and 3 guests