futex() working properly? ...

Test and fix the Kernel 4.14 features

Moderators: mdrjr, odroid

futex() working properly? ...

Unread postby brianb644 » Wed Jan 10, 2018 8:45 am

I have a couple HC1 units and am running the HPCC Systems software on them.

When I benchmark the software ... I expect to see long runs where the system is 100% busy, mostly with "user time" ... instead I see the system 100% busy but with only 40% "user time" and about 60% "system time". I've traced the excess "system time" to the futex() system call (commonly used in libpthreads) ... which for lack of a better term appears to be doing a "busy wait".

I also noticed that when running "gmake -j" (another threaded operation), that top was also displaying a high proportion of system time ... more than I would expect and making me think its not just a fault with the software package.

I am hoping that someone will have some insight into the futex() implementation within the kernel and can suggest what I should do going forward. It seems possible that, for some applications (real-time perhaps), a busy wait is the most efficient implementation ... so maybe the implementation of the futex() call can be configured? ... at runtime (good)? ... or compile time (less good)?

I think implementing this kernel call requires custom assembly code on both the user and kernel side of the system call, so it also seems possible that the implementation is broken or incomplete.
brianb644
 
Posts: 7
Joined: Sat Jan 06, 2018 12:59 am
languages_spoken: english
ODROIDs: HC1

Re: futex() working properly? ...

Unread postby brianb644 » Wed Jan 24, 2018 5:55 pm

A quick additional data point. I installed the Arch Linux distribution for my HC1 and built the HPCC Systems software there. I experienced the same apparent busy-wait symptom I see when using Ubuntu. I've noticed there are real-time options available for the Kernel at compile time. I think my plan is to explore the options available when building the kernel. If I find the real-time options are enabled, or other options might apply to my symptoms. I'll test with a custom kernel. I'm also pursue building a minimal test program that can exhibit the symptoms.
brianb644
 
Posts: 7
Joined: Sat Jan 06, 2018 12:59 am
languages_spoken: english
ODROIDs: HC1

Re: futex() working properly? ...

Unread postby odroid » Wed Jan 24, 2018 6:08 pm

I have no idea about the futex issue since we don't know well the kernel deep inside.

Real-time kernel functionality is also very limited on ARM big-Little architecture. Only the basic RT options worked.
viewtopic.php?f=146&t=29271
User avatar
odroid
Site Admin
 
Posts: 28292
Joined: Fri Feb 22, 2013 11:14 pm
languages_spoken: English
ODROIDs: ODROID

Re: futex() working properly? ...

Unread postby brianb644 » Tue Mar 06, 2018 6:22 am

I'm still following up on this issue and would like to try an experiment. Can someone tell me how to boot the HC1 ... to use only the 4 BIG CPUs ... or point me to a post/blog/documentation to help me figure it out.

Thank!
brianb644
 
Posts: 7
Joined: Sat Jan 06, 2018 12:59 am
languages_spoken: english
ODROIDs: HC1

Re: futex() working properly? ...

Unread postby odroid » Tue Mar 06, 2018 10:00 am

Try below commands to turn the LITTLE cores off.
"echo 0 > /sys/devices/system/cpu/cpu0/online"
"echo 0 > /sys/devices/system/cpu/cpu1/online"
"echo 0 > /sys/devices/system/cpu/cpu2/online"
"echo 0 > /sys/devices/system/cpu/cpu3/online"
User avatar
odroid
Site Admin
 
Posts: 28292
Joined: Fri Feb 22, 2013 11:14 pm
languages_spoken: English
ODROIDs: ODROID

Re: futex() working properly? ...

Unread postby rooted » Tue Mar 06, 2018 12:58 pm

odroid wrote:Try below commands to turn the LITTLE cores off.
"echo 0 > /sys/devices/system/cpu/cpu0/online"
"echo 0 > /sys/devices/system/cpu/cpu1/online"
"echo 0 > /sys/devices/system/cpu/cpu2/online"
"echo 0 > /sys/devices/system/cpu/cpu3/online"
Can you disable them from boot, before the OS is up?

By editing the DTB I'm guessing?
User avatar
rooted
 
Posts: 5177
Joined: Fri Dec 19, 2014 9:12 am
Location: Gulf of Mexico, US
languages_spoken: english
ODROIDs: C1, C1+, C2
XU3 Lite, XU4
N1
VU7+
HiFi Shield 2
Smart Power (original)

Re: futex() working properly? ...

Unread postby memeka » Tue Mar 06, 2018 6:01 pm

rooted wrote:
odroid wrote:Try below commands to turn the LITTLE cores off.
"echo 0 > /sys/devices/system/cpu/cpu0/online"
"echo 0 > /sys/devices/system/cpu/cpu1/online"
"echo 0 > /sys/devices/system/cpu/cpu2/online"
"echo 0 > /sys/devices/system/cpu/cpu3/online"
Can you disable them from boot, before the OS is up?

By editing the DTB I'm guessing?


i think not, i think the XU4 boots off the small cores and then enables the big ones.
User avatar
memeka
 
Posts: 3963
Joined: Mon May 20, 2013 10:22 am
languages_spoken: english
ODROIDs: XU rev2 + eMMC + UART
U3 + eMMC + IO Shield + UART

Re: futex() working properly? ...

Unread postby brianb644 » Wed Mar 07, 2018 9:28 am

Turning them off seems to be a bit tricky ... I found the following bash script to work reliably ... from experience, the sleeps and the taskset seem to be necessary.

Code: Select all
#
# Ensure that this process isn't running on the cpu we are about to turn off
# and hang out for a bit to ensure we move.
#
taskset -p 0xf0 $$
sleep 1

export LITTLE_ONLINE_STATUS=0

for cpuid in {0..3}
do
    online=/sys/devices/system/cpu/cpu${cpuid}/online

    if test -f ${online}
    then
        curr_online=`cat $online`
        if test "$curr_online" -ne "${LITTLE_ONLINE_STATUS}"
        then
            echo "${LITTLE_ONLINE_STATUS}" > /sys/devices/system/cpu/cpu${cpuid}/online
            # The following sleep seems necessary to avoid hanging
            sleep 1
        fi
    else
        echo "Error: online file for cpu$cpuid not found ($online)" 1>&2
        exit 1;
    fi
done


I wrote the script to facilitate turning the little cores back on ... but setting the cpuX/online status to 1 always hangs for me (but this isn't important for my application).

Concerning my futex() issue ... turning off the little cores fixes the problem I was trying to fix. With the little cores turned off, my application can run the BIG cores at near 100% user time for long stretches.

I'll probably not do the research to verify my guesses, but my application has lots of threads and does lots of task switching ... the futex() supported pthread library uses local variables shared by all the cores to minimise system calls, however the BIG.little arch has to do quite a bit of work to reconcile possible writes across the BIG.little boundary. Turning off one half of the BIG.little lets the futex() calls run at full speed. For most applications this isn't a big deal, however my application runs best with the "little" cores turned off.

EDIT 29-Mar-2018: The script above IS NOT as reliable as I thought. I've been running it as part of rc.local, but sometimes find that one of the little cores is still enabled ... this defeats the purpose I'm running the script for. Also, after a software upgrade, after the system rebooted ... attempts to ssh into the system hang. If I disable this script in rc.local, all is OK. If I enable the script, access via ssh will hang. If I reboot the system and run ssh before rc.local runs, I can get in ... but my session is likely to hang after rc.local runs. I'll post a new item at the bottom to solicit alternative ideas!
Last edited by brianb644 on Thu Mar 29, 2018 6:45 pm, edited 3 times in total.
brianb644
 
Posts: 7
Joined: Sat Jan 06, 2018 12:59 am
languages_spoken: english
ODROIDs: HC1

Re: futex() working properly? ...

Unread postby rooted » Wed Mar 07, 2018 11:05 pm

I think disabling big.LITTLE scheduler and using HMP (or whatever it's called as I forget) in the kernel would allow you to use all cores if the issue is due to overhead.

*edit*
What I meant is disable HMP and use BL switcher.
Last edited by rooted on Thu Mar 08, 2018 12:59 am, edited 1 time in total.
User avatar
rooted
 
Posts: 5177
Joined: Fri Dec 19, 2014 9:12 am
Location: Gulf of Mexico, US
languages_spoken: english
ODROIDs: C1, C1+, C2
XU3 Lite, XU4
N1
VU7+
HiFi Shield 2
Smart Power (original)

Re: futex() working properly? ...

Unread postby mad_ady » Thu Mar 08, 2018 12:41 am

You could also move all userspace apps to the big cores by setting CPUAffinity=4-7 in /etc/systemd/systemd.conf, but you'd still have some interrupts (and kernel) running on cpu0
User avatar
mad_ady
 
Posts: 4034
Joined: Wed Jul 15, 2015 5:00 pm
Location: Bucharest, Romania
languages_spoken: english
ODROIDs: XU4, C1+, C2, N1

Re: futex() working properly? ...

Unread postby brianb644 » Mon Mar 12, 2018 1:36 pm

Thanks for the suggestions! Over time I think I'll still try some additional configurations ... with different combinations of Application and System settings. Setting CPUAffinity will definitely be on my list.

Underneath what I currently "suspect" is happening ... is that because the futex() call uses process local shared memory to help negotiate entry into critical sections ... it can cause extra work ... and disruption to the cpu memory caches? ... to guarantee that the local memory address hasn't been written by the other set of CPUs. Because the app I'm using is calling futex() so often ... the application is really affected. Turning off the little cores seems to skip any "extra work", where forcing all application processes to the big cores via taskset ... did not help at all.
brianb644
 
Posts: 7
Joined: Sat Jan 06, 2018 12:59 am
languages_spoken: english
ODROIDs: HC1

Re: futex() working properly? ...

Unread postby brianb644 » Thu Mar 29, 2018 7:03 pm

Is it possible to turn off HMP under Ubuntu and only run on the A15 cores? Booting on the little cores would be fine, but I would like to run my application on only the big cores ... and because of the heavy futex() call usage ... any of the little cores being active slows things down.

FYI - HPCC Systems on 2 x HC1 ... can complete the a 50GB version of the TeraSort Challenge (Gray sort) in 16-ish minutes ... and for quite a while performance would scale linearly ... e.g. 8 x HC1 should take about 4-ish minutes. This is fast enough to do real work with really large data.
brianb644
 
Posts: 7
Joined: Sat Jan 06, 2018 12:59 am
languages_spoken: english
ODROIDs: HC1

Re: futex() working properly? ...

Unread postby mad_ady » Thu Mar 29, 2018 7:32 pm

You can turn off HMP, but you need to recompile your kernel. But you need to use cgroups to assign tasks to cores. Lookup Odroid Magazine, the NAS article from february 2017 for examples.
User avatar
mad_ady
 
Posts: 4034
Joined: Wed Jul 15, 2015 5:00 pm
Location: Bucharest, Romania
languages_spoken: english
ODROIDs: XU4, C1+, C2, N1

Re: futex() working properly? ...

Unread postby memeka » Sat Jun 23, 2018 12:20 pm

You can keep using HMP and assign applications to big cores only.
Look up on taskset.
User avatar
memeka
 
Posts: 3963
Joined: Mon May 20, 2013 10:22 am
languages_spoken: english
ODROIDs: XU rev2 + eMMC + UART
U3 + eMMC + IO Shield + UART

Re: futex() working properly? ...

Unread postby mad_ady » Sat Jun 23, 2018 7:23 pm

:) long time no see, memeka! Welcome back.
User avatar
mad_ady
 
Posts: 4034
Joined: Wed Jul 15, 2015 5:00 pm
Location: Bucharest, Romania
languages_spoken: english
ODROIDs: XU4, C1+, C2, N1


Return to Linux Kernel 4.14 Debugging Party

Who is online

Users browsing this forum: No registered users and 1 guest