Crashes with ondemand governor

Post Reply
evrflx
Posts: 20
Joined: Fri Apr 05, 2019 4:16 am
languages_spoken: english
ODROIDs: 4 C2, 2 XU4, 5 N2
Has thanked: 1 time
Been thanked: 3 times
Contact:

Crashes with ondemand governor

Unread post by evrflx » Sat Feb 15, 2020 5:57 pm

Hi,
I am running a cluster of 5 odroid n2s with archlinux arm and the odroid kernel.
Ever since I run it, one of the machines reboots or crashes sporadically.

I am quite confident that I have pinpointed the cause: I am running the ondeman cpu governor and the kernel logs messages about exceptions.
When starting the prometheus node exporter, the machine crashes instantly. I assume that this is because accessing governor details from the kernel (at least the exception is the last message I see).

Is this a bug in the kernel or the odroid hardware? Any chance to get this stable?

Code: Select all

kernel: ------------[ cut here ]------------
kernel: WARNING: CPU: 1 PID: 3199 at cpufreq_cpu_get+0xb4/0xd0
kernel: Modules linked in: dummy vport_vxlan vxlan ip6_udp_tunnel udp_tunnel openvswitch xt_NFLOG nf_conntrack_netlink xt_addrtype xt_statistic xt_nat ipt_REJECT nf_reject_ipv4 xt_tcpudp xt_physdev xt_conntrack ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs nf_tables ip_set_hash_ip xt_set ip_set iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_comment iptable_filter ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6_tables xt_mark iptable_nat nf_conntrack_ipv4 nf_nat nf_conntrack overlay zram rtc_pcf8563 i2c_meson_master ir_lirc_codec lirc_dev meson_ir sch_fq_codel br_netfilter bridge stp llc ip_tables x_tables ipv6
kernel: 
kernel: CPU: 1 PID: 3199 Comm: kworker/1:2 Not tainted 4.9.210-1-ARCH #1
kernel: Hardware name: Hardkernel ODROID-N2 (DT)
kernel: Workqueue: events dbs_work_handler
kernel: task: ffffffc0c7a1e040 task.stack: ffffffc0c7d1c000
kernel: PC is at cpufreq_cpu_get+0xb4/0xd0
kernel: LR is at cpufreq_times_record_transition+0x34/0x68
kernel: R1  : ffffff800aaba890, PFN: 2aba
kernel: pc : [<ffffff80096fa044>] lr : [<ffffff80096ff81c>] pstate: 80000045
kernel: sp : ffffffc0c7d1fb20
kernel: x29: ffffffc0c7d1fb20 x28: 00000000000b2778 
kernel: x27: ffffffc0ca1f741c x26: 00000000000a2d78 
kernel: x25: 0000000000000000 x24: 0000000000000001 
kernel: x23: ffffff800a82b27c x22: ffffff800aaba758 
kernel: x21: 00000016cd2ccd2c x20: ffffffc0ca1f7400 
kernel: x19: 0000000000000008 x18: 0000000000000000 
kernel: x17: 0000007f85292a40 x16: ffffff8009138fe0 
kernel: x15: 00002aabf746aeb0 x14: 0000000000000000 
kernel: x13: 0000000000000005 x12: 00000000e5b906e6 
kernel: x11: 0000000000000006 x10: 0101010101010101 
kernel: x9 : ffffffc0c7d1fd10 x8 : 7f7f7f7f7f7f7f7f 
kernel: x7 : 6b6b6f5e30727872 x6 : 031a1a5400000000 
kernel: x5 : 0000000000000003 x4 : 0000000000000005 
kernel: x3 : ffffffc0ca391e40 x2 : 000000000000000d 
kernel: x1 : ffffff800aaba890 x0 : 0000000000000006 
         SP: 0xffffffc0c7d1faa0:
kernel: faa0  0aaba758 ffffff80 0a82b27c ffffff80 00000001 00000000 00000000 00000000
kernel: fac0  000a2d78 00000000 ca1f741c ffffffc0 000b2778 00000000 c7d1fb20 ffffffc0
kernel: fae0  096ff81c ffffff80 c7d1fb20 ffffffc0 096fa044 ffffff80 80000045 00000000
kernel: fb00  c7d1fb20 ffffffc0 096fe87c ffffff80 ffffffff ffffffff 00000000 00000000
kernel: fb20  c7d1fb50 ffffffc0 096ff81c ffffff80 0aabd5f0 ffffff80 ca1f7400 ffffffc0
kernel: fb40  c873e400 ffffffc0 0aabd590 ffffff80 c7d1fb80 ffffffc0 096fa268 ffffff80
kernel: fb60  0aabd5f0 ffffff80 ca1f7400 ffffffc0 0a829094 ffffff80 096fa1f4 ffffff80
kernel: fb80  c7d1fbe0 ffffffc0 096fa644 ffffff80 ca1f7400 ffffffc0 0aabd5f0 ffffff80
         X3: 0xffffffc0ca391dc0:
kernel: 1dc0  00000005 00124f80 00000000 00000006 001554f0 00000000 00000007 00171240
kernel: 1de0  00000000 00000008 00188940 00000000 00000009 001a0040 00000000 0000000a
kernel: 1e00  001cee40 00000000 0000000b fffffffe 00000000 0000000c fffffffe 00000000
kernel: 1e20  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
kernel: 1e40  00000000 00000000 00000000 00000000 00000000 00000000 00005c0a 00000000
kernel: 1e60  000003c9 00000000 00000592 00000000 0000041c 00000000 000002a7 00000000
kernel: 1e80  000002c4 00000000 00000454 00000000 00000d4d 00000000 000186a0 0003d090
kernel: 1ea0  0007a120 000a2d78 000f4240 00124f80 001554f0 00171240 00188940 001a0040
         X9: 0xffffffc0c7d1fc90:
kernel: fc90  c7d1fcf0 ffffffc0 09700520 ffffff80 ca1f7400 ffffffc0 c9104b40 ffffffc0
kernel: fcb0  c9104c00 ffffffc0 c9104b40 ffffffc0 c9104c00 ffffffc0 c9564bc0 ffffffc0
kernel: fcd0  00000000 00000000 cf337040 ffffffc0 00000000 00124f80 00000000 00000000
kernel: fcf0  c7d1fd30 ffffffc0 09704444 ffffff80 c9104c60 ffffffc0 c9104c00 ffffffc0
kernel: fd10  c9104c08 ffffffc0 ca1f7400 ffffffc0 0a8e0368 ffffff80 c7a1e040 ffffffc0
kernel: fd30  c7d1fd70 ffffffc0 090c2b88 ffffff80 c9104c60 ffffffc0 c7dcd600 ffffffc0
kernel: fd50  cf337040 ffffffc0 cf33b800 ffffffc0 00000000 00000000 c7dcd600 ffffffc0
kernel: fd70  c7d1fdc0 ffffffc0 090c2ee4 ffffff80 c7dcd600 ffffffc0 cf337040 ffffffc0
         X20: 0xffffffc0ca1f7380:
kernel: 7380  00000000 00000000 00000001 00000001 00000000 00000000 ca390000 ffffffc0
kernel: 73a0  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
kernel: 73c0  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
kernel: 73e0  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
kernel: 7400  00000003 00000000 00000003 00000000 00000003 00000000 00000000 00000000
kernel: 7420  c873e1c0 ffffffc0 001cee40 000186a0 0000c350 000a2d78 001cee40 00124f80
kernel: 7440  00124f80 001cee40 00000000 00000000 0a8e0368 ffffff80 c9104c00 ffffffc0
kernel: 7460  00000000 00000000 00000000 00000000 ffffffe0 0000000f ca1f7478 ffffffc0
         X27: 0xffffffc0ca1f739c:
kernel: 739c  ffffffc0 00000000 00000000 00000000 00000000 00000000 00000000 00000000
kernel: 73bc  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
kernel: 73dc  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
kernel: 73fc  00000000 00000003 00000000 00000003 00000000 00000003 00000000 00000000
kernel: 741c  00000000 c873e1c0 ffffffc0 001cee40 000186a0 0000c350 000a2d78 001cee40
kernel: 743c  00124f80 00124f80 001cee40 00000000 00000000 0a8e0368 ffffff80 c9104c00
kernel: 745c  ffffffc0 00000000 00000000 00000000 00000000 ffffffe0 0000000f ca1f7478
kernel: 747c  ffffffc0 ca1f7478 ffffffc0 096fd608 ffffff80 000a2d78 001cee40 ca391d80
         X29: 0xffffffc0c7d1faa0:
kernel: faa0  0aaba758 ffffff80 0a82b27c ffffff80 00000001 00000000 00000000 00000000
kernel: fac0  000a2d78 00000000 ca1f741c ffffffc0 000b2778 00000000 c7d1fb20 ffffffc0
kernel: fae0  096ff81c ffffff80 c7d1fb20 ffffffc0 096fa044 ffffff80 80000045 00000000
kernel: fb00  c7d1fb20 ffffffc0 096fe87c ffffff80 ffffffff ffffffff 00000000 00000000
kernel: fb20  c7d1fb50 ffffffc0 096ff81c ffffff80 0aabd5f0 ffffff80 ca1f7400 ffffffc0
kernel: fb40  c873e400 ffffffc0 0aabd590 ffffff80 c7d1fb80 ffffffc0 096fa268 ffffff80
kernel: fb60  0aabd5f0 ffffff80 ca1f7400 ffffffc0 0a829094 ffffff80 096fa1f4 ffffff80
kernel: fb80  c7d1fbe0 ffffffc0 096fa644 ffffff80 ca1f7400 ffffffc0 0aabd5f0 ffffff80
kernel: 
kernel: ---[ end trace ac40fa7506c2b2d1 ]---
kernel: Call trace:
kernel: Exception stack(0xffffffc0c7d1f940 to 0xffffffc0c7d1fa70)
kernel: f940: 0000000000000008 0000007fffffffff ffffffc0c7d1fb20 ffffff80096fa044
kernel: f960: 0000000080000045 ffffff80090d4dc8 ffffffc0c7d1f980 ffffff8009436db8
kernel: f980: ffffffc0c7d1f9b0 ffffff80090ef938 ffffff800a82a1a0 ffffffc0c7d1fc30
kernel: f9a0: 0000000000000005 ffffffc0c7d1fc30 ffffffc0c7d1fb80 ffffff80090f0db8
kernel: f9c0: ffffffc0cf3377c0 ffffffc0cab4a000 0000000000000001 0000000000000588
kernel: f9e0: ffffff800a827f40 0000000000000001 0000000000000006 ffffff800aaba890
kernel: fa00: 000000000000000d ffffffc0ca391e40 0000000000000005 0000000000000003
kernel: fa20: 031a1a5400000000 6b6b6f5e30727872 7f7f7f7f7f7f7f7f ffffffc0c7d1fd10
kernel: fa40: 0101010101010101 0000000000000006 00000000e5b906e6 0000000000000005
kernel: fa60: 0000000000000000 00002aabf746aeb0
kernel: [<ffffff80096fa044>] cpufreq_cpu_get+0xb4/0xd0
kernel: [<ffffff80096ff81c>] cpufreq_times_record_transition+0x34/0x68
kernel: [<ffffff80096fa268>] cpufreq_notify_transition+0xd8/0x208
kernel: [<ffffff80096fa644>] cpufreq_freq_transition_end+0x3c/0xb0
kernel: [<ffffff800975ba00>] meson_cpufreq_set_target+0x168/0x318
kernel: [<ffffff80096fabc0>] __cpufreq_driver_target+0x218/0x548
kernel: [<ffffff8009700520>] od_dbs_timer+0xc0/0x1a0
kernel: [<ffffff8009704444>] dbs_work_handler+0x44/0x80
kernel: [<ffffff80090c2b88>] process_one_work+0x1d0/0x4d8
kernel: [<ffffff80090c2ee4>] worker_thread+0x54/0x4a8
kernel: [<ffffff80090c99a0>] kthread+0xf8/0x100
kernel: [<ffffff8009083950>] ret_from_fork+0x10/0x40


elatllat
Posts: 1667
Joined: Tue Sep 01, 2015 8:54 am
languages_spoken: english
ODROIDs: XU4, N1, N2
Has thanked: 29 times
Been thanked: 93 times
Contact:

Re: Crashes with ondemand governor

Unread post by elatllat » Sat Feb 15, 2020 11:21 pm

You could try my build of Mainline Linux until somebody fixes that for you, and to determine where the problem is.

evrflx
Posts: 20
Joined: Fri Apr 05, 2019 4:16 am
languages_spoken: english
ODROIDs: 4 C2, 2 XU4, 5 N2
Has thanked: 1 time
Been thanked: 3 times
Contact:

Re: Crashes with ondemand governor

Unread post by evrflx » Sun Feb 16, 2020 5:27 am

Would love to do that - but even after your reading your kind reply I have no clue how to meld your documentation with archlinux build to create the installable kernel images O;-)
I tried the upstream mainline 5.5 and it simply did not boot.

btw: sorry for posting this in the wrong forum, it slipped my mind to use the 'issues' category.

elatllat
Posts: 1667
Joined: Tue Sep 01, 2015 8:54 am
languages_spoken: english
ODROIDs: XU4, N1, N2
Has thanked: 29 times
Been thanked: 93 times
Contact:

Re: Crashes with ondemand governor

Unread post by elatllat » Sun Feb 16, 2020 6:45 am

If you have a 4gb version get the 5.4 build and follow the 2nd script here;
viewtopic.php?f=176&t=33993&p=261833#p261833

evrflx
Posts: 20
Joined: Fri Apr 05, 2019 4:16 am
languages_spoken: english
ODROIDs: 4 C2, 2 XU4, 5 N2
Has thanked: 1 time
Been thanked: 3 times
Contact:

Re: Crashes with ondemand governor

Unread post by evrflx » Mon Feb 17, 2020 12:54 am

Since I would build on an Arch Linux system the debian/ubuntu commands would not work.

Is the assumption correct, that I just need to apply your patch to upstream and use the usual package-build infrastructure to build a custom kernel image and install that afterwards?
The hardware definition from the dtb must be included as well, of course, but from reading your build script it looks like in fact minimal patches to upstream and the biggest part is automating the build and installation.

elatllat
Posts: 1667
Joined: Tue Sep 01, 2015 8:54 am
languages_spoken: english
ODROIDs: XU4, N1, N2
Has thanked: 29 times
Been thanked: 93 times
Contact:

Re: Crashes with ondemand governor

Unread post by elatllat » Mon Feb 17, 2020 3:28 am

Correct.
( converting the examples from apt to pacman/dnf/etc is trivial )
The patch is just to fix a USB bug and double the speed.

jgmdev
Posts: 16
Joined: Tue Jan 28, 2020 2:28 pm
languages_spoken: english, spanish
ODROIDs: U2, N2
Has thanked: 6 times
Been thanked: 3 times
Contact:

Re: Crashes with ondemand governor

Unread post by jgmdev » Wed Feb 19, 2020 2:03 am

evrflx wrote:
Mon Feb 17, 2020 12:54 am
Since I would build on an Arch Linux system the debian/ubuntu commands would not work.
...
On ArchLinux you can install the linux-aarch64 package, I was able to boot it with a boot.ini like this:

Code: Select all

ODROIDN2-UBOOT-CONFIG

setenv bootlabel "ArchLinux EMMC"

# Default Console Device Setting
setenv condev "console=ttyAML0,115200n8 console=tty1"   # on both

# Boot Args
setenv bootargs "root=/dev/mmcblk${devno}p2 rootwait rw ${condev} ${amlogic} no_console_suspend fsck.repair=yes net.ifnames=0 hdmimode=1080p60hz clk_ignore_unused"

# Set load addresses
setenv dtb_loadaddr "0x20000000"
setenv loadaddr "0x1080000"
setenv initrd_loadaddr "0x3080000"

# Load kernel, dtb and initrd
load mmc ${devno}:1 ${loadaddr} /Image
load mmc ${devno}:1 ${dtb_loadaddr} /dtbs/amlogic/meson-g12b-odroid-n2.dtb
load mmc ${devno}:1 ${initrd_loadaddr} /initramfs-linux.uimg
#fdt addr ${dtb_loadaddr}

# boot
booti ${loadaddr} ${initrd_loadaddr} ${dtb_loadaddr}
This worked for me, also it seems that linux-aarch64-rc was updated to 5.6 :) maybe time for testing
These users thanked the author jgmdev for the post:
evrflx (Sat Feb 22, 2020 8:15 pm)

evrflx
Posts: 20
Joined: Fri Apr 05, 2019 4:16 am
languages_spoken: english
ODROIDs: 4 C2, 2 XU4, 5 N2
Has thanked: 1 time
Been thanked: 3 times
Contact:

Re: Crashes with ondemand governor

Unread post by evrflx » Sat Feb 22, 2020 8:15 pm

Excellent! That works like a charm, thank you very much!
Now I will experiment how stable it is. First observation: Instead of 3710 MB memory I only got 3628 MB left. But if it is stable, so be it.

evrflx
Posts: 20
Joined: Fri Apr 05, 2019 4:16 am
languages_spoken: english
ODROIDs: 4 C2, 2 XU4, 5 N2
Has thanked: 1 time
Been thanked: 3 times
Contact:

Re: Crashes with ondemand governor

Unread post by evrflx » Sun Feb 23, 2020 8:12 pm

Sofar it looks fine, but one machine crashed (or lost network, not quite sure). That is when I wanted to setup the watchdog to get automatic reboots in case of crashes.
It seems neither of these modules work: meson_gxbb_wdt, meson_wdt - any idea what might work with mainline?

Post Reply

Return to “General Topics”

Who is online

Users browsing this forum: No registered users and 3 guests