Sometimes watchdog deads.

Post Reply
Sebas_Ledesma
Posts: 159
Joined: Thu Jun 08, 2017 2:49 am
languages_spoken: english
ODROIDs: c2
Has thanked: 16 times
Been thanked: 12 times
Contact:

Sometimes watchdog deads.

Post by Sebas_Ledesma »

Hi:

I'm tracing another problem.
I've have configured the watchdog according to the wiki, and sometimes the process closes.

systemctl status watchdog shows:

Code: Select all

● watchdog.service - watchdog daemon
   Loaded: loaded (/lib/systemd/system/watchdog.service; enabled; vendor preset:
   Active: inactive (dead)
and in the logs i get:

Code: Select all

Jan 11 17:24:17 UX24-LABO-ORIGINAL watchdog[1072]: was able to ping process 521 (/var/run/rsyslogd.pid)
Jan 11 17:24:18 UX24-LABO-ORIGINAL watchdog[1072]: still alive after 37 interval(s)
Jan 11 17:24:18 UX24-LABO-ORIGINAL watchdog[1072]: was able to ping process 521 (/var/run/rsyslogd.pid)
Jan 11 17:24:19 UX24-LABO-ORIGINAL watchdog[1072]: still alive after 38 interval(s)
Jan 11 17:24:19 UX24-LABO-ORIGINAL watchdog[1072]: was able to ping process 521 (/var/run/rsyslogd.pid)
Jan 11 17:24:20 UX24-LABO-ORIGINAL watchdog[1072]: still alive after 39 interval(s)
Jan 11 17:24:20 UX24-LABO-ORIGINAL watchdog[1072]: was able to ping process 521 (/var/run/rsyslogd.pid)
Jan 11 17:24:21 UX24-LABO-ORIGINAL watchdog[1072]: still alive after 40 interval(s)
Jan 11 17:24:21 UX24-LABO-ORIGINAL watchdog[1072]: was able to ping process 521 (/var/run/rsyslogd.pid)
Jan 11 17:24:22 UX24-LABO-ORIGINAL watchdog[1072]: still alive after 41 interval(s)
Jan 11 17:24:22 UX24-LABO-ORIGINAL watchdog[1072]: was able to ping process 521 (/var/run/rsyslogd.pid)
Jan 11 17:24:23 UX24-LABO-ORIGINAL systemd[1]: Stopping watchdog daemon...
Jan 11 17:24:23 UX24-LABO-ORIGINAL watchdog[1072]: still alive after 42 interval(s)
Jan 11 17:24:23 UX24-LABO-ORIGINAL watchdog[1072]: stopping daemon (5.15)
Jan 11 17:24:23 UX24-LABO-ORIGINAL systemd[1]: watchdog.service: Control process exited, code=exited status=1
Jan 11 17:24:23 UX24-LABO-ORIGINAL systemd[1]: watchdog.service: Failed with result 'exit-code'.
Jan 11 17:24:23 UX24-LABO-ORIGINAL systemd[1]: Stopped watchdog daemon.
Jan 11 17:24:23 UX24-LABO-ORIGINAL systemd[1]: watchdog.service: Triggering OnFailure= dependencies.
Jan 11 17:24:23 UX24-LABO-ORIGINAL systemd[1]: watchdog.service: Failed to enqueue OnFailure= job, ignoring: Transaction is destructive.
Any suggestions?

Thanks in advance
Sebas

mad_ady
Posts: 9049
Joined: Wed Jul 15, 2015 5:00 pm
languages_spoken: english
ODROIDs: XU4, C1+, C2, C4, N1, N2, H2, Go, Go Advance
Location: Bucharest, Romania
Has thanked: 595 times
Been thanked: 573 times
Contact:

Re: Sometimes watchdog deads.

Post by mad_ady »

I don't know why it dies for you, but you should try editing watchdog.service and adding

Code: Select all

Restart=always
RestartSec=5
to restart watchdog automatically in case it crashes.

Sebas_Ledesma
Posts: 159
Joined: Thu Jun 08, 2017 2:49 am
languages_spoken: english
ODROIDs: c2
Has thanked: 16 times
Been thanked: 12 times
Contact:

Re: Sometimes watchdog deads.

Post by Sebas_Ledesma »

My system is:
Linux UX24-LABO-ORIGINAL 3.16.81-49 #1 SMP PREEMPT Wed Jan 15 21:38:53 -02 2020 aarch64 aarch64 aarch64 GNU/Linux


Mi configuration file are pretty same as referenced in https://wiki.odroid.com/odroid-c2/appli ... hdog_timer

/etc/default/watchdog

Code: Select all

# Start watchdog at boot time? 0 or 1
run_watchdog=1
# Start wd_keepalive after stopping watchdog? 0 or 1
run_wd_keepalive=1
# Load module before starting watchdog
watchdog_module=gxbb_wdt
# Specify additional watchdog options here (see manpage).
watchdog_options="-s -v -c /etc/watchdog.conf"
/etc/watchdog.conf

Code: Select all

#ping                   = 172.31.14.1
#ping                   = 172.26.1.255
#interface              = eth0
#file                   = /var/log/messages
#change                 = 1407

# Uncomment to enable test. Setting one of these values to '0' disables it.
# These values will hopefully never reboot your machine during normal use
# (if your machine is really hung, the loadavg will go much higher than 25)
#max-load-1             = 24
#max-load-5             = 18
#max-load-15            = 12

# Note that this is the number of pages!
# To get the real size, check how large the pagesize is on your machine.
#min-memory             = 1
#allocatable-memory     = 1

#repair-binary          = /usr/sbin/repair
#repair-timeout         = 60
#test-binary            =
#test-timeout           = 60

# The retry-timeout and repair limit are used to handle errors in a more robust
# manner. Errors must persist for longer than retry-timeout to action a repair
# or reboot, and if repair-maximum attempts are made without the test passing a
# reboot is initiated anyway.
#retry-timeout          = 60
#repair-maximum         = 1

watchdog-device=/dev/watchdog

# Defaults compiled into the binary
#temperature-sensor     =
#max-temperature        = 90

# Defaults compiled into the binary
admin                   = root
interval                = 1
logtick                = 1
log-dir         = /var/log/watchdog

# This greatly decreases the chance that watchdog won't be scheduled before
# your machine is really loaded
realtime                = yes
priority                = 1

# Check if rsyslogd is still running by enabling the following line
pidfile         = /var/run/rsyslogd.pid

watchdog-timeout=30
I will take a look at adding the re-start option that you suggested.
Thanks!

Sebas_Ledesma
Posts: 159
Joined: Thu Jun 08, 2017 2:49 am
languages_spoken: english
ODROIDs: c2
Has thanked: 16 times
Been thanked: 12 times
Contact:

Re: Sometimes watchdog deads.

Post by Sebas_Ledesma »

Additional info (just in case the original post no clear enough).
With the same configuration I reboot the odroid and sometimes the watchdog it's running.
I can do systemctl status watchdog or pgrep watchdog and see it's working, and sometimes I reboot and nothing.

User avatar
odroid
Site Admin
Posts: 36385
Joined: Fri Feb 22, 2013 11:14 pm
languages_spoken: English, Korean
ODROIDs: ODROID
Has thanked: 1431 times
Been thanked: 980 times
Contact:

Re: Sometimes watchdog deads.

Post by odroid »

What happens if you force to crash the kernel with echo c > /proc/sysrq-trigger command?

BTW, we've not tested the WDT function much on Ubuntu 18.04 nor 20.04. The Wiki pages was based on 14.04 and 16.04.
Which Ubuntu version do you run? Based on our Ubuntu 18.04.3 image?

Sebas_Ledesma
Posts: 159
Joined: Thu Jun 08, 2017 2:49 am
languages_spoken: english
ODROIDs: c2
Has thanked: 16 times
Been thanked: 12 times
Contact:

Re: Sometimes watchdog deads.

Post by Sebas_Ledesma »

When the watchdog it's not running then executing echo c > /proc/sysrq-trigger freezes the odroid. It's needed to unplug to restart the system.
Also when there is no watchdog running, executing:

Code: Select all

sudo systemctl start watchdog.service
didnt finish. It can be cancelled by pressing CTRL+C.
While waiting sudo systemctl start watchdog.service to finish we can open another terminal and pgrep watchdog shows nothing.

We are using Ubuntu 18.04.3 with kernel 3.16.81-49, watchdog it's version 5.15-2.

Sebas

User avatar
odroid
Site Admin
Posts: 36385
Joined: Fri Feb 22, 2013 11:14 pm
languages_spoken: English, Korean
ODROIDs: ODROID
Has thanked: 1431 times
Been thanked: 980 times
Contact:

Re: Sometimes watchdog deads.

Post by odroid »

We will try to reproduce the issue.

Sebas_Ledesma
Posts: 159
Joined: Thu Jun 08, 2017 2:49 am
languages_spoken: english
ODROIDs: c2
Has thanked: 16 times
Been thanked: 12 times
Contact:

Re: Sometimes watchdog deads.

Post by Sebas_Ledesma »

When pgrep watchdog shows nothing, I was able to launch watchdog with:

Code: Select all

sudo service watchdog start
It can be called without sudo but in this case it will require to revalidate the user password.

User avatar
odroid
Site Admin
Posts: 36385
Joined: Fri Feb 22, 2013 11:14 pm
languages_spoken: English, Korean
ODROIDs: ODROID
Has thanked: 1431 times
Been thanked: 980 times
Contact:

Re: Sometimes watchdog deads.

Post by odroid »

We could reproduce the issue. Something has changed from Ubuntu 18.04.
We will try understanding how to properly configure the watchdog services.

cap00k
Posts: 104
Joined: Tue May 21, 2013 10:46 am
languages_spoken: english
ODROIDs: ODROID
Has thanked: 0
Been thanked: 11 times
Contact:

Re: Sometimes watchdog deads.

Post by cap00k »

Just make link for default config and enable the watchdog.service

Code: Select all

# sudo ln -s  /lib/systemd/system/watchdog.service /etc/systemd/system/multi-user.target.wants/watchdog.service
# sudo systemctl enable watchdog.service
Then reboot system:

Code: Select all

# sudo reboot
And check what service starts automatically:

Code: Select all

# sudo systemctl status watchdog.service
● watchdog.service - watchdog daemon
   Loaded: loaded (/lib/systemd/system/watchdog.service; enabled; vendor preset: enabled)
   Active: active (running) since Sun 2018-01-28 15:58:19 UTC; 13s ago
  Process: 419 ExecStart=/bin/sh -c [ $run_watchdog != 1 ] || exec /usr/sbin/watchdog $watchdog_options (code=exited, status=0/SUCCESS)
  Process: 415 ExecStartPre=/bin/sh -c [ -z "${watchdog_module}" ] || [ "${watchdog_module}" = "none" ] || /sbin/modprobe $watchdog_module (code=exite
 Main PID: 421 (watchdog)
   CGroup: /system.slice/watchdog.service
           └─421 /usr/sbin/watchdog -s -v -c /etc/watchdog.conf

Jan 28 15:58:23 odroid watchdog[421]: still alive after 4 interval(s)
Jan 28 15:58:24 odroid watchdog[421]: still alive after 5 interval(s)
Jan 28 15:58:25 odroid watchdog[421]: still alive after 6 interval(s)
Jan 28 15:58:26 odroid watchdog[421]: still alive after 7 interval(s)
Jan 28 15:58:27 odroid watchdog[421]: still alive after 8 interval(s)
Jan 28 15:58:28 odroid watchdog[421]: still alive after 9 interval(s)
Jan 28 15:58:29 odroid watchdog[421]: still alive after 10 interval(s)
Jan 28 15:58:30 odroid watchdog[421]: still alive after 11 interval(s)
Jan 28 15:58:31 odroid watchdog[421]: still alive after 12 interval(s)
Jan 28 15:58:32 odroid watchdog[421]: still alive after 13 interval(s)


Sebas_Ledesma
Posts: 159
Joined: Thu Jun 08, 2017 2:49 am
languages_spoken: english
ODROIDs: c2
Has thanked: 16 times
Been thanked: 12 times
Contact:

Re: Sometimes watchdog deads.

Post by Sebas_Ledesma »

The symbolic link is already created>
lrwxrwxrwx 1 root root 36 Dec 2 2019 /etc/systemd/system/multi-user.target.wants/watchdog.service -> /lib/systemd/system/watchdog.service

And, as we know, the deamon seems to start always, but in some cases it closes (around 40 intervals) with this message in the syslog:
Failed to enqueue OnFailure= job, ignoring: Transaction is destructive

If there is no problem at the boot then the deamon it's stable and works seamsly.

Post Reply

Return to “Ubuntu”

Who is online

Users browsing this forum: No registered users and 1 guest