Odroid Recovery Images Brainstorm

Share here your ideas for new projects

Moderators: odroid, meveric, mdrjr

Odroid Recovery Images Brainstorm

Unread postby mad_ady » Fri Oct 21, 2016 10:50 pm

It seems that sometimes a working Odroid installation can break and not boot again (viewtopic.php?f=95&t=23163). If you don't have a backup (http://magazine.odroid.com/wp-content/u ... df#page=22), most users are advised to flash a new image. This usually works, but, let's face it - this is Linux, not Windows, so re-imaging shouldn't be necessary most of the times.

Odroid asked me to start a discussion thread where we can brainstorm what the best approach is to fixing a broken image with as little user intervention as possible. Below is what I had in mind (and I'll add relevant ideas to the todo list):

Most problems I ran into relate to a fsck which can't fix the partition without intervention or with corrupted files in /media/boot (e.g. Image is 0 bytes). These can be fixed by removing the eMMC/SD, connecting it to a different system and running some commands on it. The idea would be to simplify all this and have the user flash a recovery image on a spare SD card, remove the eMMC/SD (C1/C2) and boot from this recovery SD card (flip the switch for XU4) and once the system has booted insert the broken eMMC/SD through an adapter on on of the USB ports. Once the card is detected or the user selects an Ok on screen the recovery process should do the following:
  • Gather diagnostic information (there was a discussion when the C2 was launched about a test procedure to validate the hardware works correctly - I can't find the thread, but some of the ideas might be implemented here as well)
  • Identify the eMMC/SD disk (e.g. /dev/sda)
  • Run fsck on all partitions
    Code: Select all
    fsck /dev/sda1
    fsck /dev/sda2
  • Mount the disks
    Code: Select all
    mkdir /media/recovery
    mount /dev/sda2 /media/recovery
    mount /dev/sda1 /media/recovery/media/boot
  • If mounting is fine, chroot into the system and reinstall kernel, uboot, bootini
    Code: Select all
    mount -o bind /dev/ /media/recovery/dev
    mount -o bind /proc /media/recovery/proc
    mount -o bind /sys /media/recovery/sys
    chroot /media/recovery /bin/bash
    apt-get install --reinstall u-boot linux-kernel-c2 bootini
    # note - the u-boot script writes the bootloader on disk, but I don't know how it identifies the disk - it's very likely it will write it to the recovery disk. So, we might run the script manually instead of reinstalling the package
    exit
    sync
    umount /media/recovery/sys
    umount /media/recovery/proc
    umount /media/recovery/dev
    umount /media/recovery/media/boot
    umount /media/recovery
  • After the attempted recovery is done, flash the blue led in a different pattern (3s off/3s on) to let the user know it's done. This is because some users might not have a HDMI cable/keyboard attached to the Odroid to interact in other ways. At this stage it's safe to pull power and retry to boot normally.

Implementation
  • Based on minimal ubuntu image for the relevant platform
  • Vfat partition and rootfs need to have different UUIDs than the standard images to prevent conflicts
  • Auto-update recovery script on startup if network is available
  • Collect logs into a single file in the VFAT partition of the recovery disk. Logs contain things like dmesg, lsusb, lsusb -t, fdisk -l, ifconfig -a, ethtool ethX, df -h, cat /media/recovery/media/boot/boot.ini, ls -l /media/recovery/media/boot/, ls -l /media/recovery/, file /media/disk/media/boot/*Image. The logs can be posted to pastebin and analysed if the recovery didn't work.
  • Run automatic recovery scripts that output to the same log and hdmi console
  • Leave system running with ssh on by default in case the user needs to do extra debugging/extract extra information

Now, additional ideas are welcome. For instance, I'm not sure if the recovery procedure should run by default unattended, or if it should require user confirmation. Also, android systems will not be supported (except for fsck), but as far as I know Android rootfs is usually mounted read-only and updates are infrequent, so it shouldn't break as easily. Also, I'm not sure if in case of error we should copy the kernel/dtb/boot.ini/modules/initrd from the rescue disk to the target disk so that it boots next time (we already have them).

We can display a progress on-screen with dialog, but we need to select the lowest most-common supported resolution in boot.ini. (Do all TVs/monitors support 640x480?)

As I said, ideas welcome.
User avatar
mad_ady
 
Posts: 1557
Joined: Wed Jul 15, 2015 5:00 pm
Location: Bucharest, Romania
languages_spoken: english
ODROIDs: XU3, C1+, C2

Re: Odroid Recovery Images Brainstorm

Unread postby crashoverride » Fri Oct 21, 2016 11:35 pm

mad_ady wrote:there was a discussion when the C2 was launched about a test procedure to validate the hardware works correctly - I can't find the thread, but some of the ideas might be implemented here as well

This one?
http://forum.odroid.com/viewtopic.php?f=140&t=18759

USB via OTG to a PC is the ultimate solution. It can boot without any SD card, eMMC, or network. With 2GB of RAM on the device, you can load the most impressive recovery program mankind has ever seen! It would also be possible to use the PC as the display/console with a full featured GUI/Wizard. However, for this to work we need to reverse engineer the boot rom USB boot protocol.

Of interest is the following:
http://www.cnx-software.com/2016/10/06/hacking-arm-trustzone-secure-boot-on-amlogic-s905-soc/
they managed to dump the BootROM from Amlogic S905 SoC.


[edit]
In the absence of USB Host boot (not booting from a USB hdd/flash), an alternative may be a "recovery sd card". Insert the SD card to load the recovery program which runs entirely from RAM and then remove it and insert the sd card to be flashed or repaired.

An alternative to the alternative is to keep 2 copies of everything in the vfat partition, then provide a uboot mechanism such as reading a GPIO pin or detecting a USB key press that will force load the known-good kernel and initrd. The initrd could have a simple busybox shell for cases where Ubuntu can not boot (x86 Ubuntu does this).
crashoverride
 
Posts: 2575
Joined: Tue Dec 30, 2014 8:42 pm
languages_spoken: english
ODROIDs: C1

Re: Odroid Recovery Images Brainstorm

Unread postby mad_ady » Sat Oct 22, 2016 1:20 am

crashoverride wrote:
mad_ady wrote:there was a discussion when the C2 was launched about a test procedure to validate the hardware works correctly - I can't find the thread, but some of the ideas might be implemented here as well

This one?
http://forum.odroid.com/viewtopic.php?f=140&t=18759

USB via OTG to a PC is the ultimate solution. It can boot without any SD card, eMMC, or network. With 2GB of RAM on the device, you can load the most impressive recovery program mankind has ever seen! It would also be possible to use the PC as the display/console with a full featured GUI/Wizard. However, for this to work we need to reverse engineer the boot rom USB boot protocol.

Of interest is the following:
http://www.cnx-software.com/2016/10/06/hacking-arm-trustzone-secure-boot-on-amlogic-s905-soc/
they managed to dump the BootROM from Amlogic S905 SoC.


[edit]

Yup, that was the thread, thanks.

First some clarifications: these recovery images should work on all platforms - C1/C2/XU4 and need to be upgradeable to the next odroid products. Your proposal would work only on the C2 and would have the disadvantage of requiring a PC to inject the boot code. However your proposal can boot the system even with broken emmc and sd slot, which is nice. By any chance do you know if otg boot happens before emmc? If yes you could fix a broken emmc without taking the case apart, which is nice.


In the absence of USB Host boot (not booting from a USB hdd/flash), an alternative may be a "recovery sd card". Insert the SD card to load the recovery program which runs entirely from RAM and then remove it and insert the sd card to be flashed or repaired.

This is what I was thinking of, with the exception of "load everything in RAM". I admit, it would save you from having a usb micro-sd reader, but there's nowhere safe to write the diagnostic log (e.g. think of repairing an sdcard with IO errors or with no partition table). I'll keep it into consideration, but either I do everything from a big initrd that gets loaded to ram by uboot, or the rootfs's init loads itself in ram and chroots into the ram image... Either way we need to do something about the logs/output.

An alternative to the alternative is to keep 2 copies of everything in the vfat partition, then provide a uboot mechanism such as reading a GPIO pin or detecting a USB key press that will force load the known-good kernel and initrd. The initrd could have a simple busybox shell for cases where Ubuntu can not boot (x86 Ubuntu does this).

This doesn't cover all problems - speciffically partitions in need of fsck. And this is caused by a lack of journal. Dropping to busybox on hdmi would be nice, for those that have hdmi.

@odroid: Do you have a list with the most common faults (for all odroid products) for devices which were RMA'd to you? Maybe we could concentrate on those issues for faults
User avatar
mad_ady
 
Posts: 1557
Joined: Wed Jul 15, 2015 5:00 pm
Location: Bucharest, Romania
languages_spoken: english
ODROIDs: XU3, C1+, C2

Re: Odroid Recovery Images Brainstorm

Unread postby crashoverride » Sat Oct 22, 2016 10:17 am

mad_ady wrote:Either way we need to do something about the logs/output.

Traditionally, PCs would beep an error code out on the speaker. We could instead "blink" an error code out on the "heartbeat" LED since all boards have it. This covers the early boot stages. After the OS loads, you have many more options:
1) OTG "gadget" serial port console to host PC
2) network logging
3) HDMI reporting

mad_ady wrote:By any chance do you know if otg boot happens before emmc?

It is the very first boot attempt. It allows "new born" devices with blank eMMCs to be flashed as they roll off the assembly line using the Amlogic flash tool.

mad_ady wrote:Maybe we could concentrate on those issues for faults

I would hope that SD/eMMC card issues are not statistically significant for returns. "Reflashing" is often the first thing users are told to do.

--

From a diagnostic point of view, once there are filesystem errors you introduce too many "unknowns" to proceed any further. If you are going to fsck then you need a list of each file and its checksum that should be present in order to guarantee its consistency. This is currently not possible for files like boot.ini and some things in /etc (ssh keys should be different on every sd card).

We also need to ensure we are not treating symptoms. I mentioned in other threads that the current "boot.ini" is too fragile and prone to breakage. It should be static with user options stored externally using check-summed binary elements. System updates should never wipe out user settings. An analogy is that its like "sudo apt update" clearing the CMOS NVRAM settings on your PC.
crashoverride
 
Posts: 2575
Joined: Tue Dec 30, 2014 8:42 pm
languages_spoken: english
ODROIDs: C1

Re: Odroid Recovery Images Brainstorm

Unread postby mad_ady » Sat Oct 22, 2016 3:53 pm

Well if the filesystem is "mildly" corrupt and only a few files have been truncated, running a blind fsck would make the filesystem consistent with the price of corrupting a few files. If the files are important for boot (kernel, boot.ini, modules) they would be restored by running apt-get --reinstall and should be consistent.
If other files are corrupted more manual intervention would be necessary.
The recovery won't be able to fix everything, but should address most common problems.
Also, if the data is critical to the user there will be a warning that the procedure will modify some files and in extreme cases might make the problem worse. :)
User avatar
mad_ady
 
Posts: 1557
Joined: Wed Jul 15, 2015 5:00 pm
Location: Bucharest, Romania
languages_spoken: english
ODROIDs: XU3, C1+, C2

Re: Odroid Recovery Images Brainstorm

Unread postby stmicro » Sat Oct 22, 2016 5:39 pm

It should be worth to research how RPI implemented the "Recovery mode"(SHIFT keypress) and "Safe mode".
https://github.com/raspberrypi/noobs
stmicro
 
Posts: 195
Joined: Tue Apr 28, 2015 4:23 pm
languages_spoken: english
ODROIDs: 2 x C1+, 2 x C2
1 x XU-L, 2 x XU3-L, 3 x U3, 2 x U2, 9 x XU4

Re: Odroid Recovery Images Brainstorm

Unread postby mad_ady » Sat Oct 22, 2016 6:37 pm

Thanks stmicro. This:
What to do if your SHIFT keypress isn't detected

Try pressing shift only when the grey splashscreen is displayed rather than holding it from boot up.

How to boot into "Safe Mode"

To boot into a basic busybox shell rather than launching the NOOBS GUI, you can either:

Append rescueshell to the argument list in the recovery.cmdline file which is found in the root NOOBS directory.

Insert a physical jumper between pins 5 & 6 of GPIO header P1. If you have external hardware or an addon board connected to the GPIO header, you may find that pin 5 is being pulled low and accidentally triggering "Safe Mode". To prevent this you can append disablesafemode to the argument list in the recovery.cmdline file which is found in the root NOOBS directory.


... seems doable in the initrd on the official images. If the initrd can already display on screen and take input from a keyboard (and I know that for tripleboot images it does), it would be helpful to have a recovery console with fsck and chroot so that experienced users can fix their devices without pulling the emmc/sd. For bonus points start a dropbear ssh server for remote troubleshooting.
User avatar
mad_ady
 
Posts: 1557
Joined: Wed Jul 15, 2015 5:00 pm
Location: Bucharest, Romania
languages_spoken: english
ODROIDs: XU3, C1+, C2

Re: Odroid Recovery Images Brainstorm

Unread postby rooted » Sat Oct 22, 2016 10:57 pm

stmicro wrote:It should be worth to research how RPI implemented the "Recovery mode"(SHIFT keypress) and "Safe mode".
https://github.com/raspberrypi/noobs

AFAIK XBian was where this feature came from, either way it indeed works.
User avatar
rooted
 
Posts: 3526
Joined: Fri Dec 19, 2014 9:12 am
Location: Gulf of Mexico, US
languages_spoken: english
ODROIDs: C1
C1+
C2
XU3 Lite
XU4
VU7+
HiFi Shield 2
Smart Power (original)

Re: Odroid Recovery Images Brainstorm

Unread postby odroid » Sat Nov 05, 2016 10:24 am

We tried to enable the USB keyboard function to detect "Shift" or "Alt" key input on u-boot.
But the current u-boot USB stack supports only high-speed USB 2.0 while most USB keyboards support only USB 1.0 or 1.1 Low/Full speed protocol. :(
We need to find another way to enter into the recovery(safe) mode manually or improve the u-boot to support the Low/Full speed USB devices.
User avatar
odroid
Site Admin
 
Posts: 22329
Joined: Fri Feb 22, 2013 11:14 pm
languages_spoken: English
ODROIDs: ODROID

Re: Odroid Recovery Images Brainstorm

Unread postby rooted » Sat Nov 05, 2016 2:14 pm

Can gpio be used from uboot?
User avatar
rooted
 
Posts: 3526
Joined: Fri Dec 19, 2014 9:12 am
Location: Gulf of Mexico, US
languages_spoken: english
ODROIDs: C1
C1+
C2
XU3 Lite
XU4
VU7+
HiFi Shield 2
Smart Power (original)

Re: Odroid Recovery Images Brainstorm

Unread postby mad_ady » Sat Nov 05, 2016 3:22 pm

@odroid: if usb disks and vfat are supported in uboot then you could have uboot look for a directory called odroid-recovery on all attached disks. If it is found load boot.ini from it.
If the user wants to run recovery they plug in the stick and boot. When they are done they unplug and reboot.
User avatar
mad_ady
 
Posts: 1557
Joined: Wed Jul 15, 2015 5:00 pm
Location: Bucharest, Romania
languages_spoken: english
ODROIDs: XU3, C1+, C2

Re: Odroid Recovery Images Brainstorm

Unread postby crashoverride » Tue Apr 25, 2017 9:20 pm

IIRC, NOOBS actually boots a lightweight kernel. This enumerates the USB bus looking for the key press. If its not seen then it kexecs the normal kernel.
https://en.wikipedia.org/wiki/Kexec
crashoverride
 
Posts: 2575
Joined: Tue Dec 30, 2014 8:42 pm
languages_spoken: english
ODROIDs: C1

Re: Odroid Recovery Images Brainstorm

Unread postby rooted » Tue Apr 25, 2017 9:37 pm

I haven't had good luck with kexec since about 2004, of course I quit using it when it started randomly falling to initialize some piece of hardware I had.

It works okay now it seems.
User avatar
rooted
 
Posts: 3526
Joined: Fri Dec 19, 2014 9:12 am
Location: Gulf of Mexico, US
languages_spoken: english
ODROIDs: C1
C1+
C2
XU3 Lite
XU4
VU7+
HiFi Shield 2
Smart Power (original)


Return to The Ideas

Who is online

Users browsing this forum: No registered users and 2 guests