Odroid asked me to start a discussion thread where we can brainstorm what the best approach is to fixing a broken image with as little user intervention as possible. Below is what I had in mind (and I'll add relevant ideas to the todo list):
Most problems I ran into relate to a fsck which can't fix the partition without intervention or with corrupted files in /media/boot (e.g. Image is 0 bytes). These can be fixed by removing the eMMC/SD, connecting it to a different system and running some commands on it. The idea would be to simplify all this and have the user flash a recovery image on a spare SD card, remove the eMMC/SD (C1/C2) and boot from this recovery SD card (flip the switch for XU4) and once the system has booted insert the broken eMMC/SD through an adapter on on of the USB ports. Once the card is detected or the user selects an Ok on screen the recovery process should do the following:
- Gather diagnostic information (there was a discussion when the C2 was launched about a test procedure to validate the hardware works correctly - I can't find the thread, but some of the ideas might be implemented here as well)
- Identify the eMMC/SD disk (e.g. /dev/sda)
- Run fsck on all partitions
- Code: Select all
- Mount the disks
- Code: Select all
mount /dev/sda2 /media/recovery
mount /dev/sda1 /media/recovery/media/boot
- If mounting is fine, chroot into the system and reinstall kernel, uboot, bootini
- Code: Select all
mount -o bind /dev/ /media/recovery/dev
mount -o bind /proc /media/recovery/proc
mount -o bind /sys /media/recovery/sys
chroot /media/recovery /bin/bash
apt-get install --reinstall u-boot linux-kernel-c2 bootini
# note - the u-boot script writes the bootloader on disk, but I don't know how it identifies the disk - it's very likely it will write it to the recovery disk. So, we might run the script manually instead of reinstalling the package
- After the attempted recovery is done, flash the blue led in a different pattern (3s off/3s on) to let the user know it's done. This is because some users might not have a HDMI cable/keyboard attached to the Odroid to interact in other ways. At this stage it's safe to pull power and retry to boot normally.
- Based on minimal ubuntu image for the relevant platform
- Vfat partition and rootfs need to have different UUIDs than the standard images to prevent conflicts
- Auto-update recovery script on startup if network is available
- Collect logs into a single file in the VFAT partition of the recovery disk. Logs contain things like dmesg, lsusb, lsusb -t, fdisk -l, ifconfig -a, ethtool ethX, df -h, cat /media/recovery/media/boot/boot.ini, ls -l /media/recovery/media/boot/, ls -l /media/recovery/, file /media/disk/media/boot/*Image. The logs can be posted to pastebin and analysed if the recovery didn't work.
- Run automatic recovery scripts that output to the same log and hdmi console
- Leave system running with ssh on by default in case the user needs to do extra debugging/extract extra information
Now, additional ideas are welcome. For instance, I'm not sure if the recovery procedure should run by default unattended, or if it should require user confirmation. Also, android systems will not be supported (except for fsck), but as far as I know Android rootfs is usually mounted read-only and updates are infrequent, so it shouldn't break as easily. Also, I'm not sure if in case of error we should copy the kernel/dtb/boot.ini/modules/initrd from the rescue disk to the target disk so that it boots next time (we already have them).
We can display a progress on-screen with dialog, but we need to select the lowest most-common supported resolution in boot.ini. (Do all TVs/monitors support 640x480?)
As I said, ideas welcome.