As of the time of this edit, Hardkernel are working on this issue. It turns out the problem has something to do with reading data from multiple USB devices at once, and it happens more quickly the faster you read data from them. The issue causes Ubuntu to stop recognising the N2's USB ports (including the micro-USB port) until you power the device off and on again, but ethernet still works so it's still possible to log in over SSH.
If you just came here looking for a solution, you can safely skip to odroid's latest update as of the time of writing.
How to check if you have this issue
To check if you have this issue, do the following:
- if your N2 crashes too quickly to test, optionally detach all your hard disks before powering it on
- open one terminal for each hard drive attached to your N2. SSH sessions over the built-in ethernet socket are best, but you can press alt+F1, alt+F2 etc. on the console to get multiple sessions
- this issue causes all USB ports to fail, so you won't be able to connect through a USB keyboard or wireless ethernet dongle once you've triggered the bug
- in one terminal, run
lsusb -t
and make a note of how many lines it displays - in every open terminal, run
dmesg -w &
- this will print a huge amount of debugging information, which you can ignore for now - once you've finished typing
dmesg -w &
in all the terminals, it's time to trigger the bug...- if your N2 crashes on its own just by plugging the disks in, plug them in now and wait
- if your N2 is stable until you actually use the disks, run
dd if=/dev/sd<letter> of=/dev/null status=progress
in each terminal (change <letter> to a in the first terminal, b in the second, and so on
- watch any open terminal for a little while - you should see a few cryptic error messages from
dmesg
, then after a few minutes you'll get a message that's several screen-lengths long - this means you've replicated the bug - if you have an ethernet cable connected to your N2, SSH should still work. Run
lsusb -t
again - you should see all your USB ports are now missing - shut your device down...
- if you can connect with SSH, do
shutdown -h now
in the normal way - otherwise, you'll have to unplug the N2
- if you can connect with SSH, do
dd
processes all fail in about a minute and lsusb
shows nothing attached afterwards, then you probably have the bug described in this thread. If you have the same behaviour but it takes a while longer before it happens (less than 30 minutes), you probably have the same issue but your disk might be partially immune for some reason. If the test runs happily for over half an hour, your issue is probably different to this one.Possible workaround
The following workaround is based on a pair of suggestions from mad_ady. Most people report it fixes the bug, but it also slows down data transfer by 20-50%.
- paste the following commands into a root shell on your N2:
Code: Select all
cat > /root/odroid-workaround.sh <<EOF #!/bin/sh echo 32 | tee /sys/class/block/?d?/queue/max_sectors_kb > /dev/null EOF chmod 755 /root/odroid-workaround.sh echo 'ACTION=="add", ENV{DEVNAME}=="/dev/?d?", SUBSYSTEM=="block" RUN+="/root/odroid-workaround.sh"' > /etc/udev/rules.d/99-odroid-workaround.rules
- reboot your N2 to ensure the changes have been applied
- optionally rerun the test to make sure this fix works for you
- optionally check the workaround was applied correctly:
cat /sys/class/block/?d?/queue/max_sectors_kb
- you should see one line for each attached disk drive, each of which should be
32
if the fix was applied correctly
- you should see one line for each attached disk drive, each of which should be
- optionally replace
32
with a better value for your device and use case (see "Tuning the workaround", below) - when odroid come out with a fix, remove this workaround:
rm -f /root/odroid-workaround.sh /etc/udev/rules.d/99-odroid-workaround.rules
- older versions of this post recommended slightly different workarounds. If you used one of those workarounds, you should also
rm -f /etc/cron.d/odroid-workaround /etc/modprobe/odroid-workaround.conf
- older versions of this post recommended slightly different workarounds. If you used one of those workarounds, you should also
The workaround above sets
max_sectors_kb
to 32
. Although this number seems to work for most people, some people need a number as low as 10
, while others have been able to increase the number into the low hundreds before their system became unstable.We're not really sure why different values work for different people. For example, it might be that faster hard disks need to be slowed down more in order to avoid the bug; or it might be that heavier workloads are more likely to make some internal buffer overflow. We can only suggest that you increase the number if your system seems stable, and reduce it if your system is unstable.
To try different values, edit
/root/odroid-workaround.sh
and change the number 32
to your preferred value (e.g. 24
or 128
). Rerun the script to apply the changes, then check if the issue still occurs.Original post (historical interest only - please ignore)
I'm running Ubuntu on an Odroid N2, which crashes in under a minute when syncing a RAID array containing two USB drives. I think I'm supposed to report to you guys to rule out errors in Odroid, Ubuntu or my own reasoning. But I suspect this is a hard drive quirk that should be patched into the kernel.
Steps to reproduce the error
- get an Odroid N2
- install the official Ubuntu 20190329 image on a micro SD card
- attach a Western Digital Gaming drive (USB ID 1058:261e)
- attach a Toshiba External USB 3.0 drive (USB ID 0480:0900)
- create an MD RAID1 array with both devices
- wait a few moments while the devices resync
- expected: the device slowly resyncs
observed: the system hangs and prints the attached messages tokern.log
dd
with either device on its own. Doing so would allow me to rule out one device, so I'd appreciate any suggestions.Note 2: I can provide exact instructions for creating a RAID array, but the instructions will be fairly long and boring so I'd rather confirm there's no easier way to replicate the problem first.
Workaround and suggested next steps
I've been able to work around the issue like so:
Code: Select all
echo 'options usb-storage quirks=0480:0900:g' > /etc/modprobe/quirk.conf # either of these will
echo 'options usb-storage quirks=1058:261e:g' > /etc/modprobe/quirk.conf # work around the issue
I'm still left with the following possibilities:
- could the Odroid itself be the problem? Has anyone else successfully configured a pair of UAS devices in a RAID array?
- is there a way to trigger this issue on a single device, so we can confirm which one has the issue?
- if we can confirm which device has the issue, would I be right that we're supposed to submit a patch to unusual_uas.h?