one is not like the others

Post Reply
fvolk
Posts: 456
Joined: Sun Jun 05, 2016 11:04 pm
languages_spoken: english
ODROIDs: C2, C4, H2
Has thanked: 0
Been thanked: 42 times
Contact:

one is not like the others

Post by fvolk »

With diskless netboot finally working, I set up my cluster of C4s.
All have a microSD with just a single, minimal boot script file that loads kernel+dtb from TFTP, this section looks like:
setenv autoload no
dhcp
setenv serverip 192.168.1.xx
tftp ${kernel_addr} ${bootfile}
tftp ${fdt_addr} ${fdtfile}
and server side I see in log:
dnsmasq-dhcp[2818]: 105392093 DHCPDISCOVER(eth0) 00:1e:06:xx:xx:xx
dnsmasq-dhcp[2818]: 105392093 vendor class: U-Boot.armv8
dnsmasq-tftp[2818]: sent /diskless/c2/Image to 192.168.1.yyy
dnsmasq-tftp[2818]: sent /diskless/c2/meson-sm1-odroid-c4.dtb to 192.168.1.yyy
and this works on all my C4s .... except on one C4. :-/
One C4 does not want to diskless boot, it gets stuck in:
22:10:29 dnsmasq-dhcp[2818]: 105392573 DHCPDISCOVER(eth0) 00:1e:06:xx:xx:xx
22:10:29 dnsmasq-dhcp[2818]: 105392573 vendor class: U-Boot.armv8
22:11:28 dnsmasq-tftp[2818]: failed sending /diskless/c2/Image to 192.168.1.yyy
22:11:55 dnsmasq-tftp[2818]: failed sending /diskless/c2/meson-sm1-odroid-c4.dtb to 192.168.1.yyy
Uboot calls DHCP, requests the correct files, but this transfer fails.
After one minute TFTP gives up with an error.

I swapped microSD cards between C4s.
I tried different microSD cards brands.
I erased and wrote a new microSD card from scratch.
I swapped network cables.
I swapped position on network switch.
I swapped power supply (cable).
I rebooted network switch/modem/router.
...but this one C4 does NOT boot from network.
However, this one DOES boot without problems from local files and the official images.

The workaround for now is to copy kernel Image and dtb locally on microSD and patch the script on this one to
load mmc ${devno}:1 ${kernel_addr} ${bootfile}
load mmc ${devno}:1 ${fdt_addr} ${fdtfile}
On every upgrade all nodes need no local changes, except this one where I have to update these two files manually - I hope I don't forget.

My only idea is this one C4 has some manufacturing tolerance that network somehow fails.
It is not a warranty case because otherwise it works just fine.

...but if you have an idea what makes this one C4 different, I would like to hear it, I'm running out of ideas.
(and now I'm considering to order another C4 AND also finally an USB UART... another ~100 EUR.... *sigh*)

fvolk
Posts: 456
Joined: Sun Jun 05, 2016 11:04 pm
languages_spoken: english
ODROIDs: C2, C4, H2
Has thanked: 0
Been thanked: 42 times
Contact:

Re: one is not like the others

Post by fvolk »

Bought more C4s.
Again, one of the new shipment does not boot diskless - however, its behaviour is slightly different:
9:19:29 dnsmasq-tftp[2759]: sent /z/__diskless/c2/meson-sm1-odroid-c4.dtb to 192.168.1.yyy
9:19:32 dnsmasq-tftp[2759]: failed sending /z/__diskless/c2/Image to 192.168.1.yyy
The first file succeeds, the second file does not? ... wtf...

User avatar
tobetter
Posts: 5557
Joined: Mon Feb 25, 2013 10:55 am
languages_spoken: Korean, English
ODROIDs: Many
Location: Paju, South Korea
Has thanked: 225 times
Been thanked: 652 times
Contact:

Re: one is not like the others

Post by tobetter »

I've uploaded the firmware of the Petitboot for ODROID-C4 yesterday and it can supports PXE boot on it. It would work with dnsmasq in your server, this is the link of my PXE boot setup to install Ubuntu. In order to run the Petitboot for ODROID-C4, you can refer to this link.
viewtopic.php?p=299477#p299477

EDIT: This is my PXE setup for Netboot Installer from the Petitboot
http://ppa.linuxfactory.or.kr/installer ... C4/default

mad_ady
Posts: 8338
Joined: Wed Jul 15, 2015 5:00 pm
languages_spoken: english
ODROIDs: XU4, C1+, C2, C4, N1, N2, H2, Go, Go Advance
Location: Bucharest, Romania
Has thanked: 574 times
Been thanked: 439 times
Contact:

Re: one is not like the others

Post by mad_ady »

@fvolk of the problem is consistent could you do a packet capture on your tftp server and see who misbehaves and how?

fvolk
Posts: 456
Joined: Sun Jun 05, 2016 11:04 pm
languages_spoken: english
ODROIDs: C2, C4, H2
Has thanked: 0
Been thanked: 42 times
Contact:

Re: one is not like the others

Post by fvolk »

Hmm... three options to become smarter

1) try PXE boot - but this would need me to figure out how to configure a second IP range for the C4s at the server, currently I have a single PXE range for the H2s
2) finally figure out USB UART console and see whether uboot shows something on client
3) wireshark the network packets at server side

fvolk
Posts: 456
Joined: Sun Jun 05, 2016 11:04 pm
languages_spoken: english
ODROIDs: C2, C4, H2
Has thanked: 0
Been thanked: 42 times
Contact:

Re: one is not like the others

Post by fvolk »

The C4 that fails:

Code: Select all

## Executing script at 04000000
dwmac.ff3f0000 Waiting for PHY auto negotiation to complete...... done
Speed: 1000, full duplex
BOOTP broadcast 1
BOOTP broadcast 2
DHCP client bound to address 192.168.1.xx (258 ms)
Speed: 1000, full duplex
Using dwmac.ff3f0000 device
TFTP from server 192.168.1.yy; our IP address is 192.168.1.xx
Filename 'c2/Image'.
Load address: 0x11000000
Loading: T T T T #T #T T T T T
Retry count exceeded; starting again
Speed: 1000, full duplex
Using dwmac.ff3f0000 device
TFTP from server 192.168.1.yy; our IP address is 192.168.1.xx
Filename 'c2/meson-sm1-odroid-c4.dtb'.
Load address: 0x20000000
Loading: T T T T T T #T T ###T T
Retry count exceeded; starting again
[rsvmem] get fdtaddr NULL!
rsvmem - reserve memory

Usage:
rsvmem check                   - check reserved memory
rsvmem dump                    - dump reserved memory

rsvmem check failed
active_slot is <NULL>
Unknown command 'store' - try 'help'
No dtbo patitions found
load dtb from 0x1000000 ......
## Flattened Device Tree blob at 20000000
   Booting using the fdt blob at 0x20000000
libfdt fdt_path_offset() returned FDT_ERR_BADSTRUCTURE
No valid dtbo image found
libfdt fdt_getprop(): FDT_ERR_NOTFOUND
[rsvmem] fdt get size #address-cells failed.
   Loading Device Tree to 000000001fff1000, end 000000001ffffb27 ... OK
fdt_find_or_add_subnode: memory: FDT_ERR_BADSTRUCTURE
ERROR: arch-specific fdt fixup failed
 - must RESET the board to recover.

FDT creation failed! hanging...### ERROR ### Please RESET the board ###
The C4 that half-fails:

Code: Select all

## Executing script at 04000000
dwmac.ff3f0000 Waiting for PHY auto negotiation to complete...... done
Speed: 1000, full duplex
BOOTP broadcast 1
DHCP client bound to address 192.168.1.xx (5 ms)
Speed: 1000, full duplex
Using dwmac.ff3f0000 device
TFTP from server 192.168.1.yy; our IP address is 192.168.1.xx
Filename 'c2/Image'.
Load address: 0x11000000
Loading: ###T T #T #T #T ##T #T T T ##T ####
Retry count exceeded; starting again
Speed: 1000, full duplex
Using dwmac.ff3f0000 device
TFTP from server 192.168.1.yy; our IP address is 192.168.1.xx
Filename 'c2/meson-sm1-odroid-c4.dtb'.
Load address: 0x20000000
Loading: T #T ####T #
         2.9 KiB/s
done
Bytes transferred = 47912 (bb28 hex)
All C4s that boot as expected:

Code: Select all

## Executing script at 04000000
dwmac.ff3f0000 Waiting for PHY auto negotiation to complete...... done
Speed: 1000, full duplex
BOOTP broadcast 1
BOOTP broadcast 2
DHCP client bound to address 192.168.1.xx (258 ms)
Speed: 1000, full duplex
Using dwmac.ff3f0000 device
TFTP from server 192.168.1.yy; our IP address is 192.168.1.xx
Filename 'c2/Image'.
Load address: 0x11000000
Loading: #################################################################
         #################################################################
         #################################################################
         #################################################################
         #################################################################
         #################################################################
         #################################################################
         #################################################################
         #################################################################
         #################################################################
         #################################################################
         #################################################################
         #################################################################
         #################################################################
         #################################################################
         #################################################################
         #################################################################
         #################################################################
         #################################################################
         #################################################################
         #################################################################
         #####################
         4.2 MiB/s
done
Bytes transferred = 20341248 (1366200 hex)
Speed: 1000, full duplex
Using dwmac.ff3f0000 device
TFTP from server 192.168.1.yy; our IP address is 192.168.1.xx
Filename 'c2/meson-sm1-odroid-c4.dtb'.
Load address: 0x20000000
Loading: ####
         3.3 MiB/s
done
Bytes transferred = 47912 (bb28 hex)
..all booted from the same microSD card.

User avatar
rooted
Posts: 7910
Joined: Fri Dec 19, 2014 9:12 am
languages_spoken: english
Location: Gulf of Mexico, US
Has thanked: 724 times
Been thanked: 233 times
Contact:

Re: one is not like the others

Post by rooted »

@tobetter Have you seen this thread?

User avatar
tobetter
Posts: 5557
Joined: Mon Feb 25, 2013 10:55 am
languages_spoken: Korean, English
ODROIDs: Many
Location: Paju, South Korea
Has thanked: 225 times
Been thanked: 652 times
Contact:

Re: one is not like the others

Post by tobetter »

rooted wrote:
Tue Jul 28, 2020 12:20 pm
@tobetter Have you seen this thread?
Yes, I do...and thinking about how I can test. Personally, I would go for PXE boot with the Petitboot but OP has a concern of IP ranges. I think the IP range can be set with MAC addresses of H2 and C4, let me figure out.
These users thanked the author tobetter for the post:
rooted (Tue Jul 28, 2020 1:16 pm)

fvolk
Posts: 456
Joined: Sun Jun 05, 2016 11:04 pm
languages_spoken: english
ODROIDs: C2, C4, H2
Has thanked: 0
Been thanked: 42 times
Contact:

Re: one is not like the others

Post by fvolk »

I think I meanwhile know already how to do two different IP/PXE ranges with dnsmasq, it is possible to "tag" requests based on architecture field in PXE request and then theoretically serve different files - but I havn't set this up yet in practice.

But... I'm actually considering to stop PXE and also boot the H2s differently. With one network attached everything is ok. With two networks the H2 UEFI tries first one network for ~60s, then times out, then tries the second network, the correct one, where it boots. So it would be nice to restrict the H2 to try PXE boot only from one specific network port and not like the other, otherwise booting takes a long time :-( ... I also don't want to buy a "smart" switch, which would also be a solution.

For the C4 I could still try wiresharking the traffic - anyone know how to run tshark and filter traffic on two specific ethernet macs?

mad_ady
Posts: 8338
Joined: Wed Jul 15, 2015 5:00 pm
languages_spoken: english
ODROIDs: XU4, C1+, C2, C4, N1, N2, H2, Go, Go Advance
Location: Bucharest, Romania
Has thanked: 574 times
Been thanked: 439 times
Contact:

Re: one is not like the others

Post by mad_ady »

@fvolk: here's what you can try:

Code: Select all

sudo tshark -i enp0s25 -f 'ether host aa:aa:aa:aa:aa:aa or ether host ff:ff:ff:ff:ff:ff' -s 0 -w /tmp/capture.pcap
I see timeouts in your output. Something causes it to lose packets...

brad
Posts: 1176
Joined: Tue Mar 29, 2016 1:22 pm
languages_spoken: english
ODROIDs: C2 N1 N2 N2+ H2 H2+ (64 bit ftw)
Location: Australia
Has thanked: 60 times
Been thanked: 108 times
Contact:

Re: one is not like the others

Post by brad »

Maybe see if you can adjust the server tftp settings, if you are using Ubuntu as the server and tftpd-hpa can you can try to add some options to the TFTP_OPTIONS parameter in /etc/default/tftpd-hpa

--blocksize 1468 --retransmit 2000000

(Ensure blocks can fit into 1 UDP frame and only try to retransmit the frame after 2 seconds)

Should be able to see in the wireshark if traffic is fragmented with larger block size

fvolk
Posts: 456
Joined: Sun Jun 05, 2016 11:04 pm
languages_spoken: english
ODROIDs: C2, C4, H2
Has thanked: 0
Been thanked: 42 times
Contact:

Re: one is not like the others

Post by fvolk »

brad wrote:
Tue Jul 28, 2020 5:21 pm
Maybe see if you can adjust the server tftp settings, if you are using Ubuntu as the server
Using buitl-in tftp of dnsmasq.

brad
Posts: 1176
Joined: Tue Mar 29, 2016 1:22 pm
languages_spoken: english
ODROIDs: C2 N1 N2 N2+ H2 H2+ (64 bit ftw)
Location: Australia
Has thanked: 60 times
Been thanked: 108 times
Contact:

Re: one is not like the others

Post by brad »

fvolk wrote:
Tue Jul 28, 2020 6:00 pm
brad wrote:
Tue Jul 28, 2020 5:21 pm
Maybe see if you can adjust the server tftp settings, if you are using Ubuntu as the server
Using buitl-in tftp of dnsmasq.
Sorry I think you mentioned that earlier I missed it. Try these in your /etc/dnsmasq.conf and restart dnsmasq to see if it helps

tftp-mtu=1468
tftp-no-blocksize

fvolk
Posts: 456
Joined: Sun Jun 05, 2016 11:04 pm
languages_spoken: english
ODROIDs: C2, C4, H2
Has thanked: 0
Been thanked: 42 times
Contact:

Re: one is not like the others

Post by fvolk »

mad_ady wrote:
Tue Jul 28, 2020 3:20 pm
@fvolk: here's what you can try:

Code: Select all

sudo tshark -i enp0s25 -f 'ether host aa:aa:aa:aa:aa:aa or ether host ff:ff:ff:ff:ff:ff' -s 0 -w /tmp/capture.pcap
I see timeouts in your output. Something causes it to lose packets...
Thank you for the template! But it was an AND not an OR :-)

Packet trace diff of "half booting" on the left, "ok booting" on the right.
xxx is upstream H2 server, yyy is C4 client
In the working case every block is acknowledged perfectly.
Seems in the non-working case some packets are not acknowledged?
tftp.png
tftp.png (258.58 KiB) Viewed 225 times

fvolk
Posts: 456
Joined: Sun Jun 05, 2016 11:04 pm
languages_spoken: english
ODROIDs: C2, C4, H2
Has thanked: 0
Been thanked: 42 times
Contact:

Re: one is not like the others

Post by fvolk »

Hmm... actually it should be easy to reproduce my setup:
1) flash U-Boot 2015.01-00128-g6443fcfcd0 (Mar 18 2020 - 09:49:13)
2) setup a TFTP server, place 2 fake files there, the small dtb and a ~20MB kernel Image
3) place following boot.ini on first partition - don't forget to set IP and filenames

Code: Select all

ODROIDC4-UBOOT-CONFIG

setenv bootargs ip=dhcp

setenv kernel_addr 11000000
setenv fdt_addr  0x20000000
setenv fdtfile c2/meson-sm1-odroid-c4.dtb
setenv bootfile c2/Image

setenv autoload no
dhcp
setenv serverip 192.168.1.FIXME
tftp ${kernel_addr} ${bootfile}
tftp ${fdt_addr} ${fdtfile}

booti ${kernel_addr} - ${fdt_addr}
boot
4) either see it load the files successfully (and crash), or see it stuck/retry on loading Image

..how many problem C4s have you? :-)

fvolk
Posts: 456
Joined: Sun Jun 05, 2016 11:04 pm
languages_spoken: english
ODROIDs: C2, C4, H2
Has thanked: 0
Been thanked: 42 times
Contact:

Re: one is not like the others

Post by fvolk »

brad wrote:
Tue Jul 28, 2020 6:24 pm
tftp-mtu=1468
tftp-no-blocksize
from my packet trace, I think already 1468 is used as blocksize

mad_ady
Posts: 8338
Joined: Wed Jul 15, 2015 5:00 pm
languages_spoken: english
ODROIDs: XU4, C1+, C2, C4, N1, N2, H2, Go, Go Advance
Location: Bucharest, Romania
Has thanked: 574 times
Been thanked: 439 times
Contact:

Re: one is not like the others

Post by mad_ady »

In your non-working case it looks like xxx has to sometimes send a packet three times before it gets an acknowledge. Is this the H2, or H2+? Some are complaining about H2+'s drivers.

Can you remove the network from the equation with a non-working C4 and plug it directly into the H2 and retry the tftp transfer? Do you still get packet loss/retransmissions?

fvolk
Posts: 456
Joined: Sun Jun 05, 2016 11:04 pm
languages_spoken: english
ODROIDs: C2, C4, H2
Has thanked: 0
Been thanked: 42 times
Contact:

Re: one is not like the others

Post by fvolk »

mad_ady wrote:
Tue Jul 28, 2020 7:28 pm
In your non-working case it looks like xxx has to sometimes send a packet three times before it gets an acknowledge.
Is this the H2, or H2+? Some are complaining about H2+'s drivers.

Can you remove the network from the equation with a non-working C4 and plug it directly into the H2 and retry the tftp transfer? Do you still get packet loss/retransmissions?
xxx is a H2 rev-A, running dnsmasq 2.81 with its internal TFTP -- to be upgraded to a H2+ once the network driver situation has calmed down

Direct network connection is not an option, sorry, other things on the network are in production.

I've swapped network cables, network switch, switch socket, more/less network load, etc. and its robustly reproduceable with just these two C4, so far it points to the C4s as the cause? To the network the C4s differ just in their MAC address, if some other component on the network triggers this on specific mac addresses this would be weird - but possible, yeah.

Post Reply

Return to “Issues”

Who is online

Users browsing this forum: No registered users and 1 guest