[Solved] Odroid-N1 hard crash doing NFS V4 writes

Post Reply
mlinuxguy
Posts: 842
Joined: Thu Feb 28, 2013 10:28 am
languages_spoken: english
ODROIDs: X, X2, XU, XU3, XU4, C1, C1+, C2, N1, USB-IO
Has thanked: 0
Been thanked: 0
Contact:

[Solved] Odroid-N1 hard crash doing NFS V4 writes

Post by mlinuxguy »

I crashed the odroid-n1 hard doing a rsync over NFS
Mounted my Ubuntu 17.10.1 Ryzen 1800x linux server as NFS V.4
Issued the command: rsync -varz /home/odroid/linux /net/linux

After getting thru the 'Documentation' section it hung hard
the last messages logged in /var/log/messages were like this:

Code: Select all

Feb 15 19:58:12 odroid NetworkManager[534]: <info>  [1518724692.3125] dhcp4 (eth0):   address 192.168.1.118
Feb 15 19:58:12 odroid NetworkManager[534]: <info>  [1518724692.3126] dhcp4 (eth0):   plen 24 (255.255.255.0)
Feb 15 19:58:12 odroid NetworkManager[534]: <info>  [1518724692.3127] dhcp4 (eth0):   gateway 192.168.1.1
Feb 15 19:58:12 odroid NetworkManager[534]: <info>  [1518724692.3127] dhcp4 (eth0):   server identifier 192.168.1.1
Feb 15 19:58:12 odroid NetworkManager[534]: <info>  [1518724692.3128] dhcp4 (eth0):   lease time 600
Feb 15 19:58:12 odroid NetworkManager[534]: <info>  [1518724692.3129] dhcp4 (eth0):   nameserver '192.168.1.1'
Feb 15 19:58:12 odroid NetworkManager[534]: <info>  [1518724692.3129] dhcp4 (eth0):   nameserver '8.8.8.8'
Feb 15 19:58:12 odroid NetworkManager[534]: <info>  [1518724692.3130] dhcp4 (eth0):   nameserver '8.8.4.4'
Feb 15 19:58:12 odroid NetworkManager[534]: <info>  [1518724692.3130] dhcp4 (eth0): state changed bound -> bound
No I don't have a serial console UART to log stuff with....
Last edited by mlinuxguy on Wed Feb 21, 2018 2:04 am, edited 2 times in total.

User avatar
rooted
Posts: 9723
Joined: Fri Dec 19, 2014 9:12 am
languages_spoken: english
Location: Gulf of Mexico, US
Has thanked: 767 times
Been thanked: 526 times
Contact:

Re: Odroid-N1 hard crash doing NFS V4 writes

Post by rooted »

Yeah it would have been nice to get the UART kit with it, I have one but it's inside my CloudShell with the USB hanging out the side.

You can tail the syslog writing it out then check the file after a reboot.

Code: Select all

tail -f /var/log/syslog >> /some/dir/log.file

mlinuxguy
Posts: 842
Joined: Thu Feb 28, 2013 10:28 am
languages_spoken: english
ODROIDs: X, X2, XU, XU3, XU4, C1, C1+, C2, N1, USB-IO
Has thanked: 0
Been thanked: 0
Contact:

Re: Odroid-N1 hard crash doing NFS V4 writes

Post by mlinuxguy »

rooted wrote:Yeah it would have been nice to get the UART kit with it, I have one but it's inside my CloudShell with the USB hanging out the side.

You can tail the syslog writing it out then check the file after a reboot.

Code: Select all

tail -f /var/log/syslog >> /some/dir/log.file
I had a tail going just now live and it hung during a 'git clone' writing to the NFS mount
No messages logged.

mlinuxguy
Posts: 842
Joined: Thu Feb 28, 2013 10:28 am
languages_spoken: english
ODROIDs: X, X2, XU, XU3, XU4, C1, C1+, C2, N1, USB-IO
Has thanked: 0
Been thanked: 0
Contact:

Re: Odroid-N1 hard crash doing NFS V4 writes

Post by mlinuxguy »

I need someone else to test NFS to see if the N1 also hangs for them

Code: Select all

I'm running the N1 directly to a 1G Rosewell 24 port switch
On another port on that switch is my Ubuntu 17.10 linux server sharing out /home/USER/shared
Mount options are:   mount 192.168.1.121:/home/USER/shared /net
What I've seen so far is:

Code: Select all

rsync -vars /home/odroid/linux /net/linux   <--- hangs N1 hard only power lights on
chown -R odroid:odroid /net/linux  <---- takes hours to just do a few hundred files and was still going when I stopped it
make -J8 Image   <--- within /net/linux   <--- hangs N1 hard
Nothing is logged to /var/log/messages

mlinuxguy
Posts: 842
Joined: Thu Feb 28, 2013 10:28 am
languages_spoken: english
ODROIDs: X, X2, XU, XU3, XU4, C1, C1+, C2, N1, USB-IO
Has thanked: 0
Been thanked: 0
Contact:

Re: Odroid-N1 hard crash doing NFS V4 writes

Post by mlinuxguy »

Here's the "hang" when building the linux kernel over NFS mount
turns out not really a hang, we hung_task panic'd

Code: Select all

root@odroid:~# [  566.988959] EXT4-fs (sda1): barriers disabled
[  567.051467] EXT4-fs (sda1): mounted filesystem with ordered data mode. Opts: barrier=0
[  567.147451] NFS: Registering the id_resolver key type
[  567.152574] Key type id_resolver registered
[  567.157343] Key type id_legacy registered
[ 1175.542539] nfs: server 192.168.1.121 not responding, still trying
[ 1175.542621] nfs: server 192.168.1.121 not responding, still trying
[ 1175.554936] nfs: server 192.168.1.121 not responding, still trying
[ 1175.663518] nfs: server 192.168.1.121 not responding, still trying
[ 1175.846635] nfs: server 192.168.1.121 not responding, still trying
[ 1176.056341] nfs: server 192.168.1.121 not responding, still trying

[ 1200.808682] INFO: task fixdep:2773 blocked for more than 120 seconds.
[ 1200.815144]       Not tainted 4.4.112 #6
[ 1200.819098] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1200.826939] fixdep          D ffffff80080857fc     0  2773   2691 0x00000004
[ 1200.834053] Call trace:
[ 1200.836533] [<ffffff80080857fc>] __switch_to+0x94/0xbc
[ 1200.841695] [<ffffff80087e95c4>] __schedule+0x324/0x624
[ 1200.846932] [<ffffff80087e9948>] schedule+0x84/0x98
[ 1200.851837] [<ffffff80087ebf84>] schedule_timeout+0x34/0x210
[ 1200.857510] [<ffffff80087e9268>] io_schedule_timeout+0x70/0xa8
[ 1200.863367] [<ffffff80087ea0c4>] bit_wait_io+0x20/0x64
[ 1200.868530] [<ffffff80087e9d04>] __wait_on_bit+0x74/0xc8
[ 1200.873869] [<ffffff8008166960>] wait_on_page_bit+0x7c/0x88
[ 1200.879471] [<ffffff8008166a84>] __filemap_fdatawait_range+0xb8/0x118
[ 1200.885925] [<ffffff8008166b18>] filemap_fdatawait_range+0x34/0x58
[ 1200.892137] [<ffffff80081685a4>] filemap_write_and_wait_range+0x5c/0x88
[ 1200.898855] [<ffffff80009676dc>] nfs4_file_fsync+0x94/0x1b8 [nfsv4]
[ 1200.905148] [<ffffff80081d8cf4>] vfs_fsync_range+0x94/0xb0

We panic because hung_task is set
[ 1200.958882] Kernel panic - not syncing: hung_task: blocked tasks
[ 1200.964893] CPU0: stopping
[ 1200.964899] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.4.112 #6
[ 1200.964900] Hardware name: Hardkernel ODROID-N1 (DT)
[ 1200.964901] Call trace:
[ 1200.964906] [<ffffff800808899c>] dump_backtrace+0x0/0x21c
[ 1200.964911] [<ffffff8008088bdc>] show_stack+0x24/0x30
[ 1200.964917] [<ffffff80082d5418>] dump_stack+0x94/0xbc
[ 1200.964922] [<ffffff800808d440>] handle_IPI+0x1e0/0x260
[ 1200.964925] [<ffffff8008080ed8>] gic_handle_irq+0x128/0x180
[ 1200.964928] Exception stack(0xffffff8008c03d60 to 0xffffff8008c03e90)
[ 1200.964932] 3d60: ffffffc0f7eee1c0 00000040ef30a000 ffffffc0f7eee1c0 001857cdcb8e28cb
[ 1200.964936] 3d80: 00ffffffffffffff 00000006bec29cfc 000000000002bbfe 0000000000000000
[ 1200.964939] 3da0: 00000032b5593519 ffffff8008081800 0000000000001000 0000000000000000
[ 1200.964943] 3dc0: 0000000034d5d91d 00000040ef30a000 ffffff8008c0d350 0000000000000000
[ 1200.964946] 3de0: 0000000000000000 0000000000000000 0000000030d00800 000001179dd0e3ef
[ 1200.964950] 3e00: 0000000000000000 ffffffc0f0823400 ffffff8008cddb00 0000000000000000
[ 1200.964953] 3e20: 000001179dc907d1 ffffff8008c8f4e0 0000000002dab000 00000000027ef22c
[ 1200.964956] 3e40: 0000000002b1001c ffffff8008c03e90 ffffff80085c15ac ffffff8008c03e90
[ 1200.964960] 3e60: ffffff80085c15e0 0000000060000145 ffffff8008c03e90 ffffff80085c15ac
[ 1200.964962] 3e80: ffffffffffffffff 0000000000000000
[ 1200.964965] [<ffffff80080827b4>] el1_irq+0xb4/0x140
[ 1200.964971] [<ffffff80085c15e0>] cpuidle_enter_state+0x1cc/0x25c
[ 1200.964973] [<ffffff80085c16e4>] cpuidle_enter+0x34/0x44
[ 1200.964978] [<ffffff80080dfb7c>] call_cpuidle+0x6c/0x74
== The only messages logged on my NFS server
[417391.283107] RPC request reserved 200 but used 268
[426893.545808] RPC request reserved 192 but used 268

and NFS is still running fine on it to other systems
Sadly none of this told my why it could no longer reach the NFS server, but given my SSH sessions locked up then
its probably something on the networking side in the N1.

== Sure enough the N1 is setup to panic on hung tasks

Code: Select all

root@odroid:~# sysctl -a | grep hung
kernel.hung_task_check_count = 4194304
kernel.hung_task_panic = 1
kernel.hung_task_timeout_secs = 120
kernel.hung_task_warnings = 10
so add this line to /etc/sysctl.conf
root@odroid:~# tail /etc/sysctl.conf
kernel.hung_task_panic=0

User avatar
rooted
Posts: 9723
Joined: Fri Dec 19, 2014 9:12 am
languages_spoken: english
Location: Gulf of Mexico, US
Has thanked: 767 times
Been thanked: 526 times
Contact:

Re: Odroid-N1 hard crash doing NFS V4 writes

Post by rooted »

Good debugging mate.

mlinuxguy
Posts: 842
Joined: Thu Feb 28, 2013 10:28 am
languages_spoken: english
ODROIDs: X, X2, XU, XU3, XU4, C1, C1+, C2, N1, USB-IO
Has thanked: 0
Been thanked: 0
Contact:

Re: Odroid-N1 hard crash doing NFS V4 writes

Post by mlinuxguy »

Now that I figured out what was causing the N1 to kernel panic I can get back to trying to build the kernel over NFS mount
To ensure there are no NFS issues I increased the thread count for NFS on my server from the default of 8 to 32

I also changed the mount options for the NFS device to:
mount 192.168.1.121:/home/USER/shared /net -o intr,timeo=10,soft,retrans=5

NFS had issues much further into the build

Code: Select all

[ 2289.842356] nfs: server 192.168.1.121 not responding, timed out
[ 2289.989600] nfs: server 192.168.1.121 not responding, timed out
[ 2289.995609] nfs: server 192.168.1.121 not responding, timed out
[ 2290.001633] nfs: server 192.168.1.121 not responding, timed out
[ 2290.007666] nfs: server 192.168.1.121 not responding, timed out
... continually dumping these messages ...
Eventually it stops after the build processes all die out

Code: Select all

odroid@odroid:/net/linux$ ls -l
ls: cannot open directory '.': Input/output error
odroid@odroid:/net/linux$ ls -l
total 844
-rw-r--r--   1 odroid odroid  18693 Feb 15 20:28 COPYING
-rw-r--r--   1 odroid odroid  97181 Feb 15 20:28 CREDITS
drwxr-xr-x 112 odroid odroid  12288 Feb 15 20:28 Documentation
-rw-r--r--   1 odroid odroid   1992 Feb 15 20:28 Kbuild
-rw-r--r--   1 odroid odroid    252 Feb 15 20:28 Kconfig
-rw-r--r--   1 odroid odroid 338049 Feb 15 20:28 MAINTAINERS
-rw-r--r--   1 odroid odroid  57912 Feb 15 22:17 Makefile

So I'm left with no idea why NFS fails on the N1

mlinuxguy
Posts: 842
Joined: Thu Feb 28, 2013 10:28 am
languages_spoken: english
ODROIDs: X, X2, XU, XU3, XU4, C1, C1+, C2, N1, USB-IO
Has thanked: 0
Been thanked: 0
Contact:

Re: Odroid-N1 hard crash doing NFS V4 writes

Post by mlinuxguy »

I switched out the NFS share to my QNAP NAS box in the basement
hoping it would work better than the Ubuntu 17.10 server sharing files to the N1

No, same issue with NFS server going unavailable
Yet it's still sharing the same NFS share out to other systems with no issues.
So its definitely something in the N1

Code: Select all

[11874.087249] nfs: server 192.168.1.44 not responding, still trying
[11880.596217] INFO: task git:15118 blocked for more than 120 seconds.
[11880.602511]       Tainted: G        W       4.4.112 #6
[11880.607673] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[11880.615516] git             D ffffff80080857fc     0 15118   1004 0x00000001
[11880.622639] Call trace:
[11880.625126] [<ffffff80080857fc>] __switch_to+0x94/0xbc
[11880.630306] [<ffffff80087e95c4>] __schedule+0x324/0x624
[11880.635547] [<ffffff80087e9948>] schedule+0x84/0x98
[11880.640448] [<ffffff80087ebf84>] schedule_timeout+0x34/0x210
[11880.646121] [<ffffff80087e9268>] io_schedule_timeout+0x70/0xa8
[11880.651973] [<ffffff80087ea0c4>] bit_wait_io+0x20/0x64
[11880.657128] [<ffffff80087e9d04>] __wait_on_bit+0x74/0xc8
[11880.662465] [<ffffff8008166960>] wait_on_page_bit+0x7c/0x88
[11880.668054] [<ffffff8008166a84>] __filemap_fdatawait_range+0xb8/0x118
[11880.674510] [<ffffff8008166b18>] filemap_fdatawait_range+0x34/0x58
[11880.680723] [<ffffff8008166b74>] filemap_fdatawait+0x38/0x44
[11880.686407] [<ffffff80081684a0>] filemap_write_and_wait+0x3c/0x64
[11880.692568] [<ffffff8000918908>] nfs_wb_all+0x68/0x158 [nfs]
[11880.698288] [<ffffff800090b928>] nfs_sync_inode+0x28/0x34 [nfs]
[11880.704272] [<ffffff800090dda0>] nfs_getattr+0x11c/0x22c [nfs]
[11880.710136] [<ffffff80081af914>] vfs_getattr_nosec+0x3c/0x58
[11880.715818] [<ffffff80081afb8c>] vfs_fstat+0x40/0x6c
[11880.720820] [<ffffff80081b00ac>] SyS_newfstat+0x28/0x48
[11880.726063] [<ffffff8008082f30>] el0_svc_naked+0x24/0x28
I kill off the terminal window with the git in it and suddenly the N1 can find the NFS server

Code: Select all

[12291.358864] nfs: server 192.168.1.44 not responding, still trying
[12318.218861] nfs: server 192.168.1.44 OK
However it keeps disappearing and reappearing

User avatar
rooted
Posts: 9723
Joined: Fri Dec 19, 2014 9:12 am
languages_spoken: english
Location: Gulf of Mexico, US
Has thanked: 767 times
Been thanked: 526 times
Contact:

Re: Odroid-N1 hard crash doing NFS V4 writes

Post by rooted »

Are you using the Hardkernel kernel or something you compiled?

mlinuxguy
Posts: 842
Joined: Thu Feb 28, 2013 10:28 am
languages_spoken: english
ODROIDs: X, X2, XU, XU3, XU4, C1, C1+, C2, N1, USB-IO
Has thanked: 0
Been thanked: 0
Contact:

Re: Odroid-N1 hard crash doing NFS V4 writes

Post by mlinuxguy »

rooted wrote:Are you using the Hardkernel kernel or something you compiled?
I've used both, rebooted between them to test various options with no luck

User avatar
rooted
Posts: 9723
Joined: Fri Dec 19, 2014 9:12 am
languages_spoken: english
Location: Gulf of Mexico, US
Has thanked: 767 times
Been thanked: 526 times
Contact:

Re: Odroid-N1 hard crash doing NFS V4 writes

Post by rooted »

Have you tried using NFSv3 to determine if it's purely a protocol issue?

mlinuxguy
Posts: 842
Joined: Thu Feb 28, 2013 10:28 am
languages_spoken: english
ODROIDs: X, X2, XU, XU3, XU4, C1, C1+, C2, N1, USB-IO
Has thanked: 0
Been thanked: 0
Contact:

Re: Odroid-N1 hard crash doing NFS V4 writes

Post by mlinuxguy »

rooted wrote:Have you tried using NFSv3 to determine if it's purely a protocol issue?
Not yet, I did spend time checking the patches post 4.4.112 to see if anything applied to no avail.

mlinuxguy
Posts: 842
Joined: Thu Feb 28, 2013 10:28 am
languages_spoken: english
ODROIDs: X, X2, XU, XU3, XU4, C1, C1+, C2, N1, USB-IO
Has thanked: 0
Been thanked: 0
Contact:

Re: Odroid-N1 hard crash doing NFS V4 writes

Post by mlinuxguy »

Multiple tests of the N1's NFS hang ongoing today
First up is limiting the network speed to 100Mbit (I have 1GBit switches and systems)

It's not the same rockchip SOC but will try this first:
ref: https://git.kernel.org/pub/scm/linux/ke ... 1798fb7d50
arm64: dts: rockchip: limit rk3328-rock64 gmac speed to 100MBit for now
It looks like either the current kernel or the hardware has reliability
issues when the gmac is actually running at 1GBit. In my test-case
it is not able to boot on a nfsroot at this speed, as the system
will always lose the connection to the nfs-server during boot, before
reaching any login prompt and not recover from this.
Testing if 1GBit network is unstable (NFS tests show loses access to NFS server)
force 100mbit

Code: Select all

&gmac {
    phy-supply = <&vcc_phy>;
    phy-mode = "rgmii";
    clock_in_out = "input";
    /* testing if 1GBit is unstable, limit to 100 */
    max-speed = <100>;                                     <----- limit to 100
    snps,reset-gpio = <&gpio3 15 GPIO_ACTIVE_LOW>;
checking dmesg
[ 11.007871] rk_gmac-dwmac fe300000.ethernet eth0: Link is Up - 100Mbps/Full - flow control rx/tx
^^^ limited to 100 now to test a kernel build over NFS mount

5 minutes into the build and have already made it further without a NFS server gone missing message on the N1
Update: 10 minutes into the build the NFS server goes missing

Code: Select all

[ 1118.407053] nfs: server 192.168.1.121 not responding, timed out
[ 1118.412996] nfs: server 192.168.1.121 not responding, timed out
[ 1118.418947] nfs: server 192.168.1.121 not responding, timed out
[ 1118.424890] nfs: server 192.168.1.121 not responding, timed out
[ 1118.430840] nfs: server 192.168.1.121 not responding, timed out
[ 1118.436783] nfs: server 192.168.1.121 not responding, timed out
[ 1118.442732] nfs: server 192.168.1.121 not responding, timed out
Note: I'm using ssh into the N1 and that has never dropped.

mlinuxguy
Posts: 842
Joined: Thu Feb 28, 2013 10:28 am
languages_spoken: english
ODROIDs: X, X2, XU, XU3, XU4, C1, C1+, C2, N1, USB-IO
Has thanked: 0
Been thanked: 0
Contact:

Re: Odroid-N1 hard crash doing NFS V4 writes

Post by mlinuxguy »

Next test on NFS issue is using a USB 3.0 1GBit nic
AmazonBasics USB 3.0 to 10/100/1000 Gigabit Ethernet Adapter

Plugged in new nic, disconnected built-in, ssh'd into new address on USB nic, and remounted NFS via new nic
Let the tests begin.

Code: Select all

[ 1433.328627] usb 6-1: new SuperSpeed USB device number 2 using xhci-hcd
[ 1433.352046] usb 6-1: New USB device found, idVendor=0b95, idProduct=1790
[ 1433.358794] usb 6-1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[ 1433.365953] usb 6-1: Product: AX88179
[ 1433.369634] usb 6-1: Manufacturer: ASIX Elec. Corp.
[ 1433.374533] usb 6-1: SerialNumber: 000050B62396A7
[ 1433.753222] ax88179_178a 6-1:1.0 eth1: register 'ax88179_178a' at usb-xhci-hcd.7.auto-1, ASIX AX88179 USB 3.0 Gigabit Ethernet, 00:50:b6:23:96:a7
[ 1433.766353] usbcore: registered new interface driver ax88179_178a
[ 1433.777937] ax88179_178a 6-1:1.0 enx0050b62396a7: renamed from eth1
[ 1433.803884] IPv6: ADDRCONF(NETDEV_UP): enx0050b62396a7: link is not ready
[ 1434.132418] IPv6: ADDRCONF(NETDEV_UP): enx0050b62396a7: link is not ready

root@odroid:~# [ 1564.090594] ax88179_178a 6-1:1.0 enx0050b62396a7: ax88179 - Link status is: 1
[ 1564.103982] IPv6: ADDRCONF(NETDEV_CHANGE): enx0050b62396a7: link becomes ready

root@odroid:~# umount /net
root@odroid:~# [ 1623.841209] rk_gmac-dwmac fe300000.ethernet eth0: Link is Down
ssh sessions were working fine. Started a 'make clean' on the NFS mount and we lose NFS server and ssh sessions don't respond

Code: Select all

[ 1942.525354] nfs: server 192.168.1.121 not responding, timed out
[ 1948.539439] nfs: server 192.168.1.121 not responding, timed out
[ 1954.561562] nfs: server 192.168.1.121 not responding, timed out
[ 1960.575498] nfs: server 192.168.1.121 not responding, timed out
[ 1966.589681] nfs: server 192.168.1.121 not responding, timed out

mlinuxguy
Posts: 842
Joined: Thu Feb 28, 2013 10:28 am
languages_spoken: english
ODROIDs: X, X2, XU, XU3, XU4, C1, C1+, C2, N1, USB-IO
Has thanked: 0
Been thanked: 0
Contact:

Re: Odroid-N1 hard crash doing NFS V4 writes

Post by mlinuxguy »

I can't even run iperf over the USB 3.0 NIC without losing my ssh connections into the N1
Nothing logged in serial console

Apparently USB 3.0 NIC not a good test option, though it works great for just ssh sessions

mlinuxguy
Posts: 842
Joined: Thu Feb 28, 2013 10:28 am
languages_spoken: english
ODROIDs: X, X2, XU, XU3, XU4, C1, C1+, C2, N1, USB-IO
Has thanked: 0
Been thanked: 0
Contact:

Re: Odroid-N1 hard crash doing NFS V4 writes

Post by mlinuxguy »

rooted wrote:Have you tried using NFSv3 to determine if it's purely a protocol issue?
Just tested NFSv3 and it lost the NFS server

Code: Select all

  CC      lib/dump_stack.o
  CC      drivers/gpu/drm/drm_ioctl.o
fixdep: error opening config file: include/linux/proportions.h: Input/output error
fixdep: error opening config file: include/net/netns/x_tables.h: Input/output error
scripts/Makefile.build:272: recipe for target 'net/ipv4/udp_tunnel.o' failed

Code: Select all

[ 2793.546458] nfs: server 192.168.1.121 not responding, timed out
root@odroid:~# ps -ef | grep make
root     31740  1022  0 21:16 ttyFIQ0  00:00:00 grep make
root@odroid:~# [ 2811.956970] nfs: server 192.168.1.121 not responding, timed out
The mount options I use are:

Code: Select all

192.168.1.121:/home/mmthomas/shared on /net type nfs (rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,soft,proto=tcp,timeo=10,retrans=5,sec=sys,mountaddr=192.168.1.121,mountvers=3,mountport=54655,mountproto=udp,local_lock=none,addr=192.168.1.121)
Using the following mount command

Code: Select all

mount 192.168.1.121:/home/USER/shared /net -o nfsvers=3,intr,timeo=10,soft,retrans=5

crashoverride
Posts: 5747
Joined: Tue Dec 30, 2014 8:42 pm
languages_spoken: english
ODROIDs: C1
Has thanked: 0
Been thanked: 590 times
Contact:

Re: Odroid-N1 hard crash doing NFS V4 writes

Post by crashoverride »

I don't think this issue is exclusive to NFS. I am seeing something similar with VNC. This suggests a network driver issue. I have also noticed some strangeness on SSH sessions but can not yet attribute them to the same issue.

[edit]
Using vino-server, the display will just freeze at some point. I do not see any messages in dmesg or any log that I can find.

[edit2]
"netstat -t" shows the connection as "ESTABLISHED" with a full Send-Q that never empties.

[edit3]
There may be a power related issue affecting this:
https://github.com/rockchip-linux/kernel/issues/27

I noticed the tx_delay in the N1 device tree is exaggerated (0x100). The same entry for RK3399 EVB is 0x28. I tested the 0x28 and 0x7f values, but the network will not even initialize.

mlinuxguy
Posts: 842
Joined: Thu Feb 28, 2013 10:28 am
languages_spoken: english
ODROIDs: X, X2, XU, XU3, XU4, C1, C1+, C2, N1, USB-IO
Has thanked: 0
Been thanked: 0
Contact:

Re: Odroid-N1 hard crash doing NFS V4 writes

Post by mlinuxguy »

crashoverride wrote:I don't think this issue is exclusive to NFS. I am seeing something similar with VNC. This suggests a network driver issue. I have also noticed some strangeness on SSH sessions but can not yet attribute them to the same issue.
I see this logged on my NFS server when nfsd debugging is enabled
rpcdebug -m nfsd all <--- enable debugging NFSD to syslog
rpcdebug -m nfsd -c all <---- disable NFSD debug logging

Code: Select all

Feb 18 16:56:11 linux-server kernel: [688452.714605] nfsd4_sequence: slotid 0
Feb 18 16:56:11 linux-server kernel: [688452.714605] check_slot_seqid enter. seqid 64162 slot_seqid 64161
Feb 18 16:56:11 linux-server kernel: [688452.714606] nfsv4 compound op ffff892d1f07f080 opcnt 4 #1: 53: status 0
Feb 18 16:56:11 linux-server kernel: [688452.714607] nfsv4 compound op #2/4: 22 (OP_PUTFH)
Feb 18 16:56:11 linux-server kernel: [688452.714608] nfsd: fh_verify(36: 01070001 031003e3 00000000 1fd03ac1 ff4b4c16 ce1a1598)
Feb 18 16:56:11 linux-server kernel: [688452.714611] nfsv4 compound op ffff892d1f07f080 opcnt 4 #2: 22: status 0
Feb 18 16:56:11 linux-server kernel: [688452.714612] nfsv4 compound op #3/4: 4 (OP_CLOSE)
Feb 18 16:56:11 linux-server kernel: [688452.714612] NFSD: nfsd4_close on file .ftrace.o.d
Feb 18 16:56:11 linux-server kernel: [688452.714613] NFSD: nfs4_preprocess_seqid_op: seqid=0 stateid = (5a8a0101/48f2359d/00004e75/00000001)
Feb 18 16:56:11 linux-server kernel: [688452.714616] nfsv4 compound op ffff892d1f07f080 opcnt 4 #3: 4: status 0
Feb 18 16:56:11 linux-server kernel: [688452.714617] nfsv4 compound op #4/4: 9 (OP_GETATTR)
Feb 18 16:56:11 linux-server kernel: [688452.714618] nfsd: fh_verify(36: 01070001 031003e3 00000000 1fd03ac1 ff4b4c16 ce1a1598)
Feb 18 16:56:11 linux-server kernel: [688452.714620] nfsv4 compound op ffff892d1f07f080 opcnt 4 #4: 9: status 0
Feb 18 16:56:11 linux-server kernel: [688452.714620] nfsv4 compound returned 0
Feb 18 16:56:11 linux-server kernel: [688452.714621] --> nfsd4_store_cache_entry slot ffff892d34c07000
^^^ normal sequence
Then the N1 starts logging NFS server not responding

Code: Select all

eb 18 16:56:51 linux-server kernel: [688491.969262] NFSD: laundromat service - starting
Feb 18 16:56:51 linux-server kernel: [688491.969266] NFSD: laundromat_main - sleeping for 50 seconds
Feb 18 16:57:42 linux-server kernel: [688543.169397] NFSD: laundromat service - starting
Feb 18 16:57:42 linux-server kernel: [688543.169401] NFSD: purging unused client (clientid 48f2359d)
Feb 18 16:57:42 linux-server kernel: [688543.169403] NFSD: nfs4_make_rec_clidname for Linux NFSv4.2 odroid
Feb 18 16:57:42 linux-server kernel: [688543.169427] NFSD: nfsd4_unlink_clid_dir. name af765c8f527f47d9e0fd40d5663fc7ba
Feb 18 16:57:42 linux-server kernel: [688543.217509] NFSD: laundromat_main - sleeping for 90 seconds
Feb 18 16:59:12 linux-server kernel: [688633.281271] NFSD: laundromat service - starting
Feb 18 16:59:12 linux-server kernel: [688633.281275] NFSD: laundromat_main - sleeping for 90 seconds
And the server starts logging that the session isn't found

Code: Select all

Feb 18 16:59:38 linux-server kernel: [688659.809882] nfsv4 compound op #1/4: 53 (OP_SEQUENCE)
Feb 18 16:59:38 linux-server kernel: [688659.809884] __find_in_sessionid_hashtbl: 1518993665:1223832989:35:0
Feb 18 16:59:38 linux-server kernel: [688659.809885] nfsv4 compound op #1/4: 53 (OP_SEQUENCE)
Feb 18 16:59:38 linux-server kernel: [688659.809886] __find_in_sessionid_hashtbl: session not found
Feb 18 16:59:38 linux-server kernel: [688659.809889] __find_in_sessionid_hashtbl: 1518993665:1223832989:35:0
Feb 18 16:59:38 linux-server kernel: [688659.809891] nfsv4 compound op ffff892cc266b080 opcnt 4 #1: 53: status 10052
Feb 18 16:59:38 linux-server kernel: [688659.809892] nfsv4 compound returned 10052
Feb 18 16:59:38 linux-server kernel: [688659.809893] __find_in_sessionid_hashtbl: session not found
Feb 18 16:59:38 linux-server kernel: [688659.809894] nfsd_dispatch: vers 4 proc 1

elatllat
Posts: 1896
Joined: Tue Sep 01, 2015 8:54 am
languages_spoken: english
ODROIDs: XU4, N1, N2, C4, N2+, HC4
Has thanked: 72 times
Been thanked: 139 times
Contact:

Re: Odroid-N1 hard crash doing NFS V4 writes

Post by elatllat »

If acting only as an NFS server, you would not need VPU support etc so you might? be able to boot 4.14 LTS from kernel.org to see if the issue was caused by linario;

Code: Select all

> git diff --stat remotes/kernel/linux-4.4.y odroidn1-4.4.y | grep -i nfs
 fs/kernfs/file.c                                   |      2 +-
 fs/nfs/direct.c                                    |      4 +-
 fs/nfs/nfs4idmap.c                                 |      6 +-
 fs/nfs/pnfs.c                                      |      4 +-
 fs/nfs/write.c                                     |      2 -
> git diff remotes/kernel/linux-4.4.y odroidn1-4.4.y -- fs/nfs/write.c fs/nfs/pnfs.c fs/nfs/nfs4idmap.c fs/nfs/direct.c
diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
index 5fd3cf5..4b1d08f 100644
--- a/fs/nfs/direct.c
+++ b/fs/nfs/direct.c
@@ -787,8 +787,10 @@ static void nfs_direct_write_completion(struct nfs_pgio_header *hdr)
 
        spin_lock(&dreq->lock);
 
-       if (test_bit(NFS_IOHDR_ERROR, &hdr->flags))
+       if (test_bit(NFS_IOHDR_ERROR, &hdr->flags)) {
+               dreq->flags = 0;
                dreq->error = hdr->error;
+       }
        if (dreq->error == 0) {
                nfs_direct_good_bytes(dreq, hdr);
                if (nfs_write_need_commit(hdr)) {
diff --git a/fs/nfs/nfs4idmap.c b/fs/nfs/nfs4idmap.c
index 1ee62e6..5ba22c6 100644
--- a/fs/nfs/nfs4idmap.c
+++ b/fs/nfs/nfs4idmap.c
@@ -567,13 +567,9 @@ static int nfs_idmap_legacy_upcall(struct key_construction *cons,
        struct idmap_msg *im;
        struct idmap *idmap = (struct idmap *)aux;
        struct key *key = cons->key;
-       int ret = -ENOKEY;
-
-       if (!aux)
-               goto out1;
+       int ret = -ENOMEM;
 
        /* msg and im are freed in idmap_pipe_destroy_msg */
-       ret = -ENOMEM;
        data = kzalloc(sizeof(*data), GFP_KERNEL);
        if (!data)
                goto out1;
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index c8e75e5..7af7bed 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -1943,7 +1943,7 @@ pnfs_write_through_mds(struct nfs_pageio_descriptor *desc,
                nfs_pageio_reset_write_mds(desc);
                mirror->pg_recoalesce = 1;
        }
-       hdr->completion_ops->completion(hdr);
+       hdr->release(hdr);
 }
 
 static enum pnfs_try_status
@@ -2058,7 +2058,7 @@ pnfs_read_through_mds(struct nfs_pageio_descriptor *desc,
                nfs_pageio_reset_read_mds(desc);
                mirror->pg_recoalesce = 1;
        }
-       hdr->completion_ops->completion(hdr);
+       hdr->release(hdr);
 }
 
 /*
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 6e81a5b..7a9b6e3 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -1746,8 +1746,6 @@ static void nfs_commit_release_pages(struct nfs_commit_data *data)
                set_bit(NFS_CONTEXT_RESEND_WRITES, &req->wb_context->flags);
        next:
                nfs_unlock_and_release_request(req);
-               /* Latency breaker */
-               cond_resched();
        }
        nfss = NFS_SERVER(data->inode);
        if (atomic_long_read(&nfss->writeback) < NFS_CONGESTION_OFF_THRESH)
Last edited by elatllat on Wed Feb 21, 2018 8:32 pm, edited 1 time in total.

mlinuxguy
Posts: 842
Joined: Thu Feb 28, 2013 10:28 am
languages_spoken: english
ODROIDs: X, X2, XU, XU3, XU4, C1, C1+, C2, N1, USB-IO
Has thanked: 0
Been thanked: 0
Contact:

Re: Odroid-N1 hard crash doing NFS V4 writes

Post by mlinuxguy »

elatllat wrote:If acting only as an NFS server, you would not need VPU support etc so you might? be able to boot 4.14 LTS from kernel.org to see if the issue was caused by linario;
I'll try that, I attempted to pull down later versions of the nic driver but too many changes to work on this kernel.

The N1 is actually a NFS client off of my NFS server using its 4TB disk for kernel builds on the N1

tkaiser
Posts: 781
Joined: Mon Nov 09, 2015 12:30 am
languages_spoken: english
ODROIDs: C1+, C2, XU4, HC1
Has thanked: 2 times
Been thanked: 25 times
Contact:

Re: Odroid-N1 hard crash doing NFS V4 writes

Post by tkaiser »

crashoverride wrote:I noticed the tx_delay in the N1 device tree is exaggerated (0x100). The same entry for RK3399 EVB is 0x28. I tested the 0x28 and 0x7f values, but the network will not even initialize.
Did anyone already try to 'learn' from KK3328? Should be the same GbE IP block, same kernel and maybe same problems?

On RK3328 in the beginning we ended up with this https://github.com/ayufan-rock64/linux- ... 64-offload and a month ago the result of testing for ideal TX/RX delay settings via https://github.com/ayufan-rock64/linux- ... elays-test resulted in new settings -- see 'gmac2io works stable only in thresh dma mode, and with 0x24/0x18 delays' and some commits below: https://github.com/ayufan-rock64/linux- ... elease-4.4

mlinuxguy
Posts: 842
Joined: Thu Feb 28, 2013 10:28 am
languages_spoken: english
ODROIDs: X, X2, XU, XU3, XU4, C1, C1+, C2, N1, USB-IO
Has thanked: 0
Been thanked: 0
Contact:

Re: Odroid-N1 hard crash doing NFS V4 writes

Post by mlinuxguy »

tkaiser wrote:
crashoverride wrote:I noticed the tx_delay in the N1 device tree is exaggerated (0x100). The same entry for RK3399 EVB is 0x28. I tested the 0x28 and 0x7f values, but the network will not even initialize.
Did anyone already try to 'learn' from KK3328? Should be the same GbE IP block, same kernel and maybe same problems?
I'm testing the 1st one right now

Code: Select all

root@odroid:~# ethtool -K eth0 rx on tx on ----> on
root@odroid:~# ethtool -K eth0 rx off tx off ----> off
root@odroid:~# diff on off
2,4c2,4
< rx-checksumming: on
< tx-checksumming: on
<       tx-checksum-ipv4: on
---
> rx-checksumming: off
> tx-checksumming: off
>       tx-checksum-ipv4: off
6c6
<       tx-checksum-ipv6: on
---
>       tx-checksum-ipv6: off
On the other item it looks like we will need to do similar test script to find the optimal value for the N1
=== currently in odroid-n1 device tree:

Code: Select all

    tx_delay = <0x100>;
    rx_delay = <0x11>;
=== 3328 device tree fix:

Code: Select all

+	tx_delay = <0x24>;
+	rx_delay = <0x18>;
I have no idea how HK determined what to set these values to. So it warrants testing

mlinuxguy
Posts: 842
Joined: Thu Feb 28, 2013 10:28 am
languages_spoken: english
ODROIDs: X, X2, XU, XU3, XU4, C1, C1+, C2, N1, USB-IO
Has thanked: 0
Been thanked: 0
Contact:

Re: Odroid-N1 hard crash doing NFS V4 writes

Post by mlinuxguy »

:o I just completed the first NFS share build of the linux kernel on the N1 :lol:
The tip of the hat goes to Tkaiser for his mention of their fixes required for the 3328 (up thread you can see his post)

Code: Select all

real    29m9.928s
user    108m10.535s
sys     14m43.612s
odroid@odroid:/net/linux$

The key is to disable check-summing by running this command:

Code: Select all

# ethtool -K eth0 rx off tx off
I think we should also attempt to find the optimal values for these values:

Code: Select all

=== currently in odroid-n1 device tree:
    tx_delay = <0x100>;
    rx_delay = <0x11>;
You can reference the range/single scripts that help arrive at optimal values here:
https://github.com/ayufan-rock64/linux- ... elays-test

tkaiser
Posts: 781
Joined: Mon Nov 09, 2015 12:30 am
languages_spoken: english
ODROIDs: C1+, C2, XU4, HC1
Has thanked: 2 times
Been thanked: 25 times
Contact:

Re: Odroid-N1 hard crash doing NFS V4 writes

Post by tkaiser »

mlinuxguy wrote:3328 device tree fix
It's not just adopting some DT values but ayufan added a bunch of patches from Martin Blumenstingl related to PHY (RTL8211F) behaviour prior to that. And I've to admit that I have not even checked which RTL8211 variant is on the N1 :D

And to test conveniently like ayufan for ideal tx/rx delays would require backporting 'DT overlays through configfs' patches too. But since Hardkernel's N1 kernel and ayufan's for ROCK64 have the same origin it should be easy to just throw in all his patches...

crashoverride
Posts: 5747
Joined: Tue Dec 30, 2014 8:42 pm
languages_spoken: english
ODROIDs: C1
Has thanked: 0
Been thanked: 590 times
Contact:

Re: [Sovled] Odroid-N1 hard crash doing NFS V4 writes

Post by crashoverride »

I don't think its been determined if:
1) Is this a hardware flaw affecting all RK33xx?
2) Is this a software flaw in the 4.4 BSP?
3) Is this a design flaw related to power?

Disabling checksumming and tweaking RX/TX delay suggests its more of "work-around" than a "fix".

tkaiser
Posts: 781
Joined: Mon Nov 09, 2015 12:30 am
languages_spoken: english
ODROIDs: C1+, C2, XU4, HC1
Has thanked: 2 times
Been thanked: 25 times
Contact:

Re: [Sovled] Odroid-N1 hard crash doing NFS V4 writes

Post by tkaiser »

crashoverride wrote:Disabling checksumming and tweaking RX/TX delay suggests its more of "work-around" than a "fix".
I agree wrt checksumming (there a fix would be the way better idea than the workaround we use now since ages on RK3328 since without checksum offloading CPU utilization is unnecessarily high).

But wrt TX/RX delays this is board specific and needs adjustments. For example we realized that Cloudmedia Transformer (which is to ROCK64 what HC1 is to XU4) needs different TX/RX delay adjustments due to different PCB trace routing.

Anyway: I would assume both ayufan's kernel fork for RK3328 and Hardkernel's for RK3399 are based on same RK 4.4 BSP. And ayufan recently pulled in a lot of patches related to RTL8211F PHY control (used on ROCK64 production boards and N1) and came up with better settings than before (IIRC +10-20 MB/s in both directions with 'real world' NAS workloads).

BTW: All these RTL8211F related patches by Martin Blumenstingl [1] were created on ODROID C2 :) He's the guy who did a lot of Amlogic related work for mainline kernel around Ethernet and USB. And funnily I met him in person few weeks ago. He owns now my former ODROID C1+ so maybe we see a bunch of patches improving stuff with S805 soon too :)

[1] see https://github.com/ayufan-rock64/linux- ... xdarklight or https://github.com/ayufan-rock64/linux- ... elease-4.4 and there the commits by 'xdarklight'

tkaiser
Posts: 781
Joined: Mon Nov 09, 2015 12:30 am
languages_spoken: english
ODROIDs: C1+, C2, XU4, HC1
Has thanked: 2 times
Been thanked: 25 times
Contact:

Re: [Sovled] Odroid-N1 hard crash doing NFS V4 writes

Post by tkaiser »

These commit should shed some light on one issue:

https://github.com/ayufan-rock64/linux- ... 1d6444f306
https://github.com/ayufan-rock64/linux- ... 3e1186b948

TL;DR: delays can be configured at both the MAC (SoC) and the PHY (RTL8211F) and when it's done wrong performance is trashed or no network connection can be established at all.

BTW: Similar issues wrt those 'delay related problems' are right now discussed over at Pine64 IRC. Check IRC log starting at 15:55: http://irc.pine64.uk

crashoverride
Posts: 5747
Joined: Tue Dec 30, 2014 8:42 pm
languages_spoken: english
ODROIDs: C1
Has thanked: 0
Been thanked: 590 times
Contact:

Re: [Sovled] Odroid-N1 hard crash doing NFS V4 writes

Post by crashoverride »

Disabling check-summing also appears to resolve my issue with VNC:

Code: Select all

sudo ethtool -K eth0 rx off tx off
[edit]
It also appears to correct the "strangeness" I reported in SSH sessions.

tkaiser
Posts: 781
Joined: Mon Nov 09, 2015 12:30 am
languages_spoken: english
ODROIDs: C1+, C2, XU4, HC1
Has thanked: 2 times
Been thanked: 25 times
Contact:

Re: [Sovled] Odroid-N1 hard crash doing NFS V4 writes

Post by tkaiser »

crashoverride wrote:Disabling check-summing also appears to resolve my issue with VNC ... It also appears to correct the "strangeness" I reported in SSH sessions.
Sure, known problem. I asked Xalius before and he meant that a patch for mainline kernel might be floating around that really solves the problem so the work-around is not needed any more (which would be great since then all GbE processing can be sent to little cores once checksum offloading correctly works -- at the moment with RK33xx and both BSP and mainline kernel we have either unrealiable Gigabit Ethernet or CPU utilization too high)

mdrjr
Site Admin
Posts: 11821
Joined: Fri Feb 22, 2013 11:34 pm
languages_spoken: english, portuguese
ODROIDs: -
Location: Brazil
Has thanked: 1 time
Been thanked: 52 times
Contact:

Re: [Solved] Odroid-N1 hard crash doing NFS V4 writes

Post by mdrjr »

Hi mlinuxguy,

Did you used Jumbo Frames on your testing? Or standard MTU? (1500) ?

mlinuxguy
Posts: 842
Joined: Thu Feb 28, 2013 10:28 am
languages_spoken: english
ODROIDs: X, X2, XU, XU3, XU4, C1, C1+, C2, N1, USB-IO
Has thanked: 0
Been thanked: 0
Contact:

Re: [Solved] Odroid-N1 hard crash doing NFS V4 writes

Post by mlinuxguy »

mdrjr wrote:Did you used Jumbo Frames on your testing? Or standard MTU? (1500) ?
Standard
I was unable to get jumbo to work on the current nic driver
ref: viewtopic.php?f=153&t=30167

tkaiser
Posts: 781
Joined: Mon Nov 09, 2015 12:30 am
languages_spoken: english
ODROIDs: C1+, C2, XU4, HC1
Has thanked: 2 times
Been thanked: 25 times
Contact:

Re: [Solved] Odroid-N1 hard crash doing NFS V4 writes

Post by tkaiser »

@mlinuxguy: If you run a quick iperf3 test between your NFS server and N1... do you get past 200 Mbits/sec in RX direction? It's just starting 'iperf3 -s' on the N1 and then 'iperf3 -c $n1-ip-address' on the NFS server. The default 10s duration should be ok. When I test the other direction (adding '-R' to iperf3 client call) then everything is fine but in RX direction at least my board sucks.

mlinuxguy
Posts: 842
Joined: Thu Feb 28, 2013 10:28 am
languages_spoken: english
ODROIDs: X, X2, XU, XU3, XU4, C1, C1+, C2, N1, USB-IO
Has thanked: 0
Been thanked: 0
Contact:

Re: [Solved] Odroid-N1 hard crash doing NFS V4 writes

Post by mlinuxguy »

tkaiser wrote:@mlinuxguy: If you run a quick iperf3 test between your NFS server and N1... do you get past 200 Mbits/sec in RX direction? It's just starting 'iperf3 -s' on the N1 and then 'iperf3 -c $n1-ip-address' on the NFS server.
Odroid-N1 receiving from NFS server
Note: I'm running a custom kernel with minor additions nothing related to network
root@odroid:~# iperf3 -s

Code: Select all

-----------------------------------------------------------
Server listening on 5201
-----------------------------------------------------------
Accepted connection from 192.168.1.121, port 39396
[  5] local 192.168.1.118 port 5201 connected to 192.168.1.121 port 39398
[ ID] Interval           Transfer     Bandwidth
[  5]   0.00-1.00   sec   108 MBytes   902 Mbits/sec
[  5]   1.00-2.00   sec   112 MBytes   941 Mbits/sec
[  5]   2.00-3.00   sec   112 MBytes   941 Mbits/sec
[  5]   3.00-4.00   sec   112 MBytes   941 Mbits/sec
[  5]   4.00-5.00   sec   112 MBytes   941 Mbits/sec
[  5]   5.00-6.00   sec   112 MBytes   941 Mbits/sec
[  5]   6.00-7.00   sec   112 MBytes   941 Mbits/sec
[  5]   7.00-8.00   sec   112 MBytes   941 Mbits/sec
[  5]   8.00-9.00   sec   112 MBytes   941 Mbits/sec
[  5]   9.00-10.00  sec   112 MBytes   941 Mbits/sec
[  5]  10.00-10.04  sec  4.31 MBytes   940 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth
[  5]   0.00-10.04  sec  0.00 Bytes  0.00 bits/sec                  sender
[  5]   0.00-10.04  sec  1.10 GBytes   937 Mbits/sec                  receiver
-----------------------------------------------------------
Note: I have this script that runs at boot

Code: Select all

root@odroid:~# more set_perf.sh
#!/bin/bash
echo performance > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
echo performance > /sys/devices/system/cpu/cpu4/cpufreq/scaling_governor
echo 1512000 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq
echo 1992000 > /sys/devices/system/cpu/cpu4/cpufreq/scaling_min_freq
echo 2 > /sys/class/net/eth0/queues/rx-0/rps_cpus
echo 3 > /sys/class/net/eth0/queues/tx-0/xps_cpus
sudo ethtool -K eth0 rx off tx off
echo performance | sudo tee /sys/module/pcie_aspm/parameters/policy
echo 20000 | sudo tee /sys/devices/virtual/thermal/thermal_zone0/trip_point_0_te
mp
The other direction test, N1 sending to NFS server
root@odroid:~# iperf3 -c 192.168.1.121

Code: Select all

Connecting to host 192.168.1.121, port 5201
[  4] local 192.168.1.118 port 51706 connected to 192.168.1.121 port 5201
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-1.00   sec   112 MBytes   939 Mbits/sec    0    351 KBytes
[  4]   1.00-2.00   sec   112 MBytes   942 Mbits/sec    0    351 KBytes
[  4]   2.00-3.00   sec   112 MBytes   941 Mbits/sec    0    351 KBytes
[  4]   3.00-4.00   sec   112 MBytes   942 Mbits/sec    0    366 KBytes
[  4]   4.00-5.00   sec   112 MBytes   941 Mbits/sec    0    366 KBytes
[  4]   5.00-6.00   sec   112 MBytes   941 Mbits/sec    0    366 KBytes
[  4]   6.00-7.00   sec   112 MBytes   942 Mbits/sec    0    383 KBytes
[  4]   7.00-8.00   sec   112 MBytes   941 Mbits/sec    0    383 KBytes
[  4]   8.00-9.00   sec   112 MBytes   941 Mbits/sec    0    383 KBytes
[  4]   9.00-10.00  sec   112 MBytes   941 Mbits/sec    0    383 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec  1.10 GBytes   941 Mbits/sec    0             sender
[  4]   0.00-10.00  sec  1.09 GBytes   940 Mbits/sec                  receiver

tkaiser
Posts: 781
Joined: Mon Nov 09, 2015 12:30 am
languages_spoken: english
ODROIDs: C1+, C2, XU4, HC1
Has thanked: 2 times
Been thanked: 25 times
Contact:

Re: [Solved] Odroid-N1 hard crash doing NFS V4 writes

Post by tkaiser »

Thank you. So something seems to be wrong with my board. I exchanged cables, switch and tried even a direct connection between MacBook and N1... in TX direction everything fine but RX remains below 200 Mbits/sec.

User avatar
joerg
Posts: 1723
Joined: Tue Apr 01, 2014 2:14 am
languages_spoken: german, english, español
ODROIDs: C1, C1+, C2, N1, N2, C4
Location: Germany
Has thanked: 152 times
Been thanked: 328 times
Contact:

Re: [Solved] Odroid-N1 hard crash doing NFS V4 writes

Post by joerg »

Only to confirm, I tested on my N1 with kernel as delivered:

Code: Select all

root@JW-NAS:~# iperf3 -s
-----------------------------------------------------------
Server listening on 5201
-----------------------------------------------------------
Accepted connection from 192.168.1.18, port 60834
[  5] local 192.168.1.15 port 5201 connected to 192.168.1.18 port 60836
[ ID] Interval           Transfer     Bandwidth
[  5]   0.00-1.00   sec   110 MBytes   926 Mbits/sec                  
[  5]   1.00-2.00   sec   112 MBytes   941 Mbits/sec                  
[  5]   2.00-3.00   sec   112 MBytes   941 Mbits/sec                  
[  5]   3.00-4.00   sec   112 MBytes   942 Mbits/sec                  
[  5]   4.00-5.00   sec   112 MBytes   941 Mbits/sec                  
[  5]   5.00-6.00   sec   112 MBytes   941 Mbits/sec                  
[  5]   6.00-7.00   sec   112 MBytes   942 Mbits/sec                  
[  5]   7.00-8.00   sec   112 MBytes   941 Mbits/sec                  
[  5]   8.00-9.00   sec   112 MBytes   942 Mbits/sec                  
[  5]   9.00-10.00  sec   112 MBytes   941 Mbits/sec                  
[  5]  10.00-10.02  sec  1.68 MBytes   929 Mbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth
[  5]   0.00-10.02  sec  0.00 Bytes  0.00 bits/sec                  sender
[  5]   0.00-10.02  sec  1.10 GBytes   940 Mbits/sec                  receiver
and:

Code: Select all

root@JW-NAS:~# iperf3 -c 192.168.1.18
Connecting to host 192.168.1.18, port 5201
[  4] local 192.168.1.15 port 33774 connected to 192.168.1.18 port 5201
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-1.00   sec   113 MBytes   947 Mbits/sec    0    434 KBytes       
[  4]   1.00-2.00   sec   112 MBytes   941 Mbits/sec    0    434 KBytes       
[  4]   2.00-3.00   sec   112 MBytes   942 Mbits/sec    0    457 KBytes       
[  4]   3.00-4.00   sec   112 MBytes   942 Mbits/sec    0    482 KBytes       
[  4]   4.00-5.00   sec   112 MBytes   941 Mbits/sec    0    482 KBytes       
[  4]   5.00-6.00   sec   112 MBytes   942 Mbits/sec    0    482 KBytes       
[  4]   6.00-7.00   sec   112 MBytes   942 Mbits/sec    0    482 KBytes       
[  4]   7.00-8.00   sec   112 MBytes   940 Mbits/sec    0    482 KBytes       
[  4]   8.00-9.00   sec   112 MBytes   942 Mbits/sec    0    482 KBytes       
[  4]   9.00-10.00  sec   112 MBytes   941 Mbits/sec    0    482 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec  1.10 GBytes   942 Mbits/sec    0             sender
[  4]   0.00-10.00  sec  1.10 GBytes   941 Mbits/sec                  receiver

iperf Done.

tkaiser
Posts: 781
Joined: Mon Nov 09, 2015 12:30 am
languages_spoken: english
ODROIDs: C1+, C2, XU4, HC1
Has thanked: 2 times
Been thanked: 25 times
Contact:

Re: [Solved] Odroid-N1 hard crash doing NFS V4 writes

Post by tkaiser »

joerg wrote:Only to confirm, I tested on my N1 with kernel as delivered
Thank you too. Then it's obviously just me and tx/rx delay settings are ok already.

Post Reply

Return to “Ubuntu/Debian”

Who is online

Users browsing this forum: No registered users and 2 guests