Unstable HDD(USB3.0) on Cloudshell w/ kernel 4.9

Test and fix the Kernel 4.9 features

Moderators: mdrjr, odroid

Re: Unstable HDD(USB3.0) on Cloudshell w/ kernel 4.9

Unread postby crazyquark » Thu Jul 13, 2017 2:07 pm

Yeah, I think so too. On kernel 3.10 things were mostly stable.
It did happen on 3.10 as well but it was rare and I think, like I said, it started happening after I switched my network to gigabit.
On 3.10 I could workaround it by limiting the download speed on qBittorrent to 1MB/s. But it did happen as well with this HDD, just not very often.
crazyquark
 
Posts: 176
Joined: Thu Jan 15, 2015 4:22 pm
languages_spoken: english, french, romanian
ODROIDs: C1,C1+,C2,XU4

Re: Unstable HDD(USB3.0) on Cloudshell w/ kernel 4.9

Unread postby bronco » Thu Jul 13, 2017 3:17 pm

odroid wrote:We will do "stress" command test with Samba transfer in parallel.


You as the manufacturer/engineer could do better: start to measure (since you might understand Ohm's law). If it's a voltage drop problem then it's easy to reproduce. And stress is lightweight, better use cpuminer. I recommended to @crazyquark to test with stress since he's able to trigger the well known and old
Code: Select all
usb 4-1.2: reset SuperSpeed USB device number 3 using xhci-hcd

problem with rather light workloads (after switching to Gigabit Ethernet which results in an overall higher board consumption). He should be able to nail the problem down without network being involved with the script from yesterday. But as usual people prefer developing theories over testing ;)

odroid wrote:there have been very few stability issues reported while we've shipped over 5 thousands of CloudShell kits in the past one year.
So I guess it could be related to the Kernel 4.9 software issue probably even we don't know what it is exactly.


You have reports that contain already the solution: viewtopic.php?f=97&t=16912#p110383

PSUs as all other electronics also suffer from aging effects so it's pretty normal that these issues will be more over time. And it's also easy for you to check whether a correlation with kernel 4.9 and higher failure rates is real or not. Grab a powermeter, grab a multimeter, run the same tests with both kernels and draw a nice chart with overall consumption and voltage drops if related.

If 4.9 is more efficient (likely) and if the problem is related to voltage drops (very likely) then of course it will be triggered with 4.9 more often. But it's still Ohm's law and not software.
bronco
 
Posts: 18
Joined: Tue Jul 11, 2017 2:58 pm
languages_spoken: english

Re: Unstable HDD(USB3.0) on Cloudshell w/ kernel 4.9

Unread postby odroid » Thu Jul 13, 2017 3:45 pm

I agree.
We need to prove whether the bus reset issue appears only when the voltage-drop happens or not.
User avatar
odroid
Site Admin
 
Posts: 23625
Joined: Fri Feb 22, 2013 11:14 pm
languages_spoken: English
ODROIDs: ODROID

Re: Unstable HDD(USB3.0) on Cloudshell w/ kernel 4.9

Unread postby reza » Sat Jul 15, 2017 9:14 pm

as I have said before when I use an external hdd with is powered by odroid usb3 port and do the test I still get the bus reset errors but the hdd doesn't go offline. so maybe the cloudshell is not the main issue but it exaggerates it.
reza
 
Posts: 43
Joined: Tue Mar 15, 2016 3:40 am
languages_spoken: english
ODROIDs: xu4

Re: Unstable HDD(USB3.0) on Cloudshell w/ kernel 4.9

Unread postby odroid » Sun Jul 16, 2017 11:01 am

Official Ubuntu 16.04 & Kernel 4.9 on eMMC with official 5V/4A PSU.
2TB HDD is connected to the old CloudShell.
Running "stress" to use all 8 cores.
Keep copying 10GB file from/to Windows PC in parallel (two Samba instances)
Keep copying a big file from/to eMMC to/from HDD in parallel.
We've run above test for 3hrs 44min. There is no USB reset issue yet.
Image

The DMM measured voltage on the DC jack is 5.1Volt and average load is 2.54Amp.
HDD SATA power pin shows 4.76~ 4.89 Volt.
We will keep running this test for 24 hours more.

We will perform the same test with the OMV image soon.

I think we can share the test result on Monday or Tuesday because it is already Friday PM 5:00 in Korea.
User avatar
odroid
Site Admin
 
Posts: 23625
Joined: Fri Feb 22, 2013 11:14 pm
languages_spoken: English
ODROIDs: ODROID

Re: Unstable HDD(USB3.0) on Cloudshell w/ kernel 4.9

Unread postby odroid » Sun Jul 16, 2017 1:44 pm

reza wrote:as I have said before when I use an external hdd with is powered by odroid usb3 port and do the test I still get the bus reset errors but the hdd doesn't go offline. so maybe the cloudshell is not the main issue but it exaggerates it.


Can you tell me the brand/model name of the external HDD?
We found a slightly old Hitachi Travelstar 1TB HDD which has very high in-rush current. We will test it soon.
User avatar
odroid
Site Admin
 
Posts: 23625
Joined: Fri Feb 22, 2013 11:14 pm
languages_spoken: English
ODROIDs: ODROID

Re: Unstable HDD(USB3.0) on Cloudshell w/ kernel 4.9

Unread postby reza » Sun Jul 16, 2017 2:11 pm

adata 2tb
reza
 
Posts: 43
Joined: Tue Mar 15, 2016 3:40 am
languages_spoken: english
ODROIDs: xu4

Re: Unstable HDD(USB3.0) on Cloudshell w/ kernel 4.9

Unread postby crazyquark » Sun Jul 16, 2017 3:19 pm

Nice stress test, odd that you cannot reproduce it. On my end it usually happened while I was also reading from the disk, like I said before, while streaming a movie via NFS while all the writing was also taking place.

I accidentally reproduce it on the cloudshell 2 as well but with a very different setup :)

OK, so this weekend I upgraded to cloudshell 2. My setup is a bit complicated but the important part is:
I connected the old HDD to the second USB 3.0 port via an external HDD dock; now this dock was picked up by the usb-storage driver while the cloudshell2 disks were running via uas.
I started copying my old disk to my new setup via 'rsync'. It all worked fine until I started also streaming a movie via NFS... that's when the old HDD connected via usb-storage started getting reset errors and it ended up crashing my uas disk.
So the key ingredients seem to be:
- usb storage
- reading via NFS while also writing
The writing was happening at around 60MB/s

Now, not being able to copy any further my data, I kept getting crashes, I moved my old HDD to a newer enclosure that I knew had better support. And guess what, this time the HDD was picked up by the uas driver and I was able to complete my rsync without a hitch!

Bottom line:
- use NFS for reading and write data from a local process for testing
- make sure the usb-storage driver is used
Last edited by crazyquark on Sun Jul 16, 2017 3:27 pm, edited 1 time in total.
crazyquark
 
Posts: 176
Joined: Thu Jan 15, 2015 4:22 pm
languages_spoken: english, french, romanian
ODROIDs: C1,C1+,C2,XU4

Re: Unstable HDD(USB3.0) on Cloudshell w/ kernel 4.9

Unread postby crazyquark » Sun Jul 16, 2017 3:26 pm

Sorry for the previous long post. I want to propose an alternative testing strategy:

On Odroid XU4 w/ Cloudshell1 and 2TB disk attached:
- Download a large file via FTP let's say from a remote host at speeds over 2-3MB/s at least.
- On a Linux host, copy a file from Odroid via NFS
crazyquark
 
Posts: 176
Joined: Thu Jan 15, 2015 4:22 pm
languages_spoken: english, french, romanian
ODROIDs: C1,C1+,C2,XU4

Re: Unstable HDD(USB3.0) on Cloudshell w/ kernel 4.9

Unread postby studioai » Sun Jul 16, 2017 10:56 pm

crazyquark wrote:Sorry for the previous long post. I want to propose an alternative testing strategy:

On Odroid XU4 w/ Cloudshell1 and 2TB disk attached:
- Download a large file via FTP let's say from a remote host at speeds over 2-3MB/s at least.
- On a Linux host, copy a file from Odroid via NFS


I'm experiencing same issue, In my case it occurs when I download files with transmission daemon.

when download or upload speed over 5mb/s USB drive resets.
studioai
 
Posts: 6
Joined: Thu Apr 21, 2016 10:43 am
languages_spoken: english
ODROIDs: xu4

Re: Unstable HDD(USB3.0) on Cloudshell w/ kernel 4.9

Unread postby crazyquark » Mon Jul 17, 2017 5:37 pm

I use qBittorrent, same, the HDD resets or even worse, it crashes when downloads exceed a certain speed.
crazyquark
 
Posts: 176
Joined: Thu Jan 15, 2015 4:22 pm
languages_spoken: english, french, romanian
ODROIDs: C1,C1+,C2,XU4

Re: Unstable HDD(USB3.0) on Cloudshell w/ kernel 4.9

Unread postby odroid » Mon Jul 17, 2017 6:27 pm

We performed further tests with various HDDs what we acquired recently and a few different input voltages.
We found that Seagate 2TB and HGST/Hitachi 1TB HDDs are quite sensitive to the input voltage level while WD 1TB/500GB, Samsung Momentum and Toshiba HDDs are working well even with 4.4Volt input.
The Seagate and HGST HDDs are generating the bus reset error when the voltage on the SATA port is lower than 4.7Volt.
It is the main reason why we couldn't reproduce the issue for a long time.
I think the disk input voltage tolerance seems to be different from each model of HDDs even with the same brand or manufacturer.
So please let us know your HDD brand and model name to confirm our test result.

BTW, the typical output voltage on our 5V/4A and 5V/6A PSU is 5.25Volt at light load. When the load is very high, it can be down to 5.05volt.
But some of them can be lower than 5Volt and the SATA port voltage can be lower than 4.7Volt due to the variant characteristics of each PSU.
We could observe 200~400mV of voltage drops by the resistance of FETs in the protection ICs as well as cables/connectors.
It is very worth to measure the voltage of VBUS on the USB connector if you have a DMM.

Other major root causes : There are two protection ICs on the XU4 board.
The first one is NCP372 which is placed near the DC jack to block high-voltage, low-voltage and reverse voltage from the DC plug.
The second one is NCP380 which is placed near the USB 3.0 ports for controlling the load current of the USB devices.
Refer the full schematics of XU4. https://dn.odroid.com/5422/ODROID-XU4/S ... OT1606.pdf

We will try to find a way to lower the resistance such as sharing the USB 3.0 VBUS or bypassing the protection ICs.
For example, @phaseshifter's approach.
viewtopic.php?f=99&t=25813#p181268

Other than that, we found few "S.M.A.R.T." commands could cause the bus reset error when the HDD doesn't support the proper command due to its old ATA firmware version.
But this one is not directly related to this thread.
User avatar
odroid
Site Admin
 
Posts: 23625
Joined: Fri Feb 22, 2013 11:14 pm
languages_spoken: English
ODROIDs: ODROID

Re: Unstable HDD(USB3.0) on Cloudshell w/ kernel 4.9

Unread postby bronco » Mon Jul 17, 2017 8:55 pm

odroid wrote:I think the disk input voltage tolerance seems to be different from each model of HDDs even with the same brand or manufacturer.


Of course :)

But that's only part of the problem since we're still talking about Ohm's law being valid everywhere. So users with a 5V/6A PSU with power lines thinner than yours will experience these voltage drops earlier since voltage available at XU4's DC-IN jack is already lower under load.

BTW: Threads like these viewtopic.php?f=146&t=26121#p184891 are also perfect candidates to check on layer 0 (hardware, voltage, DMM) first. If the whole setup suffers from huge voltage drops and there's a disk that tolerates low voltages might it be possible that in full load situations RTL8153 disappears since being also affected by under-voltage?
bronco
 
Posts: 18
Joined: Tue Jul 11, 2017 2:58 pm
languages_spoken: english

Re: Unstable HDD(USB3.0) on Cloudshell w/ kernel 4.9

Unread postby studioai » Tue Jul 18, 2017 3:01 pm

I ordered a voltage regulator circuit board and usb 3.0 male and female port.
I'll combine those components and attach to the usb port on XU4.
I'll report the result.
studioai
 
Posts: 6
Joined: Thu Apr 21, 2016 10:43 am
languages_spoken: english
ODROIDs: xu4

Re: Unstable HDD(USB3.0) on Cloudshell w/ kernel 4.9

Unread postby odroid » Tue Jul 18, 2017 3:10 pm

bronco wrote:If the whole setup suffers from huge voltage drops and there's a disk that tolerates low voltages might it be possible that in full load situations RTL8153 disappears since being also affected by under-voltage?

I don't think so because all the power rails on RTL8153 uses only 3.3Volt.
After applying the RTL815X patch with Kernel 4.9.33 update, the random ETH disappearing issue was gone a few weeks ago.
https://git.kernel.org/pub/scm/linux/ke ... ?h=v4.9.33
Our latest kernel package has 4.9.37 and we will release a new package with 4.9.38 very soon.
User avatar
odroid
Site Admin
 
Posts: 23625
Joined: Fri Feb 22, 2013 11:14 pm
languages_spoken: English
ODROIDs: ODROID

Previous

Return to Linux Kernel 4.9 Debugging Party

Who is online

Users browsing this forum: No registered users and 2 guests