Odroid HC2 - SATA might be failing?

Post Reply
Morlan
Posts: 9
Joined: Sat Jul 03, 2021 8:26 pm
languages_spoken: english
ODROIDs: 3x Odroid HC2
Has thanked: 1 time
Been thanked: 0
Contact:

Odroid HC2 - SATA might be failing?

Post by Morlan »

Hello Forum!

For the last 2 years I've been using 3 Odroid HC2's as my data servers (main server, backup, offsite backup). Every machine has connected hard drives from different vendors (Samsung, Seagate, WD). All of the hard drives are btrfs formatted. In the last 3 weeks 2 of the 3 machines started reporting read/write IO errors and were repeatedly switched into readonly mode. A btrfs scrub on my main machine threw a massive amount of errors. First I thought my drives were failing. But when I took the Samsung SDD from my main server and attached it to a RPi4 via an USB-to-SATA adapter the errors disappears and the btrfs scrub finished without errors.
A memtest on the main server reported no errors. This lead me to the theory that the SATA-bridge of the HC's might be failing. Especially since it's two machines, which were purchased around the same time.

Can anyone has an idea how to confirm or reject my theory? Or has someone a another explanation for the problem?

Thanks in advance!
Morlan

User avatar
odroid
Site Admin
Posts: 37753
Joined: Fri Feb 22, 2013 11:14 pm
languages_spoken: English, Korean
ODROIDs: ODROID
Has thanked: 1910 times
Been thanked: 1184 times
Contact:

Re: Odroid HC2 - SATA might be failing?

Post by odroid »

Can you access S.M.A.R.T. data from the HDDs via sudo smartctl -a /dev/sda -d sat command to narrow down root causes?

Morlan
Posts: 9
Joined: Sat Jul 03, 2021 8:26 pm
languages_spoken: english
ODROIDs: 3x Odroid HC2
Has thanked: 1 time
Been thanked: 0
Contact:

Re: Odroid HC2 - SATA might be failing?

Post by Morlan »

This is the ssd from my main server:

Code: Select all

=== START OF INFORMATION SECTION ===
Model Family:     Samsung based SSDs
Device Model:     Samsung SSD 860 EVO 1TB
Serial Number:    S4X6NF0N303147Z
LU WWN Device Id: 5 002538 e903261af
Firmware Version: RVT04B6Q
User Capacity:    1.000.204.886.016 bytes [1,00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
TRIM Command:     Available, deterministic, zeroed
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-4 T13/BSR INCITS 529 revision 5
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Mon Jul  5 09:36:26 2021 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  9 Power_On_Hours          0x0032   098   098   000    Old_age   Always       -       6899
 12 Power_Cycle_Count       0x0032   099   099   000    Old_age   Always       -       42
177 Wear_Leveling_Count     0x0013   098   098   000    Pre-fail  Always       -       19
179 Used_Rsvd_Blk_Cnt_Tot   0x0013   100   100   010    Pre-fail  Always       -       0
181 Program_Fail_Cnt_Total  0x0032   100   100   010    Old_age   Always       -       0
182 Erase_Fail_Count_Total  0x0032   100   100   010    Old_age   Always       -       0
183 Runtime_Bad_Block       0x0013   100   100   010    Pre-fail  Always       -       0
187 Uncorrectable_Error_Cnt 0x0032   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0032   040   040   000    Old_age   Always       -       60
195 ECC_Error_Rate          0x001a   200   200   000    Old_age   Always       -       0
199 CRC_Error_Count         0x003e   100   100   000    Old_age   Always       -       0
235 POR_Recovery_Count      0x0012   099   099   000    Old_age   Always       -       40
241 Total_LBAs_Written      0x0032   099   099   000    Old_age   Always       -       35951111232
I can't access the SMART values of the 2nd disk right now, because its offsite at my moms place ;)

Morlan
Posts: 9
Joined: Sat Jul 03, 2021 8:26 pm
languages_spoken: english
ODROIDs: 3x Odroid HC2
Has thanked: 1 time
Been thanked: 0
Contact:

Re: Odroid HC2 - SATA might be failing?

Post by Morlan »

An argument against HDD failure was, that it happened to two different drives (ssd and rust) in a time wise close proximity. And also that the ssd is working fine now connected to another device.

User avatar
odroid
Site Admin
Posts: 37753
Joined: Fri Feb 22, 2013 11:14 pm
languages_spoken: English, Korean
ODROIDs: ODROID
Has thanked: 1910 times
Been thanked: 1184 times
Contact:

Re: Odroid HC2 - SATA might be failing?

Post by odroid »

Did you run 'smartctl' command on the HC2?

Morlan
Posts: 9
Joined: Sat Jul 03, 2021 8:26 pm
languages_spoken: english
ODROIDs: 3x Odroid HC2
Has thanked: 1 time
Been thanked: 0
Contact:

Re: Odroid HC2 - SATA might be failing?

Post by Morlan »

No on my Pi4 actually, because the disk is now attached to it.

Morlan
Posts: 9
Joined: Sat Jul 03, 2021 8:26 pm
languages_spoken: english
ODROIDs: 3x Odroid HC2
Has thanked: 1 time
Been thanked: 0
Contact:

Re: Odroid HC2 - SATA might be failing?

Post by Morlan »

Does it make a difference if its is run on the Pi4 instead of the HC2? Its about the hdd, right?

User avatar
odroid
Site Admin
Posts: 37753
Joined: Fri Feb 22, 2013 11:14 pm
languages_spoken: English, Korean
ODROIDs: ODROID
Has thanked: 1910 times
Been thanked: 1184 times
Contact:

Re: Odroid HC2 - SATA might be failing?

Post by odroid »

I just wanted to know if the SATA interface on the HC2 board was working or not.
"lsusb" and "lsusb -t" outputs will show us some clues too.

Morlan
Posts: 9
Joined: Sat Jul 03, 2021 8:26 pm
languages_spoken: english
ODROIDs: 3x Odroid HC2
Has thanked: 1 time
Been thanked: 0
Contact:

Re: Odroid HC2 - SATA might be failing?

Post by Morlan »

Code: Select all

root@odroidxu4:~# lsusb
Bus 006 Device 002: ID 0bda:8153 Realtek Semiconductor Corp. RTL8153 Gigabit Ethernet Adapter
Bus 006 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 005 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 004 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 003 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 002 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub

Code: Select all

root@odroidxu4:~# lsusb -t
/:  Bus 06.Port 1: Dev 1, Class=root_hub, Driver=xhci-hcd/1p, 5000M
    |__ Port 1: Dev 2, If 0, Class=Vendor Specific Class, Driver=r8152, 5000M
/:  Bus 05.Port 1: Dev 1, Class=root_hub, Driver=xhci-hcd/1p, 480M
/:  Bus 04.Port 1: Dev 1, Class=root_hub, Driver=xhci-hcd/1p, 5000M
/:  Bus 03.Port 1: Dev 1, Class=root_hub, Driver=xhci-hcd/1p, 480M
/:  Bus 02.Port 1: Dev 1, Class=root_hub, Driver=exynos-ohci/3p, 12M
/:  Bus 01.Port 1: Dev 1, Class=root_hub, Driver=exynos-ehci/3p, 480M
This now without a disk attached.

Morlan
Posts: 9
Joined: Sat Jul 03, 2021 8:26 pm
languages_spoken: english
ODROIDs: 3x Odroid HC2
Has thanked: 1 time
Been thanked: 0
Contact:

Re: Odroid HC2 - SATA might be failing?

Post by Morlan »

Ok, now with a disk attached:

Code: Select all

lsusb
Bus 006 Device 002: ID 0bda:8153 Realtek Semiconductor Corp. RTL8153 Gigabit Ethernet Adapter
Bus 006 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 005 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 004 Device 002: ID 152d:0578 JMicron Technology Corp. / JMicron USA Technology Corp. JMS567 SATA 6Gb/s bridge
Bus 004 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 003 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 002 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub

Code: Select all

lsusb -t
/:  Bus 06.Port 1: Dev 1, Class=root_hub, Driver=xhci-hcd/1p, 5000M
    |__ Port 1: Dev 2, If 0, Class=Vendor Specific Class, Driver=r8152, 5000M
/:  Bus 05.Port 1: Dev 1, Class=root_hub, Driver=xhci-hcd/1p, 480M
/:  Bus 04.Port 1: Dev 1, Class=root_hub, Driver=xhci-hcd/1p, 5000M
    |__ Port 1: Dev 2, If 0, Class=Mass Storage, Driver=uas, 5000M
/:  Bus 03.Port 1: Dev 1, Class=root_hub, Driver=xhci-hcd/1p, 480M
/:  Bus 02.Port 1: Dev 1, Class=root_hub, Driver=exynos-ohci/3p, 12M
/:  Bus 01.Port 1: Dev 1, Class=root_hub, Driver=exynos-ehci/3p, 480M

User avatar
odroid
Site Admin
Posts: 37753
Joined: Fri Feb 22, 2013 11:14 pm
languages_spoken: English, Korean
ODROIDs: ODROID
Has thanked: 1910 times
Been thanked: 1184 times
Contact:

Re: Odroid HC2 - SATA might be failing?

Post by odroid »

The SATA bridge controller seems to be detected well at least.

Code: Select all

/:  Bus 04.Port 1: Dev 1, Class=root_hub, Driver=xhci-hcd/1p, 5000M
    |__ Port 1: Dev 2, If 0, Class=Mass Storage, Driver=uas, 5000M
Can you see /dev/sda* node?
Also check dmesg output to find a clue.

Morlan
Posts: 9
Joined: Sat Jul 03, 2021 8:26 pm
languages_spoken: english
ODROIDs: 3x Odroid HC2
Has thanked: 1 time
Been thanked: 0
Contact:

Re: Odroid HC2 - SATA might be failing?

Post by Morlan »

Code: Select all

ls -la /dev/ | grep sd
brw-rw----  1 root disk      8,   0 Jul  8 07:00 sda
brw-rw----  1 root disk      8,   1 Jul  8 07:00 sda1
dmesg looks fine to me:

Code: Select all

usb 4-1: new SuperSpeed USB device number 2 using xhci-hcd
[   17.967902] usb 4-1: New USB device found, idVendor=152d, idProduct=0578
[   17.967917] usb 4-1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[   17.967926] usb 4-1: Product: USB to SATA bridge
[   17.967933] usb 4-1: Manufacturer: JMicron
[   17.967940] usb 4-1: SerialNumber: DB00000000013B
...
[   18.308866] sd 0:0:0:0: Attached scsi generic sg0 type 0
[   18.309457] sd 0:0:0:0: [sda] 976773168 512-byte logical blocks: (500 GB/466 GiB)
[   18.309463] sd 0:0:0:0: [sda] 4096-byte physical blocks
[   18.309673] sd 0:0:0:0: [sda] Write Protect is off
[   18.309679] sd 0:0:0:0: [sda] Mode Sense: 53 00 00 08
[   18.310083] sd 0:0:0:0: [sda] Disabling FUA
[   18.310090] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[   18.310689] sd 0:0:0:0: [sda] Optimal transfer size 33553920 bytes not a multiple of physical block size (4096 bytes)
These are parts of dmesg, when the original drive became remounted readonly and reported massive errors on scrub (which were gone when plugged it into another device):

Code: Select all

[ 1148.717824] sd 0:0:0:0: Device offlined - not ready after error recovery
[ 1148.717829] sd 0:0:0:0: Device offlined - not ready after error recovery
[ 1148.717835] sd 0:0:0:0: Device offlined - not ready after error recovery
[ 1148.757648] blk_update_request: I/O error, dev sda, sector 98030079 op 0x0:(READ) flags 0x4000 phys_seg 128 prio class 0
[ 1148.757714] blk_update_request: I/O error, dev sda, sector 98023679 op 0x0:(READ) flags 0x0 phys_seg 32 prio class 0
[ 1148.757743] blk_update_request: I/O error, dev sda, sector 98024959 op 0x0:(READ) flags 0x0 phys_seg 32 prio class 0
[ 1148.757768] blk_update_request: I/O error, dev sda, sector 98031871 op 0x0:(READ) flags 0x0 phys_seg 32 prio class 0
[ 1148.757794] blk_update_request: I/O error, dev sda, sector 98025215 op 0x0:(READ) flags 0x0 phys_seg 128 prio class 0
[ 1148.757845] blk_update_request: I/O error, dev sda, sector 98031103 op 0x0:(READ) flags 0x0 phys_seg 32 prio class 0
[ 1148.757868] blk_update_request: I/O error, dev sda, sector 98031359 op 0x0:(READ) flags 0x0 phys_seg 64 prio class 0
[ 1148.757896] blk_update_request: I/O error, dev sda, sector 98034175 op 0x0:(READ) flags 0x0 phys_seg 32 prio class 0
[ 1148.757916] blk_update_request: I/O error, dev sda, sector 98033919 op 0x0:(READ) flags 0x0 phys_seg 32 prio class 0
[ 1148.757936] blk_update_request: I/O error, dev sda, sector 98035455 op 0x0:(READ) flags 0x0 phys_seg 96 prio class 0
[ 1148.758025] BTRFS error (device sda1): bdev /dev/sda1 errs: wr 0, rd 1, flush 0, corrupt 0, gen 0
[ 1148.758388] BTRFS error (device sda1): bdev /dev/sda1 errs: wr 1, rd 1, flush 0, corrupt 0, gen 0
[ 1148.758448] BTRFS error (device sda1): bdev /dev/sda1 errs: wr 2, rd 1, flush 0, corrupt 0, gen 0
[ 1148.758481] BTRFS error (device sda1): bdev /dev/sda1 errs: wr 3, rd 1, flush 0, corrupt 0, gen 0
[ 1148.758515] BTRFS error (device sda1): bdev /dev/sda1 errs: wr 4, rd 1, flush 0, corrupt 0, gen 0
[ 1148.758535] BTRFS error (device sda1): bdev /dev/sda1 errs: wr 4, rd 2, flush 0, corrupt 0, gen 0
[ 1148.758612] BTRFS error (device sda1): bdev /dev/sda1 errs: wr 4, rd 3, flush 0, corrupt 0, gen 0
[ 1148.758641] BTRFS error (device sda1): bdev /dev/sda1 errs: wr 5, rd 3, flush 0, corrupt 0, gen 0
[ 1148.758663] BTRFS error (device sda1): bdev /dev/sda1 errs: wr 6, rd 3, flush 0, corrupt 0, gen 0
[ 1148.758683] BTRFS error (device sda1): bdev /dev/sda1 errs: wr 7, rd 3, flush 0, corrupt 0, gen 0
[ 1148.760944] BTRFS error (device sda1): unable to fixup (regular) error at logical 50158764032 on dev /dev/sda1
[ 1148.760948] BTRFS error (device sda1): unable to fixup (regular) error at logical 50158108672 on dev /dev/sda1
[ 1148.760983] BTRFS error (device sda1): unable to fixup (regular) error at logical 50158768128 on dev /dev/sda1
[ 1148.761009] BTRFS error (device sda1): unable to fixup (regular) error at logical 50158772224 on dev /dev/sda1
[ 1148.761037] BTRFS error (device sda1): unable to fixup (regular) error at logical 50158776320 on dev /dev/sda1
[ 1148.761062] BTRFS error (device sda1): unable to fixup (regular) error at logical 50158780416 on dev /dev/sda1
[ 1148.761083] BTRFS error (device sda1): unable to fixup (regular) error at logical 50158784512 on dev /dev/sda1
[ 1148.761106] BTRFS error (device sda1): unable to fixup (regular) error at logical 50158788608 on dev /dev/sda1
[ 1148.761123] BTRFS error (device sda1): unable to fixup (regular) error at logical 50158239744 on dev /dev/sda1
[ 1149.037330] BTRFS: error (device sda1) in btrfs_commit_transaction:2281: errno=-5 IO failure (Error while writing out transaction)
[ 1149.037338] BTRFS info (device sda1): forced readonly
[ 1149.037350] BTRFS warning (device sda1): Skipping commit of aborted transaction.
[ 1149.037356] BTRFS: error (device sda1) in cleanup_transaction:1833: errno=-5 IO failure
[ 1149.037364] BTRFS info (device sda1): delayed_refs has NO entry
[ 1149.058585] BTRFS info (device sda1): scrub: not finished on devid 1 with status: -125
[ 1149.277811] sd 0:0:0:0: [sda] Synchronize Cache(10) failed: Result: hostbyte=0x07 driverbyte=0x00
[ 1149.310438] xhci-hcd xhci-hcd.7.auto: WARN Can't disable streams for endpoint 0x82, streams are being disabled already
[ 1163.920784] btrfs_dev_stat_print_on_error: 38975 callbacks suppressed
[ 1163.920815] BTRFS error (device sda1): bdev /dev/sda1 errs: wr 88, rd 46410, flush 0, corrupt 0, gen 0
[ 1163.920897] BTRFS error (device sda1): bdev /dev/sda1 errs: wr 89, rd 46410, flush 0, corrupt 0, gen 0
[ 1163.920953] BTRFS error (device sda1): bdev /dev/sda1 errs: wr 90, rd 46410, flush 0, corrupt 0, gen 0
[ 1163.920999] BTRFS error (device sda1): bdev /dev/sda1 errs: wr 91, rd 46410, flush 0, corrupt 0, gen 0
[ 1163.921048] BTRFS error (device sda1): bdev /dev/sda1 errs: wr 92, rd 46410, flush 0, corrupt 0, gen 0
[ 1163.921102] BTRFS error (device sda1): bdev /dev/sda1 errs: wr 93, rd 46410, flush 0, corrupt 0, gen 0
[ 1163.921151] BTRFS error (device sda1): bdev /dev/sda1 errs: wr 94, rd 46410, flush 0, corrupt 0, gen 0
[ 1311.762947] BTRFS error (device sda1): bdev /dev/sda1 errs: wr 94, rd 46411, flush 0, corrupt 0, gen 0
[ 1311.763253] BTRFS error (device sda1): bdev /dev/sda1 errs: wr 94, rd 46412, flush 0, corrupt 0, gen 0
[ 1311.785904] BTRFS error (device sda1): bdev /dev/sda1 errs: wr 94, rd 46413, flush 0, corrupt 0, gen 0
[ 1311.786284] BTRFS error (device sda1): bdev /dev/sda1 errs: wr 94, rd 46414, flush 0, corrupt 0, gen 0
[ 1311.789370] BTRFS error (device sda1): bdev /dev/sda1 errs: wr 94, rd 46415, flush 0, corrupt 0, gen 0
[ 1311.789668] BTRFS error (device sda1): bdev /dev/sda1 errs: wr 94, rd 46416, flush 0, corrupt 0, gen 0
[ 1695.688968] BTRFS error (device sda1): bdev /dev/sda1 errs: wr 95, rd 46416, flush 0, corrupt 0, gen 0
[ 1714.474926] BTRFS error (device sda1): bdev /dev/sda1 errs: wr 95, rd 46417, flush 0, corrupt 0, gen 0
[ 1714.475609] BTRFS error (device sda1): bdev /dev/sda1 errs: wr 95, rd 46418, flush 0, corrupt 0, gen 0
[ 1794.600684] BTRFS error (device sda1): bdev /dev/sda1 errs: wr 95, rd 46419, flush 0, corrupt 0, gen 0
[ 1794.600964] BTRFS error (device sda1): bdev /dev/sda1 errs: wr 95, rd 46420, flush 0, corrupt 0, gen 0
[ 1794.601126] BTRFS error (device sda1): bdev /dev/sda1 errs: wr 95, rd 46421, flush 0, corrupt 0, gen 0
[ 1794.641281] BTRFS error (device sda1): bdev /dev/sda1 errs: wr 95, rd 46422, flush 0, corrupt 0, gen 0
[ 1794.641420] BTRFS error (device sda1): bdev /dev/sda1 errs: wr 95, rd 46423, flush 0, corrupt 0, gen 0
[ 1794.645351] BTRFS error (device sda1): bdev /dev/sda1 errs: wr 95, rd 46424, flush 0, corrupt 0, gen 0
[ 1794.645502] BTRFS error (device sda1): bdev /dev/sda1 errs: wr 95, rd 46425, flush 0, corrupt 0, gen 0
[ 1794.645889] BTRFS error (device sda1): bdev /dev/sda1 errs: wr 95, rd 46426, flush 0, corrupt 0, gen 0
[ 1794.646755] BTRFS error (device sda1): bdev /dev/sda1 errs: wr 95, rd 46427, flush 0, corrupt 0, gen 0
[ 1794.652213] BTRFS error (device sda1): bdev /dev/sda1 errs: wr 95, rd 46428, flush 0, corrupt 0, gen 0

User avatar
odroid
Site Admin
Posts: 37753
Joined: Fri Feb 22, 2013 11:14 pm
languages_spoken: English, Korean
ODROIDs: ODROID
Has thanked: 1910 times
Been thanked: 1184 times
Contact:

Re: Odroid HC2 - SATA might be failing?

Post by odroid »

Thank you for the logs. I think there is no issue with raw SATA interface.

There might be some issues with BTRFS driver probably. But, I have no experience of using the BTRFS.
I hope other experts can help you.
These users thanked the author odroid for the post:
Morlan (Thu Jul 08, 2021 6:10 pm)

Morlan
Posts: 9
Joined: Sat Jul 03, 2021 8:26 pm
languages_spoken: english
ODROIDs: 3x Odroid HC2
Has thanked: 1 time
Been thanked: 0
Contact:

Re: Odroid HC2 - SATA might be failing?

Post by Morlan »

Thank you for your time and effort. I’ll try digging in the btrfs direction.

Post Reply

Return to “Issues”

Who is online

Users browsing this forum: No registered users and 2 guests