tobetter wrote: ↑Fri Jul 05, 2019 2:17 am
Linux kernel and U-boot are updated with patches, please do the commands for HK's Ubuntu users.
Thanks for the updates, I'll try them tomorrow and report back. Today's tests showed that the bug I mentioned in my previous post was also present with the old kernel, but happened about twice as quickly. In other words, the July 1st kernel and
tablesize
are an improvement but still don't produce a fully stable system. Hopefully the new u-boot will improve things further.
Here's the kernel log from the moment the hard disk fails (this is with the old kernel, but the error from the July 1st kernel is basically the same):
Code: Select all
Jul 3 21:46:37 andrews-house kernel: [16770.701997@0] xhci-hcd xhci-hcd.0.auto: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 1
Jul 3 21:46:37 andrews-house kernel: [16770.707044@0] xhci-hcd xhci-hcd.0.auto: Looking for event-dma 00000000cf7bf000 trb-start 00000000cf7d4fc0 trb-end 0000000000000000 seg-start 00000000cf7d4000 seg-end 00000000cf7d4ff0
Jul 3 21:46:37 andrews-house kernel: [16770.707048@0] xhci-hcd xhci-hcd.0.auto: Looking for event-dma 00000000cf7bf000 trb-start 00000000cf7d3000 trb-end 00000000cf7d3000 seg-start 00000000cf7d3000 seg-end 00000000cf7d3ff0
Jul 3 21:46:37 andrews-house kernel: [16770.707053@0] xhci-hcd xhci-hcd.0.auto: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 13
Jul 3 21:46:37 andrews-house kernel: [16770.717784@0] xhci-hcd xhci-hcd.0.auto: Looking for event-dma 00000000cf7bf020 trb-start 00000000cf7d4fc0 trb-end 0000000000000000 seg-start 00000000cf7d4000 seg-end 00000000cf7d4ff0
Jul 3 21:46:37 andrews-house kernel: [16770.717788@0] xhci-hcd xhci-hcd.0.auto: Looking for event-dma 00000000cf7bf020 trb-start 00000000cf7d3000 trb-end 00000000cf7d3000 seg-start 00000000cf7d3000 seg-end 00000000cf7d3ff0
Jul 3 21:46:37 andrews-house kernel: [16770.717792@0] xhci-hcd xhci-hcd.0.auto: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 1
Jul 3 21:46:37 andrews-house kernel: [16770.728444@0] xhci-hcd xhci-hcd.0.auto: Looking for event-dma 00000000cf7bf060 trb-start 00000000cf7d4fc0 trb-end 0000000000000000 seg-start 00000000cf7d4000 seg-end 00000000cf7d4ff0
Jul 3 21:46:37 andrews-house kernel: [16770.728448@0] xhci-hcd xhci-hcd.0.auto: Looking for event-dma 00000000cf7bf060 trb-start 00000000cf7d3000 trb-end 00000000cf7d3000 seg-start 00000000cf7d3000 seg-end 00000000cf7d3ff0
Jul 3 21:46:37 andrews-house kernel: [16770.728484@0] xhci-hcd xhci-hcd.0.auto: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 6
Jul 3 21:46:37 andrews-house kernel: [16770.739104@0] xhci-hcd xhci-hcd.0.auto: Looking for event-dma 00000000cf7bf070 trb-start 00000000cf7d4fc0 trb-end 0000000000000000 seg-start 00000000cf7d4000 seg-end 00000000cf7d4ff0
Jul 3 21:46:37 andrews-house kernel: [16770.739108@0] xhci-hcd xhci-hcd.0.auto: Looking for event-dma 00000000cf7bf070 trb-start 00000000cf7d3000 trb-end 00000000cf7d3000 seg-start 00000000cf7d3000 seg-end 00000000cf7d3ff0
Jul 3 21:47:07 andrews-house kernel: [16800.967917@0] usb 2-1.4: reset SuperSpeed USB device number 5 using xhci-hcd
Jul 3 21:47:38 andrews-house kernel: [16831.932018@1] usb 2-1.4: reset SuperSpeed USB device number 5 using xhci-hcd
Jul 3 21:47:48 andrews-house kernel: [16842.076226@1] usb 2-1.4: reset SuperSpeed USB device number 5 using xhci-hcd
Jul 3 21:48:04 andrews-house kernel: [16858.459982@1] usb 2-1.4: reset SuperSpeed USB device number 5 using xhci-hcd
Jul 3 21:48:04 andrews-house kernel: [16858.559862@0] usb 2-1.4: reset SuperSpeed USB device number 5 using xhci-hcd
Jul 3 21:48:15 andrews-house kernel: [16868.699941@1] usb 2-1.4: reset SuperSpeed USB device number 5 using xhci-hcd
Jul 3 21:48:15 andrews-house kernel: [16868.720356@0] sd 1:0:0:0: Device offlined - not ready after error recovery
Jul 3 21:48:15 andrews-house kernel: [16868.720395@0] sd 1:0:0:0: [sdb] tag#0 FAILED Result: hostbyte=DID_ABORT driverbyte=DRIVER_OK
Jul 3 21:48:15 andrews-house kernel: [16868.720410@0] sd 1:0:0:0: [sdb] tag#0 CDB: Read(16) 88 00 00 00 00 00 25 07 23 20 00 00 00 40 00 00
Jul 3 21:48:15 andrews-house kernel: [16868.720417@0] blk_update_request: I/O error, dev sdb, sector 621224736
Jul 3 21:48:15 andrews-house kernel: [16868.721426@1] sd 1:0:0:0: rejecting I/O to offline device
Jul 3 21:48:15 andrews-house kernel: [16868.726921@1] blk_update_request: I/O error, dev sdb, sector 621224800
Jul 3 21:48:15 andrews-house kernel: [16868.733381@4] sd 1:0:0:0: rejecting I/O to offline device
Jul 3 21:48:15 andrews-house kernel: [16868.738660@4] blk_update_request: I/O error, dev sdb, sector 621224864
Jul 3 21:48:15 andrews-house kernel: [16868.747008@4] sd 1:0:0:0: rejecting I/O to offline device
Jul 3 21:48:15 andrews-house kernel: [16868.750458@4] blk_update_request: I/O error, dev sdb, sector 621224424
Jul 3 21:48:15 andrews-house kernel: [16868.757183@4] sd 1:0:0:0: rejecting I/O to offline device
Jul 3 21:48:15 andrews-house kernel: [16868.762421@4] blk_update_request: I/O error, dev sdb, sector 621224424
Jul 3 21:48:15 andrews-house kernel: [16868.768897@4] Buffer I/O error on dev sdb, logical block 77653053, async page read
I was able to restart the disk simply by pulling out its USB cable and plugging it back in, so whatever's happening here certainly isn't as severe as the original bug.
@tobetter - I'm planning to spend my time tweaking the
tablesize
and
max_sectors_kb
values in the latest kernel+u-boot, but I'm happy to look into other things if anything catches your eye in the log above. Also, if I understand correctly, the new recommended workaround is:
apt-get update && apt-get upgrade
(ignoring the U-boot blob in the link above, which is no longer necessary)
- manually edit
/media/boot/boot.ini
as described above
- set
max_sectors_kb
as described in the top post
If that's all correct, I'll update the top post again.
elatllat wrote: ↑Thu Jul 04, 2019 8:56 pm
Anyone try this on the 5.2 kernel?
Is this an issue for non UAS devices? ( I have to add my devices to quirks to get smartctl to work anyway )
This is is definitely an issue for non-UAS devices. People have reported lots of USB devices with high enough sustained transfer speeds triggering the bug, it's just that UAS disk drives are the most common example (also, I didn't really understand the problem when I chose the thread title

). I'll mention that in the next update to the top post.