Check your Thermal Paste...Here's why it's Important

Moderators: mdrjr, odroid

Check your Thermal Paste...Here's why it's Important

Unread postby hominoid » Sat Jul 15, 2017 12:36 am

Everyone should check the thermal paste on their stock active and passive coolers, especially if you are running a demanding application or in a tough environment. Here are 5 typical examples of 20 XU4's I recently inspected. Only 2 of 20 were marginally better than the pictured examples and none had 100% coverage of the SOC. Examine the paste impression on the SOC and the heatsink.

S1370004s.png
S1370004s.png (1.4 MiB) Viewed 238 times
S1370006s.png
S1370006s.png (1.42 MiB) Viewed 238 times
S1370008s.png
S1370008s.png (1.48 MiB) Viewed 238 times
S1370011s.png
S1370011s.png (1.43 MiB) Viewed 238 times
S1370012s.png
S1370012s.png (1.48 MiB) Viewed 238 times

I've been running 2 clusters of 10 XU4's and recently found one unit running at half or less of the typical performance. Anytime the system was loaded the A15's seemed to go off line. After burning a fresh image on the SD card the problem still persisted. Wanting to eliminate the possibility of a bad SD card corrupting the OS, I swapped SD cards with the unit next to it which was operating fine. The problem still persisted which seemed to indicate a hardware problem. While investigating further and monitoring the freq and temp using watchtemp.sh, as soon as the cores were loaded, I briefly saw the temp hit 125 before the A15's subsequent shutdown. An error from watchtemp.sh indicated there was no file or directory for the script command "cat /sys/devices/system/cpu/cpu4/cpufreq/scaling_cur_freq" for all A15 cores, respectively. After several seconds they would comeback on line, watchtemp.sh would report the correct frequency for all A15's and then go back down, repeatedly. At that point my list of possible suspects were a bad fan, fan connection or a cocked heatsink.

I went and inspected the unit and found the fan running so I shut everything down to inspect it further. The fan connection seemed fine too. The heatsink was sitting flush so I took it off to look at the paste job. I was a little surprised with what I saw and immediately opened up and inspected all 20 units. All 20 units had insufficient thermal paste applied to the SOC. After cleaning and reapplying thermal paste to all the XU4's the temperature and the performance is where I would have expected; accept for the 1 unit which now appears to be permanently damaged. As stated above I believe I have ruled out a possible OS, application or SD card issue and need to RMA one XU4. I don't know that inadequate thermal paste caused this units problem directly but there most likely was an indirect consequence and possibly a questionable SOC. Everyone running a stock HK cooler and thermal paste needs to inspect their XU4's as soon as possible and be prepared to replace the thermal paste. Hard Kernel, I believe you may have a quality control issue. Please authorize a RMA for one XU4 unless I have missed something.
hominoid
 
Posts: 71
Joined: Tue Feb 28, 2017 3:55 am
Location: Lake Superior Basin, USA
languages_spoken: english
ODROIDs: XU4

Re: Check your Thermal Paste...Here's why it's Important

Unread postby odroid » Sat Jul 15, 2017 10:18 am

Thank you for the very detail investigation.
We had to check the thermal paste spreading process carefully. :oops: We will improve the process quickly.

One of your XU4 boards must have a hardware issue.
Please contact "odroid at hardkernel dot com" with a link of this thread.
She will help your RMA process if you purchased the board from us directly.
Otherwise, contact your local distributor. Sorry for the inconvenience caused.
User avatar
odroid
Site Admin
 
Posts: 23684
Joined: Fri Feb 22, 2013 11:14 pm
languages_spoken: English
ODROIDs: ODROID

Re: Check your Thermal Paste...Here's why it's Important

Unread postby DarkBahamut » Sun Jul 16, 2017 5:34 am

Stock thermal paste jobs are usually pretty bad and really the best advice is to inspect and replace the thermal paste on anything that comes with a pre-fitted heatsink. That said, on most components removing the heatsink will void your warranty, so you often get stuck with a pretty poor job.

Here's the thermal paste quality on a at the time £500 GPU (GTX 780) that puts out >250w of heat...

Image

It's no better than those XU4's, even though it's far more expensive and far more power hungry. A significant portion of the die has no paste at all.

In that last XU4 picture, is that paste dry? That would be rather concerning if that happens, because it questions the quality of the paste regardless of application.
DarkBahamut
 
Posts: 229
Joined: Tue Jan 19, 2016 10:19 am
languages_spoken: english
ODROIDs: XU4

Re: Check your Thermal Paste...Here's why it's Important

Unread postby odroid » Sun Jul 16, 2017 10:53 am

I have no idea about the phenomenon of the paste dry.
We started to use the SG502 thermal paste over a year ago.
Please review the specification and let me know if you have any concern.
https://www.acc-silicones.com/content/p ... sg502.ashx
User avatar
odroid
Site Admin
 
Posts: 23684
Joined: Fri Feb 22, 2013 11:14 pm
languages_spoken: English
ODROIDs: ODROID

Re: Check your Thermal Paste...Here's why it's Important

Unread postby hominoid » Sun Jul 16, 2017 11:36 pm

Thanks @Odroid for posting the thermal paste specs. I have been wondering what the stock paste was for awhile and just hadn't posted the question.
DarkBahamut wrote:Stock thermal paste jobs are usually pretty bad and really the best advice is to inspect and replace the thermal paste on anything that comes with a pre-fitted heatsink.

I agree and have always followed that advice except in this case. Lesson learned...again.
DarkBahamut wrote:In that last XU4 picture, is that paste dry? That would be rather concerning if that happens, because it questions the quality of the paste regardless of application.

No it is not dry, I was looking for that myself. One in particular seemed to have a lot less paste than the others and I contributed the low volume of paste and probably higher heat to what I saw. But, I didn't need to get my putty knife out to scrape the paste off. :)

Listen everyone, If your a causal user this should be a concern but is probably not as big of a deal. If you run your system(s) hard or in a high temp environment this becomes more important and should be a higher priority for you to address. The thermal stress on the SOC can be significant in these situations and everyone just needs to be aware of it. In my view Hard Kernel has responded appropriately and has stated they are making changes to improve the process. I only wish other manufacturers were as responsive.
hominoid
 
Posts: 71
Joined: Tue Feb 28, 2017 3:55 am
Location: Lake Superior Basin, USA
languages_spoken: english
ODROIDs: XU4

Re: Check your Thermal Paste...Here's why it's Important

Unread postby odroid » Mon Jul 17, 2017 9:41 pm

I've check with our manufacturing people.
According them, they already changed the process from the middle of June like the right side of this picture.
Paste_process.png
Paste_process.png (1.19 MiB) Viewed 92 times


We opened several random samples and removed the heatsink and took pictures.
They looks much better.
after_removing_heatsink.png
after_removing_heatsink.png (1.35 MiB) Viewed 92 times
User avatar
odroid
Site Admin
 
Posts: 23684
Joined: Fri Feb 22, 2013 11:14 pm
languages_spoken: English
ODROIDs: ODROID


Return to Hardware and peripherals

Who is online

Users browsing this forum: boomslang124 and 4 guests