Tobold's Blog
Friday, October 20, 2023
 
Weird computer repair

My new PC had some problems. Of course they were of the worst possible kind: Randomly appearing crashes that are not reproducible. So, when I brought the computer back to the shop, they couldn't find any problem; but back home the computer was still crashing at least once a day. And the Lamplighters League game crashed to desktop at least once per hour.

Now I am not an expert on computers, but I had various computers since my first ZX81 back in 1981. So with the help of experts on forums and various diagnostic software, I could at least exclude some possible problems: My CPU and GPU didn't seem to be overheating, my computer isn't overclocked, and there isn't anything wrong with the various drivers. And by looking at crash logs and dump files, the errors seemed to be related to addressing memory.

So I remembered a piece of software that had helped me in the past with identifying flaky RAM, called Memtest86. It's free, so I downloaded it, created a USB boot drive with it, found out how to change the boot sequence in my BIOS, and started testing. And yes, Memtest86 identified bad memory, and could even tell me where: Around the 24 GB address. Now my previous computer only had 16 GB, and I know that is sufficient for many applications. The new computer had two 16 GB DDR5 RAM bars, and I could identify which one was the "first" and which one was the "second".

So I pulled out the second RAM bar, restarted my computer, and now it doesn't crash anymore. Even Lamplighters League is now completely stable. Weird, how by simply pulling out a component I managed to repair my computer. And of course, now that I have a Memtest86 log and the problem identified, the shop I bought my computer from is willing to exchange the faulty RAM for a new one. I'll get that next week, but until then the PC is running happily on 16 GB.


Comments:
Yeah bad memory is always tricky to diagnose. Never heard of that memtest application so that is good to know. I had a bad memory stick recently and it took me forever to narrow it down to that.
 
I remember having very similar issue, with the exact same investigation and solution - different root cause, the issue was not the RAM but a bent pin on the processor socket.

It was in 2009.

Nice to see that the same tool is still as usefull 24 years later ! ( the tool was already old at this point).
 
Why didn't the shop where you purchased the computer do these same tests? Glad you found the problem.
 
Well done identifying the issue!
Although it is disappointing that your PC shop wasn't able/willing to do so.
 
Tolbold: "Weird, how by simply pulling out a component I managed to repair my computer."

Why is that weird? It's an electric circuit. So a faulty component affects the whole system.
 
What's weird about it is that the fault component happened to be one that's essentially optional, and in this case you're better off without it. I knew it sounded like memory as soon as you said what it wasn't. And I also used memtest many years ago to diagnose a similar problem. It's impressive that it still does the job well after all these years.
 
Just because it's optional doesn't mean it can't affect the rest.
If you add an optional loop in a pipe system but that's leaky, then the whole system is leaking. In case of a PC it will trigger safety measures and shut down.

And the first memtest might have been written years ago but it has been updated to the latest technologies. So it's not old software.
 
Huge congratulations on some very tricky debugging Tobold. Even if you aren't a computer expert your engineering problem solving skills are clearly still first rate. When you get the machine back be sure to check that your memory is working in dual channel mode (CPU Z or HWinfo64).
 
Post a Comment

<< Home
Newer›  ‹Older

  Powered by Blogger   Free Page Rank Tool