Â
This post is looooong overdue. I was supposed to write this post like 2 years back and it should have been my first post. I’ve finally taken out the time and I’m determined to finish & publish this.
First, a little background on this. I finished my higher secondary education ( also referred to as +2 in India) under CHSE Board, Odisha. A youth empowerment initiative launched by our state government offers laptops to the top 15,000 students of the 12th board exams for free. TBH, it’s pretty easy to be in the top. Fast forward, to August 2016, I collected the laptop from my college and went home super excited to use my own first ever laptop! It’s an HP 240g5 entry level laptop that came with Windows 10. As an user, I can say that it’s very portable, decent performance on Linux Mint and sluggish performance on Windows 10, and pretty much average in all department. 4 gigs RAM seems good enough for most use cases ( If you DON’T use Google Chrome :p ). All-in-all, I was stoked about this and started tinkering and customizing it.
Until that time, I had only a vague idea about Linux and had used used only Win XP & 8. After about 2 months, when I thought I had some idea about Linux distros, I decided to give Mint a go. It installed well and everything was good. 20 minutes into testing the disto, it suddenly froze. Like the entire thing just froze and I couldn’t even move to the teletype :cry:. I checked the HDD indicator light and it wasn’t blinking which made me assume it was something related to the drive. I restarted it a couple of times and it happened every single time after random intervals. Thinking this might be a distro issue, I installed Ubuntu. Same. I tried Xubuntu too and nada. It was still happening. Cut to scene, me booting to Windows, frantically googling the error, opening tabs after tabs and scrolling through each an every related answer on Stack overflow. There were various fixes mentioned by different users. I tried each and every method. Screwed up a few times and re-installed the entire thing. The error was still not resolved. Heck, I did not even know the cause. In the meantime, I had started to think that I will never be able to use Linux in this laptop. Maybe I have to convince my parents that I need a new laptop but I sadly realised that my parents don’t know how to use a laptop and my siblings never bothered to know about anything else than Windows and would definitely dismiss this :sob: . I was seriously sad about this, had switched back to Windows and accepted the sad fate and it wasn’t until 3 days later, when I suddenly had this thought that I would not stop until I had solved this. And finally, after 12 hours of StackOverflow-ing, I got a fix that worked. I did not even properly understand what the reason was (I was desperate). The fix said, I had to add intel_idle.max_cstate=1
to my GRUB configuration as a boot parameter then update and reboot. I did and when it booted up, I carefully did some dummy things, like installed some apt-packages, played HD video and anything that I had previously thought to have triggered the freeze. After about 1.5 hours, it had not frozen. It worked!! YES! I rebooted and started customizing it, nervously praying in my head to make sure that it doesn’t happen again. And it did not. :smile: I was extremely excited and proceeded to setup my new desktop and played around with it. And I got this completely out of my mind until recently ( few days back ) when I remembered about this issue out of nowhere. And decided to dig deeper and find out the reason of this error.
Turns out, the bug is actually related to the CPU. This lappy has an Intel Pentium A1020 2.41Ghz processor. As it can guessed from the solution, it has something to do with cstate
and limiting it to a certain value. And it’s exatly that. C-States refer to several power modes that a CPU can be made to go into in order to efficiently use power and save energy. Here’s a post that very concisely expains what the C-States are. Long story short, when the CPU is idle, parts of the CPU can be switched off which saves energy. The levels are semantically separated and have names starting from C0 ( Entire CPU fully operational ), to C3 ( Sleep; Stops CPU’s internal clocks) and finally to C6 ( CPU is deeply powered down). The lower in the level it is, the more time it takes to get back to C0 state when required. This is all good but there are several processors from Intel’s Bay Trail family ( also including some other family ) that is affected by this bug. The bug arose when a certain patch was introduced to the linux kernel. Unfortunately, my lappy’s processor is one of them. The CPU enters a sleep state that is not supported by it and once it’s there, it fails to get back to it’s normal state. It just stays there doing nothing and that’s what caused my laptop to freeze completely randomly and not responding to anything. The fix actually limits the processor from going into low power states hence stays away from the region of error. And the fact that this was related to the linux kernel explains why it worked just fine with windows. So the solution is actually a workaround. It trades uptime with power consumption. And since, I rarely carry my laptop elsewhere ( or even if I do, I do take me charger with me ), this trade wasn’t a problem for me. But if I had to guess, this had adverse effect on my battery and it lost it’s power backup within a year. It’s not able to sustain for more than 10 mins without external power supply. Back to the bug, this ticket tracks this issue for the kernel and it can be seen that the community has done a wonderful job reporting the error. Here is the issue raised in the community forum of Intel about this and it’s sad to see that this hasn’t been fixed from their side even after over a year. Here is a patch from someone within the community that who found a fix for this bug. This SO answer breifly explains the error and the solution, although, this wasn’t the answer where I found the fix. IIRC, someone had commented about this in a thread. And this comment from the above issue thread is one that I found interesting to read.
So even though, this was a tiring wild goose hunt, it was worth it :). This has taught me so many things. For starters, Kernel boot parameters ( one has to be very careful while entering the boot params and type it exactly as it is, if there’s typo, it won’t throw any error, it simply wouldn’t have any effect ), C-States and P-states and how they are very different from each other ( ref ) and so on. Most importantly, this taught me to have patience and determination. I probably wouldn’t have known any of this if I had not decided to come back and I’m extremely glad that I did. I’ve been using Linux Mint since then and have rarely booted into Windows.
Hope this post helps someone in understanding this issue :) Cheers!
References :-
-
https://meta.askubuntu.com/questions/16794/handling-questions-about-the-bay-trail-c-state-bug
-
https://ark.intel.com/products/codename/55844/Bay-Trail?q=bay%20trail#@All
-
https://askubuntu.com/questions/803640/system-freezes-completely-with-intel-bay-trail
-
https://software.intel.com/en-us/blogs/2008/03/12/c-states-and-p-states-are-very-different