Launching an AI breaks Firefox, launching a Steam game repairs it

4 April 2023

Computer science can be fun. Bugs even more.

A wobbly bug.

I recently discovered the amazing pop2piano program. After I used it, Firefox started to crash on any ‘rich’ webpage (hackernews would be one of the rare sites to survive, but anything with some heavy js would not make it). Instant crash. In safe mode, the sites would crash the tab, and Firefox would survive. I noticed that after playing a game, Firefox worked again. The next time it happened, launching a game did not fix Firefox. Oh, but this game was a Linux native one, whereas the previous one was launched with Proton. Pronto, I found some Proton game, launched it and quickly closed it. Firefox worked again. In the meantime I had noticed that the crash message was “glx: unable to require VAAPI”. This points to hardware acceleration using the GPU. The problem is clearly linked to the Nvidia driver, and it seems that the pytorch usage must have changed something on disk that made Firefox unable to use it properly, while Proton would undo this change. Without investigating more, I have a plausible explanation. Nonetheless, if a normal user came with such a bug report, I would certainly assume they think computers are magic.

Last year, the same computer started to take around 5 minutes to boot instead of 30 seconds. I never liked the motherboard (it’s a weird legacy/UEFI hybrid, and 30 seconds is too damn much). Once booted, there were a few weird graphical glitches, performance would be worse than usual, and power management would not work (so impossible to suspend). On the software side, I had not made any significant change. I tried updating the system, without success. I decided to upgrade the motherboard firmware, and try to optimize boot parameters. After a long time searching the vendor’s sites to find the download links and documentation, including messages “please avoid bricking your own hardware”, I installed and all I got was a new boot splash screen.
So I removed all peripherals, printer, tablet, gamepad, …, which changed nothing. Next step, I opened the case to dust and clean everything off. Still no success. At this point I wondered if it could be an issue with the CPU because of thermal paste degradation, but I had run all extensive hardware diagnostics (the Phoronix suite and anything that I found documented) and not found anything abnormal, so I refrained from touching it. That is when I noticed one back USB port was covered by a 2 millimeters black plastic thing. I had put that bluetooth dongle to connect a keyboard once the week before. I removed it, and everything worked as before. Boot time was even significantly reduced by the firmware and parameter updates. Turns out that this series of bugs happened if that dongle was connected at boot time. This is probably the worst bug I encountered, as I spent 2 weeks wondering what was broken, what I should upgrade, and how expensive that would be. I still have zero idea how such a thing can happen (especially the graphical interface glitches), but once it was solved I did not have the heart to lose more time over it.

There are many crazy bugs that don’t seem to make sense. The most famous being probably the emails that would not go beyond 500 miles, but most people probably encounter similarly inane issues without even knowing.

Sometimes I’m just in awe at how the tower of leaky abstractions we have between human and hardware can even work at all.