I recently started running into a strange issue when administering my homelab Proxmox cluster, where attempting to open a Console on a VM using the default noVNC viewer would constantly result in “Failed to Connect to Server” errors. This was puzzling to me, because this has always worked for me before, and yet no amount of restarting the Proxmox host machines resolved the issue.
There’s a myriad of proposed solutions to this issue that one can find in a Google Search, involving things like updating node certificates, proxy server and SSH configurations. But none of those “solutions” worked for me, and all of my SSH keys worked without issue for authenticating between nodes via the shell. So I ended up just letting the issue go, as it was only a minor inconvenience in my environment since I could just SSH into the nodes directly, instead of opening up a Console window in the Proxmox WebUI.
I ended up discovering the actual solution to this issue indirectly, as a result of trying to troubleshoot why an unrelated Replication task I setup was failing. I noticed this odd error in the Proxmox logs…
That malformed JSON string
error quickly raised an eyebrow, and pointed to the possibility that the sshd was replying to requests with data that the pvesr
(Proxmox Storage Replication) process was not expecting.
Thankfully I obsessively log and document everything I do when messing around with anything in my homelab, and it was these logs that ultimately pointed to the real origin of the problem. My logs reminded me that a few weeks earlier, I installed fastfetch on all of my Proxmox nodes (as I do with all of the shell accounts that I commonly SSH into), so that I ‘m greeted by a pretty, ASCII-art overview of the current state of the system whenever I log into it via SSH:
This seems like such a benign thing to have in your .bashrc
file, so it was the farthest thing from my mind as I was initially troubleshooting this problem. But as it turns out, the pvesr
process was failing because it choked on the output of fastfetch
, since it uses SSH to do whatever it needs to do on the external node, and wasn’t expecting to see all that additional data getting returned to it when it made the SSH connection.
I quickly put two and two together, and realized that noVNC was also likely failing for the very same reason. As soon as I removed the fastfetch
binary from my .bashrc
file on all of the nodes, both my Replication tasks and the noVNC console started immediately working again!
This was a vindication of the reasons why I’m so obsessive of keeping meticulous logs of everything I do on my systems, but also a lesson in the wisdom of Occam’s Razor. In the end, it was the simple removal of 9 characters of text in a .bashrc
file that instantly solved two of my problems at the same time.