DrGlitch's Weblog - TIL: I see dead cluster nodes

My pet kubernetes cluster had a spontaneous outage yesterday. Well, what else is there to expect when the data center has a power failure?

I found time to look into the matter today. First of all, I checked the cluster nodes - hm, OK, alive and happy.
The kube-system namespace was not as good, and I found flannel and coredns deployments failing, already in graceful step-back.

I managed to reveal a number of issues (systemd mount units for bind mounts getting triggered too late, etc) but failed to resolve at first - even after completely purging and re-installing latest flannel (kubectl delete ... , kubectl apply ...).

Finally, kubelet logs on one of the nodes revealed the actual root cause: systemd-resolved was not running - and apparently neither enabled nor started by default!

Nevermind the greetings from the "how did this ever work!?"-department...
After adding the necessary steps to my k8s management fabric and running the task, things automagically started to settle :)

And again: it's always DNS.

Sun	Mon	Tue	Wed	Thu	Fri	Sat
January 2022
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30
31

DrGlitch's Weblog Stuff that bounced my mind.

TIL: I see dead cluster nodes

Pingbacks

Trackbacks

Comments