For about a year now, my VMware host restarts every few weeks, without any obvious reason. At first I didn't pay too much attention to it; it was inconvenient but I wasn't doing any daily work on it so it didn't bother me too much.
But for the last few months I started using a Windows VM on a daily basis, and then spontaneous restarts (while having way too many files open at once, of course) really get on your nerves.
I put it down to the age of my server (hardware slowly dying is always a possiblity), or maybe the fact that ESXi 6.7 is not really supported, or that I added an unsupported second NIC.
However, after a while I started to notice a pattern: according to the uptimerobot service I use, the server restarts always happened a few minutes after midnight, which seemed weird. So, time for some digging :-)
SSH on the ESXi host, as root:
This log uses UTC, so the majority of host boots indeed happened 6 minutes after midnight (local time). And all on day 6 or 7 of the month - so this must be software related, not hardware.
Next step, the kernel log (a few minutes before the latest restart):
Notice the combination of "Reset" and "vmm0:win10-server-01" on some lines, that is suspicious... VM "win10-server-01" is one of my Windows 10 clients, mainly used as a file and backup server.
So, on to the event logs on that Windows VM: Event Viewer -> Event Viewer (Local) -> Windows Logs -> System
I could match ALL the "6 minutes after midnight" host boot events to entries like this:
APC PowerChute Personal Edition is an application that monitors the APC UPS attached to the ESXi host. As to why it would periodically shut down the client at the beginning of each month: no idea...
I have been experimenting with this in the past, in an attempt to have the application gracefully shut down the ESXi host during a power cutoff. But as far as I can recall I never managed to get that to work - but maybe I did somehow. Even so, I certainly didn't intend it to restart the host instead of doing a shutdown!...
This nice VMware article
Determining why an ESXi host was powered off or restarted (1019238) did not provide any additional info this time, but it might come in handy in the future.
Maybe I will dive into this again some day, but for now I will just uninstall the APC application and unplug the USB cable of the UPS - and see what happens on April 6th or 7th 2022...
Update on April 7th 2022: Still up and running!