Solving macOS VM kernel panics on heavily-loaded Proxmox/QEMU/KVM servers

Recently I needed to solve a problem where macOS VMs running on an overloaded Proxmox server (regularly pegged at 100% CPU, load >100) would kernel-panic and reboot about once every 15 minutes. All of the VMs on the box were running a CI workload, so Proxmox was effectively running a CPU torture-test similar to building Chrome in a loop. However, only the macOS guests were experiencing kernel panics.

Because multiple VMs were running a high-CPU workload simultaneously, and with the host’s core count significantly oversubscribed, there was high contention for server resources, so each guest would experience high and variable latency. This wasn’t considered a showstopper because the system was running non-interactive batch jobs, and throughput was more important than latency.

I collected 5 macOS kernel panic logs and compared them. They were very consistent, the main complaint would be a message like this:

Panic(CPU 3, time 12345678910): NMIPI for unresponsive processor: TLB flush timeout, TLB state:0x0

The stacktraces varied a little, but I spotted this trace that contained a call to _panic:

panic(cpu 7, caller 0x....): "Uninterruptible processor(s): CPU bitmap: 0x800, NMIPI acks: 0x0, now: 0x1, deadline=xxx"@/AppleInternal/BuildRoot/Library/Caches/com.apple.xbs/Sources/xnu/xnu-6153.121.2/osfmk/x86_64/pmap.c:2696

mach_kernel : _panic_trap_to_debugger + 0x277
mach_kernel : _panic + 0x54
mach_kernel : _pmap_flush + 0x4a6
mach_kernel : _vm_page_sleep + 0x9e2
mach_kernel : _vm_map_msync + 0x18c
mach_kernel : _madvise + 0xce
mach_kernel : _unix_syscall64 + 0x287
mach_kernel : _hndl_unix_scall64 + 0x16

The XNU kernel is Open Source, so I was able to take a closer look at the panicing function _pmap_flush in pmap.c.

This function flushes the TLB of the current core, signals all of the other cores to do the same, and then waits for them to comply. If they don’t comply in time, this error handler is run:

if (cpus_to_respond && (mach_absolute_time() > deadline)) {
if (machine_timeout_suspended()) {
continue;
}
if (TLBTimeOut == 0) {
if (is_timeout_traced) {
continue;
}

PMAP_TRACE_CONSTANT(PMAP_CODE(PMAP__FLUSH_TLBS_TO),
NULL, cpus_to_signal, cpus_to_respond);

is_timeout_traced = TRUE;
continue;
}
orig_acks = NMIPI_acks;
NMIPI_panic(cpus_to_respond, TLB_FLUSH_TIMEOUT);
panic("Uninterruptible processor(s): CPU bitmap: 0x%llx, NMIPI acks: 0x%lx, now: 0x%lx, deadline: %llu",
cpus_to_respond, orig_acks, NMIPI_acks, deadline);
}

My hypothesis was that there was no actual fault with the host or emulated CPU, and the timeout was simply being triggered because the host was so overloaded that the other guest threads didn’t get scheduled in time to meet the flush deadline. So maybe I could just disable the TLBTimeOut to avoid the panic? In machine_routines.c you can see how to change that setting:

/*
 * TLBTimeOut dictates the TLB flush timeout period. It defaults to
 * LockTimeOut but can be overriden separately. In particular, a
 * zero value inhibits the timeout-panic and cuts a trace evnt instead
 * - see pmap_flush_tlbs().
 */
if (PE_parse_boot_argn("tlbto_us", &slto, sizeof (slto))) {
	default_timeout_ns = slto * NSEC_PER_USEC;
	nanoseconds_to_absolutetime(default_timeout_ns, &abstime);
	TLBTimeOut = (uint32_t) abstime;
} else {
	TLBTimeOut = LockTimeOut;
}

So I added tlbto_us=0 to the kernel’s boot-args in OpenCore in order to disable the TLB flush timeout panic completely. This solved this crash! But in its place the kernel began faulting on various other spinlocks instead :(.

Panic(CPU 9, time 12345678910): NMIPI for spinlock acquisition timeout, spinlock: 0xffffff12345678 ...

It was clear that lock timeouts needed to be increased globally in order to make the kernel happy. Thankfully I noticed this kernel routine:

virtualized = ((cpuid_features() & CPUID_FEATURE_VMM) != 0);
if (virtualized) {
	int	vti;
		
	if (!PE_parse_boot_argn("vti", &vti, sizeof (vti)))
		vti = 6;
	printf("Timeouts adjusted for virtualization (<<%d)\n", vti);
	kprintf("Timeouts adjusted for virtualization (<<%d):\n", vti);

	VIRTUAL_TIMEOUT_INFLATE32(LockTimeOutUsec);
	VIRTUAL_TIMEOUT_INFLATE64(LockTimeOut);
	VIRTUAL_TIMEOUT_INFLATE64(LockTimeOutTSC);
	VIRTUAL_TIMEOUT_INFLATE64(TLBTimeOut);
	VIRTUAL_TIMEOUT_INFLATE64(MutexSpin);
	VIRTUAL_TIMEOUT_INFLATE64(reportphyreaddelayabs);
}

The very purpose of this routine was to increase kernel timeouts when macOS is running as a VM, and this included both the TLBTimeOut and the generic timeouts which were being hit when acquiring a spinlock!

I confirmed that the “VMM” CPU feature flag was already being correctly seen by the guest, which was causing timeouts to be shifted left by the value of the kernel’s vti “Virtualization Timeout Inflation” parameter, which defaults to 6. In other words, kernel timeouts were already being multiplied by 2^6 = 64 compared to macOS running on a bare-metal machine.

Since this Proxmox machine was so overloaded, I decided to bump that vti boot-args parameter up by 3 (i.e. multiply kernel timeouts by a further factor of 8). So my final boot-args were:

keepsyms=1 tlbto_us=0 vti=9

You can confirm that the macOS kernel saw your edits and applied the new value for vti by checking the macOS kernel’s log from the last day like so:

# log show --predicate "processID == 0" --start $(date "+%Y-%m-%d") --debug | grep "Timeouts adjusted"

Timeouts adjusted for virtualization (<<9)

This solved the kernel panic problem completely! In the long-term some of these VMs should either be reshuffled to other hosts, or have their virtual core count reduced, in order to bring the overall load down on Proxmox. This would bring the latency seen by the VMs back into line with what they expect from a bare-metal machine. However this set of macOS boot parameters is an easy fix to increase macOS VM stability in the event that the host has an unexpected load spike.

3 thoughts on “Solving macOS VM kernel panics on heavily-loaded Proxmox/QEMU/KVM servers”

  1. Would it not make sense to increase the CPU units count in Proxmox to give this VM more priority? I’ve done this in the past for VMs that need CPU cycles more than others.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.