Working around the AMD GPU Reset bug on Proxmox using vendor-reset

Most modern AMD GPUs suffer from the AMD Reset Bug: The card cannot be reset properly, so it can only be used once per host power-on. The second time the card is tried to be used Linux will attempt to reset it and fail, causing the VM launch to fail, or the guest, host or both to hang.

gnif’s new vendor-reset project is an attempt to work around this AMD reset issue by replacing AMD’s missing FLR support with vendor-specific reset quirks.

The current lineup of supported GPUs includes various Polaris, Vega and Navi models, including GPUs in the same series as the RX 480, RX 540, RX 580, Vega 56/64, Radeon VII, 5500XT, 5700XT, Pro 5600M (see the repo for the full list of supported chipsets).

Installing vendor-reset on Proxmox

First, update the kernel to the latest version and reboot (i.e. apt update && apt dist-upgrade). Otherwise the kernel headers fetched by pve-headers won’t match the currently-running kernel, and dkms will fail to build the package.

Now you can install vendor-reset like so:

# Get latest Proxmox kernel headers:
apt install pve-headers

# Did that fail? If so make sure you have Proxmox repository set up properly! https://pve.proxmox.com/wiki/Package_Repositories

# Get required build tools:
apt install git dkms build-essential

# Perform the build:
git clone https://github.com/gnif/vendor-reset.git
cd vendor-reset
dkms install .

# Enable vendor-reset to be loaded automatically on startup:
echo "vendor-reset" >> /etc/modules
update-initramfs -u

# Reboot to load the module:
shutdown -r now

Now when you start a VM that uses an AMD GPU, you’ll see messages like this appear in your dmesg output, showing that the new reset procedure is being used:

vfio-pci 0000:03:00.0: AMD_POLARIS10: version 1.0
vfio-pci 0000:03:00.0: AMD_POLARIS10: performing pre-reset
vfio-pci 0000:03:00.0: AMD_POLARIS10: performing reset
vfio-pci 0000:03:00.0: AMD_POLARIS10: GPU pci config reset
vfio-pci 0000:03:00.0: AMD_POLARIS10: performing post-reset
vfio-pci 0000:03:00.0: AMD_POLARIS10: reset result = 0

Unfortunately, with my RX 580, this module didn’t solve the reset issue for me, at least with macOS guests. However on a bunch of newer AMD GPUs, vendor-reset is the answer to your prayers!

(Before adding vendor-reset, I got errors like this reported from the PCIe root port the second time the card was initted by a macOS guest:)

pcieport 0000:00:02.0: AER: Uncorrected (Non-Fatal) error received: 0000:00:02.0
pcieport 0000:00:02.0: AER: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Requester ID)
pcieport 0000:00:02.0: AER: device [8086:3c04] error status/mask=00004000/00000000
pcieport 0000:00:02.0: AER: [14] CmpltTO (First)
pcieport 0000:00:02.0: AER: Device recovery successful

After that the host’s kernel threads started reporting soft lockups until the whole host was brought down.

Now I no longer see those AER messages, but I still get “DMAR: DRHD: handling fault status reg 40”, followed by soft lockups that kill the host.

41 thoughts on “Working around the AMD GPU Reset bug on Proxmox using vendor-reset”

  1. Saw this on Level1techs forum, and I had a feeling you would post about it too. I am using a WX5100 in my server, and that has been quite patchy as the reset bug sometimes affected it, sometimes it didn’t. Since I have installed this module though, it has been working properly every time the guest OS resets or shuts down.
    Very happy with the current behaviour.

    Cheers!

  2. Having a problem at this step:

    $ dkms install .

    Creating symlink /var/lib/dkms/vendor-reset/0.0.18/source ->
    /usr/src/vendor-reset-0.0.18

    DKMS: add completed.
    Error! Your kernel headers for kernel 5.4.34-1-pve cannot be found.
    Please install the linux-headers-5.4.34-1-pve package,
    or use the –kernelsourcedir option to tell DKMS where it’s located

    All steps prior to that were done. Well, I’m not sure about this one since you’re not explicit about which repos are needed.

    # Get latest Proxmox kernel headers:
    apt install pve-headers

    # Did that fail? If so make sure you have Proxmox repository set up properly! https://pve.proxmox.com/wiki/Package_Repositories

    Yes, it failed and I added the following to /etc/apt/sources.list:
    # PVE pve-no-subscription repository provided by proxmox.com,
    # NOT recommended for production use
    deb http://download.proxmox.com/debian/pve buster pve-no-subscription

    Any ideas?

    I have to say, your guides are amazingly useful. You have no idea how grateful I am to have someone to learn from.

    1. After adding that repo you’ll need to run apt update to fetch it, then you’ll want to apt dist-upgrade to get all the Proxmox updates you’ve been missing out on. Then you can apt install pve-headers, before you can run dkms.

      If during your upgrades you end up installing a new kernel, the headers on disk won’t match the kernel that is currently running until you reboot Proxmox, so give Proxmox a reboot for good luck.

    2. I ran

      dkms install .

      and had the wrong headers.

      Had to run

      apt install pve-headers-$(uname -r)

      and then

      dkms install vendor-reset/0.0.18

      in the vendor-reset directory

      1. Yep that’s why I said to upgrade the kernel and restart first. Make sure you do also have “pve-headers” installed, because otherwise the next time the kernel is updated dkms will fail to rebuild the package for you, since the new headers won’t be downloaded for you automatically. (pve-headers is a metapackage that points to the newest headers)

  3. First off… Thank You Nick for this post. I was able to apply the vendor-reset to my proxmox server and now I can power on and off my VM with my AMD RX570 4G passed through at will. I use this for hardware encoding of video streams from my Emby (like Plex) Media Server VM . I also have 2 other VM’s with the NVIDIA GTX 1050ti and Intel QuickSync (both passed through) also for Emby Media Server.

    I would also like to give my eternal gratitude to gnif for his work on this as well!

  4. Like you I still have issues getting the RX 580 to work consistently during passthrough, even with vendor-reset installed. I run into issues when sleeping on Windows VMs (which I try to avoid with any KVM) and switching between macOS and Windows. Any experiences there?

    Should probably wait for an RDNA 2 based lower end GPU or install a GTX 970, but that means not being able to use anything higher than High Sierra.

    1. I haven’t tried sleeping on any of my VMs. But yeah I have the same issues as you switching between mac and Windows. Big Navi is sounding interesting for a future purchase, maybe in a couple of generations once they’re cheap, lol

  5. Nick, thanks for all the posts. Have you tried passing in your GPU VBIOS via libvirt/qemu? This helped for resetting my 5700 XT via vendor-reset (even though there was nothing wrong with the identical VBIOS as already stored on the card). Good luck with your rig…it is so nice finally to be able to reset reasonable well!

    1. No, I haven’t tried that one, it may be worth a go…

      Did you download a BIOS from the web or just dump your current one?

      1. Either way will work. You can see your vBIOS firmware version in GPU-z under windows. You can dump it there, or with the AMD linux utility that is floating around. Or you could just grab the version number and download it from the techpowerup website.

        FWIW, I do not need to do this hack anymore. It was fixed for me by some combination of updating my motherboard firmware and/or updating to the latest vendor-reset build.

        Thanks again for all the write-ups!

  6. Hi,

    first of all thanks for your great work!

    After following all the steps in this tutorial:
    # Enable vendor-reset to be loaded automatically on startup:
    echo “vendor-reset” >> /etc/modules
    update-initramfs -u

    I get this error in the terminal:
    Running hook script ‘zz-pve-efiboot’..
    Re-executing ‘/etc/kernel/postinst.d/zz-pve-efiboot’ in new private mount namespace..
    No /etc/kernel/pve-efiboot-uuids found, skipping ESP sync.

    I don’t know what this means and how to fix it. After rebooting I’m unable to boot into Mac OS. I can see the Proxmox bootlogo on screen, and I can choos the disk I won’t to boot. But after selecting there is no Apple Logo, it is just a black screen without a error in Proxmox itself.

    Would be glad if anyone could help me with this.

    1. That one’s not an error, you can ignore that message. Which GPU model are you using, and did you make sure to set vga to none to disable the emulated video?

  7. Thanks for the fast reply!

    Ok, didn’t know that!

    I’m using AMD Radeon VII (Vega 20). I set the display to none.

    I get an similar error within Windows 10 where I can’t install any AMD drivers.

    I set up the GPU forwarding like in one of your tutorials.

    1. BTW. in Proxmox I get this error:
      TASK ERROR: start failed: command ‘/usr/bin/kvm -id 200 -name Catalina… +invtsc” failed: got timeout

      1. You can get timeouts like this due to a delay in allocating ram, or it could be a failure to reset the card. Check dmesg for card reset errors.

        If it’s just caused by ram allocation delays, start the VM like this to bypass the timeout “qm showcmd 200 | bash”

    2. I would work on solving the Windows VM problem first since it’s the easiest platform.

      Can you show the hostpci lines from your VM config file?

      1. This is the config file from in 10 VM:

        agent: 1
        bios: ovmf
        boot: order=scsi0;net0
        cores: 8
        cpu: kvm64,flags=+pdpe1gb
        efidisk0: local-lvm:vm-101-disk-1,size=4M
        hostpci0: 03:00,pcie=1,x-vga=1
        localtime: 1
        machine: q35
        memory: 16384
        name: win10pro
        net0: e1000=FA:9A:B7:FA:57:65,bridge=vmbr0,firewall=1
        numa: 1
        ostype: win10
        scsi0: local-lvm:vm-101-disk-0,cache=writeback,size=60G
        scsi1: /dev/disk/by-id/ata-ST2000DX002-2DV164_Z4ZCSG09,size=1953514584K
        scsihw: virtio-scsi-pci
        smbios1: uuid=dcafd7eb-840b-446c-a220-6a4bef0ad214
        sockets: 1
        usb0: host=1-7.1.3.2,usb3=1
        usb1: host=1-7.1.1,usb3=1
        vga: none
        vmgenid: 371ef0b1-f007-453a-b18d-7d40bea27ccc

        Do you need a list of all PCI devices?

        1. That is the GPU in the PCI list:

          01:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Device 14a0 (rev c1)
          02:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Device 14a1
          03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Vega 20 [Radeon VII] (rev c1)
          03:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Vega 20 HDMI Audio [Radeon VII]

          1. Not in Proxmox. in Windows this:
            After every restart my AMD GPU driver installation is completely gone. I can install AMD Adrenalin but at the end of the installation it says installation is complete and if I won’t to launch the application or restart the system. Launching the application gives me the error that not GPU is recognized and rebooting brings me to the error mentioned before.

            1. Can you show the dmesg output that shows the card being reset successfully then? vendor-reset prints a bunch of messages during reset.

              1. Sorry for that, that seems to be a lot of code:

                [ 38.048119] i915 0000:00:02.0: enabling device (0000 -> 0003)
                [ 38.048711] i915 0000:00:02.0: VT-d active for gfx access
                [ 38.048712] checking generic (6030000000 1300000) vs hw (4000000000 10000000)
                [ 38.049705] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
                [ 38.049706] [drm] Driver supports precise vblank timestamp query.
                [ 38.051171] iwlwifi 0000:00:14.3: Detected Intel(R) Dual Band Wireless AC 9560, REV=0x318
                [ 38.051516] snd_hda_intel 0000:03:00.1: Handle vga_switcheroo audio client
                [ 38.051765] [drm] Finished loading DMC firmware i915/kbl_dmc_ver1_04.bin (v1.4)
                [ 38.059260] iwlwifi 0000:00:14.3: Applying debug destination EXTERNAL_DRAM
                [ 38.059554] iwlwifi 0000:00:14.3: Allocated 0x00400000 bytes for firmware monitor.
                [ 38.064592] [drm] amdgpu kernel modesetting enabled.
                [ 38.064761] amdgpu 0000:03:00.0: remove_conflicting_pci_framebuffers: bar 0: 0x6030000000 -> 0x603fffffff
                [ 38.064763] amdgpu 0000:03:00.0: remove_conflicting_pci_framebuffers: bar 2: 0x6040000000 -> 0x60401fffff
                [ 38.064764] amdgpu 0000:03:00.0: remove_conflicting_pci_framebuffers: bar 5: 0x56100000 -> 0x5617ffff
                [ 38.064765] checking generic (6030000000 1300000) vs hw (6030000000 10000000)
                [ 38.064766] fb0: switching to amdgpudrmfb from EFI VGA
                [ 38.077766] Console: switching to colour dummy device 80x25
                [ 38.077789] amdgpu 0000:03:00.0: vgaarb: deactivate vga console
                [ 38.077827] amdgpu 0000:03:00.0: enabling device (0006 -> 0007)
                [ 38.077988] [drm] initializing kernel modesetting (VEGA20 0x1002:0x66AF 0x1002:0x081E 0xC1).
                [ 38.077999] [drm] register mmio base: 0x56100000
                [ 38.078000] [drm] register mmio size: 524288
                [ 38.078009] [drm] add ip block number 0
                [ 38.078010] [drm] add ip block number 1
                [ 38.078010] [drm] add ip block number 2
                [ 38.078011] [drm] add ip block number 3
                [ 38.078011] [drm] add ip block number 4
                [ 38.078012] [drm] add ip block number 5
                [ 38.078012] [drm] add ip block number 6
                [ 38.078013] [drm] add ip block number 7
                [ 38.078013] [drm] add ip block number 8
                [ 38.078014] [drm] add ip block number 9
                [ 38.078031] ATOM BIOS: 113-D3600200-106
                [ 38.078812] [drm] UVD(0) is enabled in VM mode
                [ 38.078813] [drm] UVD(1) is enabled in VM mode
                [ 38.078813] [drm] UVD(0) ENC is enabled in VM mode
                [ 38.078813] [drm] UVD(1) ENC is enabled in VM mode
                [ 38.078814] [drm] VCE enabled in VM mode
                [ 38.078844] [drm] vm size is 262144 GB, 4 levels, block size is 9-bit, fragment size is 9-bit
                [ 38.078853] amdgpu 0000:03:00.0: BAR 2: releasing [mem 0x6040000000-0x60401fffff 64bit pref]
                [ 38.078855] amdgpu 0000:03:00.0: BAR 0: releasing [mem 0x6030000000-0x603fffffff 64bit pref]
                [ 38.078868] pcieport 0000:02:00.0: BAR 15: releasing [mem 0x6030000000-0x60401fffff 64bit pref]
                [ 38.078869] pcieport 0000:01:00.0: BAR 15: releasing [mem 0x6030000000-0x60401fffff 64bit pref]
                [ 38.078871] pcieport 0000:00:01.0: BAR 15: releasing [mem 0x6030000000-0x60401fffff 64bit pref]
                [ 38.078879] pcieport 0000:00:01.0: BAR 15: assigned [mem 0x4200000000-0x47ffffffff 64bit pref]
                [ 38.078880] pcieport 0000:01:00.0: BAR 15: assigned [mem 0x4200000000-0x47ffffffff 64bit pref]
                [ 38.078882] pcieport 0000:02:00.0: BAR 15: assigned [mem 0x4200000000-0x47ffffffff 64bit pref]
                [ 38.078884] amdgpu 0000:03:00.0: BAR 0: assigned [mem 0x4400000000-0x47ffffffff 64bit pref]
                [ 38.078890] amdgpu 0000:03:00.0: BAR 2: assigned [mem 0x4200000000-0x42001fffff 64bit pref]
                [ 38.078897] pcieport 0000:00:01.0: PCI bridge to [bus 01-03]
                [ 38.078898] pcieport 0000:00:01.0: bridge window [io 0x5000-0x5fff]
                [ 38.078900] pcieport 0000:00:01.0: bridge window [mem 0x56100000-0x562fffff]
                [ 38.078902] pcieport 0000:00:01.0: bridge window [mem 0x4200000000-0x47ffffffff 64bit pref]
                [ 38.078905] pcieport 0000:01:00.0: PCI bridge to [bus 02-03]
                [ 38.078906] pcieport 0000:01:00.0: bridge window [io 0x5000-0x5fff]
                [ 38.078909] pcieport 0000:01:00.0: bridge window [mem 0x56100000-0x561fffff]
                [ 38.078912] pcieport 0000:01:00.0: bridge window [mem 0x4200000000-0x47ffffffff 64bit pref]
                [ 38.078915] pcieport 0000:02:00.0: PCI bridge to [bus 03]
                [ 38.078917] pcieport 0000:02:00.0: bridge window [io 0x5000-0x5fff]
                [ 38.078920] pcieport 0000:02:00.0: bridge window [mem 0x56100000-0x561fffff]
                [ 38.078923] pcieport 0000:02:00.0: bridge window [mem 0x4200000000-0x47ffffffff 64bit pref]
                [ 38.078933] amdgpu 0000:03:00.0: VRAM: 16368M 0x0000008000000000 - 0x00000083FEFFFFFF (16368M used)
                [ 38.078934] amdgpu 0000:03:00.0: GART: 512M 0x0000000000000000 - 0x000000001FFFFFFF
                [ 38.078936] amdgpu 0000:03:00.0: AGP: 267894784M 0x0000008400000000 - 0x0000FFFFFFFFFFFF
                [ 38.078941] [drm] Detected VRAM RAM=16368M, BAR=16384M
                [ 38.078942] [drm] RAM width 4096bits HBM
                [ 38.079026] [drm] amdgpu: 16368M of VRAM memory ready
                [ 38.079029] [drm] amdgpu: 16368M of GTT memory ready.
                [ 38.079038] [drm] GART: num cpu pages 131072, num gpu pages 131072
                [ 38.079115] [drm] PCIE GART of 512M enabled (table at 0x00000080012E6000).
                [ 38.084142] [drm] use_doorbell being set to: [true]
                [ 38.084887] [drm] use_doorbell being set to: [true]
                [ 38.084937] amdgpu: [powerplay] hwmgr_sw_init smu backed is vega20_smu
                [ 38.086660] [drm] Found UVD firmware ENC: 1.2 DEC: .43 Family ID: 19
                [ 38.086663] [drm] PSP loading UVD firmware
                [ 38.087844] [drm] Found VCE firmware Version: 57.6 Binary ID: 4
                [ 38.087865] [drm] PSP loading VCE firmware
                [ 38.099752] iwlwifi 0000:00:14.3: base HW address: 04:ea:56:b4:04:f6
                [ 38.165751] ieee80211 phy0: Selected rate control algorithm 'iwl-mvm-rs'
                [ 38.165970] thermal thermal_zone3: failed to read out thermal zone (-61)
                [ 38.167056] iwlwifi 0000:00:14.3 wlo1: renamed from wlan0
                [ 38.330188] [drm] failed to retrieve link info, disabling eDP
                [ 38.514118] [drm] reserve 0x400000 from 0x83fe800000 for PSP TMR
                [ 38.554564] RAPL PMU: API unit is 2^-32 Joules, 3 fixed counters, 655360 ms ovfl timer
                [ 38.554565] RAPL PMU: hw unit of domain pp0-core 2^-14 Joules
                [ 38.554565] RAPL PMU: hw unit of domain package 2^-14 Joules
                [ 38.554565] RAPL PMU: hw unit of domain pp1-gpu 2^-14 Joules
                [ 38.557387] cryptd: max_cpu_qlen set to 1000
                [ 38.561196] AVX2 version of gcm_enc/dec engaged.
                [ 38.561197] AES CTR mode by8 optimization enabled
                [ 38.589962] Adding 8388604k swap on /dev/mapper/pve-swap. Priority:-2 extents:1 across:8388604k SSFS
                [ 38.699337] intel_rapl_common: Found RAPL domain package
                [ 38.699338] intel_rapl_common: Found RAPL domain core
                [ 38.699339] intel_rapl_common: Found RAPL domain uncore
                [ 38.737563] [drm] psp command failed and response status is (0x100)
                [ 38.849601] [drm] Display Core initialized with v3.2.48!
                [ 38.849675] snd_hda_intel 0000:03:00.1: bound 0000:03:00.0 (ops amdgpu_dm_audio_component_bind_ops [amdgpu])
                [ 39.020273] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
                [ 39.020273] [drm] Driver supports precise vblank timestamp query.
                [ 39.062156] [drm] UVD and UVD ENC initialized successfully.
                [ 39.261137] [drm] VCE initialized successfully.
                [ 39.262472] kfd kfd: Allocated 3969056 bytes on gart
                [ 39.263142] Virtual CRAT table created for GPU
                [ 39.263142] Parsing CRAT table with 1 nodes
                [ 39.263149] Creating topology SYSFS entries
                [ 39.263222] Topology: Add dGPU node [0x66af:0x1002]
                [ 39.263223] kfd kfd: added device 1002:66af
                [ 39.266215] [drm] fb mappable at 0x440193B000
                [ 39.266215] [drm] vram apper at 0x4400000000
                [ 39.266216] [drm] size 19906560
                [ 39.266216] [drm] fb depth is 24
                [ 39.266216] [drm] pitch is 13824
                [ 39.266341] fbcon: amdgpudrmfb (fb0) is primary device
                [ 39.392016] Console: switching to colour frame buffer device 240x67
                [ 39.410359] amdgpu 0000:03:00.0: fb0: amdgpudrmfb frame buffer device
                [ 39.721624] amdgpu 0000:03:00.0: ring gfx uses VM inv eng 0 on hub 0
                [ 39.721625] amdgpu 0000:03:00.0: ring comp_1.0.0 uses VM inv eng 1 on hub 0
                [ 39.721625] amdgpu 0000:03:00.0: ring comp_1.1.0 uses VM inv eng 4 on hub 0
                [ 39.721626] amdgpu 0000:03:00.0: ring comp_1.2.0 uses VM inv eng 5 on hub 0
                [ 39.721626] amdgpu 0000:03:00.0: ring comp_1.3.0 uses VM inv eng 6 on hub 0
                [ 39.721627] amdgpu 0000:03:00.0: ring comp_1.0.1 uses VM inv eng 7 on hub 0
                [ 39.721627] amdgpu 0000:03:00.0: ring comp_1.1.1 uses VM inv eng 8 on hub 0
                [ 39.721627] amdgpu 0000:03:00.0: ring comp_1.2.1 uses VM inv eng 9 on hub 0
                [ 39.721628] amdgpu 0000:03:00.0: ring comp_1.3.1 uses VM inv eng 10 on hub 0
                [ 39.721628] amdgpu 0000:03:00.0: ring kiq_2.1.0 uses VM inv eng 11 on hub 0
                [ 39.721629] amdgpu 0000:03:00.0: ring sdma0 uses VM inv eng 0 on hub 1
                [ 39.721629] amdgpu 0000:03:00.0: ring page0 uses VM inv eng 1 on hub 1
                [ 39.721630] amdgpu 0000:03:00.0: ring sdma1 uses VM inv eng 4 on hub 1
                [ 39.721630] amdgpu 0000:03:00.0: ring page1 uses VM inv eng 5 on hub 1
                [ 39.721631] amdgpu 0000:03:00.0: ring uvd_0 uses VM inv eng 6 on hub 1
                [ 39.721631] amdgpu 0000:03:00.0: ring uvd_enc_0.0 uses VM inv eng 7 on hub 1
                [ 39.721632] amdgpu 0000:03:00.0: ring uvd_enc_0.1 uses VM inv eng 8 on hub 1
                [ 39.721632] amdgpu 0000:03:00.0: ring uvd_1 uses VM inv eng 9 on hub 1
                [ 39.721633] amdgpu 0000:03:00.0: ring uvd_enc_1.0 uses VM inv eng 10 on hub 1
                [ 39.721633] amdgpu 0000:03:00.0: ring uvd_enc_1.1 uses VM inv eng 11 on hub 1
                [ 39.721634] amdgpu 0000:03:00.0: ring vce0 uses VM inv eng 12 on hub 1
                [ 39.721634] amdgpu 0000:03:00.0: ring vce1 uses VM inv eng 13 on hub 1
                [ 39.721634] amdgpu 0000:03:00.0: ring vce2 uses VM inv eng 14 on hub 1
                [ 39.721635] [drm] ECC is not present.
                [ 39.721635] [drm] SRAM ECC is not present.
                [ 39.722023] Detected AMDGPU DF Counters. # of Counters = 4.
                [ 39.722036] [drm] Initialized amdgpu 3.35.0 20150101 for 0000:03:00.0 on minor 1
                [ 39.722566] [drm] Initialized i915 1.6.0 20190822 for 0000:00:02.0 on minor 0
                [ 39.722996] snd_hda_intel 0000:00:1f.3: bound 0000:00:02.0 (ops i915_audio_component_bind_ops [i915])
                [ 39.758385] [drm] Cannot find any crtc or sizes
                [ 39.790084] [drm] Cannot find any crtc or sizes
                [ 39.820200] [drm] Cannot find any crtc or sizes
                [ 2216.103474] [drm:amdgpu_pci_remove [amdgpu]] *ERROR* Device removal is currently not supported outside of fbcon
                [ 2216.103980] [drm] amdgpu: finishing device.
                [ 2216.223161] Console: switching to colour dummy device 80x25
                [ 2216.383460] [drm] amdgpu: ttm finalized
                [ 2216.402754] vfio-pci 0000:03:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none
                [ 2218.399051] vfio-pci 0000:03:00.0: vfio_ecap_init: hiding ecap 0x19@0x270
                [ 2218.399059] vfio-pci 0000:03:00.0: vfio_ecap_init: hiding ecap 0x1b@0x2d0
                [ 2237.830880] usb 1-7.1.1: reset full-speed USB device number 9 using xhci_hcd
                [ 2238.218853] usb 1-7.1.3.2: reset low-speed USB device number 13 using xhci_hcd
                [ 2594.687073] vfio-pci 0000:03:00.1: Refused to change power state, currently in D0
                [ 2599.275450] vfio-pci 0000:03:00.1: vfio_bar_restore: reset recovery - restoring BARs
                [ 2599.291422] vfio-pci 0000:03:00.0: vfio_bar_restore: reset recovery - restoring BARs
                [ 2600.544777] vfio-pci 0000:03:00.0: vfio_bar_restore: reset recovery - restoring BARs
                [ 2600.544925] vfio-pci 0000:03:00.1: vfio_bar_restore: reset recovery - restoring BARs
                [ 2600.560860] vfio-pci 0000:03:00.0: vfio_bar_restore: reset recovery - restoring BARs
                [ 2600.561006] vfio-pci 0000:03:00.1: vfio_bar_restore: reset recovery - restoring BARs
                [ 2600.573161] vfio-pci 0000:03:00.0: vfio_bar_restore: reset recovery - restoring BARs
                [ 2600.573309] vfio-pci 0000:03:00.1: vfio_bar_restore: reset recovery - restoring BARs
                [ 2600.587327] vfio-pci 0000:03:00.0: vfio_bar_restore: reset recovery - restoring BARs
                [ 2601.136257] vfio-pci 0000:03:00.0: vfio_bar_restore: reset recovery - restoring BARs
                [ 2601.231596] vfio-pci 0000:03:00.0: vfio_bar_restore: reset recovery - restoring BARs
                [ 2601.231718] vfio-pci 0000:03:00.0: vfio_bar_restore: reset recovery - restoring BARs
                [ 2601.231729] vfio-pci 0000:03:00.0: vfio_bar_restore: reset recovery - restoring BARs
                [ 2601.231745] vfio-pci 0000:03:00.1: vfio_bar_restore: reset recovery - restoring BARs
                [ 2601.231825] vfio-pci 0000:03:00.1: vfio_bar_restore: reset recovery - restoring BARs
                [ 2601.231836] vfio-pci 0000:03:00.1: vfio_bar_restore: reset recovery - restoring BARs
                [ 2601.231906] vfio-pci 0000:03:00.0: vfio_bar_restore: reset recovery - restoring BARs
                [ 2601.231917] vfio-pci 0000:03:00.1: vfio_bar_restore: reset recovery - restoring BARs
                [ 2601.503789] vfio-pci 0000:03:00.0: vfio_bar_restore: reset recovery - restoring BARs
                [ 2601.503888] vfio-pci 0000:03:00.1: vfio_bar_restore: reset recovery - restoring BARs
                [ 2601.510840] vfio-pci 0000:03:00.0: vfio_bar_restore: reset recovery - restoring BARs
                [ 2601.510946] vfio-pci 0000:03:00.0: vfio_bar_restore: reset recovery - restoring BARs
                [ 2601.510958] vfio-pci 0000:03:00.1: vfio_bar_restore: reset recovery - restoring BARs
                [ 2601.511062] vfio-pci 0000:03:00.1: vfio_bar_restore: reset recovery - restoring BARs
                [ 2601.522300] vfio-pci 0000:03:00.0: vfio_bar_restore: reset recovery - restoring BARs
                [ 2601.522429] vfio-pci 0000:03:00.1: vfio_bar_restore: reset recovery - restoring BARs
                [ 2601.612251] vfio-pci 0000:03:00.0: vfio_bar_restore: reset recovery - restoring BARs
                root@pve:~#

  8. You’ve got several problems there, vendor-reset isn’t being loaded at all, which is weird, but you also haven’t blacklisted amdgpu so that’s loading during host startup, and you don’t want that to happen either.

    Try “update-initramfs -u -k all” just in case it didn’t update the kernel you’re booting from last time.

    1. I blacklisted “amdgpu” and ran this cmd: “update-initramfs -u -k all”.

      After restarting Proxmox Windows seems to start but won’t give me a picture. Even with Anydesk where I can see that the system started I see a black screen without anything else.

      dmesg shows me this:
      [ 303.784162] vfio-pci 0000:03:00.0: BAR 0: can’t reserve [mem 0x6030000000-0x603fffffff 64bit pref]
      [ 303.784177] vfio-pci 0000:03:00.0: BAR 0: can’t reserve [mem 0x6030000000-0x603fffffff 64bit pref]
      [ 303.784186] vfio-pci 0000:03:00.0: BAR 0: can’t reserve [mem 0x6030000000-0x603fffffff 64bit pref]

      1. I think that’s because the GPU is being used for your host console. If you have another GPU, set that as the primary instead in your host UEFI settings. If you only have this one GPU, add this to your kernel arguments to disable host video “video=vesafb:off,efifb:off”. Note that you’ll want to ensure you have a way of connecting to Proxmox that doesn’t require the screen, since it’ll stop outputting video right at the start of boot.

          1. Yes, if you have an iGPU, set it as primary and you don’t need to add the video=vesafb:off,efifb:off argument.

            1. Sadly Proxmox won’t even let me boot into the bootloader of the Mac VM.

              I only get this massage:
              [ 1038.500624] vfio-pci 0000:03:00.0: timed put waiting for pending transaction; performing function level reset anyway

              1. vendor-reset still isn’t being loaded. Try “modprobe vendor-reset” (reboot host first to reset the already-broken GPU)

        1. So now with changed primary output I can boot into Windows and see whats on the screen.

          But after another try to install the AMD driver I got the same error massage in Windows as before:
          No AMD graphics driver is installer, or the AMD driver is not functioning properly. Please install the AMD driver appropriate for your AMD hardware.

          I loaded the same (updated) GPU driver as on my natively running Windows 10 machine. So that is for sure the right one.

          1. BTW. this error massage comes right after the installation was completed successfully. I can even see the AMD icon in the system tray.

  9. After using this cmd: “modprobe vendor-reset”
    I get this: FATAL: Module vendor-reset not found in directory /lib/modules/5.4.73-1-pve

          1. I installed everything in the right order like you listed up without any complaints on the Proxmox side. I get the massages that I can’t load another module because I already have them installed.

            The only problem I get (at least in the Proxmox install process) is the very last part:

            # Enable vendor-reset to be loaded automatically on startup:
            echo “vendor-reset” >> /etc/modules
            update-initramfs -u

            That results into this:
            Running hook script ‘zz-pve-efiboot’..
            Re-executing ‘/etc/kernel/postinst.d/zz-pve-efiboot’ in new private mount namespace..
            No /etc/kernel/pve-efiboot-uuids found, skipping ESP sync.

              1. I did this and completely re-installed Catalina.
                Now it seems to work fine!

                But in Windows I always get the weird error massage:
                No AMD graphics driver is installer, or the AMD driver is not functioning properly. Please install the AMD driver appropriate for your AMD hardware.

                I think a Windows re-install would be the next step, maybe it also fixes this.

                All in all thank you very much for your patience, you helped me a lot!

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.