aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation/virt
Commit message (Collapse)AuthorAgeFilesLines
...
| | * | Merge branch kvm-arm64/psci-suspend into kvmarm-master/nextMarc Zyngier2022-05-161-2/+74
| | |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * kvm-arm64/psci-suspend: : . : Add support for PSCI SYSTEM_SUSPEND and allow userspace to : filter the wake-up events. : : Patches courtesy of Oliver. : . Documentation: KVM: Fix title level for PSCI_SUSPEND selftests: KVM: Test SYSTEM_SUSPEND PSCI call selftests: KVM: Refactor psci_test to make it amenable to new tests selftests: KVM: Use KVM_SET_MP_STATE to power off vCPU in psci_test selftests: KVM: Create helper for making SMCCC calls selftests: KVM: Rename psci_cpu_on_test to psci_test KVM: arm64: Implement PSCI SYSTEM_SUSPEND KVM: arm64: Add support for userspace to suspend a vCPU KVM: arm64: Return a value from check_vcpu_requests() KVM: arm64: Rename the KVM_REQ_SLEEP handler KVM: arm64: Track vCPU power state using MP state values KVM: arm64: Dedupe vCPU power off helpers KVM: arm64: Don't depend on fallthrough to hide SYSTEM_RESET2 Signed-off-by: Marc Zyngier <maz@kernel.org>
| | | * | Documentation: KVM: Fix title level for PSCI_SUSPENDStephen Rothwell2022-05-051-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The htmldoc build breaks in a funny way with: <quote> Sphinx parallel build error: docutils.utils.SystemMessage: /home/sfr/next/next/Documentation/virt/kvm/api.rst:6175: (SEVERE/4) Title level inconsistent: For arm/arm64: ^^^^^^^^^^^^^^ </quote> Swap the ^^s for a bunch of --s... Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> [maz: commit message] Signed-off-by: Marc Zyngier <maz@kernel.org>
| | | * | KVM: arm64: Implement PSCI SYSTEM_SUSPENDOliver Upton2022-05-041-0/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ARM DEN0022D.b 5.19 "SYSTEM_SUSPEND" describes a PSCI call that allows software to request that a system be placed in the deepest possible low-power state. Effectively, software can use this to suspend itself to RAM. Unfortunately, there really is no good way to implement a system-wide PSCI call in KVM. Any precondition checks done in the kernel will need to be repeated by userspace since there is no good way to protect a critical section that spans an exit to userspace. SYSTEM_RESET and SYSTEM_OFF are equally plagued by this issue, although no users have seemingly cared for the relatively long time these calls have been supported. The solution is to just make the whole implementation userspace's problem. Introduce a new system event, KVM_SYSTEM_EVENT_SUSPEND, that indicates to userspace a calling vCPU has invoked PSCI SYSTEM_SUSPEND. Additionally, add a CAP to get buy-in from userspace for this new exit type. Only advertise the SYSTEM_SUSPEND PSCI call if userspace has opted in. If a vCPU calls SYSTEM_SUSPEND, punt straight to userspace. Provide explicit documentation of userspace's responsibilites for the exit and point to the PSCI specification to describe the actual PSCI call. Reviewed-by: Reiji Watanabe <reijiw@google.com> Signed-off-by: Oliver Upton <oupton@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220504032446.4133305-8-oupton@google.com
| | | * | KVM: arm64: Add support for userspace to suspend a vCPUOliver Upton2022-05-041-2/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce a new MP state, KVM_MP_STATE_SUSPENDED, which indicates a vCPU is in a suspended state. In the suspended state the vCPU will block until a wakeup event (pending interrupt) is recognized. Add a new system event type, KVM_SYSTEM_EVENT_WAKEUP, to indicate to userspace that KVM has recognized one such wakeup event. It is the responsibility of userspace to then make the vCPU runnable, or leave it suspended until the next wakeup event. Signed-off-by: Oliver Upton <oupton@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220504032446.4133305-7-oupton@google.com
| | * | | Merge branch kvm-arm64/hcall-selection into kvmarm-master/nextMarc Zyngier2022-05-164-78/+157
| | |\ \ \ | | | |/ / | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * kvm-arm64/hcall-selection: : . : Introduce a new set of virtual sysregs for userspace to : select the hypercalls it wants to see exposed to the guest. : : Patches courtesy of Raghavendra and Oliver. : . KVM: arm64: Fix hypercall bitmap writeback when vcpus have already run KVM: arm64: Hide KVM_REG_ARM_*_BMAP_BIT_COUNT from userspace Documentation: Fix index.rst after psci.rst renaming selftests: KVM: aarch64: Add the bitmap firmware registers to get-reg-list selftests: KVM: aarch64: Introduce hypercall ABI test selftests: KVM: Create helper for making SMCCC calls selftests: KVM: Rename psci_cpu_on_test to psci_test tools: Import ARM SMCCC definitions Docs: KVM: Add doc for the bitmap firmware registers Docs: KVM: Rename psci.rst to hypercalls.rst KVM: arm64: Add vendor hypervisor firmware register KVM: arm64: Add standard hypervisor firmware register KVM: arm64: Setup a framework for hypercall bitmap firmware registers KVM: arm64: Factor out firmware register handling from psci.c Signed-off-by: Marc Zyngier <maz@kernel.org>
| | | * | Documentation: Fix index.rst after psci.rst renamingMarc Zyngier2022-05-041-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix the TOC in index.rst after psci.rst has been renamed to hypercalls.rst. Signed-off-by: Marc Zyngier <maz@kernel.org> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Link: https://lore.kernel.org/r/20220504205627.18f46380@canb.auug.org.au
| | | * | Docs: KVM: Add doc for the bitmap firmware registersRaghavendra Rao Ananta2022-05-032-17/+96
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add the documentation for the bitmap firmware registers in hypercalls.rst and api.rst. This includes the details for KVM_REG_ARM_STD_BMAP, KVM_REG_ARM_STD_HYP_BMAP, and KVM_REG_ARM_VENDOR_HYP_BMAP registers. Since the document is growing to carry other hypercall related information, make necessary adjustments to present the document in a generic sense, rather than being PSCI focused. Signed-off-by: Raghavendra Rao Ananta <rananta@google.com> Reviewed-by: Gavin Shan <gshan@redhat.com> [maz: small scale reformat, move things about, random typo fixes] Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220502233853.1233742-7-rananta@google.com
| | | * | Docs: KVM: Rename psci.rst to hypercalls.rstRaghavendra Rao Ananta2022-05-031-0/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since the doc also covers general hypercalls' details, rather than just PSCI, and the fact that the bitmap firmware registers' details will be added to this doc, rename the file to a more appropriate name- hypercalls.rst. Signed-off-by: Raghavendra Rao Ananta <rananta@google.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Oliver Upton <oupton@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220502233853.1233742-6-rananta@google.com
| * | | | KVM: X86/MMU: Fix shadowing 5-level NPT for 4-level NPT L1 guestLai Jiangshan2022-04-291-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When shadowing 5-level NPT for 4-level NPT L1 guest, the root_sp is allocated with role.level = 5 and the guest pagetable's root gfn. And root_sp->spt[0] is also allocated with the same gfn and the same role except role.level = 4. Luckily that they are different shadow pages, but only root_sp->spt[0] is the real translation of the guest pagetable. Here comes a problem: If the guest switches from gCR4_LA57=0 to gCR4_LA57=1 (or vice verse) and uses the same gfn as the root page for nested NPT before and after switching gCR4_LA57. The host (hCR4_LA57=1) might use the same root_sp for the guest even the guest switches gCR4_LA57. The guest will see unexpected page mapped and L2 may exploit the bug and hurt L1. It is lucky that the problem can't hurt L0. And three special cases need to be handled: The root_sp should be like role.direct=1 sometimes: its contents are not backed by gptes, root_sp->gfns is meaningless. (For a normal high level sp in shadow paging, sp->gfns is often unused and kept zero, but it could be relevant and meaningful if sp->gfns is used because they are backed by concrete gptes.) For such root_sp in the case, root_sp is just a portal to contribute root_sp->spt[0], and root_sp->gfns should not be used and root_sp->spt[0] should not be dropped if gpte[0] of the guest root pagetable is changed. Such root_sp should not be accounted too. So add role.passthrough to distinguish the shadow pages in the hash when gCR4_LA57 is toggled and fix above special cases by using it in kvm_mmu_page_{get|set}_gfn() and sp_has_gptes(). Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com> Message-Id: <20220420131204.2850-3-jiangshanlai@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | Merge branch 'kvm-fixes-for-5.18-rc5' into HEADPaolo Bonzini2022-04-291-12/+15
| |\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fixes for (relatively) old bugs, to be merged in both the -rc and next development trees. The merge reconciles the ABI fixes for KVM_EXIT_SYSTEM_EVENT between 5.18 and commit c24a950ec7d6 ("KVM, SEV: Add KVM_EXIT_SHUTDOWN metadata for SEV-ES", 2022-04-13).
| * | | | KVM, SEV: Add KVM_EXIT_SHUTDOWN metadata for SEV-ESPeter Gonda2022-04-131-1/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If an SEV-ES guest requests termination, exit to userspace with KVM_EXIT_SYSTEM_EVENT and a dedicated SEV_TERM type instead of -EINVAL so that userspace can take appropriate action. See AMD's GHCB spec section '4.1.13 Termination Request' for more details. Suggested-by: Sean Christopherson <seanjc@google.com> Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Peter Gonda <pgonda@google.com> Reported-by: kernel test robot <lkp@intel.com> Message-Id: <20220407210233.782250-1-pgonda@google.com> [Add documentatino. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | Merge branch 'kvm-older-features' into HEADPaolo Bonzini2022-04-131-14/+135
| |\ \ \ \ | | |_|/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge branch for features that did not make it into 5.18: * New ioctls to get/set TSC frequency for a whole VM * Allow userspace to opt out of hypercall patching Nested virtualization improvements for AMD: * Support for "nested nested" optimizations (nested vVMLOAD/VMSAVE, nested vGIF) * Allow AVIC to co-exist with a nested guest running * Fixes for LBR virtualizations when a nested guest is running, and nested LBR virtualization support * PAUSE filtering for nested hypervisors Guest support: * Decoupling of vcpu_is_preempted from PV spinlocks Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| | * | | KVM: x86: Accept KVM_[GS]ET_TSC_KHZ as a VM ioctl.David Woodhouse2022-04-021-4/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This sets the default TSC frequency for subsequently created vCPUs. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Message-Id: <20220225145304.36166-2-dwmw2@infradead.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| | * | | KVM: x86/xen: Advertise and document KVM_XEN_HVM_CONFIG_EVTCHN_SENDDavid Woodhouse2022-04-021-10/+119
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | At the end of the patch series adding this batch of event channel acceleration features, finally add the feature bit which advertises them and document it all. For SCHEDOP_poll we need to wake a polling vCPU when a given port is triggered, even when it's masked — and we want to implement that in the kernel, for efficiency. So we want the kernel to know that it has sole ownership of event channel delivery. Thus, we allow userspace to make the 'promise' by setting the corresponding feature bit in its KVM_XEN_HVM_CONFIG call. As we implement SCHEDOP_poll bypass later, we will do so only if that promise has been made by userspace. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220303154127.202856-16-dwmw2@infradead.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| | * | | KVM: x86: Allow userspace to opt out of hypercall patchingOliver Upton2022-04-021-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | KVM handles the VMCALL/VMMCALL instructions very strangely. Even though both of these instructions really should #UD when executed on the wrong vendor's hardware (i.e. VMCALL on SVM, VMMCALL on VMX), KVM replaces the guest's instruction with the appropriate instruction for the vendor. Nonetheless, older guest kernels without commit c1118b3602c2 ("x86: kvm: use alternatives for VMCALL vs. VMMCALL if kernel text is read-only") do not patch in the appropriate instruction using alternatives, likely motivating KVM's intervention. Add a quirk allowing userspace to opt out of hypercall patching. If the quirk is disabled, KVM synthesizes a #UD in the guest. Signed-off-by: Oliver Upton <oupton@google.com> Message-Id: <20220316005538.2282772-2-oupton@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | | | | Merge tag 'arm64-upstream' of ↵Linus Torvalds2022-05-231-0/+2
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Catalin Marinas: - Initial support for the ARMv9 Scalable Matrix Extension (SME). SME takes the approach used for vectors in SVE and extends this to provide architectural support for matrix operations. No KVM support yet, SME is disabled in guests. - Support for crashkernel reservations above ZONE_DMA via the 'crashkernel=X,high' command line option. - btrfs search_ioctl() fix for live-lock with sub-page faults. - arm64 perf updates: support for the Hisilicon "CPA" PMU for monitoring coherent I/O traffic, support for Arm's CMN-650 and CMN-700 interconnect PMUs, minor driver fixes, kerneldoc cleanup. - Kselftest updates for SME, BTI, MTE. - Automatic generation of the system register macros from a 'sysreg' file describing the register bitfields. - Update the type of the function argument holding the ESR_ELx register value to unsigned long to match the architecture register size (originally 32-bit but extended since ARMv8.0). - stacktrace cleanups. - ftrace cleanups. - Miscellaneous updates, most notably: arm64-specific huge_ptep_get(), avoid executable mappings in kexec/hibernate code, drop TLB flushing from get_clear_flush() (and rename it to get_clear_contig()), ARCH_NR_GPIO bumped to 2048 for ARCH_APPLE. * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (145 commits) arm64/sysreg: Generate definitions for FAR_ELx arm64/sysreg: Generate definitions for DACR32_EL2 arm64/sysreg: Generate definitions for CSSELR_EL1 arm64/sysreg: Generate definitions for CPACR_ELx arm64/sysreg: Generate definitions for CONTEXTIDR_ELx arm64/sysreg: Generate definitions for CLIDR_EL1 arm64/sve: Move sve_free() into SVE code section arm64: Kconfig.platforms: Add comments arm64: Kconfig: Fix indentation and add comments arm64: mm: avoid writable executable mappings in kexec/hibernate code arm64: lds: move special code sections out of kernel exec segment arm64/hugetlb: Implement arm64 specific huge_ptep_get() arm64/hugetlb: Use ptep_get() to get the pte value of a huge page arm64: kdump: Do not allocate crash low memory if not needed arm64/sve: Generate ZCR definitions arm64/sme: Generate defintions for SVCR arm64/sme: Generate SMPRI_EL1 definitions arm64/sme: Automatically generate SMPRIMAP_EL2 definitions arm64/sme: Automatically generate SMIDR_EL1 defines arm64/sme: Automatically generate defines for SMCR ...
| * | | | | KVM: arm64: uapi: Add kvm_debug_exit_arch.hsr_highAlexandru Elisei2022-04-291-0/+2
| |/ / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When userspace is debugging a VM, the kvm_debug_exit_arch part of the kvm_run struct contains arm64 specific debug information: the ESR_EL2 value, encoded in the field "hsr", and the address of the instruction that caused the exception, encoded in the field "far". Linux has moved to treating ESR_EL2 as a 64-bit register, but unfortunately kvm_debug_exit_arch.hsr cannot be changed because that would change the memory layout of the struct on big endian machines: Current layout: | Layout with "hsr" extended to 64 bits: | offset 0: ESR_EL2[31:0] (hsr) | offset 0: ESR_EL2[61:32] (hsr[61:32]) offset 4: padding | offset 4: ESR_EL2[31:0] (hsr[31:0]) offset 8: FAR_EL2[61:0] (far) | offset 8: FAR_EL2[61:0] (far) which breaks existing code. The padding is inserted by the compiler because the "far" field must be aligned to 8 bytes (each field must be naturally aligned - aapcs64 [1], page 18), and the struct itself must be aligned to 8 bytes (the struct must be aligned to the maximum alignment of its fields - aapcs64, page 18), which means that "hsr" must be aligned to 8 bytes as it is the first field in the struct. To avoid changing the struct size and layout for the existing fields, add a new field, "hsr_high", which replaces the existing padding. "hsr_high" will be used to hold the ESR_EL2[61:32] bits of the register. The memory layout, both on big and little endian machine, becomes: offset 0: ESR_EL2[31:0] (hsr) offset 4: ESR_EL2[61:32] (hsr_high) offset 8: FAR_EL2[61:0] (far) The padding that the compiler inserts for the current struct layout is unitialized. To prevent an updated userspace running on an old kernel mistaking the padding for a valid "hsr_high" value, add a new flag, KVM_DEBUG_ARCH_HSR_HIGH_VALID, to kvm_run->flags to let userspace know that "hsr_high" holds a valid ESR_EL2[61:32] value. [1] https://github.com/ARM-software/abi-aa/releases/download/2021Q3/aapcs64.pdf Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220425114444.368693-6-alexandru.elisei@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
* | | | | Merge tag 'x86_sev_for_v5.19_rc1' of ↵Linus Torvalds2022-05-232-0/+156
|\ \ \ \ \ | |_|_|/ / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull AMD SEV-SNP support from Borislav Petkov: "The third AMD confidential computing feature called Secure Nested Paging. Add to confidential guests the necessary memory integrity protection against malicious hypervisor-based attacks like data replay, memory remapping and others, thus achieving a stronger isolation from the hypervisor. At the core of the functionality is a new structure called a reverse map table (RMP) with which the guest has a say in which pages get assigned to it and gets notified when a page which it owns, gets accessed/modified under the covers so that the guest can take an appropriate action. In addition, add support for the whole machinery needed to launch a SNP guest, details of which is properly explained in each patch. And last but not least, the series refactors and improves parts of the previous SEV support so that the new code is accomodated properly and not just bolted on" * tag 'x86_sev_for_v5.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (60 commits) x86/entry: Fixup objtool/ibt validation x86/sev: Mark the code returning to user space as syscall gap x86/sev: Annotate stack change in the #VC handler x86/sev: Remove duplicated assignment to variable info x86/sev: Fix address space sparse warning x86/sev: Get the AP jump table address from secrets page x86/sev: Add missing __init annotations to SEV init routines virt: sevguest: Rename the sevguest dir and files to sev-guest virt: sevguest: Change driver name to reflect generic SEV support x86/boot: Put globals that are accessed early into the .data section x86/boot: Add an efi.h header for the decompressor virt: sevguest: Fix bool function returning negative value virt: sevguest: Fix return value check in alloc_shared_pages() x86/sev-es: Replace open-coded hlt-loop with sev_es_terminate() virt: sevguest: Add documentation for SEV-SNP CPUID Enforcement virt: sevguest: Add support to get extended report virt: sevguest: Add support to derive key virt: Add SEV-SNP guest driver x86/sev: Register SEV-SNP guest request platform device x86/sev: Provide support for SNP guest request NAEs ...
| * | | | virt: sevguest: Rename the sevguest dir and files to sev-guestTom Lendacky2022-04-272-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rename the drivers/virt/coco/sevguest directory and files to sev-guest so as to match the driver name. [ bp: Rename Documentation/virt/coco/sevguest.rst too, as reported by sfr: https://lore.kernel.org/r/20220427101059.3bf55262@canb.auug.org.au ] Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/2f5c9cb16e3a67599c8e3170f6c72c8712c47d53.1650464054.git.thomas.lendacky@amd.com
| * | | | virt: sevguest: Add documentation for SEV-SNP CPUID EnforcementMichael Roth2022-04-071-0/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Update the documentation with information regarding SEV-SNP CPUID Enforcement details and what sort of assurances it provides to guests. Signed-off-by: Michael Roth <michael.roth@amd.com> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220307213356.2797205-47-brijesh.singh@amd.com
| * | | | virt: sevguest: Add support to get extended reportBrijesh Singh2022-04-071-0/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Version 2 of GHCB specification defines Non-Automatic-Exit (NAE) to get extended guest report which is similar to the SNP_GET_REPORT ioctl. The main difference is related to the additional data that will be returned. That additional data returned is a certificate blob that can be used by the SNP guest user. The certificate blob layout is defined in the GHCB specification. The driver simply treats the blob as a opaque data and copies it to userspace. [ bp: Massage commit message, cast 1st arg of access_ok() ] Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220307213356.2797205-46-brijesh.singh@amd.com
| * | | | virt: sevguest: Add support to derive keyBrijesh Singh2022-04-071-0/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The SNP_GET_DERIVED_KEY ioctl interface can be used by the SNP guest to ask the firmware to provide a key derived from a root key. The derived key may be used by the guest for any purposes it chooses, such as a sealing key or communicating with the external entities. See SEV-SNP firmware spec for more information. [ bp: No need to memset "req" - it will get overwritten. ] Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Liam Merwick <liam.merwick@oracle.com> Link: https://lore.kernel.org/r/20220307213356.2797205-45-brijesh.singh@amd.com
| * | | | virt: Add SEV-SNP guest driverBrijesh Singh2022-04-072-0/+87
| | |_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The SEV-SNP specification provides the guest a mechanism to communicate with the PSP without risk from a malicious hypervisor who wishes to read, alter, drop or replay the messages sent. The driver uses snp_issue_guest_request() to issue GHCB SNP_GUEST_REQUEST or SNP_EXT_GUEST_REQUEST NAE events to submit the request to PSP. The PSP requires that all communication should be encrypted using key specified through a struct snp_guest_platform_data descriptor. Userspace can use SNP_GET_REPORT ioctl() to query the guest attestation report. See SEV-SNP spec section Guest Messages for more details. [ bp: Remove the "what" from the commit message, massage. ] Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220307213356.2797205-44-brijesh.singh@amd.com
* | | | KVM: fix bad user ABI for KVM_EXIT_SYSTEM_EVENTPaolo Bonzini2022-04-291-7/+17
| |/ / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When KVM_EXIT_SYSTEM_EVENT was introduced, it included a flags member that at the time was unused. Unfortunately this extensibility mechanism has several issues: - x86 is not writing the member, so it would not be possible to use it on x86 except for new events - the member is not aligned to 64 bits, so the definition of the uAPI struct is incorrect for 32- on 64-bit userspace. This is a problem for RISC-V, which supports CONFIG_KVM_COMPAT, but fortunately usage of flags was only introduced in 5.18. Since padding has to be introduced, place a new field in there that tells if the flags field is valid. To allow further extensibility, in fact, change flags to an array of 16 values, and store how many of the values are valid. The availability of the new ndata field is tied to a system capability; all architectures are changed to fill in the field. To avoid breaking compilation of userspace that was using the flags field, provide a userspace-only union to overlap flags with data[0]. The new field is placed at the same offset for both 32- and 64-bit userspace. Cc: Will Deacon <will@kernel.org> Cc: Marc Zyngier <maz@kernel.org> Cc: Peter Gonda <pgonda@google.com> Cc: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reported-by: kernel test robot <lkp@intel.com> Message-Id: <20220422103013.34832-1-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | | Documentation: KVM: Add SPDX-License-Identifier tagLike Xu2022-04-114-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | +new file mode 100644 +WARNING: Missing or malformed SPDX-License-Identifier tag in line 1 +#27: FILE: Documentation/virt/kvm/x86/errata.rst:1: Opportunistically update all other non-added KVM documents and remove a new extra blank line at EOF for x86/errata.rst. Signed-off-by: Like Xu <likexu@tencent.com> Message-Id: <20220406063715.55625-5-likexu@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | | Documentation: kvm: Add missing line break in api.rstBagas Sanjaya2022-04-051-0/+1
|/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add missing line break separator between literal block and description of KVM_EXIT_RISCV_SBI. This fixes: </path/to/linux>/Documentation/virt/kvm/api.rst:6118: WARNING: Literal block ends without a blank line; unexpected unindent. Fixes: da40d85805937d (RISC-V: KVM: Document RISC-V specific parts of KVM API, 2021-09-27) Cc: Anup Patel <anup.patel@wdc.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-riscv@lists.infradead.org Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com> Message-Id: <20220403065735.23859-1-bagasdotme@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2022-04-0219-34/+176
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull kvm fixes from Paolo Bonzini: - Only do MSR filtering for MSRs accessed by rdmsr/wrmsr - Documentation improvements - Prevent module exit until all VMs are freed - PMU Virtualization fixes - Fix for kvm_irq_delivery_to_apic_fast() NULL-pointer dereferences - Other miscellaneous bugfixes * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (42 commits) KVM: x86: fix sending PV IPI KVM: x86/mmu: do compare-and-exchange of gPTE via the user address KVM: x86: Remove redundant vm_entry_controls_clearbit() call KVM: x86: cleanup enter_rmode() KVM: x86: SVM: fix tsc scaling when the host doesn't support it kvm: x86: SVM: remove unused defines KVM: x86: SVM: move tsc ratio definitions to svm.h KVM: x86: SVM: fix avic spec based definitions again KVM: MIPS: remove reference to trap&emulate virtualization KVM: x86: document limitations of MSR filtering KVM: x86: Only do MSR filtering when access MSR by rdmsr/wrmsr KVM: x86/emulator: Emulate RDPID only if it is enabled in guest KVM: x86/pmu: Fix and isolate TSX-specific performance event logic KVM: x86: mmu: trace kvm_mmu_set_spte after the new SPTE was set KVM: x86/svm: Clear reserved bits written to PerfEvtSeln MSRs KVM: x86: Trace all APICv inhibit changes and capture overall status KVM: x86: Add wrappers for setting/clearing APICv inhibits KVM: x86: Make APICv inhibit reasons an enum and cleanup naming KVM: X86: Handle implicit supervisor access with SMAP KVM: X86: Rename variable smap to not_smap in permission_fault() ...
| * KVM: MIPS: remove reference to trap&emulate virtualizationPaolo Bonzini2022-04-021-6/+0
| | | | | | | | | | | | Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220313140522.1307751-1-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * KVM: x86: document limitations of MSR filteringPaolo Bonzini2022-04-021-0/+5
| | | | | | | | | | | | | | | | MSR filtering requires an exit to userspace that is hard to implement and would be very slow in the case of nested VMX vmexit and vmentry MSR accesses. Document the limitation. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * KVM: Remove dirty handling from gfn_to_pfn_cache completelyDavid Woodhouse2022-04-021-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It isn't OK to cache the dirty status of a page in internal structures for an indefinite period of time. Any time a vCPU exits the run loop to userspace might be its last; the VMM might do its final check of the dirty log, flush the last remaining dirty pages to the destination and complete a live migration. If we have internal 'dirty' state which doesn't get flushed until the vCPU is finally destroyed on the source after migration is complete, then we have lost data because that will escape the final copy. This problem already exists with the use of kvm_vcpu_unmap() to mark pages dirty in e.g. VMX nesting. Note that the actual Linux MM already considers the page to be dirty since we have a writeable mapping of it. This is just about the KVM dirty logging. For the nesting-style use cases (KVM_GUEST_USES_PFN) we will need to track which gfn_to_pfn_caches have been used and explicitly mark the corresponding pages dirty before returning to userspace. But we would have needed external tracking of that anyway, rather than walking the full list of GPCs to find those belonging to this vCPU which are dirty. So let's rely *solely* on that external tracking, and keep it simple rather than laying a tempting trap for callers to fall into. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220303154127.202856-3-dwmw2@infradead.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * KVM: Don't actually set a request when evicting vCPUs for GFN cache invdSean Christopherson2022-04-021-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Don't actually set a request bit in vcpu->requests when making a request purely to force a vCPU to exit the guest. Logging a request but not actually consuming it would cause the vCPU to get stuck in an infinite loop during KVM_RUN because KVM would see the pending request and bail from VM-Enter to service the request. Note, it's currently impossible for KVM to set KVM_REQ_GPC_INVALIDATE as nothing in KVM is wired up to set guest_uses_pa=true. But, it'd be all too easy for arch code to introduce use of kvm_gfn_to_pfn_cache_init() without implementing handling of the request, especially since getting test coverage of MMU notifier interaction with specific KVM features usually requires a directed test. Opportunistically rename gfn_to_pfn_cache_invalidate_start()'s wake_vcpus to evict_vcpus. The purpose of the request is to get vCPUs out of guest mode, it's supposed to _avoid_ waking vCPUs that are blocking. Opportunistically rename KVM_REQ_GPC_INVALIDATE to be more specific as to what it wants to accomplish, and to genericize the name so that it can used for similar but unrelated scenarios, should they arise in the future. Add a comment and documentation to explain why the "no action" request exists. Add compile-time assertions to help detect improper usage. Use the inner assertless helper in the one s390 path that makes requests without a hardcoded request. Cc: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220223165302.3205276-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * Documentation: KVM: add API issues sectionPaolo Bonzini2022-03-291-0/+46
| | | | | | | | | | | | | | | | | | | | | | Add a section to document all the different ways in which the KVM API sucks. I am sure there are way more, give people a place to vent so that userspace authors are aware. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220322110712.222449-4-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * Documentation: KVM: add virtual CPU errata documentationPaolo Bonzini2022-03-292-0/+40
| | | | | | | | | | | | | | | | | | | | | | | | Add a file to document all the different ways in which the virtual CPU emulation is imperfect. Include an example to show how to document such errata. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Jim Mattson <jmattson@google.com> Reviewed-by: Oliver Upton <oupton@google.com> Message-Id: <20220322110712.222449-3-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * Documentation: KVM: add separate directories for architecture-specific ↵Paolo Bonzini2022-03-2915-19/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | documentation ARM already has an arm/ subdirectory, but s390 and x86 do not even though they have a relatively large number of files specific to them. Create new directories in Documentation/virt/kvm for these two architectures as well. While at it, group the API documentation and the developer documentation in the table of contents. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220322110712.222449-2-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * Documentation: kvm: include new locksPaolo Bonzini2022-03-291-0/+15
| | | | | | | | | | | | | | | | | | kvm->mn_invalidate_lock and kvm->slots_arch_lock were not included in the documentation, add them. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220322110720.222499-3-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * Documentation: kvm: fixes for locking.rstPaolo Bonzini2022-03-291-9/+19
| | | | | | | | | | | | | | | | | | Separate the various locks clearly, and include the new names of blocked_vcpu_on_cpu_lock and blocked_vcpu_on_cpu. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220322110720.222499-2-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | Merge tag 'for-linus-5.18-rc1' of ↵Linus Torvalds2022-03-311-0/+20
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml Pull UML updates from Richard Weinberger: - Devicetree support (for testing) - Various cleanups and fixes: UBD, port_user, uml_mconsole - Maintainer update * tag 'for-linus-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml: um: run_helper: Write error message to kernel log on exec failure on host um: port_user: Improve error handling when port-helper is not found um: port_user: Allow setting path to port-helper using UML_PORT_HELPER envvar um: port_user: Search for in.telnetd in PATH um: clang: Strip out -mno-global-merge from USER_CFLAGS docs: UML: Mention telnetd for port channel um: Remove unused timeval_to_ns() function um: Fix uml_mconsole stop/go um: Cleanup syscall_handler_t definition/cast, fix warning uml: net: vector: fix const issue um: Fix WRITE_ZEROES in the UBD Driver um: Migrate vector drivers to NAPI um: Fix order of dtb unflatten/early init um: fix and optimize xor select template for CONFIG64 and timetravel mode um: Document dtb command line option lib/logic_iomem: correct fallback config references um: Remove duplicated include in syscalls_64.c MAINTAINERS: Update UserModeLinux entry
| * | docs: UML: Mention telnetd for port channelVincent Whitchurch2022-03-111-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It is not obvious from the documentation that using the "port" channel for the console requires telnetd to be installed (see port_connection() in arch/um/drivers/port_user.c). Mention this, and the fact that UML will not boot until a client connects. Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com> Acked-by: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Richard Weinberger <richard@nod.at>
| * | um: Document dtb command line optionAnton Ivanov2022-03-111-0/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | Add documentation for the dtb command line option and the ability to load/parse device trees. Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com> Reviewed-by: Vincent Whitchurch <vincent.whitchurch@axis.com> Signed-off-by: Richard Weinberger <richard@nod.at>
* | | Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2022-03-243-72/+247
|\ \ \ | | |/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull kvm updates from Paolo Bonzini: "ARM: - Proper emulation of the OSLock feature of the debug architecture - Scalibility improvements for the MMU lock when dirty logging is on - New VMID allocator, which will eventually help with SVA in VMs - Better support for PMUs in heterogenous systems - PSCI 1.1 support, enabling support for SYSTEM_RESET2 - Implement CONFIG_DEBUG_LIST at EL2 - Make CONFIG_ARM64_ERRATUM_2077057 default y - Reduce the overhead of VM exit when no interrupt is pending - Remove traces of 32bit ARM host support from the documentation - Updated vgic selftests - Various cleanups, doc updates and spelling fixes RISC-V: - Prevent KVM_COMPAT from being selected - Optimize __kvm_riscv_switch_to() implementation - RISC-V SBI v0.3 support s390: - memop selftest - fix SCK locking - adapter interruptions virtualization for secure guests - add Claudio Imbrenda as maintainer - first step to do proper storage key checking x86: - Continue switching kvm_x86_ops to static_call(); introduce static_call_cond() and __static_call_ret0 when applicable. - Cleanup unused arguments in several functions - Synthesize AMD 0x80000021 leaf - Fixes and optimization for Hyper-V sparse-bank hypercalls - Implement Hyper-V's enlightened MSR bitmap for nested SVM - Remove MMU auditing - Eager splitting of page tables (new aka "TDP" MMU only) when dirty page tracking is enabled - Cleanup the implementation of the guest PGD cache - Preparation for the implementation of Intel IPI virtualization - Fix some segment descriptor checks in the emulator - Allow AMD AVIC support on systems with physical APIC ID above 255 - Better API to disable virtualization quirks - Fixes and optimizations for the zapping of page tables: - Zap roots in two passes, avoiding RCU read-side critical sections that last too long for very large guests backed by 4 KiB SPTEs. - Zap invalid and defunct roots asynchronously via concurrency-managed work queue. - Allowing yielding when zapping TDP MMU roots in response to the root's last reference being put. - Batch more TLB flushes with an RCU trick. Whoever frees the paging structure now holds RCU as a proxy for all vCPUs running in the guest, i.e. to prolongs the grace period on their behalf. It then kicks the the vCPUs out of guest mode before doing rcu_read_unlock(). Generic: - Introduce __vcalloc and use it for very large allocations that need memcg accounting" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (246 commits) KVM: use kvcalloc for array allocations KVM: x86: Introduce KVM_CAP_DISABLE_QUIRKS2 kvm: x86: Require const tsc for RT KVM: x86: synthesize CPUID leaf 0x80000021h if useful KVM: x86: add support for CPUID leaf 0x80000021 KVM: x86: do not use KVM_X86_OP_OPTIONAL_RET0 for get_mt_mask Revert "KVM: x86/mmu: Zap only TDP MMU leafs in kvm_zap_gfn_range()" kvm: x86/mmu: Flush TLB before zap_gfn_range releases RCU KVM: arm64: fix typos in comments KVM: arm64: Generalise VM features into a set of flags KVM: s390: selftests: Add error memop tests KVM: s390: selftests: Add more copy memop tests KVM: s390: selftests: Add named stages for memop test KVM: s390: selftests: Add macro as abstraction for MEM_OP KVM: s390: selftests: Split memop tests KVM: s390x: fix SCK locking RISC-V: KVM: Implement SBI HSM suspend call RISC-V: KVM: Add common kvm_riscv_vcpu_wfi() function RISC-V: Add SBI HSM suspend related defines RISC-V: KVM: Implement SBI v0.3 SRST extension ...
| * | KVM: x86: Introduce KVM_CAP_DISABLE_QUIRKS2Oliver Upton2022-03-211-0/+50
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | KVM_CAP_DISABLE_QUIRKS is irrevocably broken. The capability does not advertise the set of quirks which may be disabled to userspace, so it is impossible to predict the behavior of KVM. Worse yet, KVM_CAP_DISABLE_QUIRKS will tolerate any value for cap->args[0], meaning it fails to reject attempts to set invalid quirk bits. The only valid workaround for the quirky quirks API is to add a new CAP. Actually advertise the set of quirks that can be disabled to userspace so it can predict KVM's behavior. Reject values for cap->args[0] that contain invalid bits. Finally, add documentation for the new capability and describe the existing quirks. Signed-off-by: Oliver Upton <oupton@google.com> Message-Id: <20220301060351.442881-5-oupton@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | Merge tag 'kvmarm-5.18' of ↵Paolo Bonzini2022-03-182-46/+82
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 updates for 5.18 - Proper emulation of the OSLock feature of the debug architecture - Scalibility improvements for the MMU lock when dirty logging is on - New VMID allocator, which will eventually help with SVA in VMs - Better support for PMUs in heterogenous systems - PSCI 1.1 support, enabling support for SYSTEM_RESET2 - Implement CONFIG_DEBUG_LIST at EL2 - Make CONFIG_ARM64_ERRATUM_2077057 default y - Reduce the overhead of VM exit when no interrupt is pending - Remove traces of 32bit ARM host support from the documentation - Updated vgic selftests - Various cleanups, doc updates and spelling fixes
| | * \ Merge branch kvm-arm64/misc-5.18 into kvmarm-master/nextMarc Zyngier2022-03-092-45/+44
| | |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * kvm-arm64/misc-5.18: : . : Misc fixes for KVM/arm64 5.18: : : - Drop unused kvm parameter to kvm_psci_version() : : - Implement CONFIG_DEBUG_LIST at EL2 : : - Make CONFIG_ARM64_ERRATUM_2077057 default y : : - Only do the interrupt dance if we have exited because of an interrupt : : - Remove traces of 32bit ARM host support from the documentation : . Documentation: KVM: Update documentation to indicate KVM is arm64-only KVM: arm64: Only open the interrupt window on exit due to an interrupt KVM: arm64: Enable Cortex-A510 erratum 2077057 by default Signed-off-by: Marc Zyngier <maz@kernel.org>
| | | * | Documentation: KVM: Update documentation to indicate KVM is arm64-onlyOliver Upton2022-03-092-45/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | KVM support for 32-bit ARM hosts (KVM/arm) has been removed from the kernel since commit 541ad0150ca4 ("arm: Remove 32bit KVM host support"). There still exists some remnants of the old architecture in the KVM documentation. Remove all traces of 32-bit host support from the documentation. Note that AArch32 guests are still supported. Suggested-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Oliver Upton <oupton@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220308172856.2997250-1-oupton@google.com
| | * | | Merge branch kvm-arm64/psci-1.1 into kvmarm-master/nextMarc Zyngier2022-02-251-0/+5
| | |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * kvm-arm64/psci-1.1: : . : Limited PSCI-1.1 support from Will Deacon: : : This small series exposes the PSCI SYSTEM_RESET2 call to guests, which : allows the propagation of a "reset_type" and a "cookie" back to the VMM. : Although Linux guests only ever pass 0 for the type ("SYSTEM_WARM_RESET"), : the vendor-defined range can be used by a bootloader to provide additional : information about the reset, such as an error code. : . KVM: arm64: Remove unneeded semicolons KVM: arm64: Indicate SYSTEM_RESET2 in kvm_run::system_event flags field KVM: arm64: Expose PSCI SYSTEM_RESET2 call to the guest KVM: arm64: Bump guest PSCI version to 1.1 Signed-off-by: Marc Zyngier <maz@kernel.org>
| | | * | | KVM: arm64: Indicate SYSTEM_RESET2 in kvm_run::system_event flags fieldWill Deacon2022-02-211-0/+5
| | | |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When handling reset and power-off PSCI calls from the guest, we initialise X0 to PSCI_RET_INTERNAL_FAILURE in case the VMM tries to re-run the vCPU after issuing the call. Unfortunately, this also means that the VMM cannot see which PSCI call was issued and therefore cannot distinguish between PSCI SYSTEM_RESET and SYSTEM_RESET2 calls, which is necessary in order to determine the validity of the "reset_type" in X1. Allocate bit 0 of the previously unused 'flags' field of the system_event structure so that we can indicate the PSCI call used to initiate the reset. Cc: Marc Zyngier <maz@kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Alexandru Elisei <alexandru.elisei@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220221153524.15397-4-will@kernel.org
| | * | | KVM: arm64: Refuse to run VCPU if the PMU doesn't match the physical CPUAlexandru Elisei2022-02-081-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Userspace can assign a PMU to a VCPU with the KVM_ARM_VCPU_PMU_V3_SET_PMU device ioctl. If the VCPU is scheduled on a physical CPU which has a different PMU, the perf events needed to emulate a guest PMU won't be scheduled in and the guest performance counters will stop counting. Treat it as an userspace error and refuse to run the VCPU in this situation. Suggested-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220127161759.53553-7-alexandru.elisei@arm.com
| | * | | KVM: arm64: Add KVM_ARM_VCPU_PMU_V3_SET_PMU attributeAlexandru Elisei2022-02-081-0/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When KVM creates an event and there are more than one PMUs present on the system, perf_init_event() will go through the list of available PMUs and will choose the first one that can create the event. The order of the PMUs in this list depends on the probe order, which can change under various circumstances, for example if the order of the PMU nodes change in the DTB or if asynchronous driver probing is enabled on the kernel command line (with the driver_async_probe=armv8-pmu option). Another consequence of this approach is that on heteregeneous systems all virtual machines that KVM creates will use the same PMU. This might cause unexpected behaviour for userspace: when a VCPU is executing on the physical CPU that uses this default PMU, PMU events in the guest work correctly; but when the same VCPU executes on another CPU, PMU events in the guest will suddenly stop counting. Fortunately, perf core allows user to specify on which PMU to create an event by using the perf_event_attr->type field, which is used by perf_init_event() as an index in the radix tree of available PMUs. Add the KVM_ARM_VCPU_PMU_V3_CTRL(KVM_ARM_VCPU_PMU_V3_SET_PMU) VCPU attribute to allow userspace to specify the arm_pmu that KVM will use when creating events for that VCPU. KVM will make no attempt to run the VCPU on the physical CPUs that share the PMU, leaving it up to userspace to manage the VCPU threads' affinity accordingly. To ensure that KVM doesn't expose an asymmetric system to the guest, the PMU set for one VCPU will be used by all other VCPUs. Once a VCPU has run, the PMU cannot be changed in order to avoid changing the list of available events for a VCPU, or to change the semantics of existing events. Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220127161759.53553-6-alexandru.elisei@arm.com
| | * | | KVM: arm64: Do not change the PMU event filter after a VCPU has runMarc Zyngier2022-02-081-1/+1
| | |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Userspace can specify which events a guest is allowed to use with the KVM_ARM_VCPU_PMU_V3_FILTER attribute. The list of allowed events can be identified by a guest from reading the PMCEID{0,1}_EL0 registers. Changing the PMU event filter after a VCPU has run can cause reads of the registers performed before the filter is changed to return different values than reads performed with the new event filter in place. The architecture defines the two registers as read-only, and this behaviour contradicts that. Keep track when the first VCPU has run and deny changes to the PMU event filter to prevent this from happening. Signed-off-by: Marc Zyngier <maz@kernel.org> [ Alexandru E: Added commit message, updated ioctl documentation ] Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220127161759.53553-2-alexandru.elisei@arm.com
| * | | Merge branch 'kvm-bugfixes' into HEADPaolo Bonzini2022-03-041-1/+1
| |\ \ \ | | | |/ | | |/| | | | | Merge bugfixes from 5.17 before merging more tricky work.