aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
...
| * | kvm: Disallow wraparound in kvm_gfn_to_hva_cache_initJim Mattson2018-12-211-20/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, in the case where (gpa + len) wrapped around, the entire region was not validated, as the comment claimed. It doesn't actually seem that wraparound should be allowed here at all. Furthermore, since some callers don't check the return code from this function, it seems prudent to clear ghc->memslot in the event of an error. Fixes: 8f964525a121f ("KVM: Allow cross page reads and writes from cached translations.") Reported-by: Cfir Cohen <cfir@google.com> Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: Cfir Cohen <cfir@google.com> Reviewed-by: Marc Orr <marcorr@google.com> Cc: Andrew Honig <ahonig@google.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
| * | KVM: VMX: Remove duplicated include from vmx.cYueHaibing2018-12-211-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | Remove duplicated include. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
| * | selftests: kvm: report failed stage when exit reason is unexpectedVitaly Kuznetsov2018-12-212-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we get a report like ==== Test Assertion Failure ==== x86_64/state_test.c:157: run->exit_reason == KVM_EXIT_IO pid=955 tid=955 - Success 1 0x0000000000401350: main at state_test.c:154 2 0x00007fc31c9e9412: ?? ??:0 3 0x000000000040159d: _start at ??:? Unexpected exit reason: 8 (SHUTDOWN), it is not obvious which particular stage failed. Add the info. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
| * | KVM: x86: svm: report MSR_IA32_MCG_EXT_CTL as unsupportedVitaly Kuznetsov2018-12-211-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | AMD doesn't seem to implement MSR_IA32_MCG_EXT_CTL and svm code in kvm knows nothing about it, however, this MSR is among emulated_msrs and thus returned with KVM_GET_MSR_INDEX_LIST. The consequent KVM_GET_MSRS, of course, fails. Report the MSR as unsupported to not confuse userspace. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
| * | KVM: x86: fix size of x86_fpu_cache objectsPaolo Bonzini2018-12-211-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The memory allocation in b666a4b69739 ("kvm: x86: Dynamically allocate guest_fpu", 2018-11-06) is wrong, there are other members in struct fpu before the fpregs_state union and the patch should be doing something similar to the code in fpu__init_task_struct_size. It's enough to run a guest and then rmmod kvm to see slub errors which are actually caused by memory corruption. For now let's revert it to sizeof(struct fpu), which is conservative. I have plans to move fsave/fxsave/xsave directly in KVM, without using the kernel FPU helpers, and once it's done, the size of the object in the cache will be something like kvm_xstate_size. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | Merge tag 'kvm-ppc-next-4.21-1' of ↵Radim Krčmář2018-12-2018-70/+418
| |\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc PPC KVM update for 4.21 from Paul Mackerras The main new feature this time is support in HV nested KVM for passing a device that is emulated by a level 0 hypervisor and presented to level 1 as a PCI device through to a level 2 guest using VFIO. Apart from that there are improvements for migration of radix guests under HV KVM and some other fixes and cleanups.
| | * KVM: PPC: Book3S HV: Allow passthrough of an emulated device to an L3 guestSuraj Jitindar Singh2018-12-172-10/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously when a device was being emulated by an L1 guest for an L2 guest, that device couldn't then be passed through to an L3 guest. This was because the L1 guest had no method for accessing L3 memory. The hcall H_COPY_TOFROM_GUEST provides this access. Thus this setup for passthrough can now be allowed. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
| | * KVM: PPC: Book3S: Introduce new hcall H_COPY_TOFROM_GUEST to access ↵Suraj Jitindar Singh2018-12-175-4/+89
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | quadrants 1 & 2 A guest cannot access quadrants 1 or 2 as this would result in an exception. Thus introduce the hcall H_COPY_TOFROM_GUEST to be used by a guest when it wants to perform an access to quadrants 1 or 2, for example when it wants to access memory for one of its nested guests. Also provide an implementation for the kvm-hv module. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
| | * KVM: PPC: Book3S HV: Allow passthrough of an emulated device to an L2 guestSuraj Jitindar Singh2018-12-175-11/+57
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow for a device which is being emulated at L0 (the host) for an L1 guest to be passed through to a nested (L2) guest. The existing kvmppc_hv_emulate_mmio function can be used here. The main challenge is that for a load the result must be stored into the L2 gpr, not an L1 gpr as would normally be the case after going out to qemu to complete the operation. This presents a challenge as at this point the L2 gpr state has been written back into L1 memory. To work around this we store the address in L1 memory of the L2 gpr where the result of the load is to be stored and use the new io_gpr value KVM_MMIO_REG_NESTED_GPR to indicate that this is a nested load for which completion must be done when returning back into the kernel. Then in kvmppc_complete_mmio_load() the resultant value is written into L1 memory at the location of the indicated L2 gpr. Note that we don't currently let an L1 guest emulate a device for an L2 guest which is then passed through to an L3 guest. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
| | * KVM: PPC: Update kvmppc_st and kvmppc_ld to use quadrantsSuraj Jitindar Singh2018-12-171-2/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The functions kvmppc_st and kvmppc_ld are used to access guest memory from the host using a guest effective address. They do so by translating through the process table to obtain a guest real address and then using kvm_read_guest or kvm_write_guest to make the access with the guest real address. This method of access however only works for L1 guests and will give the incorrect results for a nested guest. We can however use the store_to_eaddr and load_from_eaddr kvmppc_ops to perform the access for a nested guesti (and a L1 guest). So attempt this method first and fall back to the old method if this fails and we aren't running a nested guest. At this stage there is no fall back method to perform the access for a nested guest and this is left as a future improvement. For now we will return to the nested guest and rely on the fact that a translation should be faulted in before retrying the access. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
| | * KVM: PPC: Add load_from_eaddr and store_to_eaddr to the kvmppc_ops structSuraj Jitindar Singh2018-12-172-0/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The kvmppc_ops struct is used to store function pointers to kvm implementation specific functions. Introduce two new functions load_from_eaddr and store_to_eaddr to be used to load from and store to a guest effective address respectively. Also implement these for the kvm-hv module. If we are using the radix mmu then we can call the functions to access quadrant 1 and 2. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
| | * KVM: PPC: Book3S HV: Implement functions to access quadrants 1 & 2Suraj Jitindar Singh2018-12-174-0/+111
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The POWER9 radix mmu has the concept of quadrants. The quadrant number is the two high bits of the effective address and determines the fully qualified address to be used for the translation. The fully qualified address consists of the effective lpid, the effective pid and the effective address. This gives then 4 possible quadrants 0, 1, 2, and 3. When accessing these quadrants the fully qualified address is obtained as follows: Quadrant | Hypervisor | Guest -------------------------------------------------------------------------- | EA[0:1] = 0b00 | EA[0:1] = 0b00 0 | effLPID = 0 | effLPID = LPIDR | effPID = PIDR | effPID = PIDR -------------------------------------------------------------------------- | EA[0:1] = 0b01 | 1 | effLPID = LPIDR | Invalid Access | effPID = PIDR | -------------------------------------------------------------------------- | EA[0:1] = 0b10 | 2 | effLPID = LPIDR | Invalid Access | effPID = 0 | -------------------------------------------------------------------------- | EA[0:1] = 0b11 | EA[0:1] = 0b11 3 | effLPID = 0 | effLPID = LPIDR | effPID = 0 | effPID = 0 -------------------------------------------------------------------------- In the Guest; Quadrant 3 is normally used to address the operating system since this uses effPID=0 and effLPID=LPIDR, meaning the PID register doesn't need to be switched. Quadrant 0 is normally used to address user space since the effLPID and effPID are taken from the corresponding registers. In the Host; Quadrant 0 and 3 are used as above, however the effLPID is always 0 to address the host. Quadrants 1 and 2 can be used by the host to address guest memory using a guest effective address. Since the effLPID comes from the LPID register, the host loads the LPID of the guest it would like to access (and the PID of the process) and can perform accesses to a guest effective address. This means quadrant 1 can be used to address the guest user space and quadrant 2 can be used to address the guest operating system from the hypervisor, using a guest effective address. Access to the quadrants can cause a Hypervisor Data Storage Interrupt (HDSI) due to being unable to perform partition scoped translation. Previously this could only be generated from a guest and so the code path expects us to take the KVM trampoline in the interrupt handler. This is no longer the case so we modify the handler to call bad_page_fault() to check if we were expecting this fault so we can handle it gracefully and just return with an error code. In the hash mmu case we still raise an unknown exception since quadrants aren't defined for the hash mmu. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
| | * KVM: PPC: Book3S HV: Add function kvmhv_vcpu_is_radix()Suraj Jitindar Singh2018-12-172-0/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There exists a function kvm_is_radix() which is used to determine if a kvm instance is using the radix mmu. However this only applies to the first level (L1) guest. Add a function kvmhv_vcpu_is_radix() which can be used to determine if the current execution context of the vcpu is radix, accounting for if the vcpu is running a nested guest. Currently all nested guests must be radix but this may change in the future. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
| | * KVM: PPC: Book3S: Only report KVM_CAP_SPAPR_TCE_VFIO on powernv machinesSuraj Jitindar Singh2018-12-171-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The kvm capability KVM_CAP_SPAPR_TCE_VFIO is used to indicate the availability of in kernel tce acceleration for vfio. However it is currently the case that this is only available on a powernv machine, not for a pseries machine. Thus make this capability dependent on having the cpu feature CPU_FTR_HVMODE. [paulus@ozlabs.org - fixed compilation for Book E.] Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
| | * KVM: PPC: Book3S HV: Flush guest mappings when turning dirty tracking on/offPaul Mackerras2018-12-174-4/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds code to flush the partition-scoped page tables for a radix guest when dirty tracking is turned on or off for a memslot. Only the guest real addresses covered by the memslot are flushed. The reason for this is to get rid of any 2M PTEs in the partition-scoped page tables that correspond to host transparent huge pages, so that page dirtiness is tracked at a system page (4k or 64k) granularity rather than a 2M granularity. The page tables are also flushed when turning dirty tracking off so that the memslot's address space can be repopulated with THPs if possible. To do this, we add a new function kvmppc_radix_flush_memslot(). Since this does what's needed for kvmppc_core_flush_memslot_hv() on a radix guest, we now make kvmppc_core_flush_memslot_hv() call the new kvmppc_radix_flush_memslot() rather than calling kvm_unmap_radix() for each page in the memslot. This has the effect of fixing a bug in that kvmppc_core_flush_memslot_hv() was previously calling kvm_unmap_radix() without holding the kvm->mmu_lock spinlock, which is required to be held. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Reviewed-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
| | * KVM: PPC: Book3S HV: Cleanups - constify memslots, fix commentsPaul Mackerras2018-12-175-13/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds 'const' to the declarations for the struct kvm_memory_slot pointer parameters of some functions, which will make it possible to call those functions from kvmppc_core_commit_memory_region_hv() in the next patch. This also fixes some comments about locking. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Reviewed-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
| | * KVM: PPC: Book3S HV: Map single pages when doing dirty page loggingPaul Mackerras2018-12-171-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For radix guests, this makes KVM map guest memory as individual pages when dirty page logging is enabled for the memslot corresponding to the guest real address. Having a separate partition-scoped PTE for each system page mapped to the guest means that we have a separate dirty bit for each page, thus making the reported dirty bitmap more accurate. Without this, if part of guest memory is backed by transparent huge pages, the dirty status is reported at a 2MB granularity rather than a 64kB (or 4kB) granularity for that part, causing userspace to have to transmit more data when migrating the guest. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Reviewed-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
| | * KVM: PPC: Pass change type down to memslot commit functionBharata B Rao2018-12-176-8/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, kvm_arch_commit_memory_region() gets called with a parameter indicating what type of change is being made to the memslot, but it doesn't pass it down to the platform-specific memslot commit functions. This adds the `change' parameter to the lower-level functions so that they can use it in future. [paulus@ozlabs.org - fix book E also.] Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Reviewed-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
| | * KVM: PPC: Book3S PR: Set hflag to indicate that POWER9 supports 1T segmentsSuraj Jitindar Singh2018-12-141-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When booting a kvm-pr guest on a POWER9 machine the following message is observed: "qemu-system-ppc64: KVM does not support 1TiB segments which guest expects" This is because the guest is expecting to be able to use 1T segments however we don't indicate support for it. This is because we don't set the BOOK3S_HFLAG_MULTI_PGSIZE flag in the hflags in kvmppc_set_pvr_pr() on POWER9. POWER9 does indeed have support for 1T segments, so add a case for POWER9 to the switch statement to ensure it is set. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
| | * KVM: PPC: Book3S HV: Change to use DEFINE_SHOW_ATTRIBUTE macroYangtao Li2018-12-142-22/+2
| | | | | | | | | | | | | | | | | | | | | Use DEFINE_SHOW_ATTRIBUTE macro to simplify the code. Signed-off-by: Yangtao Li <tiny.windzz@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
| | * KVM: PPC: Book3S HV: Fix race between kvm_unmap_hva_range and MMU mode switchPaul Mackerras2018-12-142-6/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Testing has revealed an occasional crash which appears to be caused by a race between kvmppc_switch_mmu_to_hpt and kvm_unmap_hva_range_hv. The symptom is a NULL pointer dereference in __find_linux_pte() called from kvm_unmap_radix() with kvm->arch.pgtable == NULL. Looking at kvmppc_switch_mmu_to_hpt(), it does indeed clear kvm->arch.pgtable (via kvmppc_free_radix()) before setting kvm->arch.radix to NULL, and there is nothing to prevent kvm_unmap_hva_range_hv() or the other MMU callback functions from being called concurrently with kvmppc_switch_mmu_to_hpt() or kvmppc_switch_mmu_to_radix(). This patch therefore adds calls to spin_lock/unlock on the kvm->mmu_lock around the assignments to kvm->arch.radix, and makes sure that the partition-scoped radix tree or HPT is only freed after changing kvm->arch.radix. This also takes the kvm->mmu_lock in kvmppc_rmap_reset() to make sure that the clearing of each rmap array (one per memslot) doesn't happen concurrently with use of the array in the kvm_unmap_hva_range_hv() or the other MMU callbacks. Fixes: 18c3640cefc7 ("KVM: PPC: Book3S HV: Add infrastructure for running HPT guests on radix host") Cc: stable@vger.kernel.org # v4.15+ Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
| * | Merge tag 'kvm-s390-next-4.21-1' of ↵Paolo Bonzini2018-12-191-7/+17
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD KVM: s390: Fixes for 4.21 Just two small fixes.
| | * | KVM: s390: fix kmsg component kvm-s390Michael Mueller2018-12-181-4/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Relocate #define statement for kvm related kernel messages before the include of printk to become effective. Signed-off-by: Michael Mueller <mimu@linux.ibm.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
| | * | KVM: s390: unregister debug feature on failing arch initMichael Mueller2018-12-181-3/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make sure the debug feature and its allocated resources get released upon unsuccessful architecture initialization. A related indication of the issue will be reported as kernel message. Signed-off-by: Michael Mueller <mimu@linux.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Pierre Morel <pmorel@linux.ibm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Message-Id: <20181130143215.69496-2-mimu@linux.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
| * | | Merge tag 'kvmarm-for-v4.21' of ↵Paolo Bonzini2018-12-1929-296/+595
| |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm updates for 4.21 - Large PUD support for HugeTLB - Single-stepping fixes - Improved tracing - Various timer and vgic fixups
| | * | | arm: KVM: Add S2_PMD_{MASK,SIZE} constantsMarc Zyngier2018-12-191-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | They were missing, and it turns out that we do need them now. Acked-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| | * | | arm/arm64: KVM: Add ARM_EXCEPTION_IS_TRAP macroMarc Zyngier2018-12-193-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 32 and 64bit use different symbols to identify the traps. 32bit has a fine grained approach (prefetch abort, data abort and HVC), while 64bit is pretty happy with just "trap". This has been fine so far, except that we now need to decode some of that in tracepoints that are common to both architectures. Introduce ARM_EXCEPTION_IS_TRAP which abstracts the trap symbols and make the tracepoint use it. Acked-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| | * | | arm64: KVM: Avoid setting the upper 32 bits of VTCR_EL2 to 1Will Deacon2018-12-191-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Although bit 31 of VTCR_EL2 is RES1, we inadvertently end up setting all of the upper 32 bits to 1 as well because we define VTCR_EL2_RES1 as signed, which is sign-extended when assigning to kvm->arch.vtcr. Lucky for us, the architecture currently treats these upper bits as RES0 so, whilst we've been naughty, we haven't set fire to anything yet. Cc: <stable@vger.kernel.org> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| | * | | KVM: arm/arm64: Fix unintended stage 2 PMD mappingsChristoffer Dall2018-12-191-22/+64
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are two things we need to take care of when we create block mappings in the stage 2 page tables: (1) The alignment within a PMD between the host address range and the guest IPA range must be the same, since otherwise we end up mapping pages with the wrong offset. (2) The head and tail of a memory slot may not cover a full block size, and we have to take care to not map those with block descriptors, since we could expose memory to the guest that the host did not intend to expose. So far, we have been taking care of (1), but not (2), and our commentary describing (1) was somewhat confusing. This commit attempts to factor out the checks of both into a common function, and if we don't pass the check, we won't attempt any PMD mappings for neither hugetlbfs nor THP. Note that we used to only check the alignment for THP, not for hugetlbfs, but as far as I can tell the check needs to be applied to both scenarios. Cc: Ralph Palutke <ralph.palutke@fau.de> Cc: Lukas Braun <koomi@moshbit.net> Reported-by: Lukas Braun <koomi@moshbit.net> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| | * | | arm/arm64: KVM: vgic: Force VM halt when changing the active state of GICv3 ↵Marc Zyngier2018-12-191-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | PPIs/SGIs We currently only halt the guest when a vCPU messes with the active state of an SPI. This is perfectly fine for GICv2, but isn't enough for GICv3, where all vCPUs can access the state of any other vCPU. Let's broaden the condition to include any GICv3 interrupt that has an active state (i.e. all but LPIs). Cc: stable@vger.kernel.org Reviewed-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| | * | | arm64: KVM: Add trapped system register access tracepointMarc Zyngier2018-12-193-0/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We're pretty blind when it comes to system register tracing, and rely on the ESR value displayed by kvm_handle_sys, which isn't much. Instead, let's add an actual name to the sysreg entries, so that we can finally print it as we're about to perform the access itself. The new tracepoint is conveniently called kvm_sys_access. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| | * | | KVM: arm64: Make vcpu const in vcpu_read_sys_regChristoffer Dall2018-12-192-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | vcpu_read_sys_reg should not be modifying the VCPU structure. Eventually, to handle EL2 sysregs for nested virtualization, we will call vcpu_read_sys_reg from places that have a const vcpu pointer, which will complain about the lack of the const modifier on the read path. Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| | * | | KVM: arm/arm64: arch_timer: Simplify kvm_timer_vcpu_terminateChristoffer Dall2018-12-191-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | kvm_timer_vcpu_terminate can only be called in two scenarios: 1. As part of cleanup during a failed VCPU create 2. As part of freeing the whole VM (struct kvm refcount == 0) In the first case, we cannot have programmed any timers or mapped any IRQs, and therefore we do not have to cancel anything or unmap anything. In the second case, the VCPU will have gone through kvm_timer_vcpu_put, which will have canceled the emulated physical timer's hrtimer, and we do not need to that here as well. We also do not care if the irq is recorded as mapped or not in the VGIC data structure, because the whole VM is going away. That leaves us only with having to ensure that we cancel the bg_timer if we were blocking the last time we called kvm_timer_vcpu_put(). Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| | * | | KVM: arm/arm64: Remove arch timer workqueueChristoffer Dall2018-12-192-31/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The use of a work queue in the hrtimer expire function for the bg_timer is a leftover from the time when we would inject interrupts when the bg_timer expired. Since we are no longer doing that, we can instead call kvm_vcpu_wake_up() directly from the hrtimer function and remove all workqueue functionality from the arch timer code. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| | * | | KVM: arm/arm64: Fixup the kvm_exit tracepointChristoffer Dall2018-12-193-13/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The kvm_exit tracepoint strangely always reported exits as being IRQs. This seems to be because either the __print_symbolic or the tracepoint macros use a variable named idx. Take this chance to update the fields in the tracepoint to reflect the concepts in the arm64 architecture that we pass to the tracepoint and move the exception type table to the same location and header files as the exits code. We also clear out the exception code to 0 for IRQ exits (which translates to UNKNOWN in text) to make it slighyly less confusing to parse the trace output. Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| | * | | KVM: arm/arm64: vgic: Consider priority and active state for pending irqChristoffer Dall2018-12-191-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When checking if there are any pending IRQs for the VM, consider the active state and priority of the IRQs as well. Otherwise we could be continuously scheduling a guest hypervisor without it seeing an IRQ. Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| | * | | KVM: arm/arm64: vgic: Fix off-by-one bug in vgic_get_irq()Gustavo A. R. Silva2018-12-191-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When using the nospec API, it should be taken into account that: "...if the CPU speculates past the bounds check then * array_index_nospec() will clamp the index within the range of [0, * size)." The above is part of the header for macro array_index_nospec() in linux/nospec.h Now, in this particular case, if intid evaluates to exactly VGIC_MAX_SPI or to exaclty VGIC_MAX_PRIVATE, the array_index_nospec() macro ends up returning VGIC_MAX_SPI - 1 or VGIC_MAX_PRIVATE - 1 respectively, instead of VGIC_MAX_SPI or VGIC_MAX_PRIVATE, which, based on the original logic: /* SGIs and PPIs */ if (intid <= VGIC_MAX_PRIVATE) return &vcpu->arch.vgic_cpu.private_irqs[intid]; /* SPIs */ if (intid <= VGIC_MAX_SPI) return &kvm->arch.vgic.spis[intid - VGIC_NR_PRIVATE_IRQS]; are valid values for intid. Fix this by calling array_index_nospec() macro with VGIC_MAX_PRIVATE + 1 and VGIC_MAX_SPI + 1 as arguments for its parameter size. Fixes: 41b87599c743 ("KVM: arm/arm64: vgic: fix possible spectre-v1 in vgic_get_irq()") Cc: stable@vger.kernel.org Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> [dropped the SPI part which was fixed separately] Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| | * | | KVM: arm/arm64: vgic: Cap SPIs to the VM-defined maximumMarc Zyngier2018-12-181-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | SPIs should be checked against the VMs specific configuration, and not the architectural maximum. Cc: stable@vger.kernel.org Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| | * | | KVM: arm64: Clarify explanation of STAGE2_PGTABLE_LEVELSChristoffer Dall2018-12-182-10/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In attempting to re-construct the logic for our stage 2 page table layout I found the reasoning in the comment explaining how we calculate the number of levels used for stage 2 page tables a bit backwards. This commit attempts to clarify the comment, to make it slightly easier to read without having the Arm ARM open on the right page. While we're at it, fixup a typo in a comment that was recently changed. Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| | * | | KVM: arm/arm64: vgic: Do not cond_resched_lock() with IRQs disabledJulien Thierry2018-12-181-21/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To change the active state of an MMIO, halt is requested for all vcpus of the affected guest before modifying the IRQ state. This is done by calling cond_resched_lock() in vgic_mmio_change_active(). However interrupts are disabled at this point and we cannot reschedule a vcpu. We actually don't need any of this, as kvm_arm_halt_guest ensures that all the other vcpus are out of the guest. Let's just drop that useless code. Signed-off-by: Julien Thierry <julien.thierry@arm.com> Suggested-by: Christoffer Dall <christoffer.dall@arm.com> Cc: stable@vger.kernel.org Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| | * | | KVM: arm64: Add support for creating PUD hugepages at stage 2Punit Agrawal2018-12-186-6/+143
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | KVM only supports PMD hugepages at stage 2. Now that the various page handling routines are updated, extend the stage 2 fault handling to map in PUD hugepages. Addition of PUD hugepage support enables additional page sizes (e.g., 1G with 4K granule) which can be useful on cores that support mapping larger block sizes in the TLB entries. Signed-off-by: Punit Agrawal <punit.agrawal@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@arm.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> [ Replace BUG() => WARN_ON(1) for arm32 PUD helpers ] Signed-off-by: Suzuki Poulose <suzuki.poulose@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| | * | | KVM: arm64: Update age handlers to support PUD hugepagesPunit Agrawal2018-12-184-19/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In preparation for creating larger hugepages at Stage 2, add support to the age handling notifiers for PUD hugepages when encountered. Provide trivial helpers for arm32 to allow sharing code. Signed-off-by: Punit Agrawal <punit.agrawal@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@arm.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> [ Replaced BUG() => WARN_ON(1) for arm32 PUD helpers ] Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| | * | | KVM: arm64: Support handling access faults for PUD hugepagesPunit Agrawal2018-12-184-11/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In preparation for creating larger hugepages at Stage 2, extend the access fault handling at Stage 2 to support PUD hugepages when encountered. Provide trivial helpers for arm32 to allow sharing of code. Signed-off-by: Punit Agrawal <punit.agrawal@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@arm.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> [ Replaced BUG() => WARN_ON(1) in PUD helpers ] Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| | * | | KVM: arm64: Support PUD hugepage in stage2_is_exec()Punit Agrawal2018-12-184-5/+61
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In preparation for creating PUD hugepages at stage 2, add support for detecting execute permissions on PUD page table entries. Faults due to lack of execute permissions on page table entries is used to perform i-cache invalidation on first execute. Provide trivial implementations of arm32 helpers to allow sharing of code. Signed-off-by: Punit Agrawal <punit.agrawal@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@arm.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> [ Replaced BUG() => WARN_ON(1) in arm32 PUD helpers ] Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| | * | | KVM: arm64: Support dirty page tracking for PUD hugepagesPunit Agrawal2018-12-183-4/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In preparation for creating PUD hugepages at stage 2, add support for write protecting PUD hugepages when they are encountered. Write protecting guest tables is used to track dirty pages when migrating VMs. Also, provide trivial implementations of required kvm_s2pud_* helpers to allow sharing of code with arm32. Signed-off-by: Punit Agrawal <punit.agrawal@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@arm.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> [ Replaced BUG() => WARN_ON() in arm32 pud helpers ] Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| | * | | KVM: arm/arm64: Introduce helpers to manipulate page table entriesPunit Agrawal2018-12-183-6/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce helpers to abstract architectural handling of the conversion of pfn to page table entries and marking a PMD page table entry as a block entry. The helpers are introduced in preparation for supporting PUD hugepages at stage 2 - which are supported on arm64 but do not exist on arm. Signed-off-by: Punit Agrawal <punit.agrawal@arm.com> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Acked-by: Christoffer Dall <christoffer.dall@arm.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| | * | | KVM: arm/arm64: Re-factor setting the Stage 2 entry to exec on faultPunit Agrawal2018-12-181-13/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Stage 2 fault handler marks a page as executable if it is handling an execution fault or if it was a permission fault in which case the executable bit needs to be preserved. The logic to decide if the page should be marked executable is duplicated for PMD and PTE entries. To avoid creating another copy when support for PUD hugepages is introduced refactor the code to share the checks needed to mark a page table entry as executable. Signed-off-by: Punit Agrawal <punit.agrawal@arm.com> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| | * | | KVM: arm/arm64: Share common code in user_mem_abort()Punit Agrawal2018-12-181-19/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The code for operations such as marking the pfn as dirty, and dcache/icache maintenance during stage 2 fault handling is duplicated between normal pages and PMD hugepages. Instead of creating another copy of the operations when we introduce PUD hugepages, let's share them across the different pagesizes. Signed-off-by: Punit Agrawal <punit.agrawal@arm.com> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| | * | | KVM: arm/arm64: vgic-v2: Set active_source to 0 when restoring stateChristoffer Dall2018-12-181-1/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When restoring the active state from userspace, we don't know which CPU was the source for the active state, and this is not architecturally exposed in any of the register state. Set the active_source to 0 in this case. In the future, we can expand on this and exposse the information as additional information to userspace for GICv2 if anyone cares. Cc: stable@vger.kernel.org Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| | * | | KVM: arm/arm64: Log PSTATE for unhandled sysregsMark Rutland2018-12-182-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When KVM traps an unhandled sysreg/coproc access from a guest, it logs the guest PC. To aid debugging, it would be helpful to know which exception level the trap came from, along with other PSTATE/CPSR bits, so let's log the PSTATE/CPSR too. Acked-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>