aboutsummaryrefslogtreecommitdiffstats
path: root/arch/powerpc/kernel
Commit message (Collapse)AuthorAgeFilesLines
* Merge tag 'random-6.1-rc1-for-linus' of ↵Linus Torvalds2022-10-161-1/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/crng/random Pull more random number generator updates from Jason Donenfeld: "This time with some large scale treewide cleanups. The intent of this pull is to clean up the way callers fetch random integers. The current rules for doing this right are: - If you want a secure or an insecure random u64, use get_random_u64() - If you want a secure or an insecure random u32, use get_random_u32() The old function prandom_u32() has been deprecated for a while now and is just a wrapper around get_random_u32(). Same for get_random_int(). - If you want a secure or an insecure random u16, use get_random_u16() - If you want a secure or an insecure random u8, use get_random_u8() - If you want secure or insecure random bytes, use get_random_bytes(). The old function prandom_bytes() has been deprecated for a while now and has long been a wrapper around get_random_bytes() - If you want a non-uniform random u32, u16, or u8 bounded by a certain open interval maximum, use prandom_u32_max() I say "non-uniform", because it doesn't do any rejection sampling or divisions. Hence, it stays within the prandom_*() namespace, not the get_random_*() namespace. I'm currently investigating a "uniform" function for 6.2. We'll see what comes of that. By applying these rules uniformly, we get several benefits: - By using prandom_u32_max() with an upper-bound that the compiler can prove at compile-time is ≤65536 or ≤256, internally get_random_u16() or get_random_u8() is used, which wastes fewer batched random bytes, and hence has higher throughput. - By using prandom_u32_max() instead of %, when the upper-bound is not a constant, division is still avoided, because prandom_u32_max() uses a faster multiplication-based trick instead. - By using get_random_u16() or get_random_u8() in cases where the return value is intended to indeed be a u16 or a u8, we waste fewer batched random bytes, and hence have higher throughput. This series was originally done by hand while I was on an airplane without Internet. Later, Kees and I worked on retroactively figuring out what could be done with Coccinelle and what had to be done manually, and then we split things up based on that. So while this touches a lot of files, the actual amount of code that's hand fiddled is comfortably small" * tag 'random-6.1-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random: prandom: remove unused functions treewide: use get_random_bytes() when possible treewide: use get_random_u32() when possible treewide: use get_random_{u8,u16}() when possible, part 2 treewide: use get_random_{u8,u16}() when possible, part 1 treewide: use prandom_u32_max() when possible, part 2 treewide: use prandom_u32_max() when possible, part 1
| * treewide: use prandom_u32_max() when possible, part 1Jason A. Donenfeld2022-10-111-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rather than incurring a division or requesting too many random bytes for the given range, use the prandom_u32_max() function, which only takes the minimum required bytes from the RNG and avoids divisions. This was done mechanically with this coccinelle script: @basic@ expression E; type T; identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32"; typedef u64; @@ ( - ((T)get_random_u32() % (E)) + prandom_u32_max(E) | - ((T)get_random_u32() & ((E) - 1)) + prandom_u32_max(E * XXX_MAKE_SURE_E_IS_POW2) | - ((u64)(E) * get_random_u32() >> 32) + prandom_u32_max(E) | - ((T)get_random_u32() & ~PAGE_MASK) + prandom_u32_max(PAGE_SIZE) ) @multi_line@ identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32"; identifier RAND; expression E; @@ - RAND = get_random_u32(); ... when != RAND - RAND %= (E); + RAND = prandom_u32_max(E); // Find a potential literal @literal_mask@ expression LITERAL; type T; identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32"; position p; @@ ((T)get_random_u32()@p & (LITERAL)) // Add one to the literal. @script:python add_one@ literal << literal_mask.LITERAL; RESULT; @@ value = None if literal.startswith('0x'): value = int(literal, 16) elif literal[0] in '123456789': value = int(literal, 10) if value is None: print("I don't know how to handle %s" % (literal)) cocci.include_match(False) elif value == 2**32 - 1 or value == 2**31 - 1 or value == 2**24 - 1 or value == 2**16 - 1 or value == 2**8 - 1: print("Skipping 0x%x for cleanup elsewhere" % (value)) cocci.include_match(False) elif value & (value + 1) != 0: print("Skipping 0x%x because it's not a power of two minus one" % (value)) cocci.include_match(False) elif literal.startswith('0x'): coccinelle.RESULT = cocci.make_expr("0x%x" % (value + 1)) else: coccinelle.RESULT = cocci.make_expr("%d" % (value + 1)) // Replace the literal mask with the calculated result. @plus_one@ expression literal_mask.LITERAL; position literal_mask.p; expression add_one.RESULT; identifier FUNC; @@ - (FUNC()@p & (LITERAL)) + prandom_u32_max(RESULT) @collapse_ret@ type T; identifier VAR; expression E; @@ { - T VAR; - VAR = (E); - return VAR; + return E; } @drop_var@ type T; identifier VAR; @@ { - T VAR; ... when != VAR } Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Yury Norov <yury.norov@gmail.com> Reviewed-by: KP Singh <kpsingh@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> # for ext4 and sbitmap Reviewed-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> # for drbd Acked-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Heiko Carstens <hca@linux.ibm.com> # for s390 Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # for mmc Acked-by: Darrick J. Wong <djwong@kernel.org> # for xfs Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
* | Merge tag 'powerpc-6.1-2' of ↵Linus Torvalds2022-10-144-17/+53
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: - Fix 32-bit syscall wrappers with 64-bit arguments of unaligned register-pairs. Notably this broke ftruncate64 & pread/write64, which can lead to file corruption. - Fix lost interrupts when returning to soft-masked context on 64-bit. - Fix build failure when CONFIG_DTL=n. Thanks to Nicholas Piggin, Jason A. Donenfeld, Guenter Roeck, Arnd Bergmann, and Sachin Sant. * tag 'powerpc-6.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/pseries: Fix CONFIG_DTL=n build powerpc/64s/interrupt: Fix lost interrupts when returning to soft-masked context powerpc/32: fix syscall wrappers with 64-bit arguments of unaligned register-pairs
| * | powerpc/64s/interrupt: Fix lost interrupts when returning to soft-masked contextNicholas Piggin2022-10-131-2/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It's possible for an interrupt returning to an irqs-disabled context to lose a pending soft-masked irq because it branches to part of the exit code for irqs-enabled contexts, which is meant to clear only the PACA_IRQS_HARD_DIS flag from PACAIRQHAPPENED by zeroing the byte. This just looks like a simple thinko from a recent commit (if there was no hard mask pending, there would be no reason to clear it anyway). This also adds comment to the code that actually does need to clear the flag. Fixes: e485f6c751e0a ("powerpc/64/interrupt: Fix return to masked context after hard-mask irq becomes pending") Reported-by: Sachin Sant <sachinp@linux.ibm.com> Reported-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221013064418.1311104-1-npiggin@gmail.com
| * | powerpc/32: fix syscall wrappers with 64-bit arguments of unaligned ↵Nicholas Piggin2022-10-133-15/+40
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | register-pairs powerpc 32-bit system call (and function) calling convention for 64-bit arguments requires the next available odd-pair (two sequential registers with the first being odd-numbered) from the standard register argument allocation. The first argument register is r3, so a 64-bit argument that appears at an even position in the argument list must skip a register (unless there were preceding 64-bit arguments, which might throw things off). This requires non-standard compat definitions to deal with the holes in the argument register allocation. With pt_regs syscall wrappers which use a standard mapper to map pt_regs GPRs to function arguments, 32-bit kernels hit the same basic problem, the standard definitions don't cope with the unused argument registers. Fix this by having 32-bit kernels share those syscall definitions with compat. Thanks to Jason for spending a lot of time finding and bisecting this and developing a trivial reproducer. The perfect bug report. Reported-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Fixes: 7e92e01b72452 ("powerpc: Provide syscall wrapper") Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221012035335.866440-1-npiggin@gmail.com
* | Merge tag 'mm-nonmm-stable-2022-10-11' of ↵Linus Torvalds2022-10-121-5/+0
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: - hfs and hfsplus kmap API modernization (Fabio Francesco) - make crash-kexec work properly when invoked from an NMI-time panic (Valentin Schneider) - ntfs bugfixes (Hawkins Jiawei) - improve IPC msg scalability by replacing atomic_t's with percpu counters (Jiebin Sun) - nilfs2 cleanups (Minghao Chi) - lots of other single patches all over the tree! * tag 'mm-nonmm-stable-2022-10-11' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (71 commits) include/linux/entry-common.h: remove has_signal comment of arch_do_signal_or_restart() prototype proc: test how it holds up with mapping'less process mailmap: update Frank Rowand email address ia64: mca: use strscpy() is more robust and safer init/Kconfig: fix unmet direct dependencies ia64: update config files nilfs2: replace WARN_ONs by nilfs_error for checkpoint acquisition failure fork: remove duplicate included header files init/main.c: remove unnecessary (void*) conversions proc: mark more files as permanent nilfs2: remove the unneeded result variable nilfs2: delete unnecessary checks before brelse() checkpatch: warn for non-standard fixes tag style usr/gen_init_cpio.c: remove unnecessary -1 values from int file ipc/msg: mitigate the lock contention with percpu counter percpu: add percpu_counter_add_local and percpu_counter_sub_local fs/ocfs2: fix repeated words in comments relay: use kvcalloc to alloc page array in relay_alloc_page_array proc: make config PROC_CHILDREN depend on PROC_FS fs: uninline inode_maybe_inc_iversion() ...
| * kernel: exit: cleanup release_thread()Kefeng Wang2022-09-111-5/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Only x86 has own release_thread(), introduce a new weak release_thread() function to clean empty definitions in other ARCHs. Link: https://lkml.kernel.org/r/20220819014406.32266-1-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Acked-by: Guo Ren <guoren@kernel.org> [csky] Acked-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Brian Cain <bcain@quicinc.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc] Acked-by: Stafford Horne <shorne@gmail.com> [openrisc] Acked-by: Catalin Marinas <catalin.marinas@arm.com> [arm64] Acked-by: Huacai Chen <chenhuacai@kernel.org> [LoongArch] Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Chris Zankel <chris@zankel.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Dinh Nguyen <dinguyen@kernel.org> Cc: Guo Ren <guoren@kernel.org> [csky] Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Jonas Bonn <jonas@southpole.se> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michal Simek <monstr@monstr.eu> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Richard Henderson <richard.henderson@linaro.org> Cc: Richard Weinberger <richard@nod.at> Cc: Rich Felker <dalias@libc.org> Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vineet Gupta <vgupta@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: Xuerui Wang <kernel@xen0n.name> Cc: Yoshinori Sato <ysato@users.osdn.me> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* | powerpc: Fix 85xx buildJoel Stanley2022-10-111-1/+1
| | | | | | | | | | | | | | | | | | The merge of the kbuild tree dropped the renaming of the FSL_BOOKE kconfig option. Fixes: 8afc66e8d43b ("Merge tag 'kbuild-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild") Signed-off-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Merge tag 'mm-stable-2022-10-08' of ↵Linus Torvalds2022-10-101-3/+3
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - Yu Zhao's Multi-Gen LRU patches are here. They've been under test in linux-next for a couple of months without, to my knowledge, any negative reports (or any positive ones, come to that). - Also the Maple Tree from Liam Howlett. An overlapping range-based tree for vmas. It it apparently slightly more efficient in its own right, but is mainly targeted at enabling work to reduce mmap_lock contention. Liam has identified a number of other tree users in the kernel which could be beneficially onverted to mapletrees. Yu Zhao has identified a hard-to-hit but "easy to fix" lockdep splat at [1]. This has yet to be addressed due to Liam's unfortunately timed vacation. He is now back and we'll get this fixed up. - Dmitry Vyukov introduces KMSAN: the Kernel Memory Sanitizer. It uses clang-generated instrumentation to detect used-unintialized bugs down to the single bit level. KMSAN keeps finding bugs. New ones, as well as the legacy ones. - Yang Shi adds a userspace mechanism (madvise) to induce a collapse of memory into THPs. - Zach O'Keefe has expanded Yang Shi's madvise(MADV_COLLAPSE) to support file/shmem-backed pages. - userfaultfd updates from Axel Rasmussen - zsmalloc cleanups from Alexey Romanov - cleanups from Miaohe Lin: vmscan, hugetlb_cgroup, hugetlb and memory-failure - Huang Ying adds enhancements to NUMA balancing memory tiering mode's page promotion, with a new way of detecting hot pages. - memcg updates from Shakeel Butt: charging optimizations and reduced memory consumption. - memcg cleanups from Kairui Song. - memcg fixes and cleanups from Johannes Weiner. - Vishal Moola provides more folio conversions - Zhang Yi removed ll_rw_block() :( - migration enhancements from Peter Xu - migration error-path bugfixes from Huang Ying - Aneesh Kumar added ability for a device driver to alter the memory tiering promotion paths. For optimizations by PMEM drivers, DRM drivers, etc. - vma merging improvements from Jakub Matěn. - NUMA hinting cleanups from David Hildenbrand. - xu xin added aditional userspace visibility into KSM merging activity. - THP & KSM code consolidation from Qi Zheng. - more folio work from Matthew Wilcox. - KASAN updates from Andrey Konovalov. - DAMON cleanups from Kaixu Xia. - DAMON work from SeongJae Park: fixes, cleanups. - hugetlb sysfs cleanups from Muchun Song. - Mike Kravetz fixes locking issues in hugetlbfs and in hugetlb core. Link: https://lkml.kernel.org/r/CAOUHufZabH85CeUN-MEMgL8gJGzJEWUrkiM58JkTbBhh-jew0Q@mail.gmail.com [1] * tag 'mm-stable-2022-10-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (555 commits) hugetlb: allocate vma lock for all sharable vmas hugetlb: take hugetlb vma_lock when clearing vma_lock->vma pointer hugetlb: fix vma lock handling during split vma and range unmapping mglru: mm/vmscan.c: fix imprecise comments mm/mglru: don't sync disk for each aging cycle mm: memcontrol: drop dead CONFIG_MEMCG_SWAP config symbol mm: memcontrol: use do_memsw_account() in a few more places mm: memcontrol: deprecate swapaccounting=0 mode mm: memcontrol: don't allocate cgroup swap arrays when memcg is disabled mm/secretmem: remove reduntant return value mm/hugetlb: add available_huge_pages() func mm: remove unused inline functions from include/linux/mm_inline.h selftests/vm: add selftest for MADV_COLLAPSE of uffd-minor memory selftests/vm: add file/shmem MADV_COLLAPSE selftest for cleared pmd selftests/vm: add thp collapse shmem testing selftests/vm: add thp collapse file and tmpfs testing selftests/vm: modularize thp collapse memory operations selftests/vm: dedup THP helpers mm/khugepaged: add tracepoint to hpage_collapse_scan_file() mm/madvise: add file and shmem support to MADV_COLLAPSE ...
| * | powerpc: remove mmap linked list walksMatthew Wilcox (Oracle)2022-09-261-3/+3
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use the VMA iterator instead. Link: https://lkml.kernel.org/r/20220906194824.2110408-34-Liam.Howlett@oracle.com Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Tested-by: Yu Zhao <yuzhao@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: SeongJae Park <sj@kernel.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* | Merge tag 'bitmap-6.1-rc1' of https://github.com/norov/linuxLinus Torvalds2022-10-101-0/+4
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull bitmap updates from Yury Norov: - Fix unsigned comparison to -1 in CPUMAP_FILE_MAX_BYTES (Phil Auld) - cleanup nr_cpu_ids vs nr_cpumask_bits mess (me) This series cleans that mess and adds new config FORCE_NR_CPUS that allows to optimize cpumask subsystem if the number of CPUs is known at compile-time. - optimize find_bit() functions (me) Reworks find_bit() functions based on new FIND_{FIRST,NEXT}_BIT() macros. - add find_nth_bit() (me) Adds find_nth_bit(), which is ~70 times faster than bitcounting with for_each() loop: for_each_set_bit(bit, mask, size) if (n-- == 0) return bit; Also adds bitmap_weight_and() to let people replace this pattern: tmp = bitmap_alloc(nbits); bitmap_and(tmp, map1, map2, nbits); weight = bitmap_weight(tmp, nbits); bitmap_free(tmp); with a single bitmap_weight_and() call. - repair cpumask_check() (me) After switching cpumask to use nr_cpu_ids, cpumask_check() started generating many false-positive warnings. This series fixes it. - Add for_each_cpu_andnot() and for_each_cpu_andnot() (Valentin Schneider) Extends the API with one more function and applies it in sched/core. * tag 'bitmap-6.1-rc1' of https://github.com/norov/linux: (28 commits) sched/core: Merge cpumask_andnot()+for_each_cpu() into for_each_cpu_andnot() lib/test_cpumask: Add for_each_cpu_and(not) tests cpumask: Introduce for_each_cpu_andnot() lib/find_bit: Introduce find_next_andnot_bit() cpumask: fix checking valid cpu range lib/bitmap: add tests for for_each() loops lib/find: optimize for_each() macros lib/bitmap: introduce for_each_set_bit_wrap() macro lib/find_bit: add find_next{,_and}_bit_wrap cpumask: switch for_each_cpu{,_not} to use for_each_bit() net: fix cpu_max_bits_warn() usage in netif_attrmask_next{,_and} cpumask: add cpumask_nth_{,and,andnot} lib/bitmap: remove bitmap_ord_to_pos lib/bitmap: add tests for find_nth_bit() lib: add find_nth{,_and,_andnot}_bit() lib/bitmap: add bitmap_weight_and() lib/bitmap: don't call __bitmap_weight() in kernel code tools: sync find_bit() implementation lib/find_bit: optimize find_next_bit() functions lib/find_bit: create find_first_zero_bit_le() ...
| * | powerpc/64: don't refer nr_cpu_ids in asm code when it's undefinedYury Norov2022-09-201-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | generic_secondary_common_init() calls LOAD_REG_ADDR(r7, nr_cpu_ids) conditionally on CONFIG_SMP. However, if 'NR_CPUS == 1', kernel doesn't use the nr_cpu_ids, and in C code, it's just: #if NR_CPUS == 1 #define nr_cpu_ids ... This series makes declaration of nr_cpu_ids conditional on NR_CPUS == 1, and that reveals the issue, because compiler can't link the LOAD_REG_ADDR(r7, nr_cpu_ids) against nonexisting symbol. Current code looks unsafe for those who build kernel with CONFIG_SMP=y and NR_CPUS == 1. This is weird configuration, but not disallowed. Fix the linker error by replacing LOAD_REG_ADDR() with LOAD_REG_IMMEDIATE() conditionally on NR_CPUS == 1. As the following patch adds CONFIG_FORCE_NR_CPUS option that has the similar effect on nr_cpu_ids, make the generic_secondary_common_init() conditional on it too. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Yury Norov <yury.norov@gmail.com>
* | | Merge tag 'kbuild-v6.1' of ↵Linus Torvalds2022-10-101-10/+10
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild updates from Masahiro Yamada: - Remove potentially incomplete targets when Kbuid is interrupted by SIGINT etc in case GNU Make may miss to do that when stderr is piped to another program. - Rewrite the single target build so it works more correctly. - Fix rpm-pkg builds with V=1. - List top-level subdirectories in ./Kbuild. - Ignore auto-generated __kstrtab_* and __kstrtabns_* symbols in kallsyms. - Avoid two different modules in lib/zstd/ having shared code, which potentially causes building the common code as build-in and modular back-and-forth. - Unify two modpost invocations to optimize the build process. - Remove head-y syntax in favor of linker scripts for placing particular sections in the head of vmlinux. - Bump the minimal GNU Make version to 3.82. - Clean up misc Makefiles and scripts. * tag 'kbuild-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (41 commits) docs: bump minimal GNU Make version to 3.82 ia64: simplify esi object addition in Makefile Revert "kbuild: Check if linker supports the -X option" kbuild: rebuild .vmlinux.export.o when its prerequisite is updated kbuild: move modules.builtin(.modinfo) rules to Makefile.vmlinux_o zstd: Fixing mixed module-builtin objects kallsyms: ignore __kstrtab_* and __kstrtabns_* symbols kallsyms: take the input file instead of reading stdin kallsyms: drop duplicated ignore patterns from kallsyms.c kbuild: reuse mksysmap output for kallsyms mksysmap: update comment about __crc_* kbuild: remove head-y syntax kbuild: use obj-y instead extra-y for objects placed at the head kbuild: hide error checker logs for V=1 builds kbuild: re-run modpost when it is updated kbuild: unify two modpost invocations kbuild: move vmlinux.o rule to the top Makefile kbuild: move .vmlinux.objs rule to Makefile.modpost kbuild: list sub-directories in ./Kbuild Makefile.compiler: replace cc-ifversion with compiler-specific macros ...
| * | | kbuild: use obj-y instead extra-y for objects placed at the headMasahiro Yamada2022-10-021-10/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The objects placed at the head of vmlinux need special treatments: - arch/$(SRCARCH)/Makefile adds them to head-y in order to place them before other archives in the linker command line. - arch/$(SRCARCH)/kernel/Makefile adds them to extra-y instead of obj-y to avoid them going into built-in.a. This commit gets rid of the latter. Create vmlinux.a to collect all the objects that are unconditionally linked to vmlinux. The objects listed in head-y are moved to the head of vmlinux.a by using 'ar m'. With this, arch/$(SRCARCH)/kernel/Makefile can consistently use obj-y for builtin objects. There is no *.o that is directly linked to vmlinux. Drop unneeded code in scripts/clang-tools/gen_compile_commands.py. $(AR) mPi needs 'T' to workaround the llvm-ar bug. The fix was suggested by Nathan Chancellor [1]. [1]: https://lore.kernel.org/llvm/YyjjT5gQ2hGMH0ni@dev-arch.thelio-3990X/ Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Tested-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Nicolas Schier <nicolas@fjasle.eu>
* | | | Merge tag 'perf-core-2022-10-07' of ↵Linus Torvalds2022-10-101-13/+40
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf events updates from Ingo Molnar: "PMU driver updates: - Add AMD Last Branch Record Extension Version 2 (LbrExtV2) feature support for Zen 4 processors. - Extend the perf ABI to provide branch speculation information, if available, and use this on CPUs that have it (eg. LbrExtV2). - Improve Intel PEBS TSC timestamp handling & integration. - Add Intel Raptor Lake S CPU support. - Add 'perf mem' and 'perf c2c' memory profiling support on AMD CPUs by utilizing IBS tagged load/store samples. - Clean up & optimize various x86 PMU details. HW breakpoints: - Big rework to optimize the code for systems with hundreds of CPUs and thousands of breakpoints: - Replace the nr_bp_mutex global mutex with the bp_cpuinfo_sem per-CPU rwsem that is read-locked during most of the key operations. - Improve the O(#cpus * #tasks) logic in toggle_bp_slot() and fetch_bp_busy_slots(). - Apply micro-optimizations & cleanups. - Misc cleanups & enhancements" * tag 'perf-core-2022-10-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (75 commits) perf/hw_breakpoint: Annotate tsk->perf_event_mutex vs ctx->mutex perf: Fix pmu_filter_match() perf: Fix lockdep_assert_event_ctx() perf/x86/amd/lbr: Adjust LBR regardless of filtering perf/x86/utils: Fix uninitialized var in get_branch_type() perf/uapi: Define PERF_MEM_SNOOPX_PEER in kernel header file perf/x86/amd: Support PERF_SAMPLE_PHY_ADDR perf/x86/amd: Support PERF_SAMPLE_ADDR perf/x86/amd: Support PERF_SAMPLE_{WEIGHT|WEIGHT_STRUCT} perf/x86/amd: Support PERF_SAMPLE_DATA_SRC perf/x86/amd: Add IBS OP_DATA2 DataSrc bit definitions perf/mem: Introduce PERF_MEM_LVLNUM_{EXTN_MEM|IO} perf/x86/uncore: Add new Raptor Lake S support perf/x86/cstate: Add new Raptor Lake S support perf/x86/msr: Add new Raptor Lake S support perf/x86: Add new Raptor Lake S support bpf: Check flags for branch stack in bpf_read_branch_records helper perf, hw_breakpoint: Fix use-after-free if perf_event_open() fails perf: Use sample_flags for raw_data perf: Use sample_flags for addr ...
| * | | | Merge branch 'v6.0-rc7'Peter Zijlstra2022-09-294-2/+16
| |\| | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge upstream to get RAPTORLAKE_S Signed-off-by: Peter Zijlstra <peterz@infradead.org>
| * | | | powerpc/hw_breakpoint: Avoid relying on caller synchronizationMarco Elver2022-08-301-13/+40
| | |_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Internal data structures (cpu_bps, task_bps) of powerpc's hw_breakpoint implementation have relied on nr_bp_mutex serializing access to them. Before overhauling synchronization of kernel/events/hw_breakpoint.c, introduce 2 spinlocks to synchronize cpu_bps and task_bps respectively, thus avoiding reliance on callers synchronizing powerpc's hw_breakpoint. Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Marco Elver <elver@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Dmitry Vyukov <dvyukov@google.com> Acked-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20220829124719.675715-10-elver@google.com
* | | | Merge tag 'powerpc-6.1-1' of ↵Linus Torvalds2022-10-0973-2830/+2699
|\ \ \ \ | |_|/ / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: - Remove our now never-true definitions for pgd_huge() and p4d_leaf(). - Add pte_needs_flush() and huge_pmd_needs_flush() for 64-bit. - Add support for syscall wrappers. - Add support for KFENCE on 64-bit. - Update 64-bit HV KVM to use the new guest state entry/exit accounting API. - Support execute-only memory when using the Radix MMU (P9 or later). - Implement CONFIG_PARAVIRT_TIME_ACCOUNTING for pseries guests. - Updates to our linker script to move more data into read-only sections. - Allow the VDSO to be randomised on 32-bit. - Many other small features and fixes. Thanks to Andrew Donnellan, Aneesh Kumar K.V, Arnd Bergmann, Athira Rajeev, Christophe Leroy, David Hildenbrand, Disha Goel, Fabiano Rosas, Gaosheng Cui, Gustavo A. R. Silva, Haren Myneni, Hari Bathini, Jilin Yuan, Joel Stanley, Kajol Jain, Kees Cook, Krzysztof Kozlowski, Laurent Dufour, Liang He, Li Huafei, Lukas Bulwahn, Madhavan Srinivasan, Nathan Chancellor, Nathan Lynch, Nicholas Miehlbradt, Nicholas Piggin, Pali Rohár, Rohan McLure, Russell Currey, Sachin Sant, Segher Boessenkool, Shrikanth Hegde, Tyrel Datwyler, Wolfram Sang, ye xingchen, and Zheng Yongjun. * tag 'powerpc-6.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (214 commits) KVM: PPC: Book3S HV: Fix stack frame regs marker powerpc: Don't add __powerpc_ prefix to syscall entry points powerpc/64s/interrupt: Fix stack frame regs marker powerpc/64: Fix msr_check_and_set/clear MSR[EE] race powerpc/64s/interrupt: Change must-hard-mask interrupt check from BUG to WARN powerpc/pseries: Add firmware details to the hardware description powerpc/powernv: Add opal details to the hardware description powerpc: Add device-tree model to the hardware description powerpc/64: Add logical PVR to the hardware description powerpc: Add PVR & CPU name to hardware description powerpc: Add hardware description string powerpc/configs: Enable PPC_UV in powernv_defconfig powerpc/configs: Update config files for removed/renamed symbols powerpc/mm: Fix UBSAN warning reported on hugetlb powerpc/mm: Always update max/min_low_pfn in mem_topology_setup() powerpc/mm/book3s/hash: Rename flush_tlb_pmd_range powerpc: Drops STABS_DEBUG from linker scripts powerpc/64s: Remove lost/old comment powerpc/64s: Remove old STAB comment powerpc: remove orphan systbl_chk.sh ...
| * | | powerpc: Don't add __powerpc_ prefix to syscall entry pointsMichael Ellerman2022-10-072-5/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When using syscall wrappers the __SYSCALL_DEFINEx() and related macros add a "__powerpc_" prefix to all syscall entry points. So for example sys_mmap becomes __powerpc_sys_mmap. This risks breaking workflows and tools that expect the old naming scheme. At a minimum setting a breakpoint on eg. sys_mmap with gdb no longer works. There seems to be no compelling reason to add the "__powerpc_" prefix, other than that it follows what some other arches do (x86, arm64, s390). But unlike other arches powerpc doesn't always enable syscall wrappers, so the syscall entry points can change name depending on CONFIG options. For those reasons drop the "__powerpc_" prefix, reverting to the existing naming. Doing so reveals two prototypes in signal.h that have the incorrect type when syscall wrappers are enabled. There are already prototypes for both functions in syscalls.h, so drop the ones from signal.h. Fixes: 7e92e01b7245 ("powerpc: Provide syscall wrapper") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221006135940.1223988-1-mpe@ellerman.id.au
| * | | powerpc/64s/interrupt: Fix stack frame regs markerNicholas Piggin2022-10-051-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The value of the stack frame regs marker that gets saved on the stack in interrupt entry code does not match the regs marker value, which breaks stack frame marker matching. This stray instruction looks to have been introduced in a mismerge. Fixes: bf75a3258a403 ("powerpc/64s/interrupt: move early boot ILE fixup into a macro") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Mismerge by yours truly -_-] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221004132952.984341-1-npiggin@gmail.com
| * | | powerpc/64: Fix msr_check_and_set/clear MSR[EE] raceNicholas Piggin2022-10-041-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | irq soft-masking means that when Linux irqs are disabled, the MSR[EE] value can change from 1 to 0 asynchronously: if a masked interrupt of the PACA_IRQ_MUST_HARD_MASK variety fires while irqs are disabled, the masked handler will return with MSR[EE]=0. This means a sequence like mtmsr(mfmsr() | MSR_FP) is racy if it can be called with local irqs disabled, unless a hard_irq_disable has been done. Reported-by: Sachin Sant <sachinp@linux.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221004051157.308999-2-npiggin@gmail.com
| * | | powerpc/64s/interrupt: Change must-hard-mask interrupt check from BUG to WARNNicholas Piggin2022-10-041-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This new assertion added is generally harmless and gets fixed up naturally, but it does indicate a problem with MSR manipulation somewhere. Fixes: c39fb71a54f0 ("powerpc/64s/interrupt: masked handler debug check for previous hard disable") Reported-by: Sachin Sant <sachinp@linux.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Tested-by: Sachin Sant <sachinp@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221004051157.308999-1-npiggin@gmail.com
| * | | powerpc: Add device-tree model to the hardware descriptionMichael Ellerman2022-09-301-0/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add the model of the machine we're on to the hardware description, which is printed at boot and in case of an oops. eg: Hardware name: IBM,8247-22L Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220930082709.55830-4-mpe@ellerman.id.au
| * | | powerpc/64: Add logical PVR to the hardware descriptionMichael Ellerman2022-09-301-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we detect a logical PVR add that to the hardware description, which is printed at boot and in case of an oops. eg: Hardware name: ... 0xf000004 Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220930082709.55830-3-mpe@ellerman.id.au
| * | | powerpc: Add PVR & CPU name to hardware descriptionMichael Ellerman2022-09-301-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add the PVR and CPU name to the hardware description, which is printed at boot and in case of an oops. eg: Hardware name: ... POWER8E (raw) 0x4b0201 Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220930082709.55830-2-mpe@ellerman.id.au
| * | | powerpc: Add hardware description stringMichael Ellerman2022-09-301-1/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Create a hardware description string, which we will use to record various details of the hardware platform we are running on. Print the accumulated description at boot, and use it to set the generic description which is printed in oopses. To begin with add ppc_md.name, aka the "machine description". Example output at boot with the full series applied: Linux version 6.0.0-rc2-gcc-11.1.0-00199-g893f9007a5ce-dirty (michael@alpine1-p1) (powerpc64-linux-gcc (GCC) 11.1.0, GNU ld (GNU Binutils) 2.36.1) #844 SMP Thu Sep 29 22:29:53 AEST 2022 Hardware name: IBM pSeries (emulated by qemu) POWER9 (raw) 0x4e1200 0xf000005 of:SLOF,git-5b4c5a pSeries printk: bootconsole [udbg0] enabled Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220930082709.55830-1-mpe@ellerman.id.au
| * | | powerpc: Drops STABS_DEBUG from linker scriptsMichael Ellerman2022-09-303-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | No toolchain we support should be generating stabs debug information anymore. Drop the sections entirely from our linker scripts. We removed all the manual stabs annotations in commit 12318163737c ("powerpc/32: Remove remaining .stabs annotations"). Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220928130951.1732983-1-mpe@ellerman.id.au
| * | | powerpc/64s: Remove lost/old commentMichael Ellerman2022-09-301-10/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The bulk of this was moved/reworded in: 57f266497d81 ("powerpc: Use gas sections for arranging exception vectors") And now appears around line 700 in arch/powerpc/kernel/exceptions-64s.S. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220928130941.1732818-1-mpe@ellerman.id.au
| * | | powerpc/64s: Remove old STAB commentMichael Ellerman2022-09-301-6/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This used to be about the 0x4300 handler, but that was moved in commit da2bc4644c75 ("powerpc/64s: Add new exception vector macros"). Note that "STAB" here refers to "Segment Table" not the debug format. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220928130912.1732466-1-mpe@ellerman.id.au
| * | | powerpc: remove orphan systbl_chk.shNicholas Piggin2022-09-301-30/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | arch/powerpc/kernel/systbl_chk.sh has not been referenced since commit ab66dcc76d6a ("powerpc: generate uapi header and system call table files"). Remove it. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220929032120.3592593-1-npiggin@gmail.com
| * | | powerpc: Ignore DSI error caused by the copy/paste instructionHaren Myneni2022-09-281-10/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The data storage interrupt (DSI) error will be generated when the paste operation is issued on the suspended Nest Accelerator (NX) window due to NX state changes. The hypervisor expects the partition to ignore this error during page fault handling. To differentiate DSI caused by an actual HW configuration or by the NX window, a new “ibm,pi-features” type value is defined. Byte 0, bit 3 of pi-attribute-specifier-type is now defined to indicate this DSI error. If this error is not ignored, the user space can get SIGBUS when the NX request is issued. This patch adds changes to read ibm,pi-features property and ignore DSI error during page fault handling if MMU_FTR_NX_DSI is defined. Signed-off-by: Haren Myneni <haren@linux.ibm.com> [mpe: Mention PAPR version in comment] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/b9cd844b85eb8f70459109ce1b14e44c4cc85fa7.camel@linux.ibm.com
| * | | powerpc/kprobes: Fix null pointer reference in arch_prepare_kprobe()Li Huafei2022-09-281-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I found a null pointer reference in arch_prepare_kprobe(): # echo 'p cmdline_proc_show' > kprobe_events # echo 'p cmdline_proc_show+16' >> kprobe_events Kernel attempted to read user page (0) - exploit attempt? (uid: 0) BUG: Kernel NULL pointer dereference on read at 0x00000000 Faulting instruction address: 0xc000000000050bfc Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA PowerNV Modules linked in: CPU: 0 PID: 122 Comm: sh Not tainted 6.0.0-rc3-00007-gdcf8e5633e2e #10 NIP: c000000000050bfc LR: c000000000050bec CTR: 0000000000005bdc REGS: c0000000348475b0 TRAP: 0300 Not tainted (6.0.0-rc3-00007-gdcf8e5633e2e) MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 88002444 XER: 20040006 CFAR: c00000000022d100 DAR: 0000000000000000 DSISR: 40000000 IRQMASK: 0 ... NIP arch_prepare_kprobe+0x10c/0x2d0 LR arch_prepare_kprobe+0xfc/0x2d0 Call Trace: 0xc0000000012f77a0 (unreliable) register_kprobe+0x3c0/0x7a0 __register_trace_kprobe+0x140/0x1a0 __trace_kprobe_create+0x794/0x1040 trace_probe_create+0xc4/0xe0 create_or_delete_trace_kprobe+0x2c/0x80 trace_parse_run_command+0xf0/0x210 probes_write+0x20/0x40 vfs_write+0xfc/0x450 ksys_write+0x84/0x140 system_call_exception+0x17c/0x3a0 system_call_vectored_common+0xe8/0x278 --- interrupt: 3000 at 0x7fffa5682de0 NIP: 00007fffa5682de0 LR: 0000000000000000 CTR: 0000000000000000 REGS: c000000034847e80 TRAP: 3000 Not tainted (6.0.0-rc3-00007-gdcf8e5633e2e) MSR: 900000000280f033 <SF,HV,VEC,VSX,EE,PR,FP,ME,IR,DR,RI,LE> CR: 44002408 XER: 00000000 The address being probed has some special: cmdline_proc_show: Probe based on ftrace cmdline_proc_show+16: Probe for the next instruction at the ftrace location The ftrace-based kprobe does not generate kprobe::ainsn::insn, it gets set to NULL. In arch_prepare_kprobe() it will check for: ... prev = get_kprobe(p->addr - 1); preempt_enable_no_resched(); if (prev && ppc_inst_prefixed(ppc_inst_read(prev->ainsn.insn))) { ... If prev is based on ftrace, 'ppc_inst_read(prev->ainsn.insn)' will occur with a null pointer reference. At this point prev->addr will not be a prefixed instruction, so the check can be skipped. Check if prev is ftrace-based kprobe before reading 'prev->ainsn.insn' to fix this problem. Fixes: b4657f7650ba ("powerpc/kprobes: Don't allow breakpoints on suffixes") Signed-off-by: Li Huafei <lihuafei1@huawei.com> [mpe: Trim oops] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220923093253.177298-1-lihuafei1@huawei.com
| * | | powerpc/smp: poll cpu_callin_map more aggressively in __cpu_up()Nathan Lynch2022-09-281-16/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | At boot time, it is not necessary to delay between polls of cpu_callin_map when waiting for a kicked CPU to come up. Remove the delay intervals, but preserve the overall deadline (five seconds). At run time, the first poll result is usually negative and we incur a sleeping wait. If we spin on the callin word for a short time first, we can reduce __cpu_up() from dozens of milliseconds to under 1ms in the common case on a P9 LPAR: $ ppc64_cpu --smt=off $ bpftrace -e 'kprobe:__cpu_up { @start[tid] = nsecs; } kretprobe:__cpu_up /@start[tid]/ { @us = hist((nsecs - @start[tid]) / 1000); delete(@start[tid]); }' -c 'ppc64_cpu --smt=on' Before: @us: [16K, 32K) 85 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [32K, 64K) 13 |@@@@@@@ | After: @us: [128, 256) 95 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [256, 512) 3 |@ | Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220926220250.157022-1-nathanl@linux.ibm.com
| * | | powerpc/rtas: block error injection when locked downNathan Lynch2022-09-281-1/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The error injection facility on pseries VMs allows corruption of arbitrary guest memory, potentially enabling a sufficiently privileged user to disable lockdown or perform other modifications of the running kernel via the rtas syscall. Block the PAPR error injection facility from being opened or called when locked down. Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Acked-by: Paul Moore <paul@paul-moore.com> (LSM) Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220926131643.146502-3-nathanl@linux.ibm.com
| * | | powerpc/64s/interrupt: halt early boot interrupts if paca is not set upNicholas Piggin2022-09-283-2/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Ensure r13 is zero from very early in boot until it gets set to the boot paca pointer. This allows early program and mce handlers to halt if there is no valid paca, rather than potentially run off into the weeds. This preserves register and memory contents for low level debugging tools. Nothing could be printed to console at this point in any case because even udbg is only set up after the boot paca is set, so this shouldn't be missed. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220926055620.2676869-6-npiggin@gmail.com
| * | | powerpc/64: don't set boot CPU's r13 to paca until the structure is set upNicholas Piggin2022-09-281-10/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The idea is to get to the point where if r13 is non-zero, then it should contain a reasonable paca. This can be used in early boot program check and machine check handlers to avoid running off into the weeds if they hit before r13 has a paca. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220926055620.2676869-5-npiggin@gmail.com
| * | | powerpc/64: avoid using r13 in relocateNicholas Piggin2022-09-281-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | relocate() uses r13 in early boot before it is used for the paca. Use a different register for this so r13 is kept unchanged until it is set to the paca pointer. Avoid r14 as well while we're here, there's no reason not to use the volatile registers which is a bit less surprising, and r14 could be used as another fixed reg one day. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220926055620.2676869-4-npiggin@gmail.com
| * | | powerpc/64s: early boot machine check handlerNicholas Piggin2022-09-283-1/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use the early boot interrupt fixup in the machine check handler to allow the machine check handler to run before interrupt endian is set up. Branch to an early boot handler that just does a basic crash, which allows it to run before ppc_md is set up. MSR[ME] is enabled on the boot CPU earlier, and the machine check stack is temporarily set to the middle of the init task stack. This allows machine checks (e.g., due to invalid data access in real mode) to print something useful earlier in boot (as soon as udbg is set up, if CONFIG_PPC_EARLY_DEBUG=y). Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220926055620.2676869-3-npiggin@gmail.com
| * | | powerpc/64s/interrupt: move early boot ILE fixup into a macroNicholas Piggin2022-09-281-45/+56
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In preparation for using this sequence in machine check interrupt, move it into a macro, with a small change to make it position independent. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220926055620.2676869-2-npiggin@gmail.com
| * | | powerpc/64e: provide an addressing macro for use with TOC in alternate registerNicholas Piggin2022-09-281-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The interrupt entry code carefully saves a minimal number of registers, so in some places the TOC is required, it is loaded into a different register, so provide a macro that can supply an alternate TOC register. This continues to use got addressing because TOC-relative results in "got/toc optimization is not supported" messages by the linker. Having r2 be one of the saved registers and using that for TOC addressing may be the best way to avoid that and switch this to TOC addressing. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220926034057.2360083-6-npiggin@gmail.com
| * | | powerpc/64: provide a helper macro to load r2 with the kernel TOCNicholas Piggin2022-09-287-21/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A later change stops the kernel using r2 and loads it with a poison value. Provide a PACATOC loading abstraction which can hide this detail. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220926034057.2360083-5-npiggin@gmail.com
| * | | powerpc/64: asm use consistent global variable declaration and accessNicholas Piggin2022-09-283-21/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use helper macros to access global variables, and place them in .data sections rather than in .toc. Putting addresses in TOC is not required because the kernel is linked with a single TOC. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220926034057.2360083-3-npiggin@gmail.com
| * | | powerpc/64: use 32-bit immediate for STACK_FRAME_REGS_MARKERNicholas Piggin2022-09-285-21/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Using a 32-bit constant for this marker allows it to be loaded with two ALU instructions, like 32-bit. This avoids a TOC entry and a TOC load that depends on the r2 value that has just been loaded from the PACA. This changes the value for 32-bit as well, so both have the same value in the low 4 bytes and 64-bit has 0 in the top bytes. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220926034057.2360083-2-npiggin@gmail.com
| * | | powerpc/64/irq: tidy soft-masked irq replay and improve documentationNicholas Piggin2022-09-281-32/+61
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | irq replay is quite complicated because of softirq processing which itself enables and disables irqs. Several considerations need to be accounted for due to this, and they are not clearly documented. Refactor the irq replay code a bit to tidy and deduplicate some common functions. Add comments, debug checks. This has a minor functional change that irq tracing enable/disable is done after each interrupt replayed, rather than after a batch. It also re-sets state to IRQS_ALL_DISABLED after an interrupt, which doesn't matter much because interrupts are hard disabled at this point, but it is more consistent with how interrupt handlers are called. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220926054305.2671436-8-npiggin@gmail.com
| * | | powerpc/64s/interrupt: masked handler debug check for previous hard disableNicholas Piggin2022-09-281-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Prior changes eliminated cases of masked PACA_IRQ_MUST_HARD_MASK interrupts that re-fire due to MSR[EE] being enabled while they are pending. Add a debug check in the masked interrupt handler to catch if this occurs. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220926054305.2671436-6-npiggin@gmail.com
| * | | powerpc/64/interrupt: Fix return to masked context after hard-mask irq ↵Nicholas Piggin2022-09-282-13/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | becomes pending If a synchronous interrupt (e.g., hash fault) is taken inside an irqs-disabled region which has MSR[EE]=1, then an asynchronous interrupt that is PACA_IRQ_MUST_HARD_MASK (e.g., PMI) is taken inside the synchronous interrupt handler, then the synchronous interrupt will return with MSR[EE]=1 and the asynchronous interrupt fires again. If the asynchronous interrupt is a PMI and the original context does not have PMIs disabled (only Linux IRQs), the asynchronous interrupt will fire despite having the PMI marked soft pending. This can confuse the perf code and cause warnings. This patch changes the interrupt return so that irqs-disabled MSR[EE]=1 contexts will be returned to with MSR[EE]=0 if a PACA_IRQ_MUST_HARD_MASK interrupt has become pending in the meantime. The longer explanation for what happens: 1. local_irq_disable() 2. Hash fault interrupt fires, do_hash_fault handler runs 3. interrupt_enter_prepare() sets IRQS_ALL_DISABLED 4. interrupt_enter_prepare() sets MSR[EE]=1 5. PMU interrupt fires, masked handler runs 6. Masked handler marks PMI pending 7. Masked handler returns with PACA_IRQ_HARD_DIS set, MSR[EE]=0 8. do_hash_fault interrupt return handler runs 9. interrupt_exit_kernel_prepare() clears PACA_IRQ_HARD_DIS 10. interrupt returns with MSR[EE]=1 11. PMU interrupt fires, perf handler runs Fixes: 4423eb5ae32e ("powerpc/64/interrupt: make normal synchronous interrupts enable MSR[EE] if possible") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220926054305.2671436-4-npiggin@gmail.com
| * | | powerpc/64: mark irqs hard disabled in boot pacaNicholas Piggin2022-09-281-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This prevents interrupts in early boot (e.g., program check) from enabling MSR[EE], potentially causing endian mismatch or other crashes when reporting early boot traps. Fixes: 4423eb5ae32ec ("powerpc/64/interrupt: make normal synchronous interrupts enable MSR[EE] if possible") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220926054305.2671436-3-npiggin@gmail.com
| * | | powerpc: add ISA v3.0 / v3.1 wait opcode macroNicholas Piggin2022-09-281-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The wait instruction encoding changed between ISA v2.07 and ISA v3.0. In v3.1 the instruction gained a new field. Update the PPC_WAIT macro to the current encoding. Rename the older incompatible one with a _v203 suffix as it was introduced in v2.03 (the WC field was introduced in v2.07 but the kernel only uses WC=0). Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220920122259.363092-1-npiggin@gmail.com
| * | | powerpc/time: avoid programming DEC at the start of the timer interruptNicholas Piggin2022-09-281-14/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Setting DEC to maximum at the start of the timer interrupt is not necessary and can be avoided for performance when MSR[EE] is not enabled during the handler as explained in commit 0faf20a1ad16 ("powerpc/64s/interrupt: Don't enable MSR[EE] in irq handlers unless perf is in use"), where this change was first attempted. The idea is that the timer interrupt runs with MSR[EE]=0, and at the end of the interrupt DEC is programmed to the next timer interval, so there is no need to clear the decrementer exception before then. When the above commit was merged, that was not quite true. The low res timer subsystem had some cases in the oneshot timer code where if the tick was to be stopped and no timers active, the clock device would not get the ->set_state_oneshot_stopped() call, so DEC would not get reprogrammed, and this would hang taking continual timer interrupts. So this was reverted in commit d2b9be1f4af5 ("powerpc/time: Always set decrementer in timer_interrupt()"), which was a partial revert of the above commit. Commit 62c1256d5447 ("timers/nohz: Switch to ONESHOT_STOPPED in the low-res handler when the tick is stopped") was later merged to fix this missing case in the timer subsystem, so now the behaviour can be restored. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220909142457.278032-1-npiggin@gmail.com
| * | | powerpc: Add support for early debugging via Serial 16550 consolePali Rohár2022-09-282-0/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently powerpc early debugging contains lot of platform specific options, but does not support standard UART / serial 16550 console. Later legacy_serial.c code supports registering UART as early debug console from device tree but it is not early during booting, but rather later after machine description code finishes. So for real early debugging via UART is current code unsuitable. Add support for new early debugging option CONFIG_PPC_EARLY_DEBUG_16550 which enable Serial 16550 console on address defined by new option CONFIG_PPC_EARLY_DEBUG_16550_PHYSADDR and by stride by option CONFIG_PPC_EARLY_DEBUG_16550_STRIDE. With this change it is possible to debug powerpc machine descriptor code. For example this early debugging code can print on serial console also "No suitable machine description found" error which is done before legacy_serial.c code. Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220822231501.16827-1-pali@kernel.org