aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* io_uring: fix multishot recv request leaksPavel Begunkov2022-11-171-9/+7
| | | | | | | | | | | | | | Having REQ_F_POLLED set doesn't guarantee that the request is executed as a multishot from the polling path. Fortunately for us, if the code thinks it's multishot issue when it's not, it can only ask to skip completion so leaking the request. Use issue_flags to mark multipoll issues. Cc: stable@vger.kernel.org Fixes: 1300ebb20286b ("io_uring: multishot recv") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/37762040ba9c52b81b92a2f5ebfd4ee484088951.1668710222.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* io_uring: fix multishot accept request leaksPavel Begunkov2022-11-174-8/+8
| | | | | | | | | | | | | | Having REQ_F_POLLED set doesn't guarantee that the request is executed as a multishot from the polling path. Fortunately for us, if the code thinks it's multishot issue when it's not, it can only ask to skip completion so leaking the request. Use issue_flags to mark multipoll issues. Cc: stable@vger.kernel.org Fixes: 390ed29b5e425 ("io_uring: add IORING_ACCEPT_MULTISHOT for accept") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/7700ac57653f2823e30b34dc74da68678c0c5f13.1668710222.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* io_uring: fix tw losing poll eventsPavel Begunkov2022-11-171-0/+7
| | | | | | | | | | | | We may never try to process a poll wake and its mask if there was multiple wake ups racing for queueing up a tw. Force io_poll_check_events() to update the mask by vfs_poll(). Cc: stable@vger.kernel.org Fixes: aa43477b04025 ("io_uring: poll rework") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/00344d60f8b18907171178d7cf598de71d127b0b.1668710222.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* io_uring: update res mask in io_poll_check_eventsPavel Begunkov2022-11-171-0/+3
| | | | | | | | | | | | | | | | When io_poll_check_events() collides with someone attempting to queue a task work, it'll spin for one more time. However, it'll continue to use the mask from the first iteration instead of updating it. For example, if the first wake up was a EPOLLIN and the second EPOLLOUT, the userspace will not get EPOLLOUT in time. Clear the mask for all subsequent iterations to force vfs_poll(). Cc: stable@vger.kernel.org Fixes: aa43477b04025 ("io_uring: poll rework") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/2dac97e8f691231049cb259c4ae57e79e40b537c.1668710222.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* io_uring/poll: lockdep annote io_poll_req_insert_lockedPavel Begunkov2022-11-111-0/+2
| | | | | | | | Add a lockdep annotation in io_poll_req_insert_locked(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/8115d8e702733754d0aea119e9b5bb63d1eb8b24.1668184658.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* io_uring/poll: fix double poll req->flags racesPavel Begunkov2022-11-111-12/+17
| | | | | | | | | | | | | | | | | | | | | | | io_poll_double_prepare() | io_poll_wake() | poll->head = NULL smp_load(&poll->head); /* NULL */ | flags = req->flags; | | req->flags &= ~SINGLE_POLL; req->flags = flags | DOUBLE_POLL | The idea behind io_poll_double_prepare() is to serialise with the first poll entry by taking the wq lock. However, it's not safe to assume that io_poll_wake() is not running when we can't grab the lock and so we may race modifying req->flags. Skip double poll setup if that happens. It's ok because the first poll entry will only be removed when it's definitely completing, e.g. pollfree or oneshot with a valid mask. Fixes: 49f1c68e048f1 ("io_uring: optimise submission side poll_refs") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/b7fab2d502f6121a7d7b199fe4d914a43ca9cdfd.1668184658.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* io_uring: check for rollover of buffer ID when providing buffersJens Axboe2022-11-101-0/+2
| | | | | | | | | | | | | | | | | | We already check if the chosen starting offset for the buffer IDs fit within an unsigned short, as 65535 is the maximum value for a provided buffer. But if the caller asks to add N buffers at offset M, and M + N would exceed the size of the unsigned short, we simply add buffers with wrapping around the ID. This is not necessarily a bug and could in fact be a valid use case, but it seems confusing and inconsistent with the initial check for starting offset. Let's check for wrap consistently, and error the addition if we do need to wrap. Reported-by: Olivier Langlois <olivier@trillion01.com> Link: https://github.com/axboe/liburing/issues/726 Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
* io_uring: calculate CQEs from the user visible valueDylan Yudaken2022-11-081-2/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | io_cqring_wait (and it's wake function io_has_work) used cached_cq_tail in order to calculate the number of CQEs. cached_cq_tail is set strictly before the user visible rings->cq.tail However as far as userspace is concerned, if io_uring_enter(2) is called with a minimum number of events, they will verify by checking rings->cq.tail. It is therefore possible for io_uring_enter(2) to return early with fewer events visible to the user. Instead make the wait functions read from the user visible value, so there will be no discrepency. This is triggered eventually by the following reproducer: struct io_uring_sqe *sqe; struct io_uring_cqe *cqe; unsigned int cqe_ready; struct io_uring ring; int ret, i; ret = io_uring_queue_init(N, &ring, 0); assert(!ret); while(true) { for (i = 0; i < N; i++) { sqe = io_uring_get_sqe(&ring); io_uring_prep_nop(sqe); sqe->flags |= IOSQE_ASYNC; } ret = io_uring_submit(&ring); assert(ret == N); do { ret = io_uring_wait_cqes(&ring, &cqe, N, NULL, NULL); } while(ret == -EINTR); cqe_ready = io_uring_cq_ready(&ring); assert(!ret); assert(cqe_ready == N); io_uring_cq_advance(&ring, N); } Fixes: ad3eb2c89fb2 ("io_uring: split overflow state into SQ and CQ side") Signed-off-by: Dylan Yudaken <dylany@meta.com> Link: https://lore.kernel.org/r/20221108153016.1854297-1-dylany@meta.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* io_uring: fix typo in io_uring.h commentJens Axboe2022-11-061-1/+1
| | | | | | | | Just a basic s/thig/this swap, fixing up a typo introduced by a commit added in the 6.1 release. Fixes: 9cda70f622cd ("io_uring: introduce fixed buffer support for io_uring_cmd") Signed-off-by: Jens Axboe <axboe@kernel.dk>
* selftests/net: don't tests batched TCP io_uring zcPavel Begunkov2022-11-021-1/+1
| | | | | | | | | | | It doesn't make sense batch submitting io_uring requests to a single TCP socket without linking or some other kind of ordering. Moreover, it causes spurious -EINTR fails due to interaction with task_work. Disable it for now and keep queue depth=1. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/b547698d5938b1b1a898af1c260188d8546ded9a.1666700897.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* io_uring: unlock if __io_run_local_work locked insideDylan Yudaken2022-10-272-6/+15
| | | | | | | | | | | | | It is possible for tw to lock the ring, and this was not propogated out to io_run_local_work. This can cause an unlock to be missed. Instead pass a pointer to locked into __io_run_local_work. Fixes: 8ac5d85a89b4 ("io_uring: add local task_work run helper that is entered locked") Signed-off-by: Dylan Yudaken <dylany@meta.com> Link: https://lore.kernel.org/r/20221027144429.3971400-3-dylany@meta.com [axboe: WARN_ON() -> WARN_ON_ONCE() and add a minor comment] Signed-off-by: Jens Axboe <axboe@kernel.dk>
* io_uring: use io_run_local_work_locked helperDylan Yudaken2022-10-271-2/+1
| | | | | | | | prefer to use io_run_local_work_locked helper for consistency Signed-off-by: Dylan Yudaken <dylany@meta.com> Link: https://lore.kernel.org/r/20221027144429.3971400-2-dylany@meta.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* io_uring/net: fail zc sendmsg when unsupported by socketPavel Begunkov2022-10-221-0/+2
| | | | | | | | | The previous patch fails zerocopy send requests for protocols that don't support it, do the same for zerocopy sendmsg. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/0854e7bb4c3d810a48ec8b5853e2f61af36a0467.1666346426.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* io_uring/net: fail zc send when unsupported by socketPavel Begunkov2022-10-221-0/+2
| | | | | | | | | | | If a protocol doesn't support zerocopy it will silently fall back to copying. This type of behaviour has always been a source of troubles so it's better to fail such requests instead. Cc: <stable@vger.kernel.org> # 6.0 Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/2db3c7f16bb6efab4b04569cd16e6242b40c5cb3.1666346426.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* net: flag sockets supporting msghdr originated zerocopyPavel Begunkov2022-10-223-0/+3
| | | | | | | | | | | | We need an efficient way in io_uring to check whether a socket supports zerocopy with msghdr provided ubuf_info. Add a new flag into the struct socket flags fields. Cc: <stable@vger.kernel.org> # 6.0 Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/3dafafab822b1c66308bb58a0ac738b1e3f53f74.1666346426.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* io-wq: Fix memory leak in worker creationRafael Mendonca2022-10-201-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the CPU mask allocation for a node fails, then the memory allocated for the 'io_wqe' struct of the current node doesn't get freed on the error handling path, since it has not yet been added to the 'wqes' array. This was spotted when fuzzing v6.1-rc1 with Syzkaller: BUG: memory leak unreferenced object 0xffff8880093d5000 (size 1024): comm "syz-executor.2", pid 7701, jiffies 4295048595 (age 13.900s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<00000000cb463369>] __kmem_cache_alloc_node+0x18e/0x720 [<00000000147a3f9c>] kmalloc_node_trace+0x2a/0x130 [<000000004e107011>] io_wq_create+0x7b9/0xdc0 [<00000000c38b2018>] io_uring_alloc_task_context+0x31e/0x59d [<00000000867399da>] __io_uring_add_tctx_node.cold+0x19/0x1ba [<000000007e0e7a79>] io_uring_setup.cold+0x1b80/0x1dce [<00000000b545e9f6>] __x64_sys_io_uring_setup+0x5d/0x80 [<000000008a8a7508>] do_syscall_64+0x5d/0x90 [<000000004ac08bec>] entry_SYSCALL_64_after_hwframe+0x63/0xcd Fixes: 0e03496d1967 ("io-wq: use private CPU mask") Cc: stable@vger.kernel.org Signed-off-by: Rafael Mendonca <rafaelmendsr@gmail.com> Link: https://lore.kernel.org/r/20221020014710.902201-1-rafaelmendsr@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* io_uring/msg_ring: Fix NULL pointer dereference in io_msg_send_fd()Harshit Mogalapalli2022-10-191-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Syzkaller produced the below call trace: BUG: KASAN: null-ptr-deref in io_msg_ring+0x3cb/0x9f0 Write of size 8 at addr 0000000000000070 by task repro/16399 CPU: 0 PID: 16399 Comm: repro Not tainted 6.1.0-rc1 #28 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 Call Trace: <TASK> dump_stack_lvl+0xcd/0x134 ? io_msg_ring+0x3cb/0x9f0 kasan_report+0xbc/0xf0 ? io_msg_ring+0x3cb/0x9f0 kasan_check_range+0x140/0x190 io_msg_ring+0x3cb/0x9f0 ? io_msg_ring_prep+0x300/0x300 io_issue_sqe+0x698/0xca0 io_submit_sqes+0x92f/0x1c30 __do_sys_io_uring_enter+0xae4/0x24b0 .... RIP: 0033:0x7f2eaf8f8289 RSP: 002b:00007fff40939718 EFLAGS: 00000246 ORIG_RAX: 00000000000001aa RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f2eaf8f8289 RDX: 0000000000000000 RSI: 0000000000006f71 RDI: 0000000000000004 RBP: 00007fff409397a0 R08: 0000000000000000 R09: 0000000000000039 R10: 0000000000000000 R11: 0000000000000246 R12: 00000000004006d0 R13: 00007fff40939880 R14: 0000000000000000 R15: 0000000000000000 </TASK> Kernel panic - not syncing: panic_on_warn set ... We don't have a NULL check on file_ptr in io_msg_send_fd() function, so when file_ptr is NUL src_file is also NULL and get_file() dereferences a NULL pointer and leads to above crash. Add a NULL check to fix this issue. Fixes: e6130eba8a84 ("io_uring: add support for passing fixed file descriptors") Reported-by: syzkaller <syzkaller@googlegroups.com> Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com> Link: https://lore.kernel.org/r/20221019171218.1337614-1-harshit.m.mogalapalli@oracle.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* io_uring/rw: remove leftover debug statementJens Axboe2022-10-161-2/+0
| | | | | | | | | This debug statement was never meant to go into the upstream release, kill it off before it ends up in a release. It was just part of the testing for the initial version of the patch. Fixes: 2ec33a6c3cca ("io_uring/rw: ensure kiocb_end_write() is always called") Signed-off-by: Jens Axboe <axboe@kernel.dk>
* io_uring: don't iopoll from io_ring_ctx_wait_and_kill()Pavel Begunkov2022-10-161-8/+5
| | | | | | | | | | | | We should not be completing requests from a task context that has already undergone io_uring cancellations, i.e. __io_uring_cancel(), as there are some assumptions, e.g. around cached task refs draining. Remove iopolling from io_ring_ctx_wait_and_kill() as it can be called later after PF_EXITING is set with the last task_work run. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/7c03cc91455c4a1af49c6b9cbda4e57ea467aa11.1665891182.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* io_uring: reuse io_alloc_req()Pavel Begunkov2022-10-161-6/+2
| | | | | | | | | Don't duplicate io_alloc_req() in io_req_caches_free() but reuse the helper. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/6005fc88274864a49fc3096c22d8bdd605cf8576.1665891182.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* io_uring: kill hot path fixed file bitmap debug checksPavel Begunkov2022-10-162-1/+1
| | | | | | | | | | | We test file_table.bitmap in io_file_get_fixed() to check invariants, don't do it, it's expensive and was showing up in profiles. No reports of this triggering has come in. Move the check to the file clear instead, which will still catch any wrong usage. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/cf77f2ded68d2e5b2bc7355784d969837d48e023.1665891182.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* io_uring: remove FFS_SCMPavel Begunkov2022-10-164-25/+3
| | | | | | | | | | | THe lifetime of SCM'ed files is bound to ring_sock, which is destroyed strictly after we're done with registered file tables. This means there is no need for the FFS_SCM hack, which was not available on 32-bit builds anyway. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/984226a1045adf42dc35d8bd7fb5a8bbfa472ce1.1665891182.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* Linux 6.1-rc1Linus Torvalds2022-10-161-2/+2
|
* Merge tag 'random-6.1-rc1-for-linus' of ↵Linus Torvalds2022-10-16185-421/+378
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/crng/random Pull more random number generator updates from Jason Donenfeld: "This time with some large scale treewide cleanups. The intent of this pull is to clean up the way callers fetch random integers. The current rules for doing this right are: - If you want a secure or an insecure random u64, use get_random_u64() - If you want a secure or an insecure random u32, use get_random_u32() The old function prandom_u32() has been deprecated for a while now and is just a wrapper around get_random_u32(). Same for get_random_int(). - If you want a secure or an insecure random u16, use get_random_u16() - If you want a secure or an insecure random u8, use get_random_u8() - If you want secure or insecure random bytes, use get_random_bytes(). The old function prandom_bytes() has been deprecated for a while now and has long been a wrapper around get_random_bytes() - If you want a non-uniform random u32, u16, or u8 bounded by a certain open interval maximum, use prandom_u32_max() I say "non-uniform", because it doesn't do any rejection sampling or divisions. Hence, it stays within the prandom_*() namespace, not the get_random_*() namespace. I'm currently investigating a "uniform" function for 6.2. We'll see what comes of that. By applying these rules uniformly, we get several benefits: - By using prandom_u32_max() with an upper-bound that the compiler can prove at compile-time is ≤65536 or ≤256, internally get_random_u16() or get_random_u8() is used, which wastes fewer batched random bytes, and hence has higher throughput. - By using prandom_u32_max() instead of %, when the upper-bound is not a constant, division is still avoided, because prandom_u32_max() uses a faster multiplication-based trick instead. - By using get_random_u16() or get_random_u8() in cases where the return value is intended to indeed be a u16 or a u8, we waste fewer batched random bytes, and hence have higher throughput. This series was originally done by hand while I was on an airplane without Internet. Later, Kees and I worked on retroactively figuring out what could be done with Coccinelle and what had to be done manually, and then we split things up based on that. So while this touches a lot of files, the actual amount of code that's hand fiddled is comfortably small" * tag 'random-6.1-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random: prandom: remove unused functions treewide: use get_random_bytes() when possible treewide: use get_random_u32() when possible treewide: use get_random_{u8,u16}() when possible, part 2 treewide: use get_random_{u8,u16}() when possible, part 1 treewide: use prandom_u32_max() when possible, part 2 treewide: use prandom_u32_max() when possible, part 1
| * prandom: remove unused functionsJason A. Donenfeld2022-10-113-23/+5
| | | | | | | | | | | | | | | | | | | | | | | | With no callers left of prandom_u32() and prandom_bytes(), as well as get_random_int(), remove these deprecated wrappers, in favor of get_random_u32() and get_random_bytes(). Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Yury Norov <yury.norov@gmail.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
| * treewide: use get_random_bytes() when possibleJason A. Donenfeld2022-10-1119-24/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The prandom_bytes() function has been a deprecated inline wrapper around get_random_bytes() for several releases now, and compiles down to the exact same code. Replace the deprecated wrapper with a direct call to the real function. This was done as a basic find and replace. Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Yury Norov <yury.norov@gmail.com> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> # powerpc Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
| * treewide: use get_random_u32() when possibleJason A. Donenfeld2022-10-1171-100/+100
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The prandom_u32() function has been a deprecated inline wrapper around get_random_u32() for several releases now, and compiles down to the exact same code. Replace the deprecated wrapper with a direct call to the real function. The same also applies to get_random_int(), which is just a wrapper around get_random_u32(). This was done as a basic find and replace. Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Yury Norov <yury.norov@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> # for ext4 Acked-by: Toke Høiland-Jørgensen <toke@toke.dk> # for sch_cake Acked-by: Chuck Lever <chuck.lever@oracle.com> # for nfsd Acked-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com> # for thunderbolt Acked-by: Darrick J. Wong <djwong@kernel.org> # for xfs Acked-by: Helge Deller <deller@gmx.de> # for parisc Acked-by: Heiko Carstens <hca@linux.ibm.com> # for s390 Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
| * treewide: use get_random_{u8,u16}() when possible, part 2Jason A. Donenfeld2022-10-116-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rather than truncate a 32-bit value to a 16-bit value or an 8-bit value, simply use the get_random_{u8,u16}() functions, which are faster than wasting the additional bytes from a 32-bit value. This was done by hand, identifying all of the places where one of the random integer functions was used in a non-32-bit context. Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Yury Norov <yury.norov@gmail.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Heiko Carstens <hca@linux.ibm.com> # for s390 Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
| * treewide: use get_random_{u8,u16}() when possible, part 1Jason A. Donenfeld2022-10-1121-34/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rather than truncate a 32-bit value to a 16-bit value or an 8-bit value, simply use the get_random_{u8,u16}() functions, which are faster than wasting the additional bytes from a 32-bit value. This was done mechanically with this coccinelle script: @@ expression E; identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32"; typedef u16; typedef __be16; typedef __le16; typedef u8; @@ ( - (get_random_u32() & 0xffff) + get_random_u16() | - (get_random_u32() & 0xff) + get_random_u8() | - (get_random_u32() % 65536) + get_random_u16() | - (get_random_u32() % 256) + get_random_u8() | - (get_random_u32() >> 16) + get_random_u16() | - (get_random_u32() >> 24) + get_random_u8() | - (u16)get_random_u32() + get_random_u16() | - (u8)get_random_u32() + get_random_u8() | - (__be16)get_random_u32() + (__be16)get_random_u16() | - (__le16)get_random_u32() + (__le16)get_random_u16() | - prandom_u32_max(65536) + get_random_u16() | - prandom_u32_max(256) + get_random_u8() | - E->inet_id = get_random_u32() + E->inet_id = get_random_u16() ) @@ identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32"; typedef u16; identifier v; @@ - u16 v = get_random_u32(); + u16 v = get_random_u16(); @@ identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32"; typedef u8; identifier v; @@ - u8 v = get_random_u32(); + u8 v = get_random_u8(); @@ identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32"; typedef u16; u16 v; @@ - v = get_random_u32(); + v = get_random_u16(); @@ identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32"; typedef u8; u8 v; @@ - v = get_random_u32(); + v = get_random_u8(); // Find a potential literal @literal_mask@ expression LITERAL; type T; identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32"; position p; @@ ((T)get_random_u32()@p & (LITERAL)) // Examine limits @script:python add_one@ literal << literal_mask.LITERAL; RESULT; @@ value = None if literal.startswith('0x'): value = int(literal, 16) elif literal[0] in '123456789': value = int(literal, 10) if value is None: print("I don't know how to handle %s" % (literal)) cocci.include_match(False) elif value < 256: coccinelle.RESULT = cocci.make_ident("get_random_u8") elif value < 65536: coccinelle.RESULT = cocci.make_ident("get_random_u16") else: print("Skipping large mask of %s" % (literal)) cocci.include_match(False) // Replace the literal mask with the calculated result. @plus_one@ expression literal_mask.LITERAL; position literal_mask.p; identifier add_one.RESULT; identifier FUNC; @@ - (FUNC()@p & (LITERAL)) + (RESULT() & LITERAL) Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Yury Norov <yury.norov@gmail.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Toke Høiland-Jørgensen <toke@toke.dk> # for sch_cake Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
| * treewide: use prandom_u32_max() when possible, part 2Jason A. Donenfeld2022-10-114-19/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Rather than incurring a division or requesting too many random bytes for the given range, use the prandom_u32_max() function, which only takes the minimum required bytes from the RNG and avoids divisions. This was done by hand, covering things that coccinelle could not do on its own. Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Yury Norov <yury.norov@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> # for ext2, ext4, and sbitmap Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
| * treewide: use prandom_u32_max() when possible, part 1Jason A. Donenfeld2022-10-1189-218/+204
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rather than incurring a division or requesting too many random bytes for the given range, use the prandom_u32_max() function, which only takes the minimum required bytes from the RNG and avoids divisions. This was done mechanically with this coccinelle script: @basic@ expression E; type T; identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32"; typedef u64; @@ ( - ((T)get_random_u32() % (E)) + prandom_u32_max(E) | - ((T)get_random_u32() & ((E) - 1)) + prandom_u32_max(E * XXX_MAKE_SURE_E_IS_POW2) | - ((u64)(E) * get_random_u32() >> 32) + prandom_u32_max(E) | - ((T)get_random_u32() & ~PAGE_MASK) + prandom_u32_max(PAGE_SIZE) ) @multi_line@ identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32"; identifier RAND; expression E; @@ - RAND = get_random_u32(); ... when != RAND - RAND %= (E); + RAND = prandom_u32_max(E); // Find a potential literal @literal_mask@ expression LITERAL; type T; identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32"; position p; @@ ((T)get_random_u32()@p & (LITERAL)) // Add one to the literal. @script:python add_one@ literal << literal_mask.LITERAL; RESULT; @@ value = None if literal.startswith('0x'): value = int(literal, 16) elif literal[0] in '123456789': value = int(literal, 10) if value is None: print("I don't know how to handle %s" % (literal)) cocci.include_match(False) elif value == 2**32 - 1 or value == 2**31 - 1 or value == 2**24 - 1 or value == 2**16 - 1 or value == 2**8 - 1: print("Skipping 0x%x for cleanup elsewhere" % (value)) cocci.include_match(False) elif value & (value + 1) != 0: print("Skipping 0x%x because it's not a power of two minus one" % (value)) cocci.include_match(False) elif literal.startswith('0x'): coccinelle.RESULT = cocci.make_expr("0x%x" % (value + 1)) else: coccinelle.RESULT = cocci.make_expr("%d" % (value + 1)) // Replace the literal mask with the calculated result. @plus_one@ expression literal_mask.LITERAL; position literal_mask.p; expression add_one.RESULT; identifier FUNC; @@ - (FUNC()@p & (LITERAL)) + prandom_u32_max(RESULT) @collapse_ret@ type T; identifier VAR; expression E; @@ { - T VAR; - VAR = (E); - return VAR; + return E; } @drop_var@ type T; identifier VAR; @@ { - T VAR; ... when != VAR } Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Yury Norov <yury.norov@gmail.com> Reviewed-by: KP Singh <kpsingh@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> # for ext4 and sbitmap Reviewed-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> # for drbd Acked-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Heiko Carstens <hca@linux.ibm.com> # for s390 Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # for mmc Acked-by: Darrick J. Wong <djwong@kernel.org> # for xfs Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
* | Merge tag 'perf-tools-for-v6.1-2-2022-10-16' of ↵Linus Torvalds2022-10-1636-71/+1265
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux Pull more perf tools updates from Arnaldo Carvalho de Melo: - Use BPF CO-RE (Compile Once, Run Everywhere) to support old kernels when using bperf (perf BPF based counters) with cgroups. - Support HiSilicon PCIe Performance Monitoring Unit (PMU), that monitors bandwidth, latency, bus utilization and buffer occupancy. Documented in Documentation/admin-guide/perf/hisi-pcie-pmu.rst. - User space tasks can migrate between CPUs, so when tracing selected CPUs, system-wide sideband is still needed, fix it in the setup of Intel PT on hybrid systems. - Fix metricgroups title message in 'perf list', it should state that the metrics groups are to be used with the '-M' option, not '-e'. - Sync the msr-index.h copy with the kernel sources, adding support for using "AMD64_TSC_RATIO" in filter expressions in 'perf trace' as well as decoding it when printing the MSR tracepoint arguments. - Fix program header size and alignment when generating a JIT ELF in 'perf inject'. - Add multiple new Intel PT 'perf test' entries, including a jitdump one. - Fix the 'perf test' entries for 'perf stat' CSV and JSON output when running on PowerPC due to an invalid topology number in that arch. - Fix the 'perf test' for arm_coresight failures on the ARM Juno system. - Fix the 'perf test' attr entry for PERF_FORMAT_LOST, adding this option to the or expression expected in the intercepted perf_event_open() syscall. - Add missing condition flags ('hs', 'lo', 'vc', 'vs') for arm64 in the 'perf annotate' asm parser. - Fix 'perf mem record -C' option processing, it was being chopped up when preparing the underlying 'perf record -e mem-events' and thus being ignored, requiring using '-- -C CPUs' as a workaround. - Improvements and tidy ups for 'perf test' shell infra. - Fix Intel PT information printing segfault in uClibc, where a NULL format was being passed to fprintf. * tag 'perf-tools-for-v6.1-2-2022-10-16' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (23 commits) tools arch x86: Sync the msr-index.h copy with the kernel sources perf auxtrace arm64: Add support for parsing HiSilicon PCIe Trace packet perf auxtrace arm64: Add support for HiSilicon PCIe Tune and Trace device driver perf auxtrace arm: Refactor event list iteration in auxtrace_record__init() perf tests stat+json_output: Include sanity check for topology perf tests stat+csv_output: Include sanity check for topology perf intel-pt: Fix system_wide dummy event for hybrid perf intel-pt: Fix segfault in intel_pt_print_info() with uClibc perf test: Fix attr tests for PERF_FORMAT_LOST perf test: test_intel_pt.sh: Add 9 tests perf inject: Fix GEN_ELF_TEXT_OFFSET for jit perf test: test_intel_pt.sh: Add jitdump test perf test: test_intel_pt.sh: Tidy some alignment perf test: test_intel_pt.sh: Print a message when skipping kernel tracing perf test: test_intel_pt.sh: Tidy some perf record options perf test: test_intel_pt.sh: Fix return checking again perf: Skip and warn on unknown format 'configN' attrs perf list: Fix metricgroups title message perf mem: Fix -C option behavior for perf mem record perf annotate: Add missing condition flags for arm64 ...
| * | tools arch x86: Sync the msr-index.h copy with the kernel sourcesArnaldo Carvalho de Melo2022-10-151-0/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To pick up the changes in: b8d1d163604bd1e6 ("x86/apic: Don't disable x2APIC if locked") ca5b7c0d9621702e ("perf/x86/amd/lbr: Add LbrExtV2 branch record support") Addressing these tools/perf build warnings: diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.h Warning: Kernel ABI header at 'tools/arch/x86/include/asm/msr-index.h' differs from latest version at 'arch/x86/include/asm/msr-index.h' That makes the beautification scripts to pick some new entries: $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > before $ cp arch/x86/include/asm/msr-index.h tools/arch/x86/include/asm/msr-index.h $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > after $ diff -u before after --- before 2022-10-14 18:06:34.294561729 -0300 +++ after 2022-10-14 18:06:41.285744044 -0300 @@ -264,6 +264,7 @@ [0xc0000102 - x86_64_specific_MSRs_offset] = "KERNEL_GS_BASE", [0xc0000103 - x86_64_specific_MSRs_offset] = "TSC_AUX", [0xc0000104 - x86_64_specific_MSRs_offset] = "AMD64_TSC_RATIO", + [0xc000010e - x86_64_specific_MSRs_offset] = "AMD64_LBR_SELECT", [0xc000010f - x86_64_specific_MSRs_offset] = "AMD_DBG_EXTN_CFG", [0xc0000300 - x86_64_specific_MSRs_offset] = "AMD64_PERF_CNTR_GLOBAL_STATUS", [0xc0000301 - x86_64_specific_MSRs_offset] = "AMD64_PERF_CNTR_GLOBAL_CTL", $ Now one can trace systemwide asking to see backtraces to where that MSR is being read/written, see this example with a previous update: # perf trace -e msr:*_msr/max-stack=32/ --filter="msr>=IA32_U_CET && msr<=IA32_INT_SSP_TAB" ^C# If we use -v (verbose mode) we can see what it does behind the scenes: # perf trace -v -e msr:*_msr/max-stack=32/ --filter="msr>=IA32_U_CET && msr<=IA32_INT_SSP_TAB" Using CPUID AuthenticAMD-25-21-0 0x6a0 0x6a8 New filter for msr:read_msr: (msr>=0x6a0 && msr<=0x6a8) && (common_pid != 597499 && common_pid != 3313) 0x6a0 0x6a8 New filter for msr:write_msr: (msr>=0x6a0 && msr<=0x6a8) && (common_pid != 597499 && common_pid != 3313) mmap size 528384B ^C# Example with a frequent msr: # perf trace -v -e msr:*_msr/max-stack=32/ --filter="msr==IA32_SPEC_CTRL" --max-events 2 Using CPUID AuthenticAMD-25-21-0 0x48 New filter for msr:read_msr: (msr==0x48) && (common_pid != 2612129 && common_pid != 3841) 0x48 New filter for msr:write_msr: (msr==0x48) && (common_pid != 2612129 && common_pid != 3841) mmap size 528384B Looking at the vmlinux_path (8 entries long) symsrc__init: build id mismatch for vmlinux. Using /proc/kcore for kernel data Using /proc/kallsyms for symbols 0.000 Timer/2525383 msr:write_msr(msr: IA32_SPEC_CTRL, val: 6) do_trace_write_msr ([kernel.kallsyms]) do_trace_write_msr ([kernel.kallsyms]) __switch_to_xtra ([kernel.kallsyms]) __switch_to ([kernel.kallsyms]) __schedule ([kernel.kallsyms]) schedule ([kernel.kallsyms]) futex_wait_queue_me ([kernel.kallsyms]) futex_wait ([kernel.kallsyms]) do_futex ([kernel.kallsyms]) __x64_sys_futex ([kernel.kallsyms]) do_syscall_64 ([kernel.kallsyms]) entry_SYSCALL_64_after_hwframe ([kernel.kallsyms]) __futex_abstimed_wait_common64 (/usr/lib64/libpthread-2.33.so) 0.030 :0/0 msr:write_msr(msr: IA32_SPEC_CTRL, val: 2) do_trace_write_msr ([kernel.kallsyms]) do_trace_write_msr ([kernel.kallsyms]) __switch_to_xtra ([kernel.kallsyms]) __switch_to ([kernel.kallsyms]) __schedule ([kernel.kallsyms]) schedule_idle ([kernel.kallsyms]) do_idle ([kernel.kallsyms]) cpu_startup_entry ([kernel.kallsyms]) secondary_startup_64_no_verify ([kernel.kallsyms]) # Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Daniel Sneddon <daniel.sneddon@linux.intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sandipan Das <sandipan.das@amd.com> Link: https://lore.kernel.org/lkml/Y0nQkz2TUJxwfXJd@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | perf auxtrace arm64: Add support for parsing HiSilicon PCIe Trace packetQi Liu2022-10-157-0/+396
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add support for using 'perf report --dump-raw-trace' to parse PTT packet. Example usage: Output will contain raw PTT data and its textual representation, such as (8DW format): 0 0 0x5810 [0x30]: PERF_RECORD_AUXTRACE size: 0x400000 offset: 0 ref: 0xa5d50c725 idx: 0 tid: -1 cpu: 0 . . ... HISI PTT data: size 4194304 bytes . 00000000: 00 00 00 00 Prefix . 00000004: 08 20 00 60 Header DW0 . 00000008: ff 02 00 01 Header DW1 . 0000000c: 20 08 00 00 Header DW2 . 00000010: 10 e7 44 ab Header DW3 . 00000014: 2a a8 1e 01 Time . 00000020: 00 00 00 00 Prefix . 00000024: 01 00 00 60 Header DW0 . 00000028: 0f 1e 00 01 Header DW1 . 0000002c: 04 00 00 00 Header DW2 . 00000030: 40 00 81 02 Header DW3 . 00000034: ee 02 00 00 Time .... This patch only add basic parsing support according to the definition of the PTT packet described in Documentation/trace/hisi-ptt.rst. And the fields of each packet can be further decoded following the PCIe Spec's definition of TLP packet. Signed-off-by: Qi Liu <liuqi115@huawei.com> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Bjorn Helgaas <helgaas@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: John Garry <john.garry@huawei.com> Cc: Jonathan Cameron <jonathan.cameron@huawei.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Qi Liu <liuqi6124@gmail.com> Cc: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com> Cc: Shaokun Zhang <zhangshaokun@hisilicon.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Zeng Prime <prime.zeng@huawei.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-pci@vger.kernel.org Cc: linuxarm@huawei.com Link: https://lore.kernel.org/r/20220927081400.14364-4-yangyicong@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | perf auxtrace arm64: Add support for HiSilicon PCIe Tune and Trace device driverQi Liu2022-10-157-1/+273
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | HiSilicon PCIe tune and trace device (PTT) could dynamically tune the PCIe link's events, and trace the TLP headers). This patch add support for PTT device in perf tool, so users could use 'perf record' to get TLP headers trace data. Reviewed-by: Leo Yan <leo.yan@linaro.org> Signed-off-by: Qi Liu <liuqi115@huawei.com> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Acked-by: John Garry <john.garry@huawei.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Bjorn Helgaas <helgaas@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jonathan Cameron <jonathan.cameron@huawei.com> Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Qi Liu <liuqi6124@gmail.com> Cc: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com> Cc: Shaokun Zhang <zhangshaokun@hisilicon.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Zeng Prime <prime.zeng@huawei.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-pci@vger.kernel.org Cc: linuxarm@huawei.com Link: https://lore.kernel.org/r/20220927081400.14364-3-yangyicong@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | perf auxtrace arm: Refactor event list iteration in auxtrace_record__init()Qi Liu2022-10-151-19/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add find_pmu_for_event() and use to simplify logic in auxtrace_record_init(). find_pmu_for_event() will be reused in subsequent patches. Reviewed-by: John Garry <john.garry@huawei.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Leo Yan <leo.yan@linaro.org> Signed-off-by: Qi Liu <liuqi115@huawei.com> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Bjorn Helgaas <helgaas@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Qi Liu <liuqi6124@gmail.com> Cc: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com> Cc: Shaokun Zhang <zhangshaokun@hisilicon.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Zeng Prime <prime.zeng@huawei.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-pci@vger.kernel.org Cc: linuxarm@huawei.com Link: https://lore.kernel.org/r/20220927081400.14364-2-yangyicong@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | perf tests stat+json_output: Include sanity check for topologyAthira Rajeev2022-10-151-4/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Testcase stat+json_output.sh fails in powerpc: 86: perf stat JSON output linter : FAILED! The testcase "stat+json_output.sh" verifies perf stat JSON output. The test covers aggregation modes like per-socket, per-core, per-die, -A (no_aggr mode) along with few other tests. It counts expected fields for various commands. For example say -A (i.e, AGGR_NONE mode), expects 7 fields in the output having "CPU" as first field. Same way, for per-socket, it expects the first field in result to point to socket id. The testcases compares the result with expected count. The values for socket, die, core and cpu are fetched from topology directory: /sys/devices/system/cpu/cpu*/topology. For example, socket value is fetched from "physical_package_id" file of topology directory. (cpu__get_topology_int() in util/cpumap.c) If a platform fails to fetch the topology information, values will be set to -1. For example, incase of pSeries platform of powerpc, value for "physical_package_id" is restricted and not exposed. So, -1 will be assigned. Perf code has a checks for valid cpu id in "aggr_printout" (stat-display.c), which displays the fields. So, in cases where topology values not exposed, first field of the output displaying will be empty. This cause the testcase to fail, as it counts number of fields in the output. Incase of -A (AGGR_NONE mode,), testcase expects 7 fields in the output, becos of -1 value obtained from topology files for some, only 6 fields are printed. Hence a testcase failure reported due to mismatch in number of fields in the output. Patch here adds a sanity check in the testcase for topology. Check will help to skip the test if -1 value found. Fixes: 0c343af2a2f82844 ("perf test: JSON format checking") Reported-by: Disha Goel <disgoel@linux.vnet.ibm.com> Suggested-by: Ian Rogers <irogers@google.com> Suggested-by: James Clark <james.clark@arm.com> Signed-off-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Claire Jensen <cjense@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: linuxppc-dev@lists.ozlabs.org Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nageswara R Sastry <rnsastry@linux.ibm.com> Link: https://lore.kernel.org/r/20221006155149.67205-2-atrajeev@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | perf tests stat+csv_output: Include sanity check for topologyAthira Rajeev2022-10-151-4/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Testcase stat+csv_output.sh fails in powerpc: 84: perf stat CSV output linter: FAILED! The testcase "stat+csv_output.sh" verifies perf stat CSV output. The test covers aggregation modes like per-socket, per-core, per-die, -A (no_aggr mode) along with few other tests. It counts expected fields for various commands. For example say -A (i.e, AGGR_NONE mode), expects 7 fields in the output having "CPU" as first field. Same way, for per-socket, it expects the first field in result to point to socket id. The testcases compares the result with expected count. The values for socket, die, core and cpu are fetched from topology directory: /sys/devices/system/cpu/cpu*/topology. For example, socket value is fetched from "physical_package_id" file of topology directory. (cpu__get_topology_int() in util/cpumap.c) If a platform fails to fetch the topology information, values will be set to -1. For example, incase of pSeries platform of powerpc, value for "physical_package_id" is restricted and not exposed. So, -1 will be assigned. Perf code has a checks for valid cpu id in "aggr_printout" (stat-display.c), which displays the fields. So, in cases where topology values not exposed, first field of the output displaying will be empty. This cause the testcase to fail, as it counts number of fields in the output. Incase of -A (AGGR_NONE mode,), testcase expects 7 fields in the output, becos of -1 value obtained from topology files for some, only 6 fields are printed. Hence a testcase failure reported due to mismatch in number of fields in the output. Patch here adds a sanity check in the testcase for topology. Check will help to skip the test if -1 value found. Fixes: 7473ee56dbc91c98 ("perf test: Add checking for perf stat CSV output.") Reported-by: Disha Goel <disgoel@linux.vnet.ibm.com> Suggested-by: Ian Rogers <irogers@google.com> Suggested-by: James Clark <james.clark@arm.com> Signed-off-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Claire Jensen <cjense@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: linuxppc-dev@lists.ozlabs.org Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nageswara R Sastry <rnsastry@linux.ibm.com> Link: https://lore.kernel.org/r/20221006155149.67205-1-atrajeev@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | perf intel-pt: Fix system_wide dummy event for hybridAdrian Hunter2022-10-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | User space tasks can migrate between CPUs, so when tracing selected CPUs, system-wide sideband is still needed, however evlist->core.has_user_cpus is not set in the hybrid case, so check the target cpu_list instead. Fixes: 7d189cadbeebc778 ("perf intel-pt: Track sideband system-wide when needed") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20221012082259.22394-3-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | perf intel-pt: Fix segfault in intel_pt_print_info() with uClibcAdrian Hunter2022-10-151-2/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | uClibc segfaulted because NULL was passed as the format to fprintf(). That happened because one of the format strings was missing and intel_pt_print_info() didn't check that before calling fprintf(). Add the missing format string, and check format is not NULL before calling fprintf(). Fixes: 11fa7cb86b56d361 ("perf tools: Pass Intel PT information for decoding MTC and CYC") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20221012082259.22394-2-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | perf test: Fix attr tests for PERF_FORMAT_LOSTJames Clark2022-10-156-11/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since PERF_FORMAT_LOST was added, the default read format has that bit set, so add it to the tests. Keep the old value as well so that the test still passes on older kernels. This fixes the following failure: expected read_format=0|4, got 20 FAILED './tests/attr/test-record-C0' - match failure Fixes: 85b425f31c8866e0 ("perf record: Set PERF_FORMAT_LOST by default") Signed-off-by: James Clark <james.clark@arm.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20221012094633.21669-2-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | perf test: test_intel_pt.sh: Add 9 testsAmmy Yi2022-10-151-1/+194
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add tests: Test with MTC and TSC disabled Test with branches disabled Test with/without CYC Test recording with sample mode Test with kernel trace Test virtual LBR Test power events Test with TNT packets disabled Test with event_trace These tests mostly check that perf record works with the corresponding Intel PT config terms, sometimes also checking that certain packets do or do not appear in the resulting trace as appropriate. The "Test virtual LBR" is slightly trickier, using a Python script to check that branch stacks are actually synthesized. Signed-off-by: Ammy Yi <ammy.yi@intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20221014170905.64069-8-adrian.hunter@intel.com Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | perf inject: Fix GEN_ELF_TEXT_OFFSET for jitAdrian Hunter2022-10-151-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a program header was added, it moved the text section but GEN_ELF_TEXT_OFFSET was not updated. Fix by adding the program header size and aligning. Fixes: babd04386b1df8c3 ("perf jit: Include program header in ELF files") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Lieven Hey <lieven.hey@kdab.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20221014170905.64069-7-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | perf test: test_intel_pt.sh: Add jitdump testAdrian Hunter2022-10-151-0/+162
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a test for decoding self-modifying code using a jitdump file. The test creates a workload that uses self-modifying code and generates its own jitdump file. The result is processed with perf inject --jit and checked for decoding errors. Note the test will fail without patch "perf inject: Fix GEN_ELF_TEXT_OFFSET for jit" applied. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20221014170905.64069-6-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | perf test: test_intel_pt.sh: Tidy some alignmentAdrian Hunter2022-10-151-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Tidy alignment of test function lines to make them more readable. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20221014170905.64069-5-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | perf test: test_intel_pt.sh: Print a message when skipping kernel tracingAdrian Hunter2022-10-151-1/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Messages display with the perf test -v option. Add a message to show when skipping a test because the user cannot do kernel tracing. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20221014170905.64069-4-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | perf test: test_intel_pt.sh: Tidy some perf record optionsAdrian Hunter2022-10-151-4/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When not decoding, the options "-B -N --no-bpf-event" speed up perf record. Make a common function for them. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20221014170905.64069-3-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | perf test: test_intel_pt.sh: Fix return checking againAdrian Hunter2022-10-151-4/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | count_result() does not always reset ret=0 which means the value can spill into the next test result. Fix by explicitly setting it to zero between tests. Committer testing: # perf test "Miscellaneous Intel PT testing" 110: Miscellaneous Intel PT testing : Ok # Tested as well with: # perf test -v "Miscellaneous Intel PT testing" Fixes: fd9b45e39cfaf885 ("perf test: test_intel_pt.sh: Fix return checking") Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20221014170905.64069-2-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | perf: Skip and warn on unknown format 'configN' attrsRob Herring2022-10-145-13/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the kernel exposes a new perf_event_attr field in a format attr, perf will return an error stating the specified PMU can't be found. For example, a format attr with 'config3:0-63' causes an error as config3 is unknown to perf. This causes a compatibility issue between a newer kernel with older perf tool. Before this change with a kernel adding 'config3' I get: $ perf record -e arm_spe// -- true event syntax error: 'arm_spe//' \___ Cannot find PMU `arm_spe'. Missing kernel support? Run 'perf list' for a list of valid events Usage: perf record [<options>] [<command>] or: perf record [<options>] -- <command> [<options>] -e, --event <event> event selector. use 'perf list' to list available events After this change, I get: $ perf record -e arm_spe// -- true WARNING: 'arm_spe_0' format 'inv_event_filter' requires 'perf_event_attr::config3' which is not supported by this version of perf! [ perf record: Woken up 2 times to write data ] [ perf record: Captured and wrote 0.091 MB perf.data ] To support unknown configN formats, rework the YACC implementation to pass any config[0-9]+ format to perf_pmu__new_format() to handle with a warning. Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Rob Herring <robh@kernel.org> Tested-by: Leo Yan <leo.yan@linaro.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20220914-arm-perf-tool-spe1-2-v2-v4-1-83c098e6212e@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | perf list: Fix metricgroups title messageAndi Kleen2022-10-141-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | $ perf list metricgroups gives List of pre-defined events (to be used in -e): Metric Groups: Backend Bad BadSpec But that's incorrect of course because metric groups or metrics can only be specified with -M. So fix the message to say -e or -M Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20221004192634.998984-1-ak@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>