| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Having REQ_F_POLLED set doesn't guarantee that the request is
executed as a multishot from the polling path. Fortunately for us, if
the code thinks it's multishot issue when it's not, it can only ask to
skip completion so leaking the request. Use issue_flags to mark
multipoll issues.
Cc: stable@vger.kernel.org
Fixes: 1300ebb20286b ("io_uring: multishot recv")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/37762040ba9c52b81b92a2f5ebfd4ee484088951.1668710222.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Having REQ_F_POLLED set doesn't guarantee that the request is
executed as a multishot from the polling path. Fortunately for us, if
the code thinks it's multishot issue when it's not, it can only ask to
skip completion so leaking the request. Use issue_flags to mark
multipoll issues.
Cc: stable@vger.kernel.org
Fixes: 390ed29b5e425 ("io_uring: add IORING_ACCEPT_MULTISHOT for accept")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/7700ac57653f2823e30b34dc74da68678c0c5f13.1668710222.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
|
|
|
|
|
|
|
|
|
|
| |
We may never try to process a poll wake and its mask if there was
multiple wake ups racing for queueing up a tw. Force
io_poll_check_events() to update the mask by vfs_poll().
Cc: stable@vger.kernel.org
Fixes: aa43477b04025 ("io_uring: poll rework")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/00344d60f8b18907171178d7cf598de71d127b0b.1668710222.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When io_poll_check_events() collides with someone attempting to queue a
task work, it'll spin for one more time. However, it'll continue to use
the mask from the first iteration instead of updating it. For example,
if the first wake up was a EPOLLIN and the second EPOLLOUT, the
userspace will not get EPOLLOUT in time.
Clear the mask for all subsequent iterations to force vfs_poll().
Cc: stable@vger.kernel.org
Fixes: aa43477b04025 ("io_uring: poll rework")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/2dac97e8f691231049cb259c4ae57e79e40b537c.1668710222.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
|
|
|
|
|
|
| |
Add a lockdep annotation in io_poll_req_insert_locked().
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/8115d8e702733754d0aea119e9b5bb63d1eb8b24.1668184658.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
io_poll_double_prepare() | io_poll_wake()
| poll->head = NULL
smp_load(&poll->head); /* NULL */ |
flags = req->flags; |
| req->flags &= ~SINGLE_POLL;
req->flags = flags | DOUBLE_POLL |
The idea behind io_poll_double_prepare() is to serialise with the
first poll entry by taking the wq lock. However, it's not safe to assume
that io_poll_wake() is not running when we can't grab the lock and so we
may race modifying req->flags.
Skip double poll setup if that happens. It's ok because the first poll
entry will only be removed when it's definitely completing, e.g.
pollfree or oneshot with a valid mask.
Fixes: 49f1c68e048f1 ("io_uring: optimise submission side poll_refs")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/b7fab2d502f6121a7d7b199fe4d914a43ca9cdfd.1668184658.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We already check if the chosen starting offset for the buffer IDs fit
within an unsigned short, as 65535 is the maximum value for a provided
buffer. But if the caller asks to add N buffers at offset M, and M + N
would exceed the size of the unsigned short, we simply add buffers with
wrapping around the ID.
This is not necessarily a bug and could in fact be a valid use case, but
it seems confusing and inconsistent with the initial check for starting
offset. Let's check for wrap consistently, and error the addition if we
do need to wrap.
Reported-by: Olivier Langlois <olivier@trillion01.com>
Link: https://github.com/axboe/liburing/issues/726
Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
io_cqring_wait (and it's wake function io_has_work) used cached_cq_tail in
order to calculate the number of CQEs. cached_cq_tail is set strictly
before the user visible rings->cq.tail
However as far as userspace is concerned, if io_uring_enter(2) is called
with a minimum number of events, they will verify by checking
rings->cq.tail.
It is therefore possible for io_uring_enter(2) to return early with fewer
events visible to the user.
Instead make the wait functions read from the user visible value, so there
will be no discrepency.
This is triggered eventually by the following reproducer:
struct io_uring_sqe *sqe;
struct io_uring_cqe *cqe;
unsigned int cqe_ready;
struct io_uring ring;
int ret, i;
ret = io_uring_queue_init(N, &ring, 0);
assert(!ret);
while(true) {
for (i = 0; i < N; i++) {
sqe = io_uring_get_sqe(&ring);
io_uring_prep_nop(sqe);
sqe->flags |= IOSQE_ASYNC;
}
ret = io_uring_submit(&ring);
assert(ret == N);
do {
ret = io_uring_wait_cqes(&ring, &cqe, N, NULL, NULL);
} while(ret == -EINTR);
cqe_ready = io_uring_cq_ready(&ring);
assert(!ret);
assert(cqe_ready == N);
io_uring_cq_advance(&ring, N);
}
Fixes: ad3eb2c89fb2 ("io_uring: split overflow state into SQ and CQ side")
Signed-off-by: Dylan Yudaken <dylany@meta.com>
Link: https://lore.kernel.org/r/20221108153016.1854297-1-dylany@meta.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
|
|
|
|
|
|
| |
Just a basic s/thig/this swap, fixing up a typo introduced by a commit
added in the 6.1 release.
Fixes: 9cda70f622cd ("io_uring: introduce fixed buffer support for io_uring_cmd")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
|
|
|
|
|
|
|
|
|
| |
It doesn't make sense batch submitting io_uring requests to a single TCP
socket without linking or some other kind of ordering. Moreover, it
causes spurious -EINTR fails due to interaction with task_work. Disable
it for now and keep queue depth=1.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/b547698d5938b1b1a898af1c260188d8546ded9a.1666700897.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It is possible for tw to lock the ring, and this was not propogated out to
io_run_local_work. This can cause an unlock to be missed.
Instead pass a pointer to locked into __io_run_local_work.
Fixes: 8ac5d85a89b4 ("io_uring: add local task_work run helper that is entered locked")
Signed-off-by: Dylan Yudaken <dylany@meta.com>
Link: https://lore.kernel.org/r/20221027144429.3971400-3-dylany@meta.com
[axboe: WARN_ON() -> WARN_ON_ONCE() and add a minor comment]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
|
|
|
|
|
|
| |
prefer to use io_run_local_work_locked helper for consistency
Signed-off-by: Dylan Yudaken <dylany@meta.com>
Link: https://lore.kernel.org/r/20221027144429.3971400-2-dylany@meta.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
|
|
|
|
|
|
|
| |
The previous patch fails zerocopy send requests for protocols that don't
support it, do the same for zerocopy sendmsg.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/0854e7bb4c3d810a48ec8b5853e2f61af36a0467.1666346426.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
|
|
|
|
|
|
|
|
|
| |
If a protocol doesn't support zerocopy it will silently fall back to
copying. This type of behaviour has always been a source of troubles
so it's better to fail such requests instead.
Cc: <stable@vger.kernel.org> # 6.0
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/2db3c7f16bb6efab4b04569cd16e6242b40c5cb3.1666346426.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
|
|
|
|
|
|
|
|
|
|
| |
We need an efficient way in io_uring to check whether a socket supports
zerocopy with msghdr provided ubuf_info. Add a new flag into the struct
socket flags fields.
Cc: <stable@vger.kernel.org> # 6.0
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/3dafafab822b1c66308bb58a0ac738b1e3f53f74.1666346426.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If the CPU mask allocation for a node fails, then the memory allocated for
the 'io_wqe' struct of the current node doesn't get freed on the error
handling path, since it has not yet been added to the 'wqes' array.
This was spotted when fuzzing v6.1-rc1 with Syzkaller:
BUG: memory leak
unreferenced object 0xffff8880093d5000 (size 1024):
comm "syz-executor.2", pid 7701, jiffies 4295048595 (age 13.900s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<00000000cb463369>] __kmem_cache_alloc_node+0x18e/0x720
[<00000000147a3f9c>] kmalloc_node_trace+0x2a/0x130
[<000000004e107011>] io_wq_create+0x7b9/0xdc0
[<00000000c38b2018>] io_uring_alloc_task_context+0x31e/0x59d
[<00000000867399da>] __io_uring_add_tctx_node.cold+0x19/0x1ba
[<000000007e0e7a79>] io_uring_setup.cold+0x1b80/0x1dce
[<00000000b545e9f6>] __x64_sys_io_uring_setup+0x5d/0x80
[<000000008a8a7508>] do_syscall_64+0x5d/0x90
[<000000004ac08bec>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
Fixes: 0e03496d1967 ("io-wq: use private CPU mask")
Cc: stable@vger.kernel.org
Signed-off-by: Rafael Mendonca <rafaelmendsr@gmail.com>
Link: https://lore.kernel.org/r/20221020014710.902201-1-rafaelmendsr@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Syzkaller produced the below call trace:
BUG: KASAN: null-ptr-deref in io_msg_ring+0x3cb/0x9f0
Write of size 8 at addr 0000000000000070 by task repro/16399
CPU: 0 PID: 16399 Comm: repro Not tainted 6.1.0-rc1 #28
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7
Call Trace:
<TASK>
dump_stack_lvl+0xcd/0x134
? io_msg_ring+0x3cb/0x9f0
kasan_report+0xbc/0xf0
? io_msg_ring+0x3cb/0x9f0
kasan_check_range+0x140/0x190
io_msg_ring+0x3cb/0x9f0
? io_msg_ring_prep+0x300/0x300
io_issue_sqe+0x698/0xca0
io_submit_sqes+0x92f/0x1c30
__do_sys_io_uring_enter+0xae4/0x24b0
....
RIP: 0033:0x7f2eaf8f8289
RSP: 002b:00007fff40939718 EFLAGS: 00000246 ORIG_RAX: 00000000000001aa
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f2eaf8f8289
RDX: 0000000000000000 RSI: 0000000000006f71 RDI: 0000000000000004
RBP: 00007fff409397a0 R08: 0000000000000000 R09: 0000000000000039
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000004006d0
R13: 00007fff40939880 R14: 0000000000000000 R15: 0000000000000000
</TASK>
Kernel panic - not syncing: panic_on_warn set ...
We don't have a NULL check on file_ptr in io_msg_send_fd() function,
so when file_ptr is NUL src_file is also NULL and get_file()
dereferences a NULL pointer and leads to above crash.
Add a NULL check to fix this issue.
Fixes: e6130eba8a84 ("io_uring: add support for passing fixed file descriptors")
Reported-by: syzkaller <syzkaller@googlegroups.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
Link: https://lore.kernel.org/r/20221019171218.1337614-1-harshit.m.mogalapalli@oracle.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
|
|
|
|
|
|
|
| |
This debug statement was never meant to go into the upstream release,
kill it off before it ends up in a release. It was just part of the
testing for the initial version of the patch.
Fixes: 2ec33a6c3cca ("io_uring/rw: ensure kiocb_end_write() is always called")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
|
|
|
|
|
|
|
|
|
|
| |
We should not be completing requests from a task context that has already
undergone io_uring cancellations, i.e. __io_uring_cancel(), as there are
some assumptions, e.g. around cached task refs draining. Remove
iopolling from io_ring_ctx_wait_and_kill() as it can be called later
after PF_EXITING is set with the last task_work run.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/7c03cc91455c4a1af49c6b9cbda4e57ea467aa11.1665891182.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
|
|
|
|
|
|
|
| |
Don't duplicate io_alloc_req() in io_req_caches_free() but reuse the
helper.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/6005fc88274864a49fc3096c22d8bdd605cf8576.1665891182.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
|
|
|
|
|
|
|
|
|
| |
We test file_table.bitmap in io_file_get_fixed() to check invariants,
don't do it, it's expensive and was showing up in profiles. No reports of
this triggering has come in. Move the check to the file clear instead,
which will still catch any wrong usage.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/cf77f2ded68d2e5b2bc7355784d969837d48e023.1665891182.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
|
|
|
|
|
|
|
|
|
| |
THe lifetime of SCM'ed files is bound to ring_sock, which is destroyed
strictly after we're done with registered file tables. This means there
is no need for the FFS_SCM hack, which was not available on 32-bit builds
anyway.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/984226a1045adf42dc35d8bd7fb5a8bbfa472ce1.1665891182.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
| |
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
git://git.kernel.org/pub/scm/linux/kernel/git/crng/random
Pull more random number generator updates from Jason Donenfeld:
"This time with some large scale treewide cleanups.
The intent of this pull is to clean up the way callers fetch random
integers. The current rules for doing this right are:
- If you want a secure or an insecure random u64, use get_random_u64()
- If you want a secure or an insecure random u32, use get_random_u32()
The old function prandom_u32() has been deprecated for a while
now and is just a wrapper around get_random_u32(). Same for
get_random_int().
- If you want a secure or an insecure random u16, use get_random_u16()
- If you want a secure or an insecure random u8, use get_random_u8()
- If you want secure or insecure random bytes, use get_random_bytes().
The old function prandom_bytes() has been deprecated for a while
now and has long been a wrapper around get_random_bytes()
- If you want a non-uniform random u32, u16, or u8 bounded by a
certain open interval maximum, use prandom_u32_max()
I say "non-uniform", because it doesn't do any rejection sampling
or divisions. Hence, it stays within the prandom_*() namespace, not
the get_random_*() namespace.
I'm currently investigating a "uniform" function for 6.2. We'll see
what comes of that.
By applying these rules uniformly, we get several benefits:
- By using prandom_u32_max() with an upper-bound that the compiler
can prove at compile-time is ≤65536 or ≤256, internally
get_random_u16() or get_random_u8() is used, which wastes fewer
batched random bytes, and hence has higher throughput.
- By using prandom_u32_max() instead of %, when the upper-bound is
not a constant, division is still avoided, because
prandom_u32_max() uses a faster multiplication-based trick instead.
- By using get_random_u16() or get_random_u8() in cases where the
return value is intended to indeed be a u16 or a u8, we waste fewer
batched random bytes, and hence have higher throughput.
This series was originally done by hand while I was on an airplane
without Internet. Later, Kees and I worked on retroactively figuring
out what could be done with Coccinelle and what had to be done
manually, and then we split things up based on that.
So while this touches a lot of files, the actual amount of code that's
hand fiddled is comfortably small"
* tag 'random-6.1-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random:
prandom: remove unused functions
treewide: use get_random_bytes() when possible
treewide: use get_random_u32() when possible
treewide: use get_random_{u8,u16}() when possible, part 2
treewide: use get_random_{u8,u16}() when possible, part 1
treewide: use prandom_u32_max() when possible, part 2
treewide: use prandom_u32_max() when possible, part 1
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
With no callers left of prandom_u32() and prandom_bytes(), as well as
get_random_int(), remove these deprecated wrappers, in favor of
get_random_u32() and get_random_bytes().
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Yury Norov <yury.norov@gmail.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The prandom_bytes() function has been a deprecated inline wrapper around
get_random_bytes() for several releases now, and compiles down to the
exact same code. Replace the deprecated wrapper with a direct call to
the real function. This was done as a basic find and replace.
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Yury Norov <yury.norov@gmail.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> # powerpc
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The prandom_u32() function has been a deprecated inline wrapper around
get_random_u32() for several releases now, and compiles down to the
exact same code. Replace the deprecated wrapper with a direct call to
the real function. The same also applies to get_random_int(), which is
just a wrapper around get_random_u32(). This was done as a basic find
and replace.
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Yury Norov <yury.norov@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz> # for ext4
Acked-by: Toke Høiland-Jørgensen <toke@toke.dk> # for sch_cake
Acked-by: Chuck Lever <chuck.lever@oracle.com> # for nfsd
Acked-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com> # for thunderbolt
Acked-by: Darrick J. Wong <djwong@kernel.org> # for xfs
Acked-by: Helge Deller <deller@gmx.de> # for parisc
Acked-by: Heiko Carstens <hca@linux.ibm.com> # for s390
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Rather than truncate a 32-bit value to a 16-bit value or an 8-bit value,
simply use the get_random_{u8,u16}() functions, which are faster than
wasting the additional bytes from a 32-bit value. This was done by hand,
identifying all of the places where one of the random integer functions
was used in a non-32-bit context.
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Yury Norov <yury.norov@gmail.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Heiko Carstens <hca@linux.ibm.com> # for s390
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Rather than truncate a 32-bit value to a 16-bit value or an 8-bit value,
simply use the get_random_{u8,u16}() functions, which are faster than
wasting the additional bytes from a 32-bit value. This was done
mechanically with this coccinelle script:
@@
expression E;
identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
typedef u16;
typedef __be16;
typedef __le16;
typedef u8;
@@
(
- (get_random_u32() & 0xffff)
+ get_random_u16()
|
- (get_random_u32() & 0xff)
+ get_random_u8()
|
- (get_random_u32() % 65536)
+ get_random_u16()
|
- (get_random_u32() % 256)
+ get_random_u8()
|
- (get_random_u32() >> 16)
+ get_random_u16()
|
- (get_random_u32() >> 24)
+ get_random_u8()
|
- (u16)get_random_u32()
+ get_random_u16()
|
- (u8)get_random_u32()
+ get_random_u8()
|
- (__be16)get_random_u32()
+ (__be16)get_random_u16()
|
- (__le16)get_random_u32()
+ (__le16)get_random_u16()
|
- prandom_u32_max(65536)
+ get_random_u16()
|
- prandom_u32_max(256)
+ get_random_u8()
|
- E->inet_id = get_random_u32()
+ E->inet_id = get_random_u16()
)
@@
identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
typedef u16;
identifier v;
@@
- u16 v = get_random_u32();
+ u16 v = get_random_u16();
@@
identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
typedef u8;
identifier v;
@@
- u8 v = get_random_u32();
+ u8 v = get_random_u8();
@@
identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
typedef u16;
u16 v;
@@
- v = get_random_u32();
+ v = get_random_u16();
@@
identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
typedef u8;
u8 v;
@@
- v = get_random_u32();
+ v = get_random_u8();
// Find a potential literal
@literal_mask@
expression LITERAL;
type T;
identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
position p;
@@
((T)get_random_u32()@p & (LITERAL))
// Examine limits
@script:python add_one@
literal << literal_mask.LITERAL;
RESULT;
@@
value = None
if literal.startswith('0x'):
value = int(literal, 16)
elif literal[0] in '123456789':
value = int(literal, 10)
if value is None:
print("I don't know how to handle %s" % (literal))
cocci.include_match(False)
elif value < 256:
coccinelle.RESULT = cocci.make_ident("get_random_u8")
elif value < 65536:
coccinelle.RESULT = cocci.make_ident("get_random_u16")
else:
print("Skipping large mask of %s" % (literal))
cocci.include_match(False)
// Replace the literal mask with the calculated result.
@plus_one@
expression literal_mask.LITERAL;
position literal_mask.p;
identifier add_one.RESULT;
identifier FUNC;
@@
- (FUNC()@p & (LITERAL))
+ (RESULT() & LITERAL)
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Yury Norov <yury.norov@gmail.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Toke Høiland-Jørgensen <toke@toke.dk> # for sch_cake
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Rather than incurring a division or requesting too many random bytes for
the given range, use the prandom_u32_max() function, which only takes
the minimum required bytes from the RNG and avoids divisions. This was
done by hand, covering things that coccinelle could not do on its own.
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Yury Norov <yury.norov@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz> # for ext2, ext4, and sbitmap
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Rather than incurring a division or requesting too many random bytes for
the given range, use the prandom_u32_max() function, which only takes
the minimum required bytes from the RNG and avoids divisions. This was
done mechanically with this coccinelle script:
@basic@
expression E;
type T;
identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
typedef u64;
@@
(
- ((T)get_random_u32() % (E))
+ prandom_u32_max(E)
|
- ((T)get_random_u32() & ((E) - 1))
+ prandom_u32_max(E * XXX_MAKE_SURE_E_IS_POW2)
|
- ((u64)(E) * get_random_u32() >> 32)
+ prandom_u32_max(E)
|
- ((T)get_random_u32() & ~PAGE_MASK)
+ prandom_u32_max(PAGE_SIZE)
)
@multi_line@
identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
identifier RAND;
expression E;
@@
- RAND = get_random_u32();
... when != RAND
- RAND %= (E);
+ RAND = prandom_u32_max(E);
// Find a potential literal
@literal_mask@
expression LITERAL;
type T;
identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
position p;
@@
((T)get_random_u32()@p & (LITERAL))
// Add one to the literal.
@script:python add_one@
literal << literal_mask.LITERAL;
RESULT;
@@
value = None
if literal.startswith('0x'):
value = int(literal, 16)
elif literal[0] in '123456789':
value = int(literal, 10)
if value is None:
print("I don't know how to handle %s" % (literal))
cocci.include_match(False)
elif value == 2**32 - 1 or value == 2**31 - 1 or value == 2**24 - 1 or value == 2**16 - 1 or value == 2**8 - 1:
print("Skipping 0x%x for cleanup elsewhere" % (value))
cocci.include_match(False)
elif value & (value + 1) != 0:
print("Skipping 0x%x because it's not a power of two minus one" % (value))
cocci.include_match(False)
elif literal.startswith('0x'):
coccinelle.RESULT = cocci.make_expr("0x%x" % (value + 1))
else:
coccinelle.RESULT = cocci.make_expr("%d" % (value + 1))
// Replace the literal mask with the calculated result.
@plus_one@
expression literal_mask.LITERAL;
position literal_mask.p;
expression add_one.RESULT;
identifier FUNC;
@@
- (FUNC()@p & (LITERAL))
+ prandom_u32_max(RESULT)
@collapse_ret@
type T;
identifier VAR;
expression E;
@@
{
- T VAR;
- VAR = (E);
- return VAR;
+ return E;
}
@drop_var@
type T;
identifier VAR;
@@
{
- T VAR;
... when != VAR
}
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Yury Norov <yury.norov@gmail.com>
Reviewed-by: KP Singh <kpsingh@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz> # for ext4 and sbitmap
Reviewed-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> # for drbd
Acked-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Heiko Carstens <hca@linux.ibm.com> # for s390
Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # for mmc
Acked-by: Darrick J. Wong <djwong@kernel.org> # for xfs
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|\ \
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
Pull more perf tools updates from Arnaldo Carvalho de Melo:
- Use BPF CO-RE (Compile Once, Run Everywhere) to support old kernels
when using bperf (perf BPF based counters) with cgroups.
- Support HiSilicon PCIe Performance Monitoring Unit (PMU), that
monitors bandwidth, latency, bus utilization and buffer occupancy.
Documented in Documentation/admin-guide/perf/hisi-pcie-pmu.rst.
- User space tasks can migrate between CPUs, so when tracing selected
CPUs, system-wide sideband is still needed, fix it in the setup of
Intel PT on hybrid systems.
- Fix metricgroups title message in 'perf list', it should state that
the metrics groups are to be used with the '-M' option, not '-e'.
- Sync the msr-index.h copy with the kernel sources, adding support for
using "AMD64_TSC_RATIO" in filter expressions in 'perf trace' as well
as decoding it when printing the MSR tracepoint arguments.
- Fix program header size and alignment when generating a JIT ELF in
'perf inject'.
- Add multiple new Intel PT 'perf test' entries, including a jitdump
one.
- Fix the 'perf test' entries for 'perf stat' CSV and JSON output when
running on PowerPC due to an invalid topology number in that arch.
- Fix the 'perf test' for arm_coresight failures on the ARM Juno
system.
- Fix the 'perf test' attr entry for PERF_FORMAT_LOST, adding this
option to the or expression expected in the intercepted
perf_event_open() syscall.
- Add missing condition flags ('hs', 'lo', 'vc', 'vs') for arm64 in the
'perf annotate' asm parser.
- Fix 'perf mem record -C' option processing, it was being chopped up
when preparing the underlying 'perf record -e mem-events' and thus
being ignored, requiring using '-- -C CPUs' as a workaround.
- Improvements and tidy ups for 'perf test' shell infra.
- Fix Intel PT information printing segfault in uClibc, where a NULL
format was being passed to fprintf.
* tag 'perf-tools-for-v6.1-2-2022-10-16' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (23 commits)
tools arch x86: Sync the msr-index.h copy with the kernel sources
perf auxtrace arm64: Add support for parsing HiSilicon PCIe Trace packet
perf auxtrace arm64: Add support for HiSilicon PCIe Tune and Trace device driver
perf auxtrace arm: Refactor event list iteration in auxtrace_record__init()
perf tests stat+json_output: Include sanity check for topology
perf tests stat+csv_output: Include sanity check for topology
perf intel-pt: Fix system_wide dummy event for hybrid
perf intel-pt: Fix segfault in intel_pt_print_info() with uClibc
perf test: Fix attr tests for PERF_FORMAT_LOST
perf test: test_intel_pt.sh: Add 9 tests
perf inject: Fix GEN_ELF_TEXT_OFFSET for jit
perf test: test_intel_pt.sh: Add jitdump test
perf test: test_intel_pt.sh: Tidy some alignment
perf test: test_intel_pt.sh: Print a message when skipping kernel tracing
perf test: test_intel_pt.sh: Tidy some perf record options
perf test: test_intel_pt.sh: Fix return checking again
perf: Skip and warn on unknown format 'configN' attrs
perf list: Fix metricgroups title message
perf mem: Fix -C option behavior for perf mem record
perf annotate: Add missing condition flags for arm64
...
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
To pick up the changes in:
b8d1d163604bd1e6 ("x86/apic: Don't disable x2APIC if locked")
ca5b7c0d9621702e ("perf/x86/amd/lbr: Add LbrExtV2 branch record support")
Addressing these tools/perf build warnings:
diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.h
Warning: Kernel ABI header at 'tools/arch/x86/include/asm/msr-index.h' differs from latest version at 'arch/x86/include/asm/msr-index.h'
That makes the beautification scripts to pick some new entries:
$ tools/perf/trace/beauty/tracepoints/x86_msr.sh > before
$ cp arch/x86/include/asm/msr-index.h tools/arch/x86/include/asm/msr-index.h
$ tools/perf/trace/beauty/tracepoints/x86_msr.sh > after
$ diff -u before after
--- before 2022-10-14 18:06:34.294561729 -0300
+++ after 2022-10-14 18:06:41.285744044 -0300
@@ -264,6 +264,7 @@
[0xc0000102 - x86_64_specific_MSRs_offset] = "KERNEL_GS_BASE",
[0xc0000103 - x86_64_specific_MSRs_offset] = "TSC_AUX",
[0xc0000104 - x86_64_specific_MSRs_offset] = "AMD64_TSC_RATIO",
+ [0xc000010e - x86_64_specific_MSRs_offset] = "AMD64_LBR_SELECT",
[0xc000010f - x86_64_specific_MSRs_offset] = "AMD_DBG_EXTN_CFG",
[0xc0000300 - x86_64_specific_MSRs_offset] = "AMD64_PERF_CNTR_GLOBAL_STATUS",
[0xc0000301 - x86_64_specific_MSRs_offset] = "AMD64_PERF_CNTR_GLOBAL_CTL",
$
Now one can trace systemwide asking to see backtraces to where that MSR
is being read/written, see this example with a previous update:
# perf trace -e msr:*_msr/max-stack=32/ --filter="msr>=IA32_U_CET && msr<=IA32_INT_SSP_TAB"
^C#
If we use -v (verbose mode) we can see what it does behind the scenes:
# perf trace -v -e msr:*_msr/max-stack=32/ --filter="msr>=IA32_U_CET && msr<=IA32_INT_SSP_TAB"
Using CPUID AuthenticAMD-25-21-0
0x6a0
0x6a8
New filter for msr:read_msr: (msr>=0x6a0 && msr<=0x6a8) && (common_pid != 597499 && common_pid != 3313)
0x6a0
0x6a8
New filter for msr:write_msr: (msr>=0x6a0 && msr<=0x6a8) && (common_pid != 597499 && common_pid != 3313)
mmap size 528384B
^C#
Example with a frequent msr:
# perf trace -v -e msr:*_msr/max-stack=32/ --filter="msr==IA32_SPEC_CTRL" --max-events 2
Using CPUID AuthenticAMD-25-21-0
0x48
New filter for msr:read_msr: (msr==0x48) && (common_pid != 2612129 && common_pid != 3841)
0x48
New filter for msr:write_msr: (msr==0x48) && (common_pid != 2612129 && common_pid != 3841)
mmap size 528384B
Looking at the vmlinux_path (8 entries long)
symsrc__init: build id mismatch for vmlinux.
Using /proc/kcore for kernel data
Using /proc/kallsyms for symbols
0.000 Timer/2525383 msr:write_msr(msr: IA32_SPEC_CTRL, val: 6)
do_trace_write_msr ([kernel.kallsyms])
do_trace_write_msr ([kernel.kallsyms])
__switch_to_xtra ([kernel.kallsyms])
__switch_to ([kernel.kallsyms])
__schedule ([kernel.kallsyms])
schedule ([kernel.kallsyms])
futex_wait_queue_me ([kernel.kallsyms])
futex_wait ([kernel.kallsyms])
do_futex ([kernel.kallsyms])
__x64_sys_futex ([kernel.kallsyms])
do_syscall_64 ([kernel.kallsyms])
entry_SYSCALL_64_after_hwframe ([kernel.kallsyms])
__futex_abstimed_wait_common64 (/usr/lib64/libpthread-2.33.so)
0.030 :0/0 msr:write_msr(msr: IA32_SPEC_CTRL, val: 2)
do_trace_write_msr ([kernel.kallsyms])
do_trace_write_msr ([kernel.kallsyms])
__switch_to_xtra ([kernel.kallsyms])
__switch_to ([kernel.kallsyms])
__schedule ([kernel.kallsyms])
schedule_idle ([kernel.kallsyms])
do_idle ([kernel.kallsyms])
cpu_startup_entry ([kernel.kallsyms])
secondary_startup_64_no_verify ([kernel.kallsyms])
#
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sandipan Das <sandipan.das@amd.com>
Link: https://lore.kernel.org/lkml/Y0nQkz2TUJxwfXJd@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Add support for using 'perf report --dump-raw-trace' to parse PTT packet.
Example usage:
Output will contain raw PTT data and its textual representation, such
as (8DW format):
0 0 0x5810 [0x30]: PERF_RECORD_AUXTRACE size: 0x400000 offset: 0
ref: 0xa5d50c725 idx: 0 tid: -1 cpu: 0
.
. ... HISI PTT data: size 4194304 bytes
. 00000000: 00 00 00 00 Prefix
. 00000004: 08 20 00 60 Header DW0
. 00000008: ff 02 00 01 Header DW1
. 0000000c: 20 08 00 00 Header DW2
. 00000010: 10 e7 44 ab Header DW3
. 00000014: 2a a8 1e 01 Time
. 00000020: 00 00 00 00 Prefix
. 00000024: 01 00 00 60 Header DW0
. 00000028: 0f 1e 00 01 Header DW1
. 0000002c: 04 00 00 00 Header DW2
. 00000030: 40 00 81 02 Header DW3
. 00000034: ee 02 00 00 Time
....
This patch only add basic parsing support according to the definition of
the PTT packet described in Documentation/trace/hisi-ptt.rst. And the
fields of each packet can be further decoded following the PCIe Spec's
definition of TLP packet.
Signed-off-by: Qi Liu <liuqi115@huawei.com>
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Bjorn Helgaas <helgaas@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Qi Liu <liuqi6124@gmail.com>
Cc: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
Cc: Shaokun Zhang <zhangshaokun@hisilicon.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Zeng Prime <prime.zeng@huawei.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-pci@vger.kernel.org
Cc: linuxarm@huawei.com
Link: https://lore.kernel.org/r/20220927081400.14364-4-yangyicong@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
HiSilicon PCIe tune and trace device (PTT) could dynamically tune the
PCIe link's events, and trace the TLP headers).
This patch add support for PTT device in perf tool, so users could use
'perf record' to get TLP headers trace data.
Reviewed-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Qi Liu <liuqi115@huawei.com>
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Acked-by: John Garry <john.garry@huawei.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Bjorn Helgaas <helgaas@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Qi Liu <liuqi6124@gmail.com>
Cc: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
Cc: Shaokun Zhang <zhangshaokun@hisilicon.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Zeng Prime <prime.zeng@huawei.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-pci@vger.kernel.org
Cc: linuxarm@huawei.com
Link: https://lore.kernel.org/r/20220927081400.14364-3-yangyicong@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Add find_pmu_for_event() and use to simplify logic in
auxtrace_record_init(). find_pmu_for_event() will be reused in
subsequent patches.
Reviewed-by: John Garry <john.garry@huawei.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Qi Liu <liuqi115@huawei.com>
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Bjorn Helgaas <helgaas@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Qi Liu <liuqi6124@gmail.com>
Cc: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
Cc: Shaokun Zhang <zhangshaokun@hisilicon.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Zeng Prime <prime.zeng@huawei.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-pci@vger.kernel.org
Cc: linuxarm@huawei.com
Link: https://lore.kernel.org/r/20220927081400.14364-2-yangyicong@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Testcase stat+json_output.sh fails in powerpc:
86: perf stat JSON output linter : FAILED!
The testcase "stat+json_output.sh" verifies perf stat JSON output. The
test covers aggregation modes like per-socket, per-core, per-die, -A
(no_aggr mode) along with few other tests. It counts expected fields for
various commands. For example say -A (i.e, AGGR_NONE mode), expects 7
fields in the output having "CPU" as first field. Same way, for
per-socket, it expects the first field in result to point to socket id.
The testcases compares the result with expected count.
The values for socket, die, core and cpu are fetched from topology
directory:
/sys/devices/system/cpu/cpu*/topology.
For example, socket value is fetched from "physical_package_id" file of
topology directory. (cpu__get_topology_int() in util/cpumap.c)
If a platform fails to fetch the topology information, values will be
set to -1. For example, incase of pSeries platform of powerpc, value for
"physical_package_id" is restricted and not exposed. So, -1 will be
assigned.
Perf code has a checks for valid cpu id in "aggr_printout"
(stat-display.c), which displays the fields. So, in cases where topology
values not exposed, first field of the output displaying will be empty.
This cause the testcase to fail, as it counts number of fields in the
output.
Incase of -A (AGGR_NONE mode,), testcase expects 7 fields in the output,
becos of -1 value obtained from topology files for some, only 6 fields
are printed. Hence a testcase failure reported due to mismatch in number
of fields in the output.
Patch here adds a sanity check in the testcase for topology. Check will
help to skip the test if -1 value found.
Fixes: 0c343af2a2f82844 ("perf test: JSON format checking")
Reported-by: Disha Goel <disgoel@linux.vnet.ibm.com>
Suggested-by: Ian Rogers <irogers@google.com>
Suggested-by: James Clark <james.clark@arm.com>
Signed-off-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Claire Jensen <cjense@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nageswara R Sastry <rnsastry@linux.ibm.com>
Link: https://lore.kernel.org/r/20221006155149.67205-2-atrajeev@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Testcase stat+csv_output.sh fails in powerpc:
84: perf stat CSV output linter: FAILED!
The testcase "stat+csv_output.sh" verifies perf stat CSV output. The
test covers aggregation modes like per-socket, per-core, per-die, -A
(no_aggr mode) along with few other tests. It counts expected fields for
various commands. For example say -A (i.e, AGGR_NONE mode), expects 7
fields in the output having "CPU" as first field. Same way, for
per-socket, it expects the first field in result to point to socket id.
The testcases compares the result with expected count.
The values for socket, die, core and cpu are fetched from topology
directory:
/sys/devices/system/cpu/cpu*/topology.
For example, socket value is fetched from "physical_package_id" file of
topology directory. (cpu__get_topology_int() in util/cpumap.c)
If a platform fails to fetch the topology information, values will be
set to -1. For example, incase of pSeries platform of powerpc, value for
"physical_package_id" is restricted and not exposed. So, -1 will be
assigned.
Perf code has a checks for valid cpu id in "aggr_printout"
(stat-display.c), which displays the fields. So, in cases where topology
values not exposed, first field of the output displaying will be empty.
This cause the testcase to fail, as it counts number of fields in the
output.
Incase of -A (AGGR_NONE mode,), testcase expects 7 fields in the output,
becos of -1 value obtained from topology files for some, only 6 fields
are printed. Hence a testcase failure reported due to mismatch in number
of fields in the output.
Patch here adds a sanity check in the testcase for topology. Check will
help to skip the test if -1 value found.
Fixes: 7473ee56dbc91c98 ("perf test: Add checking for perf stat CSV output.")
Reported-by: Disha Goel <disgoel@linux.vnet.ibm.com>
Suggested-by: Ian Rogers <irogers@google.com>
Suggested-by: James Clark <james.clark@arm.com>
Signed-off-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Claire Jensen <cjense@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nageswara R Sastry <rnsastry@linux.ibm.com>
Link: https://lore.kernel.org/r/20221006155149.67205-1-atrajeev@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
User space tasks can migrate between CPUs, so when tracing selected CPUs,
system-wide sideband is still needed, however evlist->core.has_user_cpus
is not set in the hybrid case, so check the target cpu_list instead.
Fixes: 7d189cadbeebc778 ("perf intel-pt: Track sideband system-wide when needed")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20221012082259.22394-3-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
uClibc segfaulted because NULL was passed as the format to fprintf().
That happened because one of the format strings was missing and
intel_pt_print_info() didn't check that before calling fprintf().
Add the missing format string, and check format is not NULL before calling
fprintf().
Fixes: 11fa7cb86b56d361 ("perf tools: Pass Intel PT information for decoding MTC and CYC")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20221012082259.22394-2-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Since PERF_FORMAT_LOST was added, the default read format has that bit
set, so add it to the tests. Keep the old value as well so that the test
still passes on older kernels.
This fixes the following failure:
expected read_format=0|4, got 20
FAILED './tests/attr/test-record-C0' - match failure
Fixes: 85b425f31c8866e0 ("perf record: Set PERF_FORMAT_LOST by default")
Signed-off-by: James Clark <james.clark@arm.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20221012094633.21669-2-james.clark@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Add tests:
Test with MTC and TSC disabled
Test with branches disabled
Test with/without CYC
Test recording with sample mode
Test with kernel trace
Test virtual LBR
Test power events
Test with TNT packets disabled
Test with event_trace
These tests mostly check that perf record works with the corresponding
Intel PT config terms, sometimes also checking that certain packets do or
do not appear in the resulting trace as appropriate.
The "Test virtual LBR" is slightly trickier, using a Python script to
check that branch stacks are actually synthesized.
Signed-off-by: Ammy Yi <ammy.yi@intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20221014170905.64069-8-adrian.hunter@intel.com
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
When a program header was added, it moved the text section but
GEN_ELF_TEXT_OFFSET was not updated.
Fix by adding the program header size and aligning.
Fixes: babd04386b1df8c3 ("perf jit: Include program header in ELF files")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Lieven Hey <lieven.hey@kdab.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20221014170905.64069-7-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Add a test for decoding self-modifying code using a jitdump file.
The test creates a workload that uses self-modifying code and generates its
own jitdump file. The result is processed with perf inject --jit and
checked for decoding errors.
Note the test will fail without patch "perf inject: Fix GEN_ELF_TEXT_OFFSET
for jit" applied.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20221014170905.64069-6-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Tidy alignment of test function lines to make them more readable.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20221014170905.64069-5-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Messages display with the perf test -v option. Add a message to show when
skipping a test because the user cannot do kernel tracing.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20221014170905.64069-4-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
When not decoding, the options "-B -N --no-bpf-event" speed up perf record.
Make a common function for them.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20221014170905.64069-3-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
count_result() does not always reset ret=0 which means the value can spill
into the next test result.
Fix by explicitly setting it to zero between tests.
Committer testing:
# perf test "Miscellaneous Intel PT testing"
110: Miscellaneous Intel PT testing : Ok
#
Tested as well with:
# perf test -v "Miscellaneous Intel PT testing"
Fixes: fd9b45e39cfaf885 ("perf test: test_intel_pt.sh: Fix return checking")
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20221014170905.64069-2-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
If the kernel exposes a new perf_event_attr field in a format attr, perf
will return an error stating the specified PMU can't be found. For
example, a format attr with 'config3:0-63' causes an error as config3 is
unknown to perf. This causes a compatibility issue between a newer
kernel with older perf tool.
Before this change with a kernel adding 'config3' I get:
$ perf record -e arm_spe// -- true
event syntax error: 'arm_spe//'
\___ Cannot find PMU `arm_spe'. Missing kernel support?
Run 'perf list' for a list of valid events
Usage: perf record [<options>] [<command>]
or: perf record [<options>] -- <command> [<options>]
-e, --event <event> event selector. use 'perf list' to list
available events
After this change, I get:
$ perf record -e arm_spe// -- true
WARNING: 'arm_spe_0' format 'inv_event_filter' requires 'perf_event_attr::config3' which is not supported by this version of perf!
[ perf record: Woken up 2 times to write data ]
[ perf record: Captured and wrote 0.091 MB perf.data ]
To support unknown configN formats, rework the YACC implementation to
pass any config[0-9]+ format to perf_pmu__new_format() to handle with a
warning.
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Rob Herring <robh@kernel.org>
Tested-by: Leo Yan <leo.yan@linaro.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20220914-arm-perf-tool-spe1-2-v2-v4-1-83c098e6212e@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
$ perf list metricgroups
gives
List of pre-defined events (to be used in -e):
Metric Groups:
Backend
Bad
BadSpec
But that's incorrect of course because metric groups or metrics can only
be specified with -M. So fix the message to say -e or -M
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20221004192634.998984-1-ak@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|