aboutsummaryrefslogtreecommitdiffstats
path: root/kernel/trace
Commit message (Collapse)AuthorAgeFilesLines
* tracing: Fix misspelling of "missing"Steven Rostedt (VMware)2021-10-291-1/+1
| | | | | | My snake instinct was on and I wrote "misssing" instead of "missing". Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
* ftrace: Fix kernel-doc formatting issuesSteven Rostedt (VMware)2021-10-291-8/+10
| | | | | | | | | Some functions had kernel-doc that used a comma instead of a hash to separate the function name from the one line description. Also, the "ftrace_is_dead()" had an incomplete description. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
* tracing: Do not warn when connecting eprobe to non existing eventSteven Rostedt (VMware)2021-10-271-2/+2
| | | | | | | | | | | | | | When the syscall trace points are not configured in, the kselftests for ftrace will try to attach an event probe (eprobe) to one of the system call trace points. This triggered a WARNING, because the failure only expects to see memory issues. But this is not the only failure. The user may attempt to attach to a non existent event, and the kernel must not warn about it. Link: https://lkml.kernel.org/r/20211027120854.0680aa0f@gandalf.local.home Fixes: 7491e2c442781 ("tracing: Add a probe that attaches to trace events") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
* tracing: Have all levels of checks prevent recursionSteven Rostedt (VMware)2021-10-181-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While writing an email explaining the "bit = 0" logic for a discussion on making ftrace_test_recursion_trylock() disable preemption, I discovered a path that makes the "not do the logic if bit is zero" unsafe. The recursion logic is done in hot paths like the function tracer. Thus, any code executed causes noticeable overhead. Thus, tricks are done to try to limit the amount of code executed. This included the recursion testing logic. Having recursion testing is important, as there are many paths that can end up in an infinite recursion cycle when tracing every function in the kernel. Thus protection is needed to prevent that from happening. Because it is OK to recurse due to different running context levels (e.g. an interrupt preempts a trace, and then a trace occurs in the interrupt handler), a set of bits are used to know which context one is in (normal, softirq, irq and NMI). If a recursion occurs in the same level, it is prevented*. Then there are infrastructure levels of recursion as well. When more than one callback is attached to the same function to trace, it calls a loop function to iterate over all the callbacks. Both the callbacks and the loop function have recursion protection. The callbacks use the "ftrace_test_recursion_trylock()" which has a "function" set of context bits to test, and the loop function calls the internal trace_test_and_set_recursion() directly, with an "internal" set of bits. If an architecture does not implement all the features supported by ftrace then the callbacks are never called directly, and the loop function is called instead, which will implement the features of ftrace. Since both the loop function and the callbacks do recursion protection, it was seemed unnecessary to do it in both locations. Thus, a trick was made to have the internal set of recursion bits at a more significant bit location than the function bits. Then, if any of the higher bits were set, the logic of the function bits could be skipped, as any new recursion would first have to go through the loop function. This is true for architectures that do not support all the ftrace features, because all functions being traced must first go through the loop function before going to the callbacks. But this is not true for architectures that support all the ftrace features. That's because the loop function could be called due to two callbacks attached to the same function, but then a recursion function inside the callback could be called that does not share any other callback, and it will be called directly. i.e. traced_function_1: [ more than one callback tracing it ] call loop_func loop_func: trace_recursion set internal bit call callback callback: trace_recursion [ skipped because internal bit is set, return 0 ] call traced_function_2 traced_function_2: [ only traced by above callback ] call callback callback: trace_recursion [ skipped because internal bit is set, return 0 ] call traced_function_2 [ wash, rinse, repeat, BOOM! out of shampoo! ] Thus, the "bit == 0 skip" trick is not safe, unless the loop function is call for all functions. Since we want to encourage architectures to implement all ftrace features, having them slow down due to this extra logic may encourage the maintainers to update to the latest ftrace features. And because this logic is only safe for them, remove it completely. [*] There is on layer of recursion that is allowed, and that is to allow for the transition between interrupt context (normal -> softirq -> irq -> NMI), because a trace may occur before the context update is visible to the trace recursion logic. Link: https://lore.kernel.org/all/609b565a-ed6e-a1da-f025-166691b5d994@linux.alibaba.com/ Link: https://lkml.kernel.org/r/20211018154412.09fcad3c@gandalf.local.home Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: "James E.J. Bottomley" <James.Bottomley@hansenpartnership.com> Cc: Helge Deller <deller@gmx.de> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Jiri Kosina <jikos@kernel.org> Cc: Miroslav Benes <mbenes@suse.cz> Cc: Joe Lawrence <joe.lawrence@redhat.com> Cc: Colin Ian King <colin.king@canonical.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Jisheng Zhang <jszhang@kernel.org> Cc: =?utf-8?b?546L6LSH?= <yun.wang@linux.alibaba.com> Cc: Guo Ren <guoren@kernel.org> Cc: stable@vger.kernel.org Fixes: edc15cafcbfa3 ("tracing: Avoid unnecessary multiple recursion checks") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
* tracing: Fix event probe removal from dynamic eventsSteven Rostedt (VMware)2021-10-131-3/+51
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When an event probe is to be removed via the API that created it via the dynamic events, an -ENOENT error is returned. This is because the removal of the event probe does not expect to see the event system and name that the event probe is attached to, even though that's part of the API to create it. As the removal of probes is to use the same API as they are created. In fact, the removal is not consistent with the kprobes and uprobes removal. Fix that by allowing various ways to remove the eprobe. The eprobe is created with: e:[GROUP/]NAME SYSTEM/EVENT [OPTIONS] Have it get removed by echoing in the following into dynamic_events: # Remove all eprobes with NAME echo '-:NAME' >> dynamic_events # Remove a specific eprobe echo '-:GROUP/NAME' >> dynamic_events echo '-:GROUP/NAME SYSTEM/EVENT' >> dynamic_events echo '-:NAME SYSTEM/EVENT' >> dynamic_events echo '-:GROUP/NAME SYSTEM/EVENT OPTIONS' >> dynamic_events echo '-:NAME SYSTEM/EVENT OPTIONS' >> dynamic_events Link: https://lkml.kernel.org/r/20211012081925.0e19cc4f@gandalf.local.home Link: https://lkml.kernel.org/r/20211013205533.630722129@goodmis.org Suggested-by: Masami Hiramatsu <mhiramat@kernel.org> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Fixes: 7491e2c442781 ("tracing: Add a probe that attaches to trace events") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
* tracing: Fix missing * in comment blockColin Ian King2021-10-101-1/+1
| | | | | | | | | There is a missing * in a comment block, add it in. Link: https://lkml.kernel.org/r/20211006172830.1025336-1-colin.king@canonical.com Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
* tracing: Fix memory leak in eprobe_register()Vamshi K Sthambamkadi2021-10-101-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | kmemleak report: unreferenced object 0xffff900a70ec7ec0 (size 32): comm "ftracetest", pid 2770, jiffies 4295042510 (age 311.464s) hex dump (first 32 bytes): c8 31 23 45 0a 90 ff ff 40 85 c7 6e 0a 90 ff ff .1#E....@..n.... 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<000000009d3751fd>] kmem_cache_alloc_trace+0x2a2/0x440 [<0000000088b8124b>] eprobe_register+0x1e3/0x350 [<000000002a9a0517>] __ftrace_event_enable_disable+0x7c/0x240 [<0000000019109321>] event_enable_write+0x93/0xe0 [<000000007d85b320>] vfs_write+0xb9/0x260 [<00000000b94c5e41>] ksys_write+0x67/0xe0 [<000000005a08c81d>] __x64_sys_write+0x1a/0x20 [<00000000240bf576>] do_syscall_64+0x3b/0xc0 [<0000000043d5d9f6>] entry_SYSCALL_64_after_hwframe+0x44/0xae unreferenced object 0xffff900a56bbf280 (size 128): comm "ftracetest", pid 2770, jiffies 4295042510 (age 311.464s) hex dump (first 32 bytes): ff ff ff ff ff ff ff ff 00 00 00 00 01 00 00 00 ................ 80 69 3b b2 ff ff ff ff 20 69 3b b2 ff ff ff ff .i;..... i;..... backtrace: [<000000009d3751fd>] kmem_cache_alloc_trace+0x2a2/0x440 [<00000000c4e90fad>] eprobe_register+0x1fc/0x350 [<000000002a9a0517>] __ftrace_event_enable_disable+0x7c/0x240 [<0000000019109321>] event_enable_write+0x93/0xe0 [<000000007d85b320>] vfs_write+0xb9/0x260 [<00000000b94c5e41>] ksys_write+0x67/0xe0 [<000000005a08c81d>] __x64_sys_write+0x1a/0x20 [<00000000240bf576>] do_syscall_64+0x3b/0xc0 [<0000000043d5d9f6>] entry_SYSCALL_64_after_hwframe+0x44/0xae In new_eprobe_trigger(), allocated edata and trigger variables are never freed. To fix, free memory in disable_eprobe(). Link: https://lkml.kernel.org/r/20211008071802.GA2098@cosmos Fixes: 7491e2c442781 ("tracing: Add a probe that attaches to trace events") Signed-off-by: Vamshi K Sthambamkadi <vamshi.k.sthambamkadi@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
* tracing: Fix missing osnoise tracer on max_latencyJackie Liu2021-10-071-7/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The compiler warns when the data are actually unused: kernel/trace/trace.c:1712:13: error: ‘trace_create_maxlat_file’ defined but not used [-Werror=unused-function] 1712 | static void trace_create_maxlat_file(struct trace_array *tr, | ^~~~~~~~~~~~~~~~~~~~~~~~ [Why] CONFIG_HWLAT_TRACER=n, CONFIG_TRACER_MAX_TRACE=n, CONFIG_OSNOISE_TRACER=y gcc report warns. [How] Now trace_create_maxlat_file will only take effect when CONFIG_HWLAT_TRACER=y or CONFIG_TRACER_MAX_TRACE=y. In fact, after adding osnoise trace, it also needs to take effect. Link: https://lore.kernel.org/all/c1d9e328-ad7c-920b-6c24-9e1598a6421c@infradead.org/ Link: https://lkml.kernel.org/r/20210922025122.3268022-1-liu.yun@linux.dev Fixes: bce29ac9ce0b ("trace: Add osnoise tracer") Cc: Daniel Bristot de Oliveira <bristot@redhat.com> Suggested-by: Steven Rostedt <rostedt@goodmis.org> Reviewed-by: Daniel Bristot de Oliveira <bristot@kernel.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> # build-tested Signed-off-by: Jackie Liu <liuyun01@kylinos.cn> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
* blktrace: Fix uaf in blk_trace access after removing by sysfsZhihao Cheng2021-09-241-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is an use-after-free problem triggered by following process: P1(sda) P2(sdb) echo 0 > /sys/block/sdb/trace/enable blk_trace_remove_queue synchronize_rcu blk_trace_free relay_close rcu_read_lock __blk_add_trace trace_note_tsk (Iterate running_trace_list) relay_close_buf relay_destroy_buf kfree(buf) trace_note(sdb's bt) relay_reserve buf->offset <- nullptr deference (use-after-free) !!! rcu_read_unlock [ 502.714379] BUG: kernel NULL pointer dereference, address: 0000000000000010 [ 502.715260] #PF: supervisor read access in kernel mode [ 502.715903] #PF: error_code(0x0000) - not-present page [ 502.716546] PGD 103984067 P4D 103984067 PUD 17592b067 PMD 0 [ 502.717252] Oops: 0000 [#1] SMP [ 502.720308] RIP: 0010:trace_note.isra.0+0x86/0x360 [ 502.732872] Call Trace: [ 502.733193] __blk_add_trace.cold+0x137/0x1a3 [ 502.733734] blk_add_trace_rq+0x7b/0xd0 [ 502.734207] blk_add_trace_rq_issue+0x54/0xa0 [ 502.734755] blk_mq_start_request+0xde/0x1b0 [ 502.735287] scsi_queue_rq+0x528/0x1140 ... [ 502.742704] sg_new_write.isra.0+0x16e/0x3e0 [ 502.747501] sg_ioctl+0x466/0x1100 Reproduce method: ioctl(/dev/sda, BLKTRACESETUP, blk_user_trace_setup[buf_size=127]) ioctl(/dev/sda, BLKTRACESTART) ioctl(/dev/sdb, BLKTRACESETUP, blk_user_trace_setup[buf_size=127]) ioctl(/dev/sdb, BLKTRACESTART) echo 0 > /sys/block/sdb/trace/enable & // Add delay(mdelay/msleep) before kernel enters blk_trace_free() ioctl$SG_IO(/dev/sda, SG_IO, ...) // Enters trace_note_tsk() after blk_trace_free() returned // Use mdelay in rcu region rather than msleep(which may schedule out) Remove blk_trace from running_list before calling blk_trace_free() by sysfs if blk_trace is at Blktrace_running state. Fixes: c71a896154119f ("blktrace: add ftrace plugin") Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Link: https://lore.kernel.org/r/20210923134921.109194-1-chengzhihao1@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* Merge tag 'trace-v5.15-3' of ↵Linus Torvalds2021-09-111-19/+18
|\ | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: "Minor fixes to the processing of the bootconfig tree" * tag 'trace-v5.15-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: bootconfig: Rename xbc_node_find_child() to xbc_node_find_subkey() tracing/boot: Fix to check the histogram control param is a leaf node tracing/boot: Fix trace_boot_hist_add_array() to check array is value
| * bootconfig: Rename xbc_node_find_child() to xbc_node_find_subkey()Masami Hiramatsu2021-09-091-12/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Rename xbc_node_find_child() to xbc_node_find_subkey() for clarifying that function returns a key node (no value node). Since there are xbc_node_for_each_child() (loop on all child nodes) and xbc_node_for_each_subkey() (loop on only subkey nodes), this name distinction is necessary to avoid confusing users. Link: https://lkml.kernel.org/r/163119459826.161018.11200274779483115300.stgit@devnote2 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * tracing/boot: Fix to check the histogram control param is a leaf nodeMasami Hiramatsu2021-09-091-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | Since xbc_node_find_child() doesn't ensure the returned node is a leaf node (key-value pair or do not have subkeys), use xbc_node_find_value to ensure the histogram control parameter is a leaf node in trace_boot_compose_hist_cmd(). Link: https://lkml.kernel.org/r/163119459059.161018.18341288218424528962.stgit@devnote2 Fixes: e66ed86ca6c5 ("tracing/boot: Add per-event histogram action options") Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * tracing/boot: Fix trace_boot_hist_add_array() to check array is valueMasami Hiramatsu2021-09-091-4/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | trace_boot_hist_add_array() uses the combination of xbc_node_find_child() and xbc_node_get_child() to get the child node of the key node. But since it missed to check the child node is data node or not, user can pass the subkey node for the array node (anode). To avoid this issue, check the array node is a data node. Actually, there is xbc_node_find_value(node, key, vnode), which ensures the @vnode is a value node, so use it in trace_boot_hist_add_array() to fix this issue. Link: https://lkml.kernel.org/r/163119458308.161018.1516455973625940212.stgit@devnote2 Fixes: e66ed86ca6c5 ("tracing/boot: Add per-event histogram action options") Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
* | Merge tag 'trace-v5.15-2' of ↵Linus Torvalds2021-09-098-21/+50
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull more tracing updates from Steven Rostedt: - Add migrate-disable counter to tracing header - Fix error handling in event probes - Fix missed unlock in osnoise in error path - Fix merge issue with tools/bootconfig - Clean up bootconfig data when init memory is removed - Fix bootconfig to loop only on subkeys - Have kernel command lines override bootconfig options - Increase field counts for synthetic events - Have histograms dynamic allocate event elements to save space - Fixes in testing and documentation * tag 'trace-v5.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing/boot: Fix to loop on only subkeys selftests/ftrace: Exclude "(fault)" in testing add/remove eprobe events tracing: Dynamically allocate the per-elt hist_elt_data array tracing: synth events: increase max fields count tools/bootconfig: Show whole test command for each test case bootconfig: Fix missing return check of xbc_node_compose_key function tools/bootconfig: Fix tracing_on option checking in ftrace2bconf.sh docs: bootconfig: Add how to use bootconfig for kernel parameters init/bootconfig: Reorder init parameter from bootconfig and cmdline init: bootconfig: Remove all bootconfig data when the init memory is removed tracing/osnoise: Fix missed cpus_read_unlock() in start_per_cpu_kthreads() tracing: Fix some alloc_event_probe() error handling bugs tracing: Add migrate-disabled counter to tracing output.
| * tracing/boot: Fix to loop on only subkeysMasami Hiramatsu2021-09-081-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since the commit e5efaeb8a8f5 ("bootconfig: Support mixing a value and subkeys under a key") allows to co-exist a value node and key nodes under a node, xbc_node_for_each_child() is not only returning key node but also a value node. In the boot-time tracing using xbc_node_for_each_child() to iterate the events, groups and instances, but those must be key nodes. Thus it must use xbc_node_for_each_subkey(). Link: https://lkml.kernel.org/r/163112988361.74896.2267026262061819145.stgit@devnote2 Fixes: e5efaeb8a8f5 ("bootconfig: Support mixing a value and subkeys under a key") Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * tracing: Dynamically allocate the per-elt hist_elt_data arrayTom Zanussi2021-09-081-2/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Setting the hist_elt_data.field_var_str[] array unconditionally to a size of SYNTH_FIELD_MAX elements wastes space unnecessarily. The actual number of elements needed can be calculated at run-time instead. In most cases, this will save a lot of space since it's a per-elt array which isn't normally close to being full. It also allows us to increase SYNTH_FIELD_MAX without worrying about even more wastage when we do that. Link: https://lkml.kernel.org/r/d52ae0ad5e1b59af7c4f54faf3fc098461fd82b3.camel@kernel.org Signed-off-by: Tom Zanussi <zanussi@kernel.org> Tested-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * tracing: synth events: increase max fields countArtem Bityutskiy2021-09-081-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | Sometimes it is useful to construct larger synthetic trace events. Increase 'SYNTH_FIELDS_MAX' (maximum number of fields in a synthetic event) from 32 to 64. Link: https://lkml.kernel.org/r/20210901135513.3087062-1-dedekind1@gmail.com Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Acked-by: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * tracing/osnoise: Fix missed cpus_read_unlock() in start_per_cpu_kthreads()Qiang.Zhang2021-09-081-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | When start_kthread() return error, the cpus_read_unlock() need to be called. Link: https://lkml.kernel.org/r/20210831022919.27630-1-qiang.zhang@windriver.com Cc: <stable@vger.kernel.org> Fixes: c8895e271f79 ("trace/osnoise: Support hotplug operations") Acked-by: Daniel Bristot de Oliveira <bristot@kernel.org> Signed-off-by: Qiang.Zhang <qiang.zhang@windriver.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * tracing: Fix some alloc_event_probe() error handling bugsDan Carpenter2021-09-071-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | There are two bugs in this code. First, if the kzalloc() fails it leads to a NULL dereference of "ep" on the next line. Second, if the alloc_event_probe() function returns an error then it leads to an error pointer dereference in the caller. Link: https://lkml.kernel.org/r/20210824115150.GI31143@kili Fixes: 7491e2c44278 ("tracing: Add a probe that attaches to trace events") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * tracing: Add migrate-disabled counter to tracing output.Thomas Gleixner2021-09-033-10/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | migrate_disable() forbids task migration to another CPU. It is available since v5.11 and has already users such as highmem or BPF. It is useful to observe this task state in tracing which already has other states like the preemption counter. Instead of adding the migrate disable counter as a new entry to struct trace_entry, which would extend the whole struct by four bytes, it is squashed into the preempt-disable counter. The lower four bits represent the preemption counter, the upper four bits represent the migrate disable counter. Both counter shouldn't exceed 15 but if they do, there is a safety net which caps the value at 15. Add the migrate-disable counter to the trace entry so it shows up in the trace. Due to the users mentioned above, it is already possible to observe it: | bash-1108 [000] ...21 73.950578: rss_stat: mm_id=2213312838 curr=0 type=MM_ANONPAGES size=8192B | bash-1108 [000] d..31 73.951222: irq_disable: caller=flush_tlb_mm_range+0x115/0x130 parent=ptep_clear_flush+0x42/0x50 | bash-1108 [000] d..31 73.951222: tlb_flush: pages:1 reason:local mm shootdown (3) The last value is the migrate-disable counter. Things that popped up: - trace_print_lat_context() does not print the migrate counter. Not sure if it should. It is used in "verbose" mode and uses 8 digits and I'm not sure ther is something processing the value. - trace_define_common_fields() now defines a different variable. This probably breaks things. No ide what to do in order to preserve the old behaviour. Since this is used as a filter it should be split somehow to be able to match both nibbles here. Link: https://lkml.kernel.org/r/20210810132625.ylssabmsrkygokuv@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de> [bigeasy: patch description.] Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> [ SDR: Removed change to common_preempt_count field name ] Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
* | Merge tag 'kgdb-5.15-rc1' of ↵Linus Torvalds2021-09-071-3/+9
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux Pull kgdb updates from Daniel Thompson: "Changes for kgdb/kdb this cycle are dominated by a change from Sumit that removes as small (256K) private heap from kdb. This is change I've hoped for ever since I discovered how few users of this heap remained in the kernel, so many thanks to Sumit for hunting these down. The other change is an incremental step towards SPDX headers" * tag 'kgdb-5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux: kernel: debug: Convert to SPDX identifier kdb: Rename members of struct kdbtab_t kdb: Simplify kdb_defcmd macro logic kdb: Get rid of redundant kdb_register_flags() kdb: Rename struct defcmd_set to struct kdb_macro kdb: Get rid of custom debug heap allocator
| * | kdb: Rename members of struct kdbtab_tSumit Garg2021-07-271-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove redundant prefix "cmd_" from name of members in struct kdbtab_t for better readibility. Suggested-by: Doug Anderson <dianders@chromium.org> Signed-off-by: Sumit Garg <sumit.garg@linaro.org> Reviewed-by: Douglas Anderson <dianders@chromium.org> Link: https://lore.kernel.org/r/20210712134620.276667-5-sumit.garg@linaro.org Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
| * | kdb: Get rid of redundant kdb_register_flags()Sumit Garg2021-07-271-3/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit e4f291b3f7bb ("kdb: Simplify kdb commands registration") allowed registration of pre-allocated kdb commands with pointer to struct kdbtab_t. Lets switch other users as well to register pre- allocated kdb commands via: - Changing prototype for kdb_register() to pass a pointer to struct kdbtab_t instead. - Embed kdbtab_t structure in kdb_macro_t rather than individual params. With these changes kdb_register_flags() becomes redundant and hence removed. Also, since we have switched all users to register pre-allocated commands, "is_dynamic" flag in struct kdbtab_t becomes redundant and hence removed as well. Suggested-by: Daniel Thompson <daniel.thompson@linaro.org> Signed-off-by: Sumit Garg <sumit.garg@linaro.org> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Reviewed-by: Douglas Anderson <dianders@chromium.org> Link: https://lore.kernel.org/r/20210712134620.276667-3-sumit.garg@linaro.org Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
* | | Merge tag 'trace-v5.15' of ↵Linus Torvalds2021-09-0521-172/+1594
|\ \ \ | | |/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: - simplify the Kconfig use of FTRACE and TRACE_IRQFLAGS_SUPPORT - bootconfig can now start histograms - bootconfig supports group/all enabling - histograms now can put values in linear size buckets - execnames can be passed to synthetic events - introduce "event probes" that attach to other events and can retrieve data from pointers of fields, or record fields as different types (a pointer to a string as a string instead of just a hex number) - various fixes and clean ups * tag 'trace-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (35 commits) tracing/doc: Fix table format in histogram code selftests/ftrace: Add selftest for testing duplicate eprobes and kprobes selftests/ftrace: Add selftest for testing eprobe events on synthetic events selftests/ftrace: Add test case to test adding and removing of event probe selftests/ftrace: Fix requirement check of README file selftests/ftrace: Add clear_dynamic_events() to test cases tracing: Add a probe that attaches to trace events tracing/probes: Reject events which have the same name of existing one tracing/probes: Have process_fetch_insn() take a void * instead of pt_regs tracing/probe: Change traceprobe_set_print_fmt() to take a type tracing/probes: Use struct_size() instead of defining custom macros tracing/probes: Allow for dot delimiter as well as slash for system names tracing/probe: Have traceprobe_parse_probe_arg() take a const arg tracing: Have dynamic events have a ref counter tracing: Add DYNAMIC flag for dynamic events tracing: Replace deprecated CPU-hotplug functions. MAINTAINERS: Add an entry for os noise/latency tracepoint: Fix kerneldoc comments bootconfig/tracing/ktest: Update ktest example for boot-time tracing tools/bootconfig: Use per-group/all enable option in ftrace2bconf script ...
| * | tracing: Add a probe that attaches to trace eventsTzvetomir Stoyanov (VMware)2021-08-208-13/+958
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A new dynamic event is introduced: event probe. The event is attached to an existing tracepoint and uses its fields as arguments. The user can specify custom format string of the new event, select what tracepoint arguments will be printed and how to print them. An event probe is created by writing configuration string in 'dynamic_events' ftrace file: e[:[SNAME/]ENAME] SYSTEM/EVENT [FETCHARGS] - Set an event probe -:SNAME/ENAME - Delete an event probe Where: SNAME - System name, if omitted 'eprobes' is used. ENAME - Name of the new event in SNAME, if omitted the SYSTEM_EVENT is used. SYSTEM - Name of the system, where the tracepoint is defined, mandatory. EVENT - Name of the tracepoint event in SYSTEM, mandatory. FETCHARGS - Arguments: <name>=$<field>[:TYPE] - Fetch given filed of the tracepoint and print it as given TYPE with given name. Supported types are: (u8/u16/u32/u64/s8/s16/s32/s64), basic type (x8/x16/x32/x64), hexadecimal types "string", "ustring" and bitfield. Example, attach an event probe on openat system call and print name of the file that will be opened: echo "e:esys/eopen syscalls/sys_enter_openat file=\$filename:string" >> dynamic_events A new dynamic event is created in events/esys/eopen/ directory. It can be deleted with: echo "-:esys/eopen" >> dynamic_events Filters, triggers and histograms can be attached to the new event, it can be matched in synthetic events. There is one limitation - an event probe can not be attached to kprobe, uprobe or another event probe. Link: https://lkml.kernel.org/r/20210812145805.2292326-1-tz.stoyanov@gmail.com Link: https://lkml.kernel.org/r/20210819152825.142428383@goodmis.org Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Co-developed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Tzvetomir Stoyanov (VMware) <tz.stoyanov@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * | tracing/probes: Reject events which have the same name of existing oneMasami Hiramatsu2021-08-194-2/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since kprobe_events and uprobe_events only check whether the other same-type probe event has the same name or not, if the user gives the same name of the existing tracepoint event (or the other type of probe events), it silently fails to create the tracefs entry (but registered.) as below. /sys/kernel/tracing # ls events/task/task_rename enable filter format hist id trigger /sys/kernel/tracing # echo p:task/task_rename vfs_read >> kprobe_events [ 113.048508] Could not create tracefs 'task_rename' directory /sys/kernel/tracing # cat kprobe_events p:task/task_rename vfs_read To fix this issue, check whether the existing events have the same name or not in trace_probe_register_event_call(). If exists, it rejects to register the new event. Link: https://lkml.kernel.org/r/162936876189.187130.17558311387542061930.stgit@devnote2 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * | tracing/probes: Have process_fetch_insn() take a void * instead of pt_regsSteven Rostedt (VMware)2021-08-193-5/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In preparation to allow event probes to use the process_fetch_insn() callback in trace_probe_tmpl.h, change the data passed to it from a pointer to pt_regs, as the event probe will not be using regs, and make it a void pointer instead. Update the process_fetch_insn() callers for kprobe and uprobe events to have the regs defined in the function and just typecast the void pointer parameter. Link: https://lkml.kernel.org/r/20210819041842.291622924@goodmis.org Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * | tracing/probe: Change traceprobe_set_print_fmt() to take a typeSteven Rostedt (VMware)2021-08-194-11/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of a boolean "is_return" have traceprobe_set_print_fmt() take a type (currently just PROBE_PRINT_NORMAL and PROBE_PRINT_RETURN). This will simplify adding different types. For example, the development of the event_probe, will need its own type as it prints an event, and not an IP. Link: https://lkml.kernel.org/r/20210819041842.104626301@goodmis.org Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * | tracing/probes: Use struct_size() instead of defining custom macrosSteven Rostedt (VMware)2021-08-182-10/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove SIZEOF_TRACE_KPROBE() and SIZEOF_TRACE_UPROBE() and use struct_size() as that's what it is made for. No need to have custom macros. Especially since struct_size() has some extra memory checks for correctness. Link: https://lkml.kernel.org/r/20210817035027.795000217@goodmis.org Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * | tracing/probes: Allow for dot delimiter as well as slash for system namesSteven Rostedt (VMware)2021-08-181-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Kprobe and uprobe events can add a "system" to the events that are created via the kprobe_events and uprobe_events files respectively. If they do not include a "system" in the name, then the default "kprobes" or "uprobes" is used. The current notation to specify a system for one of these probe events is to add a '/' delimiter in the name, where the content before the '/' will be the system to use, and the content after will be the event name. echo 'p:my_system/my_event' > kprobe_events But this is inconsistent with the way histogram triggers separate their system / event names. The histogram triggers use a '.' delimiter, which can be confusing. To allow this to be more consistent, as well as keep backward compatibility, allow the kprobe and uprobe events to denote a system name with either a '/' or a '.'. That is: echo 'p:my_system/my_event' > kprobe_events is equivalent to: echo 'p:my_system.my_event' > kprobe_events Link: https://lore.kernel.org/linux-trace-devel/20210813004448.51c7de69ce432d338f4d226b@kernel.org/ Link: https://lkml.kernel.org/r/20210817035027.580493202@goodmis.org Suggested-by: Masami Hiramatsu <mhiramat@kernel.org> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * | tracing/probe: Have traceprobe_parse_probe_arg() take a const argSteven Rostedt (VMware)2021-08-184-36/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The two places that call traceprobe_parse_probe_arg() allocate a temporary buffer to copy the argv[i] into, because argv[i] is constant and the traceprobe_parse_probe_arg() will modify it to do the parsing. These two places allocate this buffer and then free it right after calling this function, leaving the onus of this allocation to the caller. As there's about to be a third user of this function that will have to do the same thing, instead of having the caller allocate the temporary buffer, simply move that allocation into the traceprobe_parse_probe_arg() itself, which will simplify the code of the callers. Link: https://lkml.kernel.org/r/20210817035027.385422828@goodmis.org Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * | tracing: Have dynamic events have a ref counterSteven Rostedt (VMware)2021-08-188-23/+80
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As dynamic events are not created by modules, if something is attached to one, calling "try_module_get()" on its "mod" field, is not going to keep the dynamic event from going away. Since dynamic events do not need the "mod" pointer of the event structure, make a union out of it in order to save memory (there's one structure for each of the thousand+ events in the kernel), and have any event with the DYNAMIC flag set to use a ref counter instead. Link: https://lore.kernel.org/linux-trace-devel/20210813004448.51c7de69ce432d338f4d226b@kernel.org/ Link: https://lkml.kernel.org/r/20210817035027.174869074@goodmis.org Suggested-by: Masami Hiramatsu <mhiramat@kernel.org> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * | tracing: Add DYNAMIC flag for dynamic eventsSteven Rostedt (VMware)2021-08-184-6/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To differentiate between static and dynamic events, add a new flag DYNAMIC to the event flags that all dynamic events have set. This will allow to differentiate when attaching to a dynamic event from a static event. Static events have a mod pointer that references the module they were created in (or NULL for core kernel). This can be incremented when the event has something attached to it. But there exists no such mechanism for dynamic events. This is dangerous as the dynamic events may now disappear without the "attachment" knowing that it no longer exists. To enforce the dynamic flag, change dyn_event_add() to pass the event that is being created such that it can set the DYNAMIC flag of the event. This helps make sure that no location that creates a dynamic event misses setting this flag. Link: https://lore.kernel.org/linux-trace-devel/20210813004448.51c7de69ce432d338f4d226b@kernel.org/ Link: https://lkml.kernel.org/r/20210817035026.936958254@goodmis.org Suggested-by: Masami Hiramatsu <mhiramat@kernel.org> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * | tracing: Replace deprecated CPU-hotplug functions.Sebastian Andrzej Siewior2021-08-173-26/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The functions get_online_cpus() and put_online_cpus() have been deprecated during the CPU hotplug rework. They map directly to cpus_read_lock() and cpus_read_unlock(). Replace deprecated CPU-hotplug functions with the official version. The behavior remains unchanged. Link: https://lkml.kernel.org/r/20210803141621.780504-37-bigeasy@linutronix.de Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Acked-by: Daniel Bristot de Oliveira <bristot@kernel.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * | tracing/boot: Show correct histogram error commandMasami Hiramatsu2021-08-161-4/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since trigger_process_regex() modifies given trigger actions while parsing, the error message couldn't show what command was passed to the trigger_process_regex() when it returns an error. To fix that, show the backed up trigger action command instead of parsed buffer. Link: https://lkml.kernel.org/r/162856126413.203126.9465564928450701424.stgit@devnote2 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * | tracing/boot: Support multiple histograms for each eventMasami Hiramatsu2021-08-161-9/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add multiple histograms support for each event. This allows user to set multiple histograms to an event. ftrace.[instance.INSTANCE.]event.GROUP.EVENT.hist[.N] { ... } The 'N' is a digit started string and it can be omitted for the default histogram. For example, multiple hist triggers example in the Documentation/trace/histogram.rst can be written as below; ftrace.event.net.netif_receive_skb.hist { 1 { keys = skbaddr.hex values = len filter = len < 0 } 2 { keys = skbaddr.hex values = len filter = len > 4096 } 3 { keys = skbaddr.hex values = len filter = len == 256 } 4 { keys = skbaddr.hex values = len } 5 { keys = len values = common_preempt_count } } Link: https://lkml.kernel.org/r/162856125628.203126.15846930277378572120.stgit@devnote2 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * | tracing/boot: Support multiple handlers for per-event histogramMasami Hiramatsu2021-08-161-8/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Support multiple handlers for per-event histogram in boot-time tracing. Since the histogram can register multiple same handler-actions with different parameters, this expands the syntax to support such cases. With this update, the 'onmax', 'onchange' and 'onmatch' handler subkeys under per-event histogram option will take a number subkeys optionally as below. (see [.N]) ftrace.[instance.INSTANCE.]event.GROUP.EVENT.hist { onmax|onchange[.N] { var = <VAR>; <ACTION> [= <PARAM>] } onmatch[.N] { event = <EVENT>; <ACTION> [= <PARAM>] } } The 'N' must be a digit (or digit started word). Thus user can add several handler-actions to the histogram, for example, ftrace.event.SOMEGROUP.SOMEEVENT.hist { keys = SOME_ID; lat = common_timestamp.usecs-$ts0 onmatch.1 { event = GROUP1.STARTEVENT1 trace = latency_event, SOME_ID, $lat } onmatch.2 { event = GROUP2.STARTEVENT2 trace = latency_event, SOME_ID, $lat } } Then, it can trace the elapsed time from GROUP1.STARTEVENT1 to SOMEGROUP.SOMEEVENT, and from GROUP2.STARTEVENT2 to SOMEGROUP.SOMEEVENT with SOME_ID key. Link: https://lkml.kernel.org/r/162856124905.203126.14913731908137885922.stgit@devnote2 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * | tracing/boot: Add per-event histogram action optionsMasami Hiramatsu2021-08-161-0/+231
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a hist-trigger action syntax support to boot-time tracing. Currently, boot-time tracing supports per-event actions as option strings. However, for the histogram action, it has a special syntax and usually needs a long action definition. To make it readable and fit to the bootconfig syntax, this introduces a new options for histogram. Here are the histogram action options for boot-time tracing. ftrace.[instance.INSTANCE.]event.GROUP.EVENT.hist { keys = <KEY>[,...] values = <VAL>[,...] sort = <SORT-KEY>[,...] size = <ENTRIES> name = <HISTNAME> var { <VAR> = <EXPR> ... } pause|continue|clear onmax|onchange { var = <VAR>; <ACTION> [= <PARAM>] } onmatch { event = <EVENT>; <ACTION> [= <PARAM>] } filter = <FILTER> } Where <ACTION> is one of below; trace = <EVENT>, <ARG1>[, ...] save = <ARG1>[, ...] snapshot Link: https://lkml.kernel.org/r/162856124106.203126.10501871028479029087.stgit@devnote2 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * | tracing: Simplify the Kconfig dependency of FTRACEMasahiro Yamada2021-08-161-5/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The entire FTRACE block is surrounded by 'if TRACING_SUPPORT' ... 'endif'. Using 'depends on' is a simpler way to guard FTRACE. Link: https://lkml.kernel.org/r/20210731052233.4703-1-masahiroy@kernel.org Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * | tracing: Allow execnames to be passed as args for synthetic eventsSteven Rostedt (VMware)2021-08-161-4/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow common_pid.execname to be saved in a variable in one histogram to be passed to another histogram that can pass it as a parameter to a synthetic event. ># echo 'hist:keys=pid:__arg__1=common_timestamp.usecs:arg2=common_pid.execname' \ > events/sched/sched_waking/trigger ># echo 'wakeup_lat s32 pid; u64 delta; char wake_comm[]' > synthetic_events ># echo 'hist:keys=next_pid:pid=next_pid,delta=common_timestamp.usecs-$__arg__1,exec=$arg2'\ ':onmatch(sched.sched_waking).trace(wakeup_lat,$pid,$delta,$exec)' \ > events/sched/sched_switch/trigger The above is a wake up latency synthetic event setup that passes the execname of the common_pid that woke the task to the scheduling of that task, which triggers a synthetic event that passes the original execname as a parameter to display it. ># echo 1 > events/synthetic/enable ># cat trace <idle>-0 [006] d..4 186.863801: wakeup_lat: pid=1306 delta=65 wake_comm=kworker/u16:3 <idle>-0 [000] d..4 186.863858: wakeup_lat: pid=163 delta=27 wake_comm=<idle> <idle>-0 [001] d..4 186.863903: wakeup_lat: pid=1307 delta=36 wake_comm=kworker/u16:4 <idle>-0 [000] d..4 186.863927: wakeup_lat: pid=163 delta=5 wake_comm=<idle> <idle>-0 [006] d..4 186.863957: wakeup_lat: pid=1306 delta=24 wake_comm=kworker/u16:3 sshd-1306 [006] d..4 186.864051: wakeup_lat: pid=61 delta=62 wake_comm=<idle> <idle>-0 [000] d..4 186.965030: wakeup_lat: pid=609 delta=18 wake_comm=<idle> <idle>-0 [006] d..4 186.987582: wakeup_lat: pid=1306 delta=65 wake_comm=kworker/u16:3 <idle>-0 [000] d..4 186.987639: wakeup_lat: pid=163 delta=27 wake_comm=<idle> Link: https://lkml.kernel.org/r/20210722142837.458596338@goodmis.org Reviewed-by: Tom Zanussi <zanussi@kernel.org> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * | tracing: Have histogram types be constant when possibleSteven Rostedt (VMware)2021-08-161-18/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of kstrdup("const", GFP_KERNEL), have the hist_field type simply assign the constant hist_field->type = "const"; And when the value passed to it is a variable, use "kstrdup_const(var, GFP_KERNEL);" which will just copy the value if the variable is already a constant. This saves on having to allocate when not needed. All frees of the hist_field->type will need to use kfree_const(). Link: https://lkml.kernel.org/r/20210722142837.280718447@goodmis.org Suggested-by: Masami Hiramatsu <mhiramat@kernel.org> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * | tracing/histogram: Update the documentation for the buckets modifierSteven Rostedt (VMware)2021-08-161-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Update both the tracefs README file as well as the histogram.rst to include an explanation of what the buckets modifier is and how to use it. Include an example with the wakeup_latency example for both log2 and the buckets modifiers as there was no existing log2 example. Link: https://lkml.kernel.org/r/20210707213922.167218794@goodmis.org Acked-by: Namhyung Kim <namhyung@kernel.org> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * | tracing: Add linear buckets to histogram logicSteven Rostedt (VMware)2021-08-161-7/+58
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There's been several times I wished the histogram logic had a "grouping" feature for the buckets. Currently, each bucket has a size of one. That is, if you trace the amount of requested allocations, each allocation is its own bucket, even if you are interested in what allocates 100 bytes or less, 100 to 200, 200 to 300, etc. Also, without grouping, it fills up the allocated histogram buckets quickly. If you are tracking latency, and don't care if something is 200 microseconds off, or 201 microseconds off, but want to track them by say 10 microseconds each. This can not currently be done. There is a log2 but that grouping get's too big too fast for a lot of cases. Introduce a "buckets=SIZE" command to each field where it will record in a rounded number. For example: ># echo 'hist:keys=bytes_req.buckets=100:sort=bytes_req' > events/kmem/kmalloc/trigger ># cat events/kmem/kmalloc/hist # event histogram # # trigger info: hist:keys=bytes_req.buckets=100:vals=hitcount:sort=bytes_req.buckets=100:size=2048 [active] # { bytes_req: ~ 0-99 } hitcount: 3149 { bytes_req: ~ 100-199 } hitcount: 1468 { bytes_req: ~ 200-299 } hitcount: 39 { bytes_req: ~ 300-399 } hitcount: 306 { bytes_req: ~ 400-499 } hitcount: 364 { bytes_req: ~ 500-599 } hitcount: 32 { bytes_req: ~ 600-699 } hitcount: 69 { bytes_req: ~ 700-799 } hitcount: 37 { bytes_req: ~ 1200-1299 } hitcount: 16 { bytes_req: ~ 1400-1499 } hitcount: 30 { bytes_req: ~ 2000-2099 } hitcount: 6 { bytes_req: ~ 4000-4099 } hitcount: 2168 { bytes_req: ~ 5000-5099 } hitcount: 6 Totals: Hits: 7690 Entries: 13 Dropped: 0 Link: https://lkml.kernel.org/r/20210707213921.980359719@goodmis.org Acked-by: Namhyung Kim <namhyung@kernel.org> Reviewed-by: Tom Zanussi <zanussi@kernel.org> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Tested-by: Daniel Bristot de Oliveira <bristot@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * | tracing/boot: Fix a hist trigger dependency for boot time tracingMasami Hiramatsu2021-08-161-6/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fixes a build error when CONFIG_HIST_TRIGGERS=n with boot-time tracing. Since the trigger_process_regex() is defined only when CONFIG_HIST_TRIGGERS=y, if it is disabled, the 'actions' event option also must be disabled. Link: https://lkml.kernel.org/r/162856123376.203126.582144262622247352.stgit@devnote2 Fixes: 81a59555ff15 ("tracing/boot: Add per-event settings") Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
* | | Merge tag 'printk-for-5.15' of ↵Linus Torvalds2021-09-011-2/+0
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux Pull printk updates from Petr Mladek: - Optionally, provide an index of possible printk messages via <debugfs>/printk/index/. It can be used when monitoring important kernel messages on a farm of various hosts. The monitor has to be updated when some messages has changed or are not longer available by a newly deployed kernel. - Add printk.console_no_auto_verbose boot parameter. It allows to generate crash dump even with slow consoles in a reasonable time frame. - Remove printk_safe buffers. The messages are always stored directly to the main logbuffer, even in NMI or recursive context. Also it allows to serialize syslog operations by a mutex instead of a spin lock. - Misc clean up and build fixes. * tag 'printk-for-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux: printk/index: Fix -Wunused-function warning lib/nmi_backtrace: Serialize even messages about idle CPUs printk: Add printk.console_no_auto_verbose boot parameter printk: Remove console_silent() lib/test_scanf: Handle n_bits == 0 in random tests printk: syslog: close window between wait and read printk: convert @syslog_lock to mutex printk: remove NMI tracking printk: remove safe buffers printk: track/limit recursion lib/nmi_backtrace: explicitly serialize banner and regs printk: Move the printk() kerneldoc comment to its new home printk/index: Fix warning about missing prototypes MIPS/asm/printk: Fix build failure caused by printk printk: index: Add indexing support to dev_printk printk: Userspace format indexing support printk: Rework parse_prefix into printk_parse_prefix printk: Straighten out log_flags into printk_info_flags string_helpers: Escape double quotes in escape_special printk/console: Check consistent sequence number when handling race in console_unlock()
| * \ \ Merge branch 'rework/printk_safe-removal' into for-linusPetr Mladek2021-08-301-2/+0
| |\ \ \
| | * | | printk: remove NMI trackingJohn Ogness2021-07-261-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | All NMI contexts are handled the same as the safe context: store the message and defer printing. There is no need to have special NMI context tracking for this. Using in_nmi() is enough. There are several parts of the kernel that are manually calling into the printk NMI context tracking in order to cause general printk deferred printing: arch/arm/kernel/smp.c arch/powerpc/kexec/crash.c kernel/trace/trace.c For arm/kernel/smp.c and powerpc/kexec/crash.c, provide a new function pair printk_deferred_enter/exit that explicitly achieves the same objective. For ftrace, remove the printk context manipulation completely. It was added in commit 03fc7f9c99c1 ("printk/nmi: Prevent deadlock when accessing the main log buffer in NMI"). The purpose was to enforce storing messages directly into the ring buffer even in NMI context. It really should have only modified the behavior in NMI context. There is no need for a special behavior any longer. All messages are always stored directly now. The console deferring is handled transparently in vprintk(). Signed-off-by: John Ogness <john.ogness@linutronix.de> [pmladek@suse.com: Remove special handling in ftrace.c completely. Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20210715193359.25946-5-john.ogness@linutronix.de
* | | | | Merge tag 'net-next-5.15' of ↵Linus Torvalds2021-08-311-12/+100
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "Core: - Enable memcg accounting for various networking objects. BPF: - Introduce bpf timers. - Add perf link and opaque bpf_cookie which the program can read out again, to be used in libbpf-based USDT library. - Add bpf_task_pt_regs() helper to access user space pt_regs in kprobes, to help user space stack unwinding. - Add support for UNIX sockets for BPF sockmap. - Extend BPF iterator support for UNIX domain sockets. - Allow BPF TCP congestion control progs and bpf iterators to call bpf_setsockopt(), e.g. to switch to another congestion control algorithm. Protocols: - Support IOAM Pre-allocated Trace with IPv6. - Support Management Component Transport Protocol. - bridge: multicast: add vlan support. - netfilter: add hooks for the SRv6 lightweight tunnel driver. - tcp: - enable mid-stream window clamping (by user space or BPF) - allow data-less, empty-cookie SYN with TFO_SERVER_COOKIE_NOT_REQD - more accurate DSACK processing for RACK-TLP - mptcp: - add full mesh path manager option - add partial support for MP_FAIL - improve use of backup subflows - optimize option processing - af_unix: add OOB notification support. - ipv6: add IFLA_INET6_RA_MTU to expose MTU value advertised by the router. - mac80211: Target Wake Time support in AP mode. - can: j1939: extend UAPI to notify about RX status. Driver APIs: - Add page frag support in page pool API. - Many improvements to the DSA (distributed switch) APIs. - ethtool: extend IRQ coalesce uAPI with timer reset modes. - devlink: control which auxiliary devices are created. - Support CAN PHYs via the generic PHY subsystem. - Proper cross-chip support for tag_8021q. - Allow TX forwarding for the software bridge data path to be offloaded to capable devices. Drivers: - veth: more flexible channels number configuration. - openvswitch: introduce per-cpu upcall dispatch. - Add internet mix (IMIX) mode to pktgen. - Transparently handle XDP operations in the bonding driver. - Add LiteETH network driver. - Renesas (ravb): - support Gigabit Ethernet IP - NXP Ethernet switch (sja1105): - fast aging support - support for "H" switch topologies - traffic termination for ports under VLAN-aware bridge - Intel 1G Ethernet - support getcrosststamp() with PCIe PTM (Precision Time Measurement) for better time sync - support Credit-Based Shaper (CBS) offload, enabling HW traffic prioritization and bandwidth reservation - Broadcom Ethernet (bnxt) - support pulse-per-second output - support larger Rx rings - Mellanox Ethernet (mlx5) - support ethtool RSS contexts and MQPRIO channel mode - support LAG offload with bridging - support devlink rate limit API - support packet sampling on tunnels - Huawei Ethernet (hns3): - basic devlink support - add extended IRQ coalescing support - report extended link state - Netronome Ethernet (nfp): - add conntrack offload support - Broadcom WiFi (brcmfmac): - add WPA3 Personal with FT to supported cipher suites - support 43752 SDIO device - Intel WiFi (iwlwifi): - support scanning hidden 6GHz networks - support for a new hardware family (Bz) - Xen pv driver: - harden netfront against malicious backends - Qualcomm mobile - ipa: refactor power management and enable automatic suspend - mhi: move MBIM to WWAN subsystem interfaces Refactor: - Ambient BPF run context and cgroup storage cleanup. - Compat rework for ndo_ioctl. Old code removal: - prism54 remove the obsoleted driver, deprecated by the p54 driver. - wan: remove sbni/granch driver" * tag 'net-next-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1715 commits) net: Add depends on OF_NET for LiteX's LiteETH ipv6: seg6: remove duplicated include net: hns3: remove unnecessary spaces net: hns3: add some required spaces net: hns3: clean up a type mismatch warning net: hns3: refine function hns3_set_default_feature() ipv6: remove duplicated 'net/lwtunnel.h' include net: w5100: check return value after calling platform_get_resource() net/mlxbf_gige: Make use of devm_platform_ioremap_resourcexxx() net: mdio: mscc-miim: Make use of the helper function devm_platform_ioremap_resource() net: mdio-ipq4019: Make use of devm_platform_ioremap_resource() fou: remove sparse errors ipv4: fix endianness issue in inet_rtm_getroute_build_skb() octeontx2-af: Set proper errorcode for IPv4 checksum errors octeontx2-af: Fix static code analyzer reported issues octeontx2-af: Fix mailbox errors in nix_rss_flowkey_cfg octeontx2-af: Fix loop in free and unmap counter af_unix: fix potential NULL deref in unix_dgram_connect() dpaa2-eth: Replace strlcpy with strscpy octeontx2-af: Use NDC TX for transmit packet data ...
| * \ \ \ \ Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextJakub Kicinski2021-08-301-10/+62
| |\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Daniel Borkmann says: ==================== bpf-next 2021-08-31 We've added 116 non-merge commits during the last 17 day(s) which contain a total of 126 files changed, 6813 insertions(+), 4027 deletions(-). The main changes are: 1) Add opaque bpf_cookie to perf link which the program can read out again, to be used in libbpf-based USDT library, from Andrii Nakryiko. 2) Add bpf_task_pt_regs() helper to access userspace pt_regs, from Daniel Xu. 3) Add support for UNIX stream type sockets for BPF sockmap, from Jiang Wang. 4) Allow BPF TCP congestion control progs to call bpf_setsockopt() e.g. to switch to another congestion control algorithm during init, from Martin KaFai Lau. 5) Extend BPF iterator support for UNIX domain sockets, from Kuniyuki Iwashima. 6) Allow bpf_{set,get}sockopt() calls from setsockopt progs, from Prankur Gupta. 7) Add bpf_get_netns_cookie() helper for BPF_PROG_TYPE_{SOCK_OPS,CGROUP_SOCKOPT} progs, from Xu Liu and Stanislav Fomichev. 8) Support for __weak typed ksyms in libbpf, from Hao Luo. 9) Shrink struct cgroup_bpf by 504 bytes through refactoring, from Dave Marchevsky. 10) Fix a smatch complaint in verifier's narrow load handling, from Andrey Ignatov. 11) Fix BPF interpreter's tail call count limit, from Daniel Borkmann. 12) Big batch of improvements to BPF selftests, from Magnus Karlsson, Li Zhijian, Yucong Sun, Yonghong Song, Ilya Leoshkevich, Jussi Maki, Ilya Leoshkevich, others. 13) Another big batch to revamp XDP samples in order to give them consistent look and feel, from Kumar Kartikeya Dwivedi. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (116 commits) MAINTAINERS: Remove self from powerpc BPF JIT selftests/bpf: Fix potential unreleased lock samples: bpf: Fix uninitialized variable in xdp_redirect_cpu selftests/bpf: Reduce more flakyness in sockmap_listen bpf: Fix bpf-next builds without CONFIG_BPF_EVENTS bpf: selftests: Add dctcp fallback test bpf: selftests: Add connect_to_fd_opts to network_helpers bpf: selftests: Add sk_state to bpf_tcp_helpers.h bpf: tcp: Allow bpf-tcp-cc to call bpf_(get|set)sockopt selftests: xsk: Preface options with opt selftests: xsk: Make enums lower case selftests: xsk: Generate packets from specification selftests: xsk: Generate packet directly in umem selftests: xsk: Simplify cleanup of ifobjects selftests: xsk: Decrease sending speed selftests: xsk: Validate tx stats on tx thread selftests: xsk: Simplify packet validation in xsk tests selftests: xsk: Rename worker_* functions that are not thread entry points selftests: xsk: Disassociate umem size with packets sent selftests: xsk: Remove end-of-test packet ... ==================== Link: https://lore.kernel.org/r/20210830225618.11634-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| | * | | | | bpf: Fix bpf-next builds without CONFIG_BPF_EVENTSDaniel Xu2021-08-251-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit fixes linker errors along the lines of: s390-linux-ld: task_iter.c:(.init.text+0xa4): undefined reference to `btf_task_struct_ids'` Fix by defining btf_task_struct_ids unconditionally in kernel/bpf/btf.c since there exists code that unconditionally uses btf_task_struct_ids. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/05d94748d9f4b3eecedc4fddd6875418a396e23c.1629942444.git.dxu@dxuuu.xyz