aboutsummaryrefslogtreecommitdiffstats
path: root/kernel
Commit message (Collapse)AuthorAgeFilesLines
* Revert "printk: add kthread console printers"Petr Mladek2022-06-231-307/+22
| | | | | | | | | | | | | | | | | | | | | | This reverts commit 09c5ba0aa2fcfdadb17d045c3ee6f86d69270df7. This reverts commit b87f02307d3cfbda768520f0687c51ca77e14fc3. The testing of 5.19 release candidates revealed missing synchronization between early and regular console functionality. It would be possible to start the console kthreads later as a workaround. But it is clear that console lock serialized console drivers between each other. It opens a big area of possible problems that were not considered by people involved in the development and review. printk() is crucial for debugging kernel issues and console output is very important part of it. The number of consoles is huge and a proper review would take some time. As a result it need to be reverted for 5.19. Link: https://lore.kernel.org/r/YrBdjVwBOVgLfHyb@alley Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20220623145157.21938-6-pmladek@suse.com
* Revert "printk: extend console_lock for per-console locking"Petr Mladek2022-06-231-205/+56
| | | | | | | | | | | | | | | | | | | | This reverts commit 8e274732115f63c1d09136284431b3555bd5cc56. The testing of 5.19 release candidates revealed missing synchronization between early and regular console functionality. It would be possible to start the console kthreads later as a workaround. But it is clear that console lock serialized console drivers between each other. It opens a big area of possible problems that were not considered by people involved in the development and review. printk() is crucial for debugging kernel issues and console output is very important part of it. The number of consoles is huge and a proper review would take some time. As a result it need to be reverted for 5.19. Link: https://lore.kernel.org/r/YrBdjVwBOVgLfHyb@alley Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20220623145157.21938-5-pmladek@suse.com
* Revert "printk: remove @console_locked"Petr Mladek2022-06-231-14/+15
| | | | | | | | | | | | | | | | | | | | This reverts commit ab406816fca009349b89cbde885daf68a8c77e33. The testing of 5.19 release candidates revealed missing synchronization between early and regular console functionality. It would be possible to start the console kthreads later as a workaround. But it is clear that console lock serialized console drivers between each other. It opens a big area of possible problems that were not considered by people involved in the development and review. printk() is crucial for debugging kernel issues and console output is very important part of it. The number of consoles is huge and a proper review would take some time. As a result it need to be reverted for 5.19. Link: https://lore.kernel.org/r/YrBdjVwBOVgLfHyb@alley Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20220623145157.21938-4-pmladek@suse.com
* Revert "printk: Block console kthreads when direct printing will be required"Petr Mladek2022-06-231-3/+1
| | | | | | | | | | | | | | | | | | | | This reverts commit c3230283e2819a69dad2cf7a63143fde8bab8b5c. The testing of 5.19 release candidates revealed missing synchronization between early and regular console functionality. It would be possible to start the console kthreads later as a workaround. But it is clear that console lock serialized console drivers between each other. It opens a big area of possible problems that were not considered by people involved in the development and review. printk() is crucial for debugging kernel issues and console output is very important part of it. The number of consoles is huge and a proper review would take some time. As a result it need to be reverted for 5.19. Link: https://lore.kernel.org/r/YrBdjVwBOVgLfHyb@alley Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20220623145157.21938-3-pmladek@suse.com
* Revert "printk: Wait for the global console lock when the system is going down"Petr Mladek2022-06-235-42/+0
| | | | | | | | | | | | | | | | | | | | This reverts commit b87f02307d3cfbda768520f0687c51ca77e14fc3. The testing of 5.19 release candidates revealed missing synchronization between early and regular console functionality. It would be possible to start the console kthreads later as a workaround. But it is clear that console lock serialized console drivers between each other. It opens a big area of possible problems that were not considered by people involved in the development and review. printk() is crucial for debugging kernel issues and console output is very important part of it. The number of consoles is huge and a proper review would take some time. As a result it need to be reverted for 5.19. Link: https://lore.kernel.org/r/YrBdjVwBOVgLfHyb@alley Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20220623145157.21938-2-pmladek@suse.com
* printk: Wait for the global console lock when the system is going downPetr Mladek2022-06-155-0/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | There are reports that the console kthreads block the global console lock when the system is going down, for example, reboot, panic. First part of the solution was to block kthreads in these problematic system states so they stopped handling newly added messages. Second part of the solution is to wait when for the kthreads when they are actively printing. It solves the problem when a message was printed before the system entered the problematic state and the kthreads managed to step in. A busy waiting has to be used because panic() can be called in any context and in an unknown state of the scheduler. There must be a timeout because the kthread might get stuck or sleeping and never release the lock. The timeout 10s is an arbitrary value inspired by the softlockup timeout. Link: https://lore.kernel.org/r/20220610205038.GA3050413@paulmck-ThinkPad-P17-Gen-1 Link: https://lore.kernel.org/r/CAMdYzYpF4FNTBPZsEFeWRuEwSies36QM_As8osPWZSr2q-viEA@mail.gmail.com Signed-off-by: Petr Mladek <pmladek@suse.com> Tested-by: Paul E. McKenney <paulmck@kernel.org> Link: https://lore.kernel.org/r/20220615162805.27962-3-pmladek@suse.com
* printk: Block console kthreads when direct printing will be requiredPetr Mladek2022-06-151-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are known situations when the console kthreads are not reliable or does not work in principle, for example, early boot, panic, shutdown. For these situations there is the direct (legacy) mode when printk() tries to get console_lock() and flush the messages directly. It works very well during the early boot when the console kthreads are not available at all. It gets more complicated in the other situations when console kthreads might be actively printing and block console_trylock() in printk(). The same problem is in the legacy code as well. Any console_lock() owner could block console_trylock() in printk(). It is solved by a trick that the current console_lock() owner is responsible for printing all pending messages. It is actually the reason why there is the risk of softlockups and why the console kthreads were introduced. The console kthreads use the same approach. They are responsible for printing the messages by definition. So that they handle the messages anytime when they are awake and see new ones. The global console_lock is available when there is nothing to do. It should work well when the problematic context is correctly detected and printk() switches to the direct mode. But it seems that it is not enough in practice. There are reports that the messages are not printed during panic() or shutdown() even though printk() tries to use the direct mode here. The problem seems to be that console kthreads become active in these situation as well. They steel the job before other CPUs are stopped. Then they are stopped in the middle of the job and block the global console_lock. First part of the solution is to block console kthreads when the system is in a problematic state and requires the direct printk() mode. Link: https://lore.kernel.org/r/20220610205038.GA3050413@paulmck-ThinkPad-P17-Gen-1 Link: https://lore.kernel.org/r/CAMdYzYpF4FNTBPZsEFeWRuEwSies36QM_As8osPWZSr2q-viEA@mail.gmail.com Suggested-by: John Ogness <john.ogness@linutronix.de> Tested-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20220615162805.27962-2-pmladek@suse.com
* Revert "printk: wake up all waiters"John Ogness2022-05-271-1/+1
| | | | | | | | | | | | | | | | | | | | | | This reverts commit 938ba4084abcf6fdd21d9078513c52f8fb9b00d0. The wait queue @log_wait never has exclusive waiters, so there is no need to use wake_up_interruptible_all(). Using wake_up_interruptible() was the correct function to wake all waiters. Since there are no exclusive waiters, erroneously changing wake_up_interruptible() to wake_up_interruptible_all() did not result in any behavior change. However, using wake_up_interruptible_all() on a wait queue without exclusive waiters is fundamentally wrong. Go back to using wake_up_interruptible() to wake all waiters. Signed-off-by: John Ogness <john.ogness@linutronix.de> Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20220526203056.81123-1-john.ogness@linutronix.de
* Merge tag 'printk-for-5.19' of ↵Linus Torvalds2022-05-257-291/+953
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux Pull printk updates from Petr Mladek: - Offload writing printk() messages on consoles to per-console kthreads. It prevents soft-lockups when an extensive amount of messages is printed. It was observed, for example, during boot of large systems with a lot of peripherals like disks or network interfaces. It prevents live-lockups that were observed, for example, when messages about allocation failures were reported and a CPU handled consoles instead of reclaiming the memory. It was hard to solve even with rate limiting because it would need to take into account the amount of messages and the speed of all consoles. It is a must to have for real time. Otherwise, any printk() might break latency guarantees. The per-console kthreads allow to handle each console on its own speed. Slow consoles do not longer slow down faster ones. And printk() does not longer unpredictably slows down various code paths. There are situations when the kthreads are either not available or not reliable, for example, early boot, suspend, or panic. In these situations, printk() uses the legacy mode and tries to handle consoles immediately. - Add documentation for the printk index. * tag 'printk-for-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux: printk, tracing: fix console tracepoint printk: remove @console_locked printk: extend console_lock for per-console locking printk: add kthread console printers printk: add functions to prefer direct printing printk: add pr_flush() printk: move buffer definitions into console_emit_next_record() caller printk: refactor and rework printing logic printk: add con_printk() macro for console details printk: call boot_delay_msec() in printk_delay() printk: get caller_id/timestamp after migration disable printk: wake waiters for safe and NMI contexts printk: wake up all waiters printk: add missing memory barrier to wake_up_klogd() printk: cpu sync always disable interrupts printk: rename cpulock functions printk/index: Printk index feature documentation MAINTAINERS: Add printk indexing maintainers on mention of printk_index
| * printk, tracing: fix console tracepointMarco Elver2022-05-061-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The original intent of the 'console' tracepoint per the commit 95100358491a ("printk/tracing: Add console output tracing") had been to "[...] record any printk messages into the trace, regardless of the current console loglevel. This can help correlate (existing) printk debugging with other tracing." Petr points out [1] that calling trace_console_rcuidle() in call_console_driver() had been the wrong thing for a while, because "printk() always used console_trylock() and the message was flushed to the console only when the trylock succeeded. And it was always deferred in NMI or when printed via printk_deferred()." With the commit 09c5ba0aa2fc ("printk: add kthread console printers"), things only got worse, and calls to call_console_driver() no longer happen with typical printk() calls but always appear deferred [2]. As such, the tracepoint can no longer serve its purpose to clearly correlate printk() calls and other tracing, as well as breaks usecases that expect every printk() call to result in a callback of the console tracepoint. Notably, the KFENCE and KCSAN test suites, which want to capture console output and assume a printk() immediately gives us a callback to the console tracepoint. Fix the console tracepoint by moving it into printk_sprint() [3]. One notable difference is that by moving tracing into printk_sprint(), the 'text' will no longer include the "header" (loglevel and timestamp), but only the raw message. Arguably this is less of a problem now that the console tracepoint happens on the printk() call and isn't delayed. Link: https://lore.kernel.org/all/Ym+WqKStCg%2FEHfh3@alley/ [1] Link: https://lore.kernel.org/all/CA+G9fYu2kS0wR4WqMRsj2rePKV9XLgOU1PiXnMvpT+Z=c2ucHA@mail.gmail.com/ [2] Link: https://lore.kernel.org/all/87fslup9dx.fsf@jogness.linutronix.de/ [3] Reported-by: Linux Kernel Functional Testing <lkft@linaro.org> Signed-off-by: Marco Elver <elver@google.com> Cc: John Ogness <john.ogness@linutronix.de> Cc: Petr Mladek <pmladek@suse.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Acked-by: John Ogness <john.ogness@linutronix.de> Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20220503073844.4148944-1-elver@google.com
| * printk: remove @console_lockedJohn Ogness2022-04-261-15/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The static global variable @console_locked is used to help debug VT code to make sure that certain code paths are running with the console_lock held. However, this information is also available with the static global variable @console_kthreads_blocked (for locking via console_lock()), and the static global variable @console_kthreads_active (for locking via console_trylock()). Remove @console_locked and update is_console_locked() to use the alternative variables. Signed-off-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20220421212250.565456-16-john.ogness@linutronix.de
| * printk: extend console_lock for per-console lockingJohn Ogness2022-04-261-56/+205
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently threaded console printers synchronize against each other using console_lock(). However, different console drivers are unrelated and do not require any synchronization between each other. Removing the synchronization between the threaded console printers will allow each console to print at its own speed. But the threaded consoles printers do still need to synchronize against console_lock() callers. Introduce a per-console mutex and a new console boolean field @blocked to provide this synchronization. console_lock() is modified so that it must acquire the mutex of each console in order to set the @blocked field. Console printing threads will acquire their mutex while printing a record. If @blocked was set, the thread will go back to sleep instead of printing. The reason for the @blocked boolean field is so that console_lock() callers do not need to acquire multiple console mutexes simultaneously, which would introduce unnecessary complexity due to nested mutex locking. Also, a new field was chosen instead of adding a new @flags value so that the blocked status could be checked without concern of reading inconsistent values due to @flags updates from other contexts. Threaded console printers also need to synchronize against console_trylock() callers. Since console_trylock() may be called from any context, the per-console mutex cannot be used for this synchronization. (mutex_trylock() cannot be called from atomic contexts.) Introduce a global atomic counter to identify if any threaded printers are active. The threaded printers will also check the atomic counter to identify if the console has been locked by another task via console_trylock(). Note that @console_sem is still used to provide synchronization between console_lock() and console_trylock() callers. A locking overview for console_lock(), console_trylock(), and the threaded printers is as follows (pseudo code): console_lock() { down(&console_sem); for_each_console(con) { mutex_lock(&con->lock); con->blocked = true; mutex_unlock(&con->lock); } /* console_lock acquired */ } console_trylock() { if (down_trylock(&console_sem) == 0) { if (atomic_cmpxchg(&console_kthreads_active, 0, -1) == 0) { /* console_lock acquired */ } } } threaded_printer() { mutex_lock(&con->lock); if (!con->blocked) { /* console_lock() callers blocked */ if (atomic_inc_unless_negative(&console_kthreads_active)) { /* console_trylock() callers blocked */ con->write(); atomic_dec(&console_lock_count); } } mutex_unlock(&con->lock); } The console owner and waiter logic now only applies between contexts that have taken the console_lock via console_trylock(). Threaded printers never take the console_lock, so they do not have a console_lock to handover. Tasks that have used console_lock() will block the threaded printers using a mutex and if the console_lock is handed over to an atomic context, it would be unable to unblock the threaded printers. However, the console_trylock() case is really the only scenario that is interesting for handovers anyway. @panic_console_dropped must change to atomic_t since it is no longer protected exclusively by the console_lock. Since threaded printers remain asleep if they see that the console is locked, they now must be explicitly woken in __console_unlock(). This means wake_up_klogd() calls following a console_unlock() are no longer necessary and are removed. Also note that threaded printers no longer need to check @console_suspended. The check for the @blocked field implicitly covers the suspended console case. Signed-off-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/878rrs6ft7.fsf@jogness.linutronix.de
| * printk: add kthread console printersJohn Ogness2022-04-221-22/+307
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Create a kthread for each console to perform console printing. During normal operation (@system_state == SYSTEM_RUNNING), the kthread printers are responsible for all printing on their respective consoles. During non-normal operation, console printing is done as it has been: within the context of the printk caller or within irqwork triggered by the printk caller, referred to as direct printing. Since threaded console printers are responsible for all printing during normal operation, this also includes messages generated via deferred printk calls. If direct printing is in effect during a deferred printk call, the queued irqwork will perform the direct printing. To make it clear that this is the only time that the irqwork will perform direct printing, rename the flag PRINTK_PENDING_OUTPUT to PRINTK_PENDING_DIRECT_OUTPUT. Threaded console printers synchronize against each other and against console lockers by taking the console lock for each message that is printed. Note that the kthread printers do not care about direct printing. They will always try to print if new records are available. They can be blocked by direct printing, but will be woken again once direct printing is finished. Console unregistration is a bit tricky because the associated kthread printer cannot be stopped while the console lock is held. A policy is implemented that states: whichever task clears con->thread (under the console lock) is responsible for stopping the kthread. unregister_console() will clear con->thread while the console lock is held and then stop the kthread after releasing the console lock. For consoles that have implemented the exit() callback, the kthread is stopped before exit() is called. Signed-off-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20220421212250.565456-14-john.ogness@linutronix.de
| * printk: add functions to prefer direct printingJohn Ogness2022-04-227-2/+65
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Once kthread printing is available, console printing will no longer occur in the context of the printk caller. However, there are some special contexts where it is desirable for the printk caller to directly print out kernel messages. Using pr_flush() to wait for threaded printers is only possible if the caller is in a sleepable context and the kthreads are active. That is not always the case. Introduce printk_prefer_direct_enter() and printk_prefer_direct_exit() functions to explicitly (and globally) activate/deactivate preferred direct console printing. The term "direct console printing" refers to printing to all enabled consoles from the context of the printk caller. The term "prefer" is used because this type of printing is only best effort. If the console is currently locked or other printers are already actively printing, the printk caller will need to rely on the other contexts to handle the printing. This preferred direct printing is how all printing has been handled until now (unless it was explicitly deferred). When kthread printing is introduced, there may be some unanticipated problems due to kthreads being unable to flush important messages. In order to minimize such risks, preferred direct printing is activated for the primary important messages when the system experiences general types of major errors. These are: - emergency reboot/shutdown - cpu and rcu stalls - hard and soft lockups - hung tasks - warn - sysrq Note that since kthread printing does not yet exist, no behavior changes result from this commit. This is only implementing the counter and marking the various places where preferred direct printing is active. Signed-off-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Acked-by: Paul E. McKenney <paulmck@kernel.org> # for RCU Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20220421212250.565456-13-john.ogness@linutronix.de
| * printk: add pr_flush()John Ogness2022-04-221-0/+83
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Provide a might-sleep function to allow waiting for console printers to catch up to the latest logged message. Use pr_flush() whenever it is desirable to get buffered messages printed before continuing: suspend_console(), resume_console(), console_stop(), console_start(), console_unblank(). Signed-off-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20220421212250.565456-12-john.ogness@linutronix.de
| * printk: move buffer definitions into console_emit_next_record() callerJohn Ogness2022-04-221-17/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Extended consoles print extended messages and do not print messages about dropped records. Non-extended consoles print "normal" messages as well as extra messages about dropped records. Currently the buffers for these various message types are defined within the functions that might use them and their usage is based upon the CON_EXTENDED flag. This will be a problem when moving to kthread printers because each printer must be able to provide its own buffers. Move all the message buffer definitions outside of console_emit_next_record(). The caller knows if extended or dropped messages should be printed and can specify the appropriate buffers to use. The console_emit_next_record() and call_console_driver() functions can know what to print based on whether specified buffers are non-NULL. With this change, buffer definition/allocation/specification is separated from the code that does the various types of string printing. Signed-off-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20220421212250.565456-11-john.ogness@linutronix.de
| * printk: refactor and rework printing logicJohn Ogness2022-04-221-213/+228
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Refactor/rework printing logic in order to prepare for moving to threaded console printing. - Move @console_seq into struct console so that the current "position" of each console can be tracked individually. - Move @console_dropped into struct console so that the current drop count of each console can be tracked individually. - Modify printing logic so that each console independently loads, prepares, and prints its next record. - Remove exclusive_console logic. Since console positions are handled independently, replaying past records occurs naturally. - Update the comments explaining why preemption is disabled while printing from printk() context. With these changes, there is a change in behavior: the console replaying the log (formerly exclusive console) will no longer block other consoles. New messages appear on the other consoles while the newly added console is still replaying. Signed-off-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20220421212250.565456-10-john.ogness@linutronix.de
| * printk: add con_printk() macro for console detailsJohn Ogness2022-04-221-6/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | It is useful to generate log messages that include details about the related console. Rather than duplicate the code to assemble the details, put that code into a macro con_printk(). Once console printers become threaded, this macro will find more users. Signed-off-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20220421212250.565456-9-john.ogness@linutronix.de
| * printk: call boot_delay_msec() in printk_delay()John Ogness2022-04-221-3/+4
| | | | | | | | | | | | | | | | | | | | | | boot_delay_msec() is always called immediately before printk_delay() so just call it from within printk_delay(). Signed-off-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Reviewed-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20220421212250.565456-8-john.ogness@linutronix.de
| * printk: get caller_id/timestamp after migration disableJohn Ogness2022-04-221-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently the local CPU timestamp and caller_id for the record are collected while migration is enabled. Since this information is CPU-specific, it should be collected with migration disabled. Migration is disabled immediately after collecting this information anyway, so just move the information collection to after the migration disabling. Signed-off-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Reviewed-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20220421212250.565456-7-john.ogness@linutronix.de
| * printk: wake waiters for safe and NMI contextsJohn Ogness2022-04-221-12/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When printk() is called from safe or NMI contexts, it will directly store the record (vprintk_store()) and then defer the console output. However, defer_console_output() only causes console printing and does not wake any waiters of new records. Wake waiters from defer_console_output() so that they also are aware of the new records from safe and NMI contexts. Fixes: 03fc7f9c99c1 ("printk/nmi: Prevent deadlock when accessing the main log buffer in NMI") Signed-off-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20220421212250.565456-6-john.ogness@linutronix.de
| * printk: wake up all waitersJohn Ogness2022-04-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | There can be multiple tasks waiting for new records. They should all be woken. Use wake_up_interruptible_all() instead of wake_up_interruptible(). Signed-off-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20220421212250.565456-5-john.ogness@linutronix.de
| * printk: add missing memory barrier to wake_up_klogd()John Ogness2022-04-221-3/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It is important that any new records are visible to preparing waiters before the waker checks if the wait queue is empty. Otherwise it is possible that: - there are new records available - the waker sees an empty wait queue and does not wake - the preparing waiter sees no new records and begins to wait This is exactly the problem that the function description of waitqueue_active() warns about. Use wq_has_sleeper() instead of waitqueue_active() because it includes the necessary full memory barrier. Signed-off-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20220421212250.565456-4-john.ogness@linutronix.de
| * printk: rename cpulock functionsJohn Ogness2022-04-221-35/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since the printk cpulock is CPU-reentrant and since it is used in all contexts, its usage must be carefully considered and most likely will require programming locklessly. To avoid mistaking the printk cpulock as a typical lock, rename it to cpu_sync. The main functions then become: printk_cpu_sync_get_irqsave(flags); printk_cpu_sync_put_irqrestore(flags); Add extra notes of caution in the function description to help developers understand the requirements for correct usage. Signed-off-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20220421212250.565456-2-john.ogness@linutronix.de
* | Merge tag 'folio-5.19' of git://git.infradead.org/users/willy/pagecacheLinus Torvalds2022-05-241-3/+4
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull page cache updates from Matthew Wilcox: - Appoint myself page cache maintainer - Fix how scsicam uses the page cache - Use the memalloc_nofs_save() API to replace AOP_FLAG_NOFS - Remove the AOP flags entirely - Remove pagecache_write_begin() and pagecache_write_end() - Documentation updates - Convert several address_space operations to use folios: - is_dirty_writeback - readpage becomes read_folio - releasepage becomes release_folio - freepage becomes free_folio - Change filler_t to require a struct file pointer be the first argument like ->read_folio * tag 'folio-5.19' of git://git.infradead.org/users/willy/pagecache: (107 commits) nilfs2: Fix some kernel-doc comments Appoint myself page cache maintainer fs: Remove aops->freepage secretmem: Convert to free_folio nfs: Convert to free_folio orangefs: Convert to free_folio fs: Add free_folio address space operation fs: Convert drop_buffers() to use a folio fs: Change try_to_free_buffers() to take a folio jbd2: Convert release_buffer_page() to use a folio jbd2: Convert jbd2_journal_try_to_free_buffers to take a folio reiserfs: Convert release_buffer_page() to use a folio fs: Remove last vestiges of releasepage ubifs: Convert to release_folio reiserfs: Convert to release_folio orangefs: Convert to release_folio ocfs2: Convert to release_folio nilfs2: Remove comment about releasepage nfs: Convert to release_folio jfs: Convert to release_folio ...
| * | mm,fs: Remove aops->readpageMatthew Wilcox (Oracle)2022-05-091-3/+2
| | | | | | | | | | | | | | | | | | | | | With all implementations of aops->readpage converted to aops->read_folio, we can stop checking whether it's set and remove the member from aops. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
| * | fs: Introduce aops->read_folioMatthew Wilcox (Oracle)2022-05-091-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | Change all the callers of ->readpage to call ->read_folio in preference, if it exists. This is a transitional duplication, and will be removed by the end of the series. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
* | | Merge tag 'pm-5.19-rc1' of ↵Linus Torvalds2022-05-245-66/+49
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management updates from Rafael Wysocki: "These add support for 'artificial' Energy Models in which power numbers for different entities may be in different scales, add support for some new hardware, fix bugs and clean up code in multiple places. Specifics: - Update the Energy Model support code to allow the Energy Model to be artificial, which means that the power values may not be on a uniform scale with other devices providing power information, and update the cpufreq_cooling and devfreq_cooling thermal drivers to support artificial Energy Models (Lukasz Luba). - Make DTPM check the Energy Model type (Lukasz Luba). - Fix policy counter decrementation in cpufreq if Energy Model is in use (Pierre Gondois). - Add CPU-based scaling support to passive devfreq governor (Saravana Kannan, Chanwoo Choi). - Update the rk3399_dmc devfreq driver (Brian Norris). - Export dev_pm_ops instead of suspend() and resume() in the IIO chemical scd30 driver (Jonathan Cameron). - Add namespace variants of EXPORT[_GPL]_SIMPLE_DEV_PM_OPS and PM-runtime counterparts (Jonathan Cameron). - Move symbol exports in the IIO chemical scd30 driver into the IIO_SCD30 namespace (Jonathan Cameron). - Avoid device PM-runtime usage count underflows (Rafael Wysocki). - Allow dynamic debug to control printing of PM messages (David Cohen). - Fix some kernel-doc comments in hibernation code (Yang Li, Haowen Bai). - Preserve ACPI-table override during hibernation (Amadeusz Sławiński). - Improve support for suspend-to-RAM for PSCI OSI mode (Ulf Hansson). - Make Intel RAPL power capping driver support the RaptorLake and AlderLake N processors (Zhang Rui, Sumeet Pawnikar). - Remove redundant store to value after multiply in the RAPL power capping driver (Colin Ian King). - Add AlderLake processor support to the intel_idle driver (Zhang Rui). - Fix regression leading to no genpd governor in the PSCI cpuidle driver and fix the riscv-sbi cpuidle driver to allow a genpd governor to be used (Ulf Hansson). - Fix cpufreq governor clean up code to avoid using kfree() directly to free kobject-based items (Kevin Hao). - Prepare cpufreq for powerpc's asm/prom.h cleanup (Christophe Leroy). - Make intel_pstate notify frequency invariance code when no_turbo is turned on and off (Chen Yu). - Add Sapphire Rapids OOB mode support to intel_pstate (Srinivas Pandruvada). - Make cpufreq avoid unnecessary frequency updates due to mismatch between hardware and the frequency table (Viresh Kumar). - Make remove_cpu_dev_symlink() clear the real_cpus mask to simplify code (Viresh Kumar). - Rearrange cpufreq_offline() and cpufreq_remove_dev() to make the calling convention for some driver callbacks consistent (Rafael Wysocki). - Avoid accessing half-initialized cpufreq policies from the show() and store() sysfs functions (Schspa Shi). - Rearrange cpufreq_offline() to make the calling convention for some driver callbacks consistent (Schspa Shi). - Update CPPC handling in cpufreq (Pierre Gondois). - Extend dev_pm_domain_detach() doc (Krzysztof Kozlowski). - Move genpd's time-accounting to ktime_get_mono_fast_ns() (Ulf Hansson). - Improve the way genpd deals with its governors (Ulf Hansson). - Update the turbostat utility to version 2022.04.16 (Len Brown, Dan Merillat, Sumeet Pawnikar, Zephaniah E. Loss-Cutler-Hull, Chen Yu)" * tag 'pm-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (94 commits) PM: domains: Trust domain-idle-states from DT to be correct by genpd PM: domains: Measure power-on/off latencies in genpd based on a governor PM: domains: Allocate governor data dynamically based on a genpd governor PM: domains: Clean up some code in pm_genpd_init() and genpd_remove() PM: domains: Fix initialization of genpd's next_wakeup PM: domains: Fixup QoS latency measurements for IRQ safe devices in genpd PM: domains: Measure suspend/resume latencies in genpd based on governor PM: domains: Move the next_wakeup variable into the struct gpd_timing_data PM: domains: Allocate gpd_timing_data dynamically based on governor PM: domains: Skip another warning in irq_safe_dev_in_sleep_domain() PM: domains: Rename irq_safe_dev_in_no_sleep_domain() in genpd PM: domains: Don't check PM_QOS_FLAG_NO_POWER_OFF in genpd PM: domains: Drop redundant code for genpd always-on governor PM: domains: Add GENPD_FLAG_RPM_ALWAYS_ON for the always-on governor powercap: intel_rapl: remove redundant store to value after multiply cpufreq: CPPC: Enable dvfs_possible_from_any_cpu cpufreq: CPPC: Enable fast_switch ACPI: CPPC: Assume no transition latency if no PCCT ACPI: bus: Set CPPC _OSC bits for all and when CPPC_LIB is supported ACPI: CPPC: Check _OSC for flexible address space ...
| | \ \
| | \ \
| *-. \ \ Merge branches 'pm-em' and 'pm-cpuidle'Rafael J. Wysocki2022-05-231-29/+36
| |\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Marge Energy Model support updates and cpuidle updates for 5.19-rc1: - Update the Energy Model support code to allow the Energy Model to be artificial, which means that the power values may not be on a uniform scale with other devices providing power information, and update the cpufreq_cooling and devfreq_cooling thermal drivers to support artificial Energy Models (Lukasz Luba). - Make DTPM check the Energy Model type (Lukasz Luba). - Fix policy counter decrementation in cpufreq if Energy Model is in use (Pierre Gondois). - Add AlderLake processor support to the intel_idle driver (Zhang Rui). - Fix regression leading to no genpd governor in the PSCI cpuidle driver and fix the riscv-sbi cpuidle driver to allow a genpd governor to be used (Ulf Hansson). * pm-em: PM: EM: Decrement policy counter powercap: DTPM: Check for Energy Model type thermal: cooling: Check Energy Model type in cpufreq_cooling and devfreq_cooling Documentation: EM: Add artificial EM registration description PM: EM: Remove old debugfs files and print all 'flags' PM: EM: Change the order of arguments in the .active_power() callback PM: EM: Use the new .get_cost() callback while registering EM PM: EM: Add artificial EM flag PM: EM: Add .get_cost() callback * pm-cpuidle: cpuidle: riscv-sbi: Fix code to allow a genpd governor to be used cpuidle: psci: Fix regression leading to no genpd governor intel_idle: Add AlderLake support
| | * | | | PM: EM: Decrement policy counterPierre Gondois2022-05-111-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In commit e458716a92b57 ("PM: EM: Mark inefficiencies in CPUFreq"), cpufreq_cpu_get() is called without a cpufreq_cpu_put(), permanently increasing the reference counts of the policy struct. Decrement the reference count once the policy struct is not used anymore. Fixes: e458716a92b57 ("PM: EM: Mark inefficiencies in CPUFreq") Tested-by: Cristian Marussi <cristian.marussi@arm.com> Signed-off-by: Pierre Gondois <pierre.gondois@arm.com> Reviewed-by: Vincent Donnefort <vincent.donnefort@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | * | | | PM: EM: Remove old debugfs files and print all 'flags'Lukasz Luba2022-04-131-19/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The Energy Model gets more bits used in 'flags'. Avoid adding another debugfs file just to print what is the status of a new flag. Simply remove old debugfs files and add one generic which prints all flags as a hex value. Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Reviewed-by: Ionela Voinescu <ionela.voinescu@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | * | | | PM: EM: Change the order of arguments in the .active_power() callbackLukasz Luba2022-04-131-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The .active_power() callback passes the device pointer when it's called. Aligned with a convetion present in other subsystems and pass the 'dev' as a first argument. It looks more cleaner. Adjust all affected drivers which implement that API callback. Suggested-by: Ionela Voinescu <ionela.voinescu@arm.com> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Reviewed-by: Ionela Voinescu <ionela.voinescu@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | * | | | PM: EM: Use the new .get_cost() callback while registering EMLukasz Luba2022-04-131-11/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The Energy Model (EM) allows to provide the 'cost' values when the device driver provides the .get_cost() optional callback. This removes restriction which is in the EM calculation function of the 'cost' for each performance state. Now, the driver is in charge of providing the right values which are then used by Energy Aware Scheduler. Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Reviewed-by: Ionela Voinescu <ionela.voinescu@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | * | | | PM: EM: Add artificial EM flagPierre Gondois2022-04-131-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The Energy Model (EM) can be used on platforms which are missing real power information. Those platforms would implement .get_cost() which populates needed values for the Energy Aware Scheduler (EAS). The EAS doesn't use 'power' fields from EM, but other frameworks might use them. Thus, to avoid miss-usage of this specific type of EM, introduce a new flags which can be checked by other frameworks. Signed-off-by: Pierre Gondois <Pierre.Gondois@arm.com> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Reviewed-by: Ionela Voinescu <ionela.voinescu@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | | | | |
| | \ \ \ \
| | \ \ \ \
| | \ \ \ \
| *---. \ \ \ \ Merge branches 'pm-core', 'pm-sleep' and 'powercap'Rafael J. Wysocki2022-05-234-37/+13
| |\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge PM core changes, updates related to system sleep and power capping updates for 5.19-rc1: - Export dev_pm_ops instead of suspend() and resume() in the IIO chemical scd30 driver (Jonathan Cameron). - Add namespace variants of EXPORT[_GPL]_SIMPLE_DEV_PM_OPS and PM-runtime counterparts (Jonathan Cameron). - Move symbol exports in the IIO chemical scd30 driver into the IIO_SCD30 namespace (Jonathan Cameron). - Avoid device PM-runtime usage count underflows (Rafael Wysocki). - Allow dynamic debug to control printing of PM messages (David Cohen). - Fix some kernel-doc comments in hibernation code (Yang Li, Haowen Bai). - Preserve ACPI-table override during hibernation (Amadeusz Sławiński). - Improve support for suspend-to-RAM for PSCI OSI mode (Ulf Hansson). - Make Intel RAPL power capping driver support the RaptorLake and AlderLake N processors (Zhang Rui, Sumeet Pawnikar). - Remove redundant store to value after multiply in the RAPL power capping driver (Colin Ian King). * pm-core: PM: runtime: Avoid device usage count underflows iio: chemical: scd30: Move symbol exports into IIO_SCD30 namespace PM: core: Add NS varients of EXPORT[_GPL]_SIMPLE_DEV_PM_OPS and runtime pm equiv iio: chemical: scd30: Export dev_pm_ops instead of suspend() and resume() * pm-sleep: cpuidle: PSCI: Improve support for suspend-to-RAM for PSCI OSI mode PM: runtime: Allow to call __pm_runtime_set_status() from atomic context PM: hibernate: Don't mark comment as kernel-doc x86/ACPI: Preserve ACPI-table override during hibernation PM: hibernate: Fix some kernel-doc comments PM: sleep: enable dynamic debug support within pm_pr_dbg() PM: sleep: Narrow down -DDEBUG on kernel/power/ files * powercap: powercap: intel_rapl: remove redundant store to value after multiply powercap: intel_rapl: add support for ALDERLAKE_N powercap: RAPL: Add Power Limit4 support for RaptorLake powercap: intel_rapl: add support for RaptorLake
| | | * | | | | | PM: hibernate: Don't mark comment as kernel-docHaowen Bai2022-04-131-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Change the comment to a normal (non-kernel-doc) comment to avoid these kernel-doc warnings: kernel/power/snapshot.c:335: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst * Data types related to memory bitmaps. Signed-off-by: Haowen Bai <baihaowen@meizu.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | | * | | | | | PM: hibernate: Fix some kernel-doc commentsYang Li2022-04-131-3/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add parameter description in alloc_rtree_node() kernel-doc comment and fix several inconsistent function name descriptions. Remove some warnings found by running scripts/kernel-doc, which is caused by using 'make W=1'. kernel/power/snapshot.c:438: warning: Function parameter or member 'gfp_mask' not described in 'alloc_rtree_node' kernel/power/snapshot.c:438: warning: Function parameter or member 'safe_needed' not described in 'alloc_rtree_node' kernel/power/snapshot.c:438: warning: Function parameter or member 'ca' not described in 'alloc_rtree_node' kernel/power/snapshot.c:438: warning: Function parameter or member 'list' not described in 'alloc_rtree_node' kernel/power/snapshot.c:916: warning: expecting prototype for memory_bm_rtree_next_pfn(). Prototype was for memory_bm_next_pfn() instead kernel/power/snapshot.c:1947: warning: expecting prototype for alloc_highmem_image_pages(). Prototype was for alloc_highmem_pages() instead kernel/power/snapshot.c:2230: warning: expecting prototype for load header(). Prototype was for load_header() instead Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> [ rjw: Comment adjustments to avoid line breaks ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | | * | | | | | PM: sleep: enable dynamic debug support within pm_pr_dbg()David Cohen2022-04-131-29/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently pm_pr_dbg() is used to filter kernel pm debug messages based on pm_debug_messages_on flag. The problem is if we enable/disable this flag it will affect all pm_pr_dbg() calls at once, so we can't individually control them. This patch changes pm_pr_dbg() implementation as such: - If pm_debug_messages_on is enabled, print the message. - If pm_debug_messages_on is disabled and CONFIG_DYNAMIC_DEBUG is enabled, only print the messages explicitly enabled on /sys/kernel/debug/dynamic_debug/control. - If pm_debug_messages_on is disabled and CONFIG_DYNAMIC_DEBUG is disabled, don't print the message. Signed-off-by: David Cohen <dacohen@pm.me> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | | * | | | | | PM: sleep: Narrow down -DDEBUG on kernel/power/ filesDavid Cohen2022-04-132-4/+5
| | | | |/ / / / | | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The macro -DDEBUG is broadly enabled on kernel/power/ directory if CONFIG_DYNAMIC_DEBUG is enabled. As side effect all debug messages using pr_debug() and dev_dbg() are enabled by default on dynamic debug. We're reworking pm_pr_dbg() to support dynamic debug, where pm_pr_dbg() will print message if either pm_debug_messages_on flag is set or if it's explicitly enabled on dynamic debug's control. That means if we let -DDEBUG broadly set, the pm_debug_messages_on flag will be bypassed by default on pm_pr_dbg() if dynamic debug is also enabled. The files that directly use pr_debug() and dev_dbg() on kernel/power/ are: - swap.c - snapshot.c - energy_model.c And those files do not use pm_pr_dbg(). So if we limit -DDEBUG to them, we keep the same functional behavior while allowing the pm_pr_dbg() refactor. Signed-off-by: David Cohen <dacohen@pm.me> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* | | | | | | | Merge tag 'seccomp-v5.19-rc1' of ↵Linus Torvalds2022-05-241-3/+41
|\ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull seccomp updates from Kees Cook: - Rework USER_NOTIF notification ordering and kill logic (Sargun Dhillon) - Improved PTRACE_O_SUSPEND_SECCOMP selftest (Jann Horn) - Gracefully handle failed unshare() in selftests (Yang Guang) - Spelling fix (Colin Ian King) * tag 'seccomp-v5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: selftests/seccomp: Fix spelling mistake "Coud" -> "Could" selftests/seccomp: Add test for wait killable notifier selftests/seccomp: Refactor get_proc_stat to split out file reading code seccomp: Add wait_killable semantic to seccomp user notifier selftests/seccomp: Ensure that notifications come in FIFO order seccomp: Use FIFO semantics to order notifications selftests/seccomp: Add SKIP for failed unshare() selftests/seccomp: Test PTRACE_O_SUSPEND_SECCOMP without CAP_SYS_ADMIN
| * | | | | | | | seccomp: Add wait_killable semantic to seccomp user notifierSargun Dhillon2022-05-031-2/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This introduces a per-filter flag (SECCOMP_FILTER_FLAG_WAIT_KILLABLE_RECV) that makes it so that when notifications are received by the supervisor the notifying process will transition to wait killable semantics. Although wait killable isn't a set of semantics formally exposed to userspace, the concept is searchable. If the notifying process is signaled prior to the notification being received by the userspace agent, it will be handled as normal. One quirk about how this is handled is that the notifying process only switches to TASK_KILLABLE if it receives a wakeup from either an addfd or a signal. This is to avoid an unnecessary wakeup of the notifying task. The reasons behind switching into wait_killable only after userspace receives the notification are: * Avoiding unncessary work - Often, workloads will perform work that they may abort (request racing comes to mind). This allows for syscalls to be aborted safely prior to the notification being received by the supervisor. In this, the supervisor doesn't end up doing work that the workload does not want to complete anyways. * Avoiding side effects - We don't want the syscall to be interruptible once the supervisor starts doing work because it may not be trivial to reverse the operation. For example, unmounting a file system may take a long time, and it's hard to rollback, or treat that as reentrant. * Avoid breaking runtimes - Various runtimes do not GC when they are during a syscall (or while running native code that subsequently calls a syscall). If many notifications are blocked, and not picked up by the supervisor, this can get the application into a bad state. Signed-off-by: Sargun Dhillon <sargun@sargun.me> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20220503080958.20220-2-sargun@sargun.me
| * | | | | | | | seccomp: Use FIFO semantics to order notificationsSargun Dhillon2022-04-291-1/+1
| | |_|/ / / / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, the seccomp notifier used LIFO semantics, where each notification would be added on top of the stack, and notifications were popped off the top of the stack. This could result one process that generates a large number of notifications preventing other notifications from being handled. This patch moves from LIFO (stack) semantics to FIFO (queue semantics). Signed-off-by: Sargun Dhillon <sargun@sargun.me> Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20220428015447.13661-1-sargun@sargun.me
* | | | | | | | Merge tag 'kernel-hardening-v5.19-rc1' of ↵Linus Torvalds2022-05-242-43/+64
|\ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull kernel hardening updates from Kees Cook: - usercopy hardening expanded to check other allocation types (Matthew Wilcox, Yuanzheng Song) - arm64 stackleak behavioral improvements (Mark Rutland) - arm64 CFI code gen improvement (Sami Tolvanen) - LoadPin LSM block dev API adjustment (Christoph Hellwig) - Clang randstruct support (Bill Wendling, Kees Cook) * tag 'kernel-hardening-v5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (34 commits) loadpin: stop using bdevname mm: usercopy: move the virt_addr_valid() below the is_vmalloc_addr() gcc-plugins: randstruct: Remove cast exception handling af_unix: Silence randstruct GCC plugin warning niu: Silence randstruct warnings big_keys: Use struct for internal payload gcc-plugins: Change all version strings match kernel randomize_kstack: Improve docs on requirements/rationale lkdtm/stackleak: fix CONFIG_GCC_PLUGIN_STACKLEAK=n arm64: entry: use stackleak_erase_on_task_stack() stackleak: add on/off stack variants lkdtm/stackleak: check stack boundaries lkdtm/stackleak: prevent unexpected stack usage lkdtm/stackleak: rework boundary management lkdtm/stackleak: avoid spurious failure stackleak: rework poison scanning stackleak: rework stack high bound handling stackleak: clarify variable names stackleak: rework stack low bound handling stackleak: remove redundant check ...
| * | | | | | | | stackleak: add on/off stack variantsMark Rutland2022-05-081-3/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The stackleak_erase() code dynamically handles being on a task stack or another stack. In most cases, this is a fixed property of the caller, which the caller is aware of, as an architecture might always return using the task stack, or might always return using a trampoline stack. This patch adds stackleak_erase_on_task_stack() and stackleak_erase_off_task_stack() functions which callers can use to avoid on_thread_stack() check and associated redundant work when the calling stack is known. The existing stackleak_erase() is retained as a safe default. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Alexander Popov <alex.popov@linux.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20220427173128.2603085-13-mark.rutland@arm.com
| * | | | | | | | stackleak: rework poison scanningMark Rutland2022-05-081-14/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently we over-estimate the region of stack which must be erased. To determine the region to be erased, we scan downwards for a contiguous block of poison values (or the low bound of the stack). There are a few minor problems with this today: * When we find a block of poison values, we include this block within the region to erase. As this is included within the region to erase, this causes us to redundantly overwrite 'STACKLEAK_SEARCH_DEPTH' (128) bytes with poison. * As the loop condition checks 'poison_count <= depth', it will run an additional iteration after finding the contiguous block of poison, decrementing 'erase_low' once more than necessary. As this is included within the region to erase, this causes us to redundantly overwrite an additional unsigned long with poison. * As we always decrement 'erase_low' after checking an element on the stack, we always include the element below this within the region to erase. As this is included within the region to erase, this causes us to redundantly overwrite an additional unsigned long with poison. Note that this is not a functional problem. As the loop condition checks 'erase_low > task_stack_low', we'll never clobber the STACK_END_MAGIC. As we always decrement 'erase_low' after this, we'll never fail to erase the element immediately above the STACK_END_MAGIC. In total, this can cause us to erase `128 + 2 * sizeof(unsigned long)` bytes more than necessary, which is unfortunate. This patch reworks the logic to find the address immediately above the poisoned region, by finding the lowest non-poisoned address. This is factored into a stackleak_find_top_of_poison() helper both for clarity and so that this can be shared with the LKDTM test in subsequent patches. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Alexander Popov <alex.popov@linux.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20220427173128.2603085-8-mark.rutland@arm.com
| * | | | | | | | stackleak: rework stack high bound handlingMark Rutland2022-05-081-5/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Prior to returning to userspace, we reset current->lowest_stack to a reasonable high bound. Currently we do this by subtracting the arbitrary value `THREAD_SIZE/64` from the top of the stack, for reasons lost to history. Looking at configurations today: * On i386 where THREAD_SIZE is 8K, the bound will be 128 bytes. The pt_regs at the top of the stack is 68 bytes (with 0 to 16 bytes of padding above), and so this covers an additional portion of 44 to 60 bytes. * On x86_64 where THREAD_SIZE is at least 16K (up to 32K with KASAN) the bound will be at least 256 bytes (up to 512 with KASAN). The pt_regs at the top of the stack is 168 bytes, and so this cover an additional 88 bytes of stack (up to 344 with KASAN). * On arm64 where THREAD_SIZE is at least 16K (up to 64K with 64K pages and VMAP_STACK), the bound will be at least 256 bytes (up to 1024 with KASAN). The pt_regs at the top of the stack is 336 bytes, so this can fall within the pt_regs, or can cover an additional 688 bytes of stack. Clearly the `THREAD_SIZE/64` value doesn't make much sense -- in the worst case, this will cause more than 600 bytes of stack to be erased for every syscall, even if actual stack usage were substantially smaller. This patches makes this slightly less nonsensical by consistently resetting current->lowest_stack to the base of the task pt_regs. For clarity and for consistency with the handling of the low bound, the generation of the high bound is split into a helper with commentary explaining why. Since the pt_regs at the top of the stack will be clobbered upon the next exception entry, we don't need to poison these at exception exit. By using task_pt_regs() as the high stack boundary instead of current_top_of_stack() we avoid some redundant poisoning, and the compiler can share the address generation between the poisoning and resetting of `current->lowest_stack`, making the generated code more optimal. It's not clear to me whether the existing `THREAD_SIZE/64` offset was a dodgy heuristic to skip the pt_regs, or whether it was attempting to minimize the number of times stackleak_check_stack() would have to update `current->lowest_stack` when stack usage was shallow at the cost of unconditionally poisoning a small portion of the stack for every exit to userspace. For now I've simply removed the offset, and if we need/want to minimize updates for shallow stack usage it should be easy to add a better heuristic atop, with appropriate commentary so we know what's going on. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Alexander Popov <alex.popov@linux.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20220427173128.2603085-7-mark.rutland@arm.com
| * | | | | | | | stackleak: clarify variable namesMark Rutland2022-05-081-16/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The logic within __stackleak_erase() can be a little hard to follow, as `boundary` switches from being the low bound to the high bound mid way through the function, and `kstack_ptr` is used to represent the start of the region to erase while `boundary` represents the end of the region to erase. Make this a little clearer by consistently using clearer variable names. The `boundary` variable is removed, the bounds of the region to erase are described by `erase_low` and `erase_high`, and bounds of the task stack are described by `task_stack_low` and `task_stack_high`. As the same time, remove the comment above the variables, since it is unclear whether it's intended as rationale, a complaint, or a TODO, and is more confusing than helpful. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Alexander Popov <alex.popov@linux.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20220427173128.2603085-6-mark.rutland@arm.com
| * | | | | | | | stackleak: rework stack low bound handlingMark Rutland2022-05-081-10/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In stackleak_task_init(), stackleak_track_stack(), and __stackleak_erase(), we open-code skipping the STACK_END_MAGIC at the bottom of the stack. Each case is implemented slightly differently, and only the __stackleak_erase() case is commented. In stackleak_task_init() and stackleak_track_stack() we unconditionally add sizeof(unsigned long) to the lowest stack address. In stackleak_task_init() we use end_of_stack() for this, and in stackleak_track_stack() we use task_stack_page(). In __stackleak_erase() we handle this by detecting if `kstack_ptr` has hit the stack end boundary, and if so, conditionally moving it above the magic. This patch adds a new stackleak_task_low_bound() helper which is used in all three cases, which unconditionally adds sizeof(unsigned long) to the lowest address on the task stack, with commentary as to why. This uses end_of_stack() as stackleak_task_init() did prior to this patch, as this is consistent with the code in kernel/fork.c which initializes the STACK_END_MAGIC value. In __stackleak_erase() we no longer need to check whether we've spilled into the STACK_END_MAGIC value, as stackleak_track_stack() ensures that `current->lowest_stack` stops immediately above this, and similarly the poison scan will stop immediately above this. For stackleak_task_init() and stackleak_track_stack() this results in no change to code generation. For __stackleak_erase() the generated assembly is slightly simpler and shorter. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Alexander Popov <alex.popov@linux.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20220427173128.2603085-5-mark.rutland@arm.com
| * | | | | | | | stackleak: remove redundant checkMark Rutland2022-05-081-4/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In __stackleak_erase() we check that the `erase_low` value derived from `current->lowest_stack` is above the lowest legitimate stack pointer value, but this is already enforced by stackleak_track_stack() when recording the lowest stack value. Remove the redundant check. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Alexander Popov <alex.popov@linux.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20220427173128.2603085-4-mark.rutland@arm.com
| * | | | | | | | stackleak: move skip_erasing() check earlierMark Rutland2022-05-081-4/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In stackleak_erase() we check skip_erasing() after accessing some fields from current. As generating the address of current uses asm which hazards with the static branch asm, this work is always performed, even when the static branch is patched to jump to the return at the end of the function. This patch avoids this redundant work by moving the skip_erasing() check earlier. To avoid complicating initialization within stackleak_erase(), the body of the function is split out into a __stackleak_erase() helper, with the check left in a wrapper function. The __stackleak_erase() helper is marked __always_inline to ensure that this is inlined into stackleak_erase() and not instrumented. Before this patch, on x86-64 w/ GCC 11.1.0 the start of the function is: <stackleak_erase>: 65 48 8b 04 25 00 00 mov %gs:0x0,%rax 00 00 48 8b 48 20 mov 0x20(%rax),%rcx 48 8b 80 98 0a 00 00 mov 0xa98(%rax),%rax 66 90 xchg %ax,%ax <------------ static branch 48 89 c2 mov %rax,%rdx 48 29 ca sub %rcx,%rdx 48 81 fa ff 3f 00 00 cmp $0x3fff,%rdx After this patch, on x86-64 w/ GCC 11.1.0 the start of the function is: <stackleak_erase>: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) <--- static branch 65 48 8b 04 25 00 00 mov %gs:0x0,%rax 00 00 48 8b 48 20 mov 0x20(%rax),%rcx 48 8b 80 98 0a 00 00 mov 0xa98(%rax),%rax 48 89 c2 mov %rax,%rdx 48 29 ca sub %rcx,%rdx 48 81 fa ff 3f 00 00 cmp $0x3fff,%rdx Before this patch, on arm64 w/ GCC 11.1.0 the start of the function is: <stackleak_erase>: d503245f bti c d5384100 mrs x0, sp_el0 f9401003 ldr x3, [x0, #32] f9451000 ldr x0, [x0, #2592] d503201f nop <------------------------------- static branch d503233f paciasp cb030002 sub x2, x0, x3 d287ffe1 mov x1, #0x3fff eb01005f cmp x2, x1 After this patch, on arm64 w/ GCC 11.1.0 the start of the function is: <stackleak_erase>: d503245f bti c d503201f nop <------------------------------- static branch d503233f paciasp d5384100 mrs x0, sp_el0 f9401003 ldr x3, [x0, #32] d287ffe1 mov x1, #0x3fff f9451000 ldr x0, [x0, #2592] cb030002 sub x2, x0, x3 eb01005f cmp x2, x1 While this may not be a huge win on its own, moving the static branch will permit further optimization of the body of the function in subsequent patches. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Alexander Popov <alex.popov@linux.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20220427173128.2603085-3-mark.rutland@arm.com