| Commit message (Collapse) | Author | Age | Files | Lines |
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management and ACPI updates from Rafael Wysocki:
"This time the total number of ACPI commits is slightly greater than
the number of cpufreq commits, but Viresh Kumar (who works on cpufreq)
remains the most active patch submitter.
To me, the most significant change is the addition of offline/online
device operations to the driver core (with the Greg's blessing) and
the related modifications of the ACPI core hotplug code. Next are the
freezer updates from Colin Cross that should make the freezing of
tasks a bit less heavy weight.
We also have a couple of regression fixes, a number of fixes for
issues that have not been identified as regressions, two new drivers
and a bunch of cleanups all over.
Highlights:
- Hotplug changes to support graceful hot-removal failures.
It sometimes is necessary to fail device hot-removal operations
gracefully if they cannot be carried out completely. For example,
if memory from a memory module being hot-removed has been allocated
for the kernel's own use and cannot be moved elsewhere, it's
desirable to fail the hot-removal operation in a graceful way
rather than to crash the kernel, but currenty a success or a kernel
crash are the only possible outcomes of an attempted memory
hot-removal. Needless to say, that is not a very attractive
alternative and it had to be addressed.
However, in order to make it work for memory, I first had to make
it work for CPUs and for this purpose I needed to modify the ACPI
processor driver. It's been split into two parts, a resident one
handling the low-level initialization/cleanup and a modular one
playing the actual driver's role (but it binds to the CPU system
device objects rather than to the ACPI device objects representing
processors). That's been sort of like a live brain surgery on a
patient who's riding a bike.
So this is a little scary, but since we found and fixed a couple of
regressions it caused to happen during the early linux-next testing
(a month ago), nobody has complained.
As a bonus we remove some duplicated ACPI hotplug code, because the
ACPI-based CPU hotplug is now going to use the common ACPI hotplug
code.
- Lighter weight freezing of tasks.
These changes from Colin Cross and Mandeep Singh Baines are
targeted at making the freezing of tasks a bit less heavy weight
operation. They reduce the number of tasks woken up every time
during the freezing, by using the observation that the freezer
simply doesn't need to wake up some of them and wait for them all
to call refrigerator(). The time needed for the freezer to decide
to report a failure is reduced too.
Also reintroduced is the check causing a lockdep warining to
trigger when try_to_freeze() is called with locks held (which is
generally unsafe and shouldn't happen).
- cpufreq updates
First off, a commit from Srivatsa S Bhat fixes a resume regression
introduced during the 3.10 cycle causing some cpufreq sysfs
attributes to return wrong values to user space after resume. The
fix is kind of fresh, but also it's pretty obvious once Srivatsa
has identified the root cause.
Second, we have a new freqdomain_cpus sysfs attribute for the
acpi-cpufreq driver to provide information previously available via
related_cpus. From Lan Tianyu.
Finally, we fix a number of issues, mostly related to the
CPUFREQ_POSTCHANGE notifier and cpufreq Kconfig options and clean
up some code. The majority of changes from Viresh Kumar with bits
from Jacob Shin, Heiko Stübner, Xiaoguang Chen, Ezequiel Garcia,
Arnd Bergmann, and Tang Yuantian.
- ACPICA update
A usual bunch of updates from the ACPICA upstream.
During the 3.4 cycle we introduced support for ACPI 5 extended
sleep registers, but they are only supposed to be used if the
HW-reduced mode bit is set in the FADT flags and the code attempted
to use them without checking that bit. That caused suspend/resume
regressions to happen on some systems. Fix from Lv Zheng causes
those registers to be used only if the HW-reduced mode bit is set.
Apart from this some other ACPICA bugs are fixed and code cleanups
are made by Bob Moore, Tomasz Nowicki, Lv Zheng, Chao Guan, and
Zhang Rui.
- cpuidle updates
New driver for Xilinx Zynq processors is added by Michal Simek.
Multidriver support simplification, addition of some missing
kerneldoc comments and Kconfig-related fixes come from Daniel
Lezcano.
- ACPI power management updates
Changes to make suspend/resume work correctly in Xen guests from
Konrad Rzeszutek Wilk, sparse warning fix from Fengguang Wu and
cleanups and fixes of the ACPI device power state selection
routine.
- ACPI documentation updates
Some previously missing pieces of ACPI documentation are added by
Lv Zheng and Aaron Lu (hopefully, that will help people to
uderstand how the ACPI subsystem works) and one outdated doc is
updated by Hanjun Guo.
- Assorted ACPI updates
We finally nailed down the IA-64 issue that was the reason for
reverting commit 9f29ab11ddbf ("ACPI / scan: do not match drivers
against objects having scan handlers"), so we can fix it and move
the ACPI scan handler check added to the ACPI video driver back to
the core.
A mechanism for adding CMOS RTC address space handlers is
introduced by Lan Tianyu to allow some EC-related breakage to be
fixed on some systems.
A spec-compliant implementation of acpi_os_get_timer() is added by
Mika Westerberg.
The evaluation of _STA is added to do_acpi_find_child() to avoid
situations in which a pointer to a disabled device object is
returned instead of an enabled one with the same _ADR value. From
Jeff Wu.
Intel BayTrail PCH (Platform Controller Hub) support is added to
the ACPI driver for Intel Low-Power Subsystems (LPSS) and that
driver is modified to work around a couple of known BIOS issues.
Changes from Mika Westerberg and Heikki Krogerus.
The EC driver is fixed by Vasiliy Kulikov to use get_user() and
put_user() instead of dereferencing user space pointers blindly.
Code cleanups are made by Bjorn Helgaas, Nicholas Mazzuca and Toshi
Kani.
- Assorted power management updates
The "runtime idle" helper routine is changed to take the return
values of the callbacks executed by it into account and to call
rpm_suspend() if they return 0, which allows us to reduce the
overall code bloat a bit (by dropping some code that's not
necessary any more after that modification).
The runtime PM documentation is updated by Alan Stern (to reflect
the "runtime idle" behavior change).
New trace points for PM QoS are added by Sahara
(<keun-o.park@windriver.com>).
PM QoS documentation is updated by Lan Tianyu.
Code cleanups are made and minor issues are addressed by Bernie
Thompson, Bjorn Helgaas, Julius Werner, and Shuah Khan.
- devfreq updates
New driver for the Exynos5-bus device from Abhilash Kesavan.
Minor cleanups, fixes and MAINTAINERS update from MyungJoo Ham,
Abhilash Kesavan, Paul Bolle, Rajagopal Venkat, and Wei Yongjun.
- OMAP power management updates
Adaptive Voltage Scaling (AVS) SmartReflex voltage control driver
updates from Andrii Tseglytskyi and Nishanth Menon."
* tag 'pm+acpi-3.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (162 commits)
cpufreq: Fix cpufreq regression after suspend/resume
ACPI / PM: Fix possible NULL pointer deref in acpi_pm_device_sleep_state()
PM / Sleep: Warn about system time after resume with pm_trace
cpufreq: don't leave stale policy pointer in cdbs->cur_policy
acpi-cpufreq: Add new sysfs attribute freqdomain_cpus
cpufreq: make sure frequency transitions are serialized
ACPI: implement acpi_os_get_timer() according the spec
ACPI / EC: Add HP Folio 13 to ec_dmi_table in order to skip DSDT scan
ACPI: Add CMOS RTC Operation Region handler support
ACPI / processor: Drop unused variable from processor_perflib.c
cpufreq: tegra: call CPUFREQ_POSTCHANGE notfier in error cases
cpufreq: s3c64xx: call CPUFREQ_POSTCHANGE notfier in error cases
cpufreq: omap: call CPUFREQ_POSTCHANGE notfier in error cases
cpufreq: imx6q: call CPUFREQ_POSTCHANGE notfier in error cases
cpufreq: exynos: call CPUFREQ_POSTCHANGE notfier in error cases
cpufreq: dbx500: call CPUFREQ_POSTCHANGE notfier in error cases
cpufreq: davinci: call CPUFREQ_POSTCHANGE notfier in error cases
cpufreq: arm-big-little: call CPUFREQ_POSTCHANGE notfier in error cases
cpufreq: powernow-k8: call CPUFREQ_POSTCHANGE notfier in error cases
cpufreq: pcc: call CPUFREQ_POSTCHANGE notfier in error cases
...
|
| |\
| | |
| | |
| | |
| | | |
* pm-assorted:
PM / Sleep: Warn about system time after resume with pm_trace
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
pm_trace uses the system's Real Time Clock (RTC) to save the magic
number. The reason for this is that the RTC is the only reliably
available piece of hardware during resume operations where a value
can be set that will survive a reboot.
Consequence is that after a resume (even if it is successful) your
system clock will have a value corresponding to the magic number
instead of the correct date/time! It is therefore advisable to use
a program like ntp-date or rdate to reset the correct date/time from
an external time source when using this trace option.
There is no run-time message to warn users of the consequences of
enabling pm_trace. Adding a warning message to pm_trace_store()
will serve as a reminder to users to set the system date and time
after resume.
Signed-off-by: Shuah Khan <shuah.kh@samsung.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
| |\|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
* pm-assorted:
PM / QoS: Add pm_qos and dev_pm_qos to events-power.txt
PM / QoS: Add dev_pm_qos_request tracepoints
PM / QoS: Add pm_qos_request tracepoints
PM / QoS: Add pm_qos_update_target/flags tracepoints
PM / QoS: Update Documentation/power/pm_qos_interface.txt
PM / Sleep: Print last wakeup source on failed wakeup_count write
PM / QoS: correct the valid range of pm_qos_class
PM / wakeup: Adjust messaging for wake events during suspend
PM / Runtime: Update .runtime_idle() callback documentation
PM / Runtime: Rework the "runtime idle" helper routine
PM / Hibernate: print physical addresses consistently with other parts of kernel
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Adds tracepoints to pm_qos_add_request, pm_qos_update_request,
pm_qos_remove_request, and pm_qos_update_request_timeout.
It's useful for checking pm_qos_class, value, and timeout_us.
Signed-off-by: Sahara <keun-o.park@windriver.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This patch adds tracepoints to pm_qos_update_target and
pm_qos_update_flags. It's useful for checking pm qos action,
previous value and current value.
Signed-off-by: Sahara <keun-o.park@windriver.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Commit a938da06 introduced a useful little log message to tell
users/debuggers which wakeup source aborted a suspend. However,
this message is only printed if the abort happens during the
in-kernel suspend path (after writing /sys/power/state).
The full specification of the /sys/power/wakeup_count facility
allows user-space power managers to double-check if wakeups have
already happened before it actually tries to suspend (e.g. while it
was running user-space pre-suspend hooks), by writing the last known
wakeup_count value to /sys/power/wakeup_count. This patch changes
the sysfs handler for that node to also print said log message if
that write fails, so that we can figure out the offending wakeup
source for both kinds of suspend aborts.
Signed-off-by: Julius Werner <jwerner@chromium.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The valid start index for pm_qos_array is not 0, but
PM_QOS_CPU_DMA_LATENCY. There is a null_pm_qos at index 0 of
pm_qos_array. However, null_pm_qos is not created as misc device so
that inclusion of 0 index for checking pm_qos_class especially for
file operations is not proper here.
[rjw: Changelog, a bit]
Signed-off-by: Sahara <keun-o.park@windriver.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This adds in a new message to the wakeup code which adds an
indication to the log that suspend was cancelled due to a wake event
occouring during the suspend sequence. It also adjusts the message
printed in suspend.c to reflect the potential that a suspend was
aborted, as opposed to a device failing to suspend.
Without these message adjustments one can end up with a kernel log
that says that a device failed to suspend with no actual device
suspend failures, which can be confusing to the log examiner.
Signed-off-by: Bernie Thompson <bhthompson@chromium.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Print physical address info in a style consistent with the %pR style
used elsewhere in the kernel.
Commit 69f1d475cc did this for a similar printk in this file, but I
must have missed this one.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
| |\ \
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
* freezer:
af_unix: use freezable blocking calls in read
sigtimedwait: use freezable blocking call
nanosleep: use freezable blocking call
futex: use freezable blocking call
select: use freezable blocking call
epoll: use freezable blocking call
binder: use freezable blocking calls
freezer: add new freezable helpers using freezer_do_not_count()
freezer: convert freezable helpers to static inline where possible
freezer: convert freezable helpers to freezer_do_not_count()
freezer: skip waking up tasks with PF_FREEZER_SKIP set
freezer: shorten freezer sleep time using exponential backoff
lockdep: check that no locks held at freeze time
lockdep: remove task argument from debug_check_no_locks_held
freezer: add unsafe versions of freezable helpers for CIFS
freezer: add unsafe versions of freezable helpers for NFS
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Avoid waking up every thread sleeping in a sigtimedwait call during
suspend and resume by calling a freezable blocking call. Previous
patches modified the freezer to avoid sending wakeups to threads
that are blocked in freezable blocking calls.
This call was selected to be converted to a freezable call because
it doesn't hold any locks or release any resources when interrupted
that might be needed by another freezing task or a kernel driver
during suspend, and is a common site where idle userspace tasks are
blocked.
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Colin Cross <ccross@android.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Avoid waking up every thread sleeping in a nanosleep call during
suspend and resume by calling a freezable blocking call. Previous
patches modified the freezer to avoid sending wakeups to threads
that are blocked in freezable blocking calls.
This call was selected to be converted to a freezable call because
it doesn't hold any locks or release any resources when interrupted
that might be needed by another freezing task or a kernel driver
during suspend, and is a common site where idle userspace tasks are
blocked.
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Colin Cross <ccross@android.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Avoid waking up every thread sleeping in a futex_wait call during
suspend and resume by calling a freezable blocking call. Previous
patches modified the freezer to avoid sending wakeups to threads
that are blocked in freezable blocking calls.
This call was selected to be converted to a freezable call because
it doesn't hold any locks or release any resources when interrupted
that might be needed by another freezing task or a kernel driver
during suspend, and is a common site where idle userspace tasks are
blocked.
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Darren Hart <dvhart@linux.intel.com>
Signed-off-by: Colin Cross <ccross@android.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Android goes through suspend/resume very often (every few seconds when
on a busy wifi network with the screen off), and a significant portion
of the energy used to go in and out of suspend is spent in the
freezer. If a task has called freezer_do_not_count(), don't bother
waking it up. If it happens to wake up later it will call
freezer_count() and immediately enter the refrigerator.
Combined with patches to convert freezable helpers to use
freezer_do_not_count() and convert common sites where idle userspace
tasks are blocked to use the freezable helpers, this reduces the
time and energy required to suspend and resume.
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Colin Cross <ccross@android.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
All tasks can easily be frozen in under 10 ms, switch to using
an initial 1 ms sleep followed by exponential backoff until
8 ms. Also convert the printed time to ms instead of centiseconds.
Acked-by: Pavel Machek <pavel@ucw.cz>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Colin Cross <ccross@android.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
The only existing caller to debug_check_no_locks_held calls it
with 'current' as the task, and the freezer needs to call
debug_check_no_locks_held but doesn't already have a current
task pointer, so remove the argument. It is already assuming
that the current task is relevant by dumping the current stack
trace as part of the warning.
This was originally part of 6aa9707099c (lockdep: check that
no locks held at freeze time) which was reverted in
dbf520a9d7d4.
Original-author: Mandeep Singh Baines <msb@chromium.org>
Acked-by: Pavel Machek <pavel@ucw.cz>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Colin Cross <ccross@android.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|\ \ \ \
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cpuset changes from Tejun Heo:
"cpuset has always been rather odd about its configurations - a cgroup
right after creation didn't allow any task executions before
configuration, changing configuration in the parent modifies the
descendants irreversibly and so on. These behaviors are inherently
nasty and almost hostile against sharing the hierarchy with other
controllers making it very difficult to use in unified hierarchy.
Li is currently in the process of updating the behaviors for
__DEVEL__sane_behavior which is the bulk of changes in this pull
request. It isn't complete yet and the behaviors will change further
but all changes are gated behind sane_behavior. In the process, the
rather hairy work-item punting which was used to work around the
limitations of cgroup descendant iterator was simplified."
* 'for-3.11-cpuset' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cpuset: rename @cont to @cgrp
cpuset: fix to migrate mm correctly in a corner case
cpuset: allow to move tasks to empty cpusets
cpuset: allow to keep tasks in empty cpusets
cpuset: introduce effective_{cpumask|nodemask}_cpuset()
cpuset: record old_mems_allowed in struct cpuset
cpuset: remove async hotplug propagation work
cpuset: let hotplug propagation work wait for task attaching
cpuset: re-structure update_cpumask() a bit
cpuset: remove cpuset_test_cpumask()
cpuset: remove unnecessary variable in cpuset_attach()
cpuset: cleanup guarantee_online_{cpus|mems}()
cpuset: remove redundant check in cpuset_cpus_allowed_fallback()
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Cont is short for container. control group was named process container
at first, but then people found container already has a meaning in
linux kernel.
Clean up the leftover variable name @cont.
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Before moving tasks out of empty cpusets, update_tasks_nodemask()
is called, which calls do_migrate_pages(xx, from, to). Then those
tasks are moved to an ancestor, and do_migrate_pages() is called
again.
The first time: from = node_to_be_offlined, to = empty.
The second time: from = empty, to = ancestor's nodemask.
so looks like no pages will be migrated.
Fix this by:
- Don't call update_tasks_nodemask() on empty cpusets.
- Pass cs->old_mems_allowed to do_migrate_pages().
v4: added comment in cpuset_hotplug_update_tasks() and rephased comment
in cpuset_attach().
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Currently some cpuset behaviors are not friendly when cpuset is co-mounted
with other cgroup controllers.
Now with this patchset if cpuset is mounted with sane_behavior option,
it behaves differently:
- Tasks will be kept in empty cpusets when hotplug happens and take
masks of ancestors with non-empty cpus/mems, instead of being moved to
an ancestor.
- A task can be moved into an empty cpuset, and again it takes masks of
ancestors, so the user can drop a task into a newly created cgroup without
having to do anything for it.
As tasks can reside in empy cpusets, here're some rules:
- They can be moved to another cpuset, regardless it's empty or not.
- Though it takes masks from ancestors, it takes other configs from the
empty cpuset.
- If the ancestors' masks are changed, those tasks will also be updated
to take new masks.
v2: add documentation in include/linux/cgroup.h
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
To achieve this:
- We call update_tasks_cpumask/nodemask() for empty cpusets when
hotplug happens, instead of moving tasks out of them.
- When a cpuset's masks are changed by writing cpuset.cpus/mems,
we also update tasks in child cpusets which are empty.
v3:
- do propagation work in one place for both hotplug and unplug
v2:
- drop rcu_read_lock before calling update_task_nodemask() and
update_task_cpumask(), instead of using workqueue.
- add documentation in include/linux/cgroup.h
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
effective_cpumask_cpuset() returns an ancestor cpuset which has
non-empty cpumask.
If a cpuset is empty and the tasks in it need to update their
cpus_allowed, they take on the ancestor cpuset's cpumask.
This currently won't change any behavior, but it will later allow us
to keep tasks in empty cpusets.
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
When we update a cpuset's mems_allowed and thus update tasks'
mems_allowed, it's required to pass the old mems_allowed and new
mems_allowed to cpuset_migrate_mm().
Currently we save old mems_allowed in a temp local variable before
changing cpuset->mems_allowed. This patch changes it by saving
old mems_allowed in cpuset->old_mems_allowed.
This currently won't change any behavior, but it will later allow
us to keep tasks in empty cpusets.
v3: restored "cpuset_attach_nodemask_to = cs->mems_allowed"
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
As we can drop rcu read lock while iterating cgroup hierarchy,
we don't have to do propagation asynchronously via workqueue.
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Instead of triggering propagation work in cpuset_attach(), we make
hotplug propagation work wait until there's no task attaching in
progress.
IMO this is more robust. We won't see empty masks in cpuset_attach().
Also it's a preparation for removing propagation work. Without asynchronous
propagation we can't call move_tasks_in_empty_cpuset() in cpuset_attach(),
because otherwise we'll deadlock on cgroup_mutex.
tj: typo fixes.
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Check if cpus_allowed is to be changed before calling validate_change().
This won't change any behavior, but later it will allow us to do this:
# mkdir /cpuset/child
# echo $$ > /cpuset/child/tasks /* empty cpuset */
# echo > /cpuset/child/cpuset.cpus /* do nothing, won't fail */
Without this patch, the last operation will fail.
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
The test is done in set_cpus_allowed_ptr(), so it's redundant.
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
We can just use oldcs->mems_allowed.
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
- We never pass a NULL @cs to these functions.
- The top cpuset always has some online cpus/mems.
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
task_cs() will never return NULL.
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|\ \ \ \ \
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup changes from Tejun Heo:
"This pull request contains the following changes.
- cgroup_subsys_state (css) reference counting has been converted to
percpu-ref. css is what each resource controller embeds into its
own control structure and perform reference count against. It may
be used in hot paths of various subsystems and is similar to module
refcnt in that aspect. For example, block-cgroup's css refcnting
was showing up a lot in Mikulaus's device-mapper scalability work
and this should alleviate it.
- cgroup subtree iterator has been updated so that RCU read lock can
be released after grabbing reference. This allows simplifying its
users which requires blocking which used to build iteration list
under RCU read lock and then traverse it outside. This pull
request contains simplification of cgroup core and device-cgroup.
A separate pull request will update cpuset.
- Fixes for various bugs including corner race conditions and RCU
usage bugs.
- A lot of cleanups and some prepartory work for the planned unified
hierarchy support."
* 'for-3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (48 commits)
cgroup: CGRP_ROOT_SUBSYS_BOUND should also be ignored when mounting an existing hierarchy
cgroup: CGRP_ROOT_SUBSYS_BOUND should be ignored when comparing mount options
cgroup: fix deadlock on cgroup_mutex via drop_parsed_module_refcounts()
cgroup: always use RCU accessors for protected accesses
cgroup: fix RCU accesses around task->cgroups
cgroup: fix RCU accesses to task->cgroups
cgroup: grab cgroup_mutex in drop_parsed_module_refcounts()
cgroup: fix cgroupfs_root early destruction path
cgroup: reserve ID 0 for dummy_root and 1 for unified hierarchy
cgroup: implement for_each_[builtin_]subsys()
cgroup: move init_css_set initialization inside cgroup_mutex
cgroup: s/for_each_subsys()/for_each_root_subsys()/
cgroup: clean up find_css_set() and friends
cgroup: remove cgroup->actual_subsys_mask
cgroup: prefix global variables with "cgroup_"
cgroup: convert CFTYPE_* flags to enums
cgroup: rename cont to cgrp
cgroup: clean up cgroup_serial_nr_cursor
cgroup: convert cgroup_cft_commit() to use cgroup_for_each_descendant_pre()
cgroup: make serial_nr_cursor available throughout cgroup.c
...
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
existing hierarchy
0ce6cba357 ("cgroup: CGRP_ROOT_SUBSYS_BOUND should be ignored when
comparing mount options") only updated the remount path but
CGRP_ROOT_SUBSYS_BOUND should also be ignored when comparing options
while mounting an existing hierarchy. As option mismatch triggers a
warning but doesn't fail the mount without sane_behavior, this only
triggers a spurious warning message.
Fix it by only comparing CGRP_ROOT_OPTION_MASK bits when comparing new
and existing root options.
Signed-off-by: Tejun Heo <tj@kernel.org>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
1672d04070 ("cgroup: fix cgroupfs_root early destruction path")
introduced CGRP_ROOT_SUBSYS_BOUND which is used to mark completion of
subsys binding on a new root; however, this broke remounts.
cgroup_remount() doesn't allow changing root options via remount and
CGRP_ROOT_SUBSYS_BOUND, which is set on all fully initialized roots,
makes the function reject all remounts.
Fix it by putting the options part in the lower 16 bits of root->flags
and masking the comparions. While at it, make cgroup_remount() emit
an error message explaining why it's rejecting a remount request, so
that it's less of a mystery.
Signed-off-by: Tejun Heo <tj@kernel.org>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
eb178d06332 ("cgroup: grab cgroup_mutex in
drop_parsed_module_refcounts()") made drop_parsed_module_refcounts()
grab cgroup_mutex to make lockdep assertion in for_each_subsys()
happy. Unfortunately, cgroup_remount() calls the function while
holding cgroup_mutex in its failure path leading to the following
deadlock.
# mount -t cgroup -o remount,memory,blkio cgroup blkio
cgroup: option changes via remount are deprecated (pid=525 comm=mount)
=============================================
[ INFO: possible recursive locking detected ]
3.10.0-rc4-work+ #1 Not tainted
---------------------------------------------
mount/525 is trying to acquire lock:
(cgroup_mutex){+.+.+.}, at: [<ffffffff8110a3e1>] drop_parsed_module_refcounts+0x21/0xb0
but task is already holding lock:
(cgroup_mutex){+.+.+.}, at: [<ffffffff8110e4e1>] cgroup_remount+0x51/0x200
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(cgroup_mutex);
lock(cgroup_mutex);
*** DEADLOCK ***
May be due to missing lock nesting notation
4 locks held by mount/525:
#0: (&type->s_umount_key#30){+.+...}, at: [<ffffffff811e9a0d>] do_mount+0x2bd/0xa30
#1: (&sb->s_type->i_mutex_key#9){+.+.+.}, at: [<ffffffff8110e4d3>] cgroup_remount+0x43/0x200
#2: (cgroup_mutex){+.+.+.}, at: [<ffffffff8110e4e1>] cgroup_remount+0x51/0x200
#3: (cgroup_root_mutex){+.+.+.}, at: [<ffffffff8110e4ef>] cgroup_remount+0x5f/0x200
stack backtrace:
CPU: 2 PID: 525 Comm: mount Not tainted 3.10.0-rc4-work+ #1
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
ffffffff829651f0 ffff88000ec2fc28 ffffffff81c24bb1 ffff88000ec2fce8
ffffffff810f420d 0000000000000006 0000000000000001 0000000000000056
ffff8800153b4640 ffff880000000000 ffffffff81c2e468 ffff8800153b4640
Call Trace:
[<ffffffff81c24bb1>] dump_stack+0x19/0x1b
[<ffffffff810f420d>] __lock_acquire+0x15dd/0x1e60
[<ffffffff810f531c>] lock_acquire+0x9c/0x1f0
[<ffffffff81c2a805>] mutex_lock_nested+0x65/0x410
[<ffffffff8110a3e1>] drop_parsed_module_refcounts+0x21/0xb0
[<ffffffff8110e63e>] cgroup_remount+0x1ae/0x200
[<ffffffff811c9bb2>] do_remount_sb+0x82/0x190
[<ffffffff811e9d41>] do_mount+0x5f1/0xa30
[<ffffffff811ea203>] SyS_mount+0x83/0xc0
[<ffffffff81c2fb82>] system_call_fastpath+0x16/0x1b
Fix it by moving the drop_parsed_module_refcounts() invocation outside
cgroup_mutex.
Signed-off-by: Tejun Heo <tj@kernel.org>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
kernel/cgroup.c still has places where a RCU pointer is set and
accessed directly without going through RCU_INIT_POINTER() or
rcu_dereference_protected(). They're all properly protected accesses
so nothing is broken but it leads to spurious sparse RCU address space
warnings.
Substitute direct accesses with RCU_INIT_POINTER() and
rcu_dereference_protected(). Note that %true is specified as the
extra condition for all derference updates. This isn't ideal as all
it does is suppressing warning without actually policing
synchronization rules; however, most are scheduled to be removed
pretty soon along with css_id itself, so no reason to be more
elaborate.
Combined with the previous changes, this removes all RCU related
sparse warnings from cgroup.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Acked-by; Li Zefan <lizefan@huawei.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
There are several places in kernel/cgroup.c where task->cgroups is
accessed and modified without going through proper RCU accessors.
None is broken as they're all lock protected accesses; however, this
still triggers sparse RCU address space warnings.
* Consistently use task_css_set() for task->cgroups dereferencing.
* Use RCU_INIT_POINTER() to clear task->cgroups to &init_css_set on
exit.
* Remove unnecessary rcu_dereference_raw() from cset->subsys[]
dereference in cgroup_exit().
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Acked-by: Li Zefan <lizefan@huawei.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
This isn't strictly necessary as all subsystems specified in
@subsys_mask are guaranteed to be pinned; however, it does spuriously
trigger lockdep warning. Let's grab cgroup_mutex around it.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
cgroupfs_root used to have ->actual_subsys_mask in addition to
->subsys_mask. a8a648c4ac ("cgroup: remove
cgroup->actual_subsys_mask") removed it noting that the subsys_mask is
essentially temporary and doesn't belong in cgroupfs_root; however,
the patch made it impossible to tell whether a cgroupfs_root actually
has the subsystems bound or just have the bits set leading to the
following BUG when trying to mount with subsystems which are already
mounted elsewhere.
kernel BUG at kernel/cgroup.c:1038!
invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
...
CPU: 1 PID: 7973 Comm: mount Tainted: G W 3.10.0-rc7-next-20130625-sasha-00011-g1c1dc0e #1105
task: ffff880fc0ae8000 ti: ffff880fc0b9a000 task.ti: ffff880fc0b9a000
RIP: 0010:[<ffffffff81249b29>] [<ffffffff81249b29>] rebind_subsystems+0x409/0x5f0
...
Call Trace:
[<ffffffff8124bd4f>] cgroup_kill_sb+0xff/0x210
[<ffffffff813d21af>] deactivate_locked_super+0x4f/0x90
[<ffffffff8124f3b3>] cgroup_mount+0x673/0x6e0
[<ffffffff81257169>] cpuset_mount+0xd9/0x110
[<ffffffff813d2580>] mount_fs+0xb0/0x2d0
[<ffffffff81404afd>] vfs_kern_mount+0xbd/0x180
[<ffffffff814070b5>] do_new_mount+0x145/0x2c0
[<ffffffff814085d6>] do_mount+0x356/0x3c0
[<ffffffff8140873d>] SyS_mount+0xfd/0x140
[<ffffffff854eb600>] tracesys+0xdd/0xe2
We still want rebind_subsystems() to take added/removed masks, so
let's fix it by marking whether a cgroupfs_root has finished binding
or not. Also, document what's going on around ->subsys_mask
initialization so that similar mistakes aren't repeated.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Acked-by: Li Zefan <lizefan@huawei.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Before 1a57423166 ("cgroup: make hierarchy_id use cyclic idr"),
hierarchy IDs were allocated from 0. As the dummy hierarchy was
always the one first initialized, it got assigned 0 and all other
hierarchies from 1. The patch accidentally changed the minimum
useable ID to 2.
Let's restore ID 0 for dummy_root and while at it reserve 1 for
unified hierarchy.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
Cc: stable@vger.kernel.org
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
There are quite a few places where all loaded [builtin] subsys are
iterated. Implement for_each_[builtin_]subsys() and replace manual
iterations with those to simplify those places a bit. The new
iterators automatically skip NULL subsystems. This shouldn't cause
any functional difference.
Iteration loops which scan all subsystems and then skipping modular
ones explicitly are converted to use for_each_builtin_subsys().
While at it, reorder variable declarations and adjust whitespaces a
bit in the affected functions.
v2: Add lockdep_assert_held() in for_each_subsys() and add comments
about synchronization as suggested by Li.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
cgroup_init() was doing init_css_set initialization outside
cgroup_mutex, which is fine but we want to add lockdep annotation on
subsystem iterations and cgroup_init() will trigger it spuriously.
Move init_css_set initialization inside cgroup_mutex.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
for_each_subsys() walks over subsystems attached to a hierarchy and
we're gonna add iterators which walk over all available subsystems.
Rename for_each_subsys() to for_each_root_subsys() so that it's more
appropriately named and for_each_subsys() can be used to iterate all
subsystems.
While at it, remove unnecessary underbar prefix from macro arguments,
put them inside parentheses, and adjust indentation for the two
for_each_*() macros.
This patch is purely cosmetic.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
find_css_set() passes uninitialized on-stack template[] array to
find_existing_css_set() which sets the entries for all subsystems.
Passing around an uninitialized array is a bit icky and we want to
introduce an iterator which only iterates loaded subsystems. Let's
initialize it on definition.
While at it, also make the following cosmetic cleanups.
* Convert to proper /** comments.
* Reorder variable declarations.
* Replace comment on synchronization with lockdep_assert_held().
This patch doesn't make any functional differences.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
cgroup curiously has two subsystem masks, ->subsys_mask and
->actual_subsys_mask. The latter only exists because the new target
subsys_mask is passed into rebind_subsystems() via @root>subsys_mask.
rebind_subsystems() needs to know what the current mask is to decide
how to reach the target mask so ->actual_subsys_mask is used as the
temp location to remember the current state.
Adding a temporary field to a permanent data structure is rather silly
and can be misleading. Update rebind_subsystems() to take @added_mask
and @removed_mask instead and remove @root->actual_subsys_mask.
This patch shouldn't introduce any behavior changes.
v2: Comment and description updated as suggested by Li.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Global variable names in kernel/cgroup.c are asking for trouble -
subsys, roots, rootnode and so on. Rename them to have "cgroup_"
prefix.
* s/subsys/cgroup_subsys/
* s/rootnode/cgroup_dummy_root/
* s/dummytop/cgroup_cummy_top/
* s/roots/cgroup_roots/
* s/root_count/cgroup_root_count/
This patch is purely cosmetic.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Cont is short for container. control group was named process container
at first, but then people found container already has a meaning in
linux kernel.
Clean up the leftover variable name @cont.
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
cgroup_serial_nr_cursor was created atomic64_t because I thought it
was never gonna used for anything other than assigning unique numbers
to cgroups and didn't want to worry about synchronization; however,
now we're using it as an event-stamp to distinguish cgroups created
before and after certain point which assumes that it's protected by
cgroup_mutex.
Let's make it clear by making it a u64. Also, rename it to
cgroup_serial_nr_next and make it point to the next nr to allocate so
that where it's pointing to is clear and more conventional.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
We used root->allcg_list to iterate cgroup hierarchy because at that time
cgroup_for_each_descendant_pre() hasn't been invented.
tj: In cgroup_cfts_commit(), s/@serial_nr/@update_upto/, move the
assignment right above releasing cgroup_mutex and explain what's
going on there.
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
The next patch will use it to determine if a cgroup is newly created
while we're iterating the cgroup hierarchy.
tj: Rephrased the comment on top of cgroup_serial_nr_cursor.
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|