aboutsummaryrefslogtreecommitdiffstats
path: root/kernel
Commit message (Collapse)AuthorAgeFilesLines
* Merge tag 'pm+acpi-4.5-rc1-1' of ↵Linus Torvalds2016-01-122-15/+11
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull oower management and ACPI updates from Rafael Wysocki: "As far as the number of commits goes, ACPICA takes the lead this time, followed by cpufreq and the device properties framework changes. The most significant new feature is the debugfs-based interface to the ACPICA's AML debugger added in the previous cycle and a new user space tool for accessing it. On the cpufreq front, the core is updated to handle governors more efficiently, particularly on systems where a single cpufreq policy object is shared between multiple CPUs, and there are quite a few changes in drivers (intel_pstate, cpufreq-dt etc). The device properties framework is updated to handle built-in (ie included in the kernel itself) device properties better, among other things by adding a fallback mechanism that will allow drivers to provide default properties to be used in case the plaform firmware doesn't provide the properties expected by them. The Operating Performance Points (OPP) framework gets new DT bindings and debugfs support. A new cpufreq driver for ST platforms is added and the ACPI driver for AMD SoCs will now support the APM X-Gene ACPI I2C device. The rest is mostly fixes and cleanups all over. Specifics: - Add a debugfs-based interface for interacting with the ACPICA's AML debugger introduced in the previous cycle and a new user space tool for that, fix some bugs related to the AML debugger and clean up the code in question (Lv Zheng, Dan Carpenter, Colin Ian King, Markus Elfring). - Update ACPICA to upstream revision 20151218 including a number of fixes and cleanups in the ACPICA core (Bob Moore, Lv Zheng, Labbe Corentin, Prarit Bhargava, Colin Ian King, David E Box, Rafael Wysocki). In particular, the previously added erroneous support for the _SUB object is dropped, the concatenate operator will support all ACPI objects now, the Debug Object handling is improved, the SuperName handling of parameters being control methods is fixed, the ObjectType operator handling is updated to follow ACPI 5.0A and the handling of CondRefOf and RefOf is updated accordingly, module- level code will be executed after loading each ACPI table now (instead of being run once after all tables containing AML have been loaded), the Operation Region handlers management is updated to fix some reported problems and a the ACPICA code in the kernel is more in line with the upstream now. - Update the ACPI backlight driver to provide information on whether or not it will generate key-presses for brightness change hotkeys and update some platform drivers (dell-wmi, thinkpad_acpi) to use that information to avoid sending double key-events to users pace for these, add new ACPI backlight quirks (Hans de Goede, Aaron Lu, Adrien Schildknecht). - Improve the ACPI handling of interrupt GPIOs (Christophe Ricard). - Fix the handling of the list of device IDs of device objects found in the ACPI namespace and add a helper for checking if there is a device object for a given device ID (Lukas Wunner). - Change the logic in the ACPI namespace scanning code to create struct acpi_device objects for all ACPI device objects found in the namespace even if _STA fails for them which helps to avoid device enumeration problems on Microsoft Surface 3 (Aaron Lu). - Add support for the APM X-Gene ACPI I2C device to the ACPI driver for AMD SoCs (Loc Ho). - Fix the long-standing issue with the DMA controller on Intel SoCs where ACPI tables have no power management support for the DMA controller itself, but it can be powered off automatically when the last (other) device on the SoC is powered off via ACPI and clean up the ACPI driver for Intel SoCs (acpi-lpss) after previous attempts to fix that problem (Andy Shevchenko). - Assorted ACPI fixes and cleanups (Andy Lutomirski, Colin Ian King, Javier Martinez Canillas, Ken Xue, Mathias Krause, Rafael Wysocki, Sinan Kaya). - Update the device properties framework for better handling of built-in properties, add support for built-in properties to the platform bus type, update the MFD subsystem's handling of device properties and add support for passing default configuration data as device properties to the intel-lpss MFD drivers, convert the designware I2C driver to use the unified device properties API and add a fallback mechanism for using default built-in properties if the platform firmware fails to provide the properties as expected by drivers (Andy Shevchenko, Mika Westerberg, Heikki Krogerus, Andrew Morton). - Add new Device Tree bindings to the Operating Performance Points (OPP) framework and update the exynos4412 DT binding accordingly, introduce debugfs support for the OPP framework (Viresh Kumar, Bartlomiej Zolnierkiewicz). - Migrate the mt8173 cpufreq driver to the new OPP bindings (Pi-Cheng Chen). - Update the cpufreq core to make the handling of governors more efficient, especially on systems where policy objects are shared between multiple CPUs (Viresh Kumar, Rafael Wysocki). - Fix cpufreq governor handling on configurations with CONFIG_HZ_PERIODIC set (Chen Yu). - Clean up the cpufreq core code related to the boost sysfs knob support and update the ACPI cpufreq driver accordingly (Rafael Wysocki). - Add a new cpufreq driver for ST platforms and corresponding Device Tree bindings (Lee Jones). - Update the intel_pstate driver to allow the P-state selection algorithm used by it to depend on the CPU ID of the processor it is running on, make it use a special P-state selection algorithm (with an IO wait time compensation tweak) on Atom CPUs based on the Airmont and Silvermont cores so as to reduce their energy consumption and improve intel_pstate documentation (Philippe Longepe, Srinivas Pandruvada). - Update the cpufreq-dt driver to support registering cooling devices that use the (P * V^2 * f) dynamic power draw formula where V is the voltage, f is the frequency and P is a constant coefficient provided by Device Tree and update the arm_big_little cpufreq driver to use that support (Punit Agrawal). - Assorted cpufreq driver (cpufreq-dt, qoriq, pcc-cpufreq, blackfin-cpufreq) updates (Andrzej Hajda, Hongtao Jia, Jacob Tanenbaum, Markus Elfring). - cpuidle core tweaks related to polling and measured_us calculation (Rik van Riel). - Removal of modularity from a few cpuidle drivers (clps711x, ux500, exynos) that cannot be built as modules in practice (Paul Gortmaker). - PM core update to prevent devices from being probed during system suspend/resume which is generally problematic and may lead to inconsistent behavior (Grygorii Strashko). - Assorted updates of the PM core and related code (Julia Lawall, Manuel Pégourié-Gonnard, Maruthi Bayyavarapu, Rafael Wysocki, Ulf Hansson). - PNP bus type updates (Christophe Le Roy, Heiner Kallweit). - PCI PM code cleanups (Jarkko Nikula, Julia Lawall). - cpupower tool updates (Jacob Tanenbaum, Thomas Renninger)" * tag 'pm+acpi-4.5-rc1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (177 commits) PM / clk: don't leave clocks enabled when driver not bound i2c: dw: Add APM X-Gene ACPI I2C device support ACPI / APD: Add APM X-Gene ACPI I2C device support ACPI / LPSS: change 'does not have' to 'has' in comment Revert "dmaengine: dw: platform: provide platform data for Intel" dmaengine: dw: return immediately from IRQ when DMA isn't in use dmaengine: dw: platform: power on device on shutdown ACPI / LPSS: override power state for LPSS DMA device PM / OPP: Use snprintf() instead of sprintf() Documentation: cpufreq: intel_pstate: enhance documentation ACPI, PCI, irq: remove redundant check for null string pointer ACPI / video: driver must be registered before checking for keypresses cpufreq-dt: fix handling regulator_get_voltage() result cpufreq: governor: Fix negative idle_time when configured with CONFIG_HZ_PERIODIC PM / sleep: Add support for read-only sysfs attributes ACPI: Fix white space in a structure definition ACPI / SBS: fix inconsistent indenting inside if statement PNP: respect PNP_DRIVER_RES_DO_NOT_CHANGE when detaching ACPI / PNP: constify device IDs ACPI / PCI: Simplify acpi_penalize_isa_irq() ...
| *-. Merge branches 'pm-sleep' and 'pm-tools'Rafael J. Wysocki2016-01-122-15/+11
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * pm-sleep: PM / sleep: Add support for read-only sysfs attributes * pm-tools: cpupower: fix how "cpupower frequency-info" interprets latency cpupower: rework the "cpupower frequency-info" command cpupower: Do not analyse offlined cpus cpupower: Provide STATIC variable in Makefile for debug builds cpupower: Fix precedence issue
| | * | PM / sleep: Add support for read-only sysfs attributesRafael J. Wysocki2016-01-042-15/+11
| | |/ | | | | | | | | | | | | | | | | | | | | | | | | Some sysfs attributes in /sys/power/ should really be read-only, so add support for that, convert those attributes to read-only and drop the stub .show() routines from them. Original-by: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* | | Merge tag 'trace-v4.5' of ↵Linus Torvalds2016-01-126-228/+300
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: "Not much new with tracing for this release. Mostly just clean ups and minor fixes. Here's what else is new: - A new TRACE_EVENT_FN_COND macro, combining both _FN and _COND for those that want both. - New selftest to test the instance create and delete - Better debug output when ftrace fails" * tag 'trace-v4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (24 commits) ftrace: Fix the race between ftrace and insmod ftrace: Add infrastructure for delayed enabling of module functions x86: ftrace: Fix the comments for ftrace_modify_code_direct() tracing: Fix comment to use tracing_on over tracing_enable metag: ftrace: Fix the comments for ftrace_modify_code sh: ftrace: Fix the comments for ftrace_modify_code() ia64: ftrace: Fix the comments for ftrace_modify_code() ftrace: Clean up ftrace_module_init() code ftrace: Join functions ftrace_module_init() and ftrace_init_module() tracing: Introduce TRACE_EVENT_FN_COND macro tracing: Use seq_buf_used() in seq_buf_to_user() instead of len bpf: Constify bpf_verifier_ops structure ftrace: Have ftrace_ops_get_func() handle RCU and PER_CPU flags too ftrace: Remove use of control list and ops ftrace: Fix output of enabled_functions for showing tramp ftrace: Fix a typo in comment ftrace: Show all tramps registered to a record on ftrace_bug() ftrace: Add variable ftrace_expected for archs to show expected code ftrace: Add new type to distinguish what kind of ftrace_bug() tracing: Update cond flag when enabling or disabling a trigger ...
| * | | ftrace: Fix the race between ftrace and insmodQiu Peiyang2016-01-071-9/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We hit ftrace_bug report when booting Android on a 64bit ATOM SOC chip. Basically, there is a race between insmod and ftrace_run_update_code. After load_module=>ftrace_module_init, another thread jumps in to call ftrace_run_update_code=>ftrace_arch_code_modify_prepare =>set_all_modules_text_rw, to change all modules as RW. Since the new module is at MODULE_STATE_UNFORMED, the text attribute is not changed. Then, the 2nd thread goes ahead to change codes. However, load_module continues to call complete_formation=>set_section_ro_nx, then 2nd thread would fail when probing the module's TEXT. The patch fixes it by using notifier to delay the enabling of ftrace records to the time when module is at state MODULE_STATE_COMING. Link: http://lkml.kernel.org/r/567CE628.3000609@intel.com Signed-off-by: Qiu Peiyang <peiyangx.qiu@intel.com> Signed-off-by: Zhang Yanmin <yanmin.zhang@intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * | | ftrace: Add infrastructure for delayed enabling of module functionsSteven Rostedt (Red Hat)2016-01-071-55/+106
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Qiu Peiyang pointed out that there's a race when enabling function tracing and loading a module. In order to make the modifications of converting nops in the prologue of functions into callbacks, the text needs to be converted from read-only to read-write. When enabling function tracing, the text permission is updated, the functions are modified, and then they are put back. When loading a module, the updates to convert function calls to mcount is done before the module text is set to read-only. But after it is done, the module text is visible by the function tracer. Thus we have the following race: CPU 0 CPU 1 ----- ----- start function tracing set text to read-write load_module add functions to ftrace set module text read-only update all functions to callbacks modify module functions too < Can't it's read-only > When this happens, ftrace detects the issue and disables itself till the next reboot. To fix this, a new DISABLED flag is added for ftrace records, which all module functions get when they are added. Then later, after the module code is all set, the records will have the DISABLED flag cleared, and they will be enabled if any callback wants all functions to be traced. Note, this doesn't add the delay to later. It simply changes the ftrace_module_init() to do both the setting of DISABLED records, and then immediately calls the enable code. This helps with testing this new code as it has the same behavior as previously. Another change will come after this to have the ftrace_module_enable() called after the text is set to read-only. Cc: Qiu Peiyang <peiyangx.qiu@intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * | | tracing: Fix comment to use tracing_on over tracing_enableChuyu Hu2015-12-231-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The file tracing_enable is obsolete and does not exist anymore. Replace the comment that references it with the proper tracing_on file. Link: http://lkml.kernel.org/r/1450787141-45544-1-git-send-email-chuhu@redhat.com Signed-off-by: Chuyu Hu <chuhu@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * | | ftrace: Clean up ftrace_module_init() codeSteven Rostedt (Red Hat)2015-12-231-6/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The start and end variables were only used when ftrace_module_init() was split up into multiple functions. No need to keep them around after the merger. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * | | ftrace: Join functions ftrace_module_init() and ftrace_init_module()Abel Vesa2015-12-231-9/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Simple cleanup. No need for two functions here. The whole work can simply be done inside 'ftrace_module_init'. Link: http://lkml.kernel.org/r/1449067197-5718-1-git-send-email-abelvesa@linux.com Signed-off-by: Abel Vesa <abelvesa@linux.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * | | bpf: Constify bpf_verifier_ops structureJulia Lawall2015-12-231-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This bpf_verifier_ops structure is never modified, like the other bpf_verifier_ops structures, so declare it as const. Done with the help of Coccinelle. Link: http://lkml.kernel.org/r/1449855359-13724-1-git-send-email-Julia.Lawall@lip6.fr Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * | | ftrace: Have ftrace_ops_get_func() handle RCU and PER_CPU flags tooSteven Rostedt (Red Hat)2015-12-231-12/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Jiri Olsa noted that the change to replace the control_ops did not update the trampoline for when running perf on a single CPU and with CONFIG_PREEMPT disabled (where dynamic ops, like perf, can use trampolines directly). The result was that perf function could be called when RCU is not watching as well as not handle the ftrace_local_disable(). Modify the ftrace_ops_get_func() to also check the RCU and PER_CPU ops flags and use the recursive function if they are set. The recursive function is modified to check those flags and execute the appropriate checks if they are set. Link: http://lkml.kernel.org/r/20151201134213.GA14155@krava.brq.redhat.com Reported-by: Jiri Olsa <jolsa@redhat.com> Patch-fixed-up-by: Jiri Olsa <jolsa@redhat.com> Tested-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * | | ftrace: Remove use of control list and opsSteven Rostedt (Red Hat)2015-12-233-91/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently perf has its own list function within the ftrace infrastructure that seems to be used only to allow for it to have per-cpu disabling as well as a check to make sure that it's not called while RCU is not watching. It uses something called the "control_ops" which is used to iterate over ops under it with the control_list_func(). The problem is that this control_ops and control_list_func unnecessarily complicates the code. By replacing FTRACE_OPS_FL_CONTROL with two new flags (FTRACE_OPS_FL_RCU and FTRACE_OPS_FL_PER_CPU) we can remove all the code that is special with the control ops and add the needed checks within the generic ftrace_list_func(). Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * | | ftrace: Fix output of enabled_functions for showing trampSteven Rostedt (Red Hat)2015-12-231-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When showing all tramps registered to a ftrace record in the file enabled_functions, it exits the loop with ops == NULL. But then it is suppose to show the function on the ops->trampoline and add_trampoline_func() is called with the given ops. But because ops is now NULL (to exit the loop), it always shows the static trampoline instead of the one that is really registered to the record. The call to add_trampoline_func() that shows the trampoline for the given ops needs to be called at every iteration. Fixes: 39daa7b9e895 "ftrace: Show all tramps registered to a record on ftrace_bug()" Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * | | ftrace: Fix a typo in commentLi Bin2015-12-231-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | s/ARCH_SUPPORT_FTARCE_OPS/ARCH_SUPPORTS_FTRACE_OPS/ Link: http://lkml.kernel.org/r/1448879016-8659-1-git-send-email-huawei.libin@huawei.com Signed-off-by: Li Bin <huawei.libin@huawei.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * | | ftrace: Show all tramps registered to a record on ftrace_bug()Steven Rostedt (Red Hat)2015-11-251-9/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When an anomaly is detected in the function call modification code, ftrace_bug() is called to disable function tracing as well as give any information that may help debug the problem. Currently, only the first found trampoline that is attached to the failed record is reported. Instead, show all trampolines that are hooked to it. Also, not only show the ops pointer but also report the function it calls. While at it, add this info to the enabled_functions debug file too. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * | | ftrace: Add variable ftrace_expected for archs to show expected codeSteven Rostedt (Red Hat)2015-11-251-2/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When an anomaly is found while modifying function code, ftrace_bug() is called which disables the function tracing infrastructure and reports information about what failed. If the code that is to be replaced does not match what is expected, then actual code is shown. Currently there is no arch generic way to show what was expected. Add a new variable pointer calld ftrace_expected that the arch code can set to point to what it expected so that ftrace_bug() can report the actual text as well as the text that was expected to be there. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * | | ftrace: Add new type to distinguish what kind of ftrace_bug()Steven Rostedt (Red Hat)2015-11-251-1/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ftrace function hook utility has several internal checks to make sure that whatever it modifies is exactly what it expects to be modifying. This is essential as modifying running code can be extremely dangerous to the system. When an anomaly is detected, ftrace_bug() is called which sends a splat to the console and disables function tracing. There's some extra information that is printed to help diagnose the issue. One thing that is missing though is output of what ftrace was doing at the time of the crash. Was it updating a call site or perhaps converting a call site to a nop? A new global enum variable is created to state what ftrace was doing at the time of the anomaly, and this is reported in ftrace_bug(). Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * | | tracing: Update cond flag when enabling or disabling a triggerTom Zanussi2015-11-251-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a trigger is enabled, the cond flag should be set beforehand, otherwise a trigger that's expecting to process a trace record (e.g. one with post_trigger set) could be invoked without one. Likewise a trigger's cond flag should be reset after it's disabled, not before. Link: http://lkml.kernel.org/r/a420b52a67b1c2d3cab017914362d153255acb99.1448303214.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de> Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * | | ring-buffer: Process commits whenever moving to a new page.Steven Rostedt (Red Hat)2015-11-251-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When crossing over to a new page, commit the current work. This will allow readers to get data with less latency, and also simplifies the work to get timestamps working for interrupted events. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * | | ring-buffer: Remove redundant update of page timestampSteven Rostedt (Red Hat)2015-11-241-24/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The first commit of a buffer page updates the timestamp of that page. No need to have the update to the next page add the timestamp too. It will only be replaced by the first commit on that page anyway. Only update to a page if it contains an event. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * | | ring-buffer: Use READ_ONCE() for most tail_page accessSteven Rostedt (Red Hat)2015-11-241-7/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As cpu_buffer->tail_page may be modified by interrupts at almost any time, the flow of logic is very important. Do not let gcc get smart with re-reading cpu_buffer->tail_page by adding READ_ONCE() around most of its accesses. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* | | | Merge branch 'for-4.5' of ↵Linus Torvalds2016-01-126-62/+48
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup updates from Tejun Heo: - cgroup v2 interface is now official. It's no longer hidden behind a devel flag and can be mounted using the new cgroup2 fs type. Unfortunately, cpu v2 interface hasn't made it yet due to the discussion around in-process hierarchical resource distribution and only memory and io controllers can be used on the v2 interface at the moment. - The existing documentation which has always been a bit of mess is relocated under Documentation/cgroup-v1/. Documentation/cgroup-v2.txt is added as the authoritative documentation for the v2 interface. - Some features are added through for-4.5-ancestor-test branch to enable netfilter xt_cgroup match to use cgroup v2 paths. The actual netfilter changes will be merged through the net tree which pulled in the said branch. - Various cleanups * 'for-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup: rename cgroup documentations cgroup: fix a typo. cgroup: Remove resource_counter.txt in Documentation/cgroup-legacy/00-INDEX. cgroup: demote subsystem init messages to KERN_DEBUG cgroup: Fix uninitialized variable warning cgroup: put controller Kconfig options in meaningful order cgroup: clean up the kernel configuration menu nomenclature cgroup_pids: fix a typo. Subject: cgroup: Fix incomplete dd command in blkio documentation cgroup: kill cgrp_ss_priv[CGROUP_CANFORK_COUNT] and friends cpuset: Replace all instances of time_t with time64_t cgroup: replace unified-hierarchy.txt with a proper cgroup v2 documentation cgroup: rename Documentation/cgroups/ to Documentation/cgroup-legacy/ cgroup: replace __DEVEL__sane_behavior with cgroup2 fs type
| * | | | cgroup: fix a typo.Rami Rosen2016-01-101-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes a typo in a comment in cgroup.c. Signed-off-by: Rami Rosen <rami.rosen@intel.com> Signed-off-by: Tejun Heo <tj@kernel.org>
| * | | | cgroup: demote subsystem init messages to KERN_DEBUGTejun Heo2016-01-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These are noisy during boot and not all that interesting. Signed-off-by: Tejun Heo <tj@kernel.org>
| * | | | cgroup_pids: fix a typo.Rami Rosen2015-12-141-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes a typo in pids_charge() method. Signed-off-by: Rami Rosen <rami.rosen@intel.com> Signed-off-by: Tejun Heo <tj@kernel.org>
| * | | | Merge branch 'for-4.5-ancestor-test' of ↵Tejun Heo2015-12-071-27/+44
| |\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup into for-4.5 Signed-off-by: Tejun Heo <tj@kernel.org>
| * | | | | cgroup: kill cgrp_ss_priv[CGROUP_CANFORK_COUNT] and friendsOleg Nesterov2015-12-035-31/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that nobody use the "priv" arg passed to can_fork/cancel_fork/fork we can kill CGROUP_CANFORK_COUNT/SUBSYS_TAG/etc and cgrp_ss_priv[] in copy_process(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
| * | | | | Merge branch 'for-4.4-fixes' into for-4.5Tejun Heo2015-12-037-110/+107
| |\ \ \ \ \
| * | | | | | cpuset: Replace all instances of time_t with time64_tArnd Bergmann2015-11-251-4/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The following patch replaces all instances of time_t with time64_t i.e. change the type used for representing time from 32-bit to 64-bit. All 32-bit kernels to date use a signed 32-bit time_t type, which can only represent time until January 2038. Since embedded systems running 32-bit Linux are going to survive beyond that date, we have to change all current uses, in a backwards compatible way. The patch also changes the function get_seconds() that returns a 32-bit integer to ktime_get_seconds() that returns seconds as 64-bit integer. The patch changes the type of ticks from time_t to u32. We keep ticks as 32-bits as the function uses 32-bit arithmetic which would prove less expensive than 64-bit arithmetic and the function is expected to be called atleast once every 32 seconds. Signed-off-by: Heena Sirwani <heenasirwani@gmail.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Tejun Heo <tj@kernel.org>
| * | | | | | cgroup: replace __DEVEL__sane_behavior with cgroup2 fs typeTejun Heo2015-11-161-24/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With major controllers - cpu, memory and io - shaping up for the unified hierarchy, cgroup2 is about ready to be, gradually, released into the wild. Replace __DEVEL__sane_behavior flag which was used to select the unified hierarchy with a separate filesystem type "cgroup2" so that unified hierarchy can be mounted as follows. mount -t cgroup2 none $MOUNT_POINT The cgroup2 fs has its own magic number - 0x63677270 ("cgrp"). v2: Assign a different magic number to cgroup2 fs. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com> Cc: Johannes Weiner <hannes@cmpxchg.org>
* | | | | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds2016-01-126-59/+187
|\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull networking updates from Davic Miller: 1) Support busy polling generically, for all NAPI drivers. From Eric Dumazet. 2) Add byte/packet counter support to nft_ct, from Floriani Westphal. 3) Add RSS/XPS support to mvneta driver, from Gregory Clement. 4) Implement IPV6_HDRINCL socket option for raw sockets, from Hannes Frederic Sowa. 5) Add support for T6 adapter to cxgb4 driver, from Hariprasad Shenai. 6) Add support for VLAN device bridging to mlxsw switch driver, from Ido Schimmel. 7) Add driver for Netronome NFP4000/NFP6000, from Jakub Kicinski. 8) Provide hwmon interface to mlxsw switch driver, from Jiri Pirko. 9) Reorganize wireless drivers into per-vendor directories just like we do for ethernet drivers. From Kalle Valo. 10) Provide a way for administrators "destroy" connected sockets via the SOCK_DESTROY socket netlink diag operation. From Lorenzo Colitti. 11) Add support to add/remove multicast routes via netlink, from Nikolay Aleksandrov. 12) Make TCP keepalive settings per-namespace, from Nikolay Borisov. 13) Add forwarding and packet duplication facilities to nf_tables, from Pablo Neira Ayuso. 14) Dead route support in MPLS, from Roopa Prabhu. 15) TSO support for thunderx chips, from Sunil Goutham. 16) Add driver for IBM's System i/p VNIC protocol, from Thomas Falcon. 17) Rationalize, consolidate, and more completely document the checksum offloading facilities in the networking stack. From Tom Herbert. 18) Support aborting an ongoing scan in mac80211/cfg80211, from Vidyullatha Kanchanapally. 19) Use per-bucket spinlock for bpf hash facility, from Tom Leiming. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1375 commits) net: bnxt: always return values from _bnxt_get_max_rings net: bpf: reject invalid shifts phonet: properly unshare skbs in phonet_rcv() dwc_eth_qos: Fix dma address for multi-fragment skbs phy: remove an unneeded condition mdio: remove an unneed condition mdio_bus: NULL dereference on allocation error net: Fix typo in netdev_intersect_features net: freescale: mac-fec: Fix build error from phy_device API change net: freescale: ucc_geth: Fix build error from phy_device API change bonding: Prevent IPv6 link local address on enslaved devices IB/mlx5: Add flow steering support net/mlx5_core: Export flow steering API net/mlx5_core: Make ipv4/ipv6 location more clear net/mlx5_core: Enable flow steering support for the IB driver net/mlx5_core: Initialize namespaces only when supported by device net/mlx5_core: Set priority attributes net/mlx5_core: Connect flow tables net/mlx5_core: Introduce modify flow table command net/mlx5_core: Managing root flow table ...
| * | | | | | | net: bpf: reject invalid shiftsRabin Vincent2016-01-121-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On ARM64, a BUG() is triggered in the eBPF JIT if a filter with a constant shift that can't be encoded in the immediate field of the UBFM/SBFM instructions is passed to the JIT. Since these shifts amounts, which are negative or >= regsize, are invalid, reject them in the eBPF verifier and the classic BPF filter checker, for all architectures. Signed-off-by: Rabin Vincent <rabin@rab.in> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2016-01-061-0/+1
| |\ \ \ \ \ \ \
| * | | | | | | | bpf: hash: use per-bucket spinlocktom.leiming@gmail.com2015-12-291-18/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Both htab_map_update_elem() and htab_map_delete_elem() can be called from eBPF program, and they may be in kernel hot path, so it isn't efficient to use a per-hashtable lock in this two helpers. The per-hashtable spinlock is used for protecting bucket's hlist, and per-bucket lock is just enough. This patch converts the per-hashtable lock into per-bucket spinlock, so that contention can be decreased a lot. Signed-off-by: Ming Lei <tom.leiming@gmail.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | | bpf: hash: move select_bucket() out of htab's spinlocktom.leiming@gmail.com2015-12-291-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The spinlock is just used for protecting the per-bucket hlist, so it isn't needed for selecting bucket. Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Ming Lei <tom.leiming@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | | bpf: hash: use atomic counttom.leiming@gmail.com2015-12-291-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Preparing for removing global per-hashtable lock, so the counter need to be defined as aotmic_t first. Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Ming Lei <tom.leiming@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | | bpf: move clearing of A/X into classic to eBPF migration prologueDaniel Borkmann2015-12-181-4/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Back in the days where eBPF (or back then "internal BPF" ;->) was not exposed to user space, and only the classic BPF programs internally translated into eBPF programs, we missed the fact that for classic BPF A and X needed to be cleared. It was fixed back then via 83d5b7ef99c9 ("net: filter: initialize A and X registers"), and thus classic BPF specifics were added to the eBPF interpreter core to work around it. This added some confusion for JIT developers later on that take the eBPF interpreter code as an example for deriving their JIT. F.e. in f75298f5c3fe ("s390/bpf: clear correct BPF accumulator register"), at least X could leak stack memory. Furthermore, since this is only needed for classic BPF translations and not for eBPF (verifier takes care that read access to regs cannot be done uninitialized), more complexity is added to JITs as they need to determine whether they deal with migrations or native eBPF where they can just omit clearing A/X in their prologue and thus reduce image size a bit, see f.e. cde66c2d88da ("s390/bpf: Only clear A and X for converted BPF programs"). In other cases (x86, arm64), A and X is being cleared in the prologue also for eBPF case, which is unnecessary. Lets move this into the BPF migration in bpf_convert_filter() where it actually belongs as long as the number of eBPF JITs are still few. It can thus be done generically; allowing us to remove the quirk from __bpf_prog_run() and to slightly reduce JIT image size in case of eBPF, while reducing code duplication on this matter in current(/future) eBPF JITs. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Tested-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Cc: Zi Shen Lim <zlim.lnx@gmail.com> Cc: Yang Shi <yang.shi@linaro.org> Acked-by: Yang Shi <yang.shi@linaro.org> Acked-by: Zi Shen Lim <zlim.lnx@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2015-12-1723-177/+270
| |\ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: drivers/net/geneve.c Here we had an overlapping change, where in 'net' the extraneous stats bump was being removed whilst in 'net-next' the final argument to udp_tunnel6_xmit_skb() was being changed. Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | | | net, cgroup: cgroup_sk_updat_lock was missing initializerTejun Heo2015-12-141-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | bd1060a1d671 ("sock, cgroup: add sock->sk_cgroup") added global spinlock cgroup_sk_update_lock but erroneously skipped initializer leading to uninitialized spinlock warning. Fix it by using DEFINE_SPINLOCK(). Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Dexuan Cui <decui@microsoft.com> Fixes: bd1060a1d671 ("sock, cgroup: add sock->sk_cgroup") Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | | | bpf, inode: allow for rename and link opsDaniel Borkmann2015-12-121-0/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add support for renaming and hard links to the fs. Most of this can be implemented by using simple library operations under the same constraints that we don't use a reserved name like elsewhere. Linking can be useful to share/manage things like maps across subsystem users. It works within the file system boundary, but is not allowed for directories. Symbolic links are explicitly not implemented here, as it can be better done already by doing bind mounts inside bpf fs to set up shared directories f.e. useful when using volumes in docker containers that map a private working directory into /sys/fs/bpf/ which contains itself a bind mounted path from the host's /sys/fs/bpf/ mount that is shared among multiple containers. For single maps instead of whole directory, hard links can be easily used to do the same. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | | | sock, cgroup: add sock->sk_cgroupTejun Heo2015-12-081-1/+54
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In cgroup v1, dealing with cgroup membership was difficult because the number of membership associations was unbound. As a result, cgroup v1 grew several controllers whose primary purpose is either tagging membership or pull in configuration knobs from other subsystems so that cgroup membership test can be avoided. net_cls and net_prio controllers are examples of the latter. They allow configuring network-specific attributes from cgroup side so that network subsystem can avoid testing cgroup membership; unfortunately, these are not only cumbersome but also problematic. Both net_cls and net_prio aren't properly hierarchical. Both inherit configuration from the parent on creation but there's no interaction afterwards. An ancestor doesn't restrict the behavior in its subtree in anyway and configuration changes aren't propagated downwards. Especially when combined with cgroup delegation, this is problematic because delegatees can mess up whatever network configuration implemented at the system level. net_prio would allow the delegatees to set whatever priority value regardless of CAP_NET_ADMIN and net_cls the same for classid. While it is possible to solve these issues from controller side by implementing hierarchical allowable ranges in both controllers, it would involve quite a bit of complexity in the controllers and further obfuscate network configuration as it becomes even more difficult to tell what's actually being configured looking from the network side. While not much can be done for v1 at this point, as membership handling is sane on cgroup v2, it'd be better to make cgroup matching behave like other network matches and classifiers than introducing further complications. In preparation, this patch updates sock->sk_cgrp_data handling so that it points to the v2 cgroup that sock was created in until either net_prio or net_cls is used. Once either of the two is used, sock->sk_cgrp_data reverts to its previous role of carrying prioidx and classid. This is to avoid adding yet another cgroup related field to struct sock. As the mode switching can happen at most once per boot, the switching mechanism is aimed at lowering hot path overhead. It may leak a finite, likely small, number of cgroup refs and report spurious prioidx or classid on switching; however, dynamic updates of prioidx and classid have always been racy and lossy - socks between creation and fd installation are never updated, config changes don't update existing sockets at all, and prioidx may index with dead and recycled cgroup IDs. Non-critical inaccuracies from small race windows won't make any noticeable difference. This patch doesn't make use of the pointer yet. The following patch will implement netfilter match for cgroup2 membership. v2: Use sock_cgroup_data to avoid inflating struct sock w/ another cgroup specific field. v3: Add comments explaining why sock_data_prioidx() and sock_data_classid() use different fallback values. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Daniel Wagner <daniel.wagner@bmw-carit.de> CC: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | | | Merge branch 'for-4.5-ancestor-test' of ↵David S. Miller2015-12-081-27/+44
| |\ \ \ \ \ \ \ \ \ | | | |_|_|_|/ / / / | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Preparatory changes for some new socket cgroup infrastructure and netfilter targets. Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | | | | | | cgroup: implement cgroup_get_from_path() and expose cgroup_put()Tejun Heo2015-11-201-5/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Implement cgroup_get_from_path() using kernfs_walk_and_get() which obtains a default hierarchy cgroup from its path. This will be used to allow cgroup path based matching from outside cgroup proper - e.g. networking and perf. v2: Add EXPORT_SYMBOL_GPL(cgroup_get_from_path). Signed-off-by: Tejun Heo <tj@kernel.org>
| | * | | | | | | | cgroup: record ancestor IDs and reimplement cgroup_is_descendant() using itTejun Heo2015-11-201-22/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | cgroup_is_descendant() currently walks up the hierarchy and compares each ancestor to the cgroup in question. While enough for cgroup core usages, this can't be used in hot paths to test cgroup membership. This patch adds cgroup->ancestor_ids[] which records the IDs of all ancestors including self and cgroup->level for the nesting level. This allows testing whether a given cgroup is a descendant of another in three finite steps - testing whether the two belong to the same hierarchy, whether the descendant candidate is at the same or a higher level than the ancestor and comparing the recorded ancestor_id at the matching level. cgroup_is_descendant() is accordingly reimplmented and made inline. Signed-off-by: Tejun Heo <tj@kernel.org>
| * | | | | | | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2015-12-0311-46/+107
| |\ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: drivers/net/ethernet/renesas/ravb_main.c kernel/bpf/syscall.c net/ipv4/ipmr.c All three conflicts were cases of overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | | | | bpf: add show_fdinfo handler for mapsDaniel Borkmann2015-11-201-1/+21
| | |/ / / / / / / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a handler for show_fdinfo() to be used by the anon-inodes backend for eBPF maps, and dump the map specification there. Not only useful for admins, but also it provides a minimal way to compare specs from ELF vs pinned object. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | | | | | Merge branch 'work.misc' of ↵Linus Torvalds2016-01-125-104/+51
|\ \ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull misc vfs updates from Al Viro: "All kinds of stuff. That probably should've been 5 or 6 separate branches, but by the time I'd realized how large and mixed that bag had become it had been too close to -final to play with rebasing. Some fs/namei.c cleanups there, memdup_user_nul() introduction and switching open-coded instances, burying long-dead code, whack-a-mole of various kinds, several new helpers for ->llseek(), assorted cleanups and fixes from various people, etc. One piece probably deserves special mention - Neil's lookup_one_len_unlocked(). Similar to lookup_one_len(), but gets called without ->i_mutex and tries to avoid ever taking it. That, of course, means that it's not useful for any directory modifications, but things like getting inode attributes in nfds readdirplus are fine with that. I really should've asked for moratorium on lookup-related changes this cycle, but since I hadn't done that early enough... I *am* asking for that for the coming cycle, though - I'm going to try and get conversion of i_mutex to rwsem with ->lookup() done under lock taken shared. There will be a patch closer to the end of the window, along the lines of the one Linus had posted last May - mechanical conversion of ->i_mutex accesses to inode_lock()/inode_unlock()/inode_trylock()/ inode_is_locked()/inode_lock_nested(). To quote Linus back then: ----- | This is an automated patch using | | sed 's/mutex_lock(&\(.*\)->i_mutex)/inode_lock(\1)/' | sed 's/mutex_unlock(&\(.*\)->i_mutex)/inode_unlock(\1)/' | sed 's/mutex_lock_nested(&\(.*\)->i_mutex,[ ]*I_MUTEX_\([A-Z0-9_]*\))/inode_lock_nested(\1, I_MUTEX_\2)/' | sed 's/mutex_is_locked(&\(.*\)->i_mutex)/inode_is_locked(\1)/' | sed 's/mutex_trylock(&\(.*\)->i_mutex)/inode_trylock(\1)/' | | with a very few manual fixups ----- I'm going to send that once the ->i_mutex-affecting stuff in -next gets mostly merged (or when Linus says he's about to stop taking merges)" * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits) nfsd: don't hold i_mutex over userspace upcalls fs:affs:Replace time_t with time64_t fs/9p: use fscache mutex rather than spinlock proc: add a reschedule point in proc_readfd_common() logfs: constify logfs_block_ops structures fcntl: allow to set O_DIRECT flag on pipe fs: __generic_file_splice_read retry lookup on AOP_TRUNCATED_PAGE fs: xattr: Use kvfree() [s390] page_to_phys() always returns a multiple of PAGE_SIZE nbd: use ->compat_ioctl() fs: use block_device name vsprintf helper lib/vsprintf: add %*pg format specifier fs: use gendisk->disk_name where possible poll: plug an unused argument to do_poll amdkfd: don't open-code memdup_user() cdrom: don't open-code memdup_user() rsxx: don't open-code memdup_user() mtip32xx: don't open-code memdup_user() [um] mconsole: don't open-code memdup_user_nul() [um] hostaudio: don't open-code memdup_user() ...
| * | | | | | | | | | kernel/*: switch to memdup_user_nul()Al Viro2016-01-044-95/+48
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | | | | | | convert a bunch of open-coded instances of memdup_user_nul()Al Viro2016-01-041-9/+3
| | |_|/ / / / / / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A _lot_ of ->write() instances were open-coding it; some are converted to memdup_user_nul(), a lot more remain... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | | | | | | | | Merge branch 'work.copy_file_range' of ↵Linus Torvalds2016-01-121-0/+1
|\ \ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs copy_file_range updates from Al Viro: "Several series around copy_file_range/CLONE" * 'work.copy_file_range' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: btrfs: use new dedupe data function pointer vfs: hoist the btrfs deduplication ioctl to the vfs vfs: wire up compat ioctl for CLONE/CLONE_RANGE cifs: avoid unused variable and label nfsd: implement the NFSv4.2 CLONE operation nfsd: Pass filehandle to nfs4_preprocess_stateid_op() vfs: pull btrfs clone API to vfs layer locks: new locks_mandatory_area calling convention vfs: Add vfs_copy_file_range() support for pagecache copies btrfs: add .copy_file_range file operation x86: add sys_copy_file_range to syscall tables vfs: add copy_file_range syscall and vfs helper