| Commit message (Collapse) | Author | Age | Files | Lines |
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
"Nothing really major this release, despite quite a lot of activity.
Just lots of things all over the place.
Some things of note include:
- Access via perf to a new type of PMU (IMC) on Power9, which can
count both core events as well as nest unit events (Memory
controller etc).
- Optimisations to the radix MMU TLB flushing, mostly to avoid
unnecessary Page Walk Cache (PWC) flushes when the structure of the
tree is not changing.
- Reworks/cleanups of do_page_fault() to modernise it and bring it
closer to other architectures where possible.
- Rework of our page table walking so that THP updates only need to
send IPIs to CPUs where the affected mm has run, rather than all
CPUs.
- The size of our vmalloc area is increased to 56T on 64-bit hash MMU
systems. This avoids problems with the percpu allocator on systems
with very sparse NUMA layouts.
- STRICT_KERNEL_RWX support on PPC32.
- A new sched domain topology for Power9, to capture the fact that
pairs of cores may share an L2 cache.
- Power9 support for VAS, which is a new mechanism for accessing
coprocessors, and initial support for using it with the NX
compression accelerator.
- Major work on the instruction emulation support, adding support for
many new instructions, and reworking it so it can be used to
implement the emulation needed to fixup alignment faults.
- Support for guests under PowerVM to use the Power9 XIVE interrupt
controller.
And probably that many things again that are almost as interesting,
but I had to keep the list short. Plus the usual fixes and cleanups as
always.
Thanks to: Alexey Kardashevskiy, Alistair Popple, Andreas Schwab,
Aneesh Kumar K.V, Anju T Sudhakar, Arvind Yadav, Balbir Singh,
Benjamin Herrenschmidt, Bhumika Goyal, Breno Leitao, Bryant G. Ly,
Christophe Leroy, Cédric Le Goater, Dan Carpenter, Dou Liyang,
Frederic Barrat, Gautham R. Shenoy, Geliang Tang, Geoff Levand, Hannes
Reinecke, Haren Myneni, Ivan Mikhaylov, John Allen, Julia Lawall,
LABBE Corentin, Laurentiu Tudor, Madhavan Srinivasan, Markus Elfring,
Masahiro Yamada, Matt Brown, Michael Neuling, Murilo Opsfelder Araujo,
Nathan Fontenot, Naveen N. Rao, Nicholas Piggin, Oliver O'Halloran,
Paul Mackerras, Rashmica Gupta, Rob Herring, Rui Teng, Sam Bobroff,
Santosh Sivaraj, Scott Wood, Shilpasri G Bhat, Sukadev Bhattiprolu,
Suraj Jitindar Singh, Tobin C. Harding, Victor Aoqui"
* tag 'powerpc-4.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (321 commits)
powerpc/xive: Fix section __init warning
powerpc: Fix kernel crash in emulation of vector loads and stores
powerpc/xive: improve debugging macros
powerpc/xive: add XIVE Exploitation Mode to CAS
powerpc/xive: introduce H_INT_ESB hcall
powerpc/xive: add the HW IRQ number under xive_irq_data
powerpc/xive: introduce xive_esb_write()
powerpc/xive: rename xive_poke_esb() in xive_esb_read()
powerpc/xive: guest exploitation of the XIVE interrupt controller
powerpc/xive: introduce a common routine xive_queue_page_alloc()
powerpc/sstep: Avoid used uninitialized error
axonram: Return directly after a failed kzalloc() in axon_ram_probe()
axonram: Improve a size determination in axon_ram_probe()
axonram: Delete an error message for a failed memory allocation in axon_ram_probe()
powerpc/powernv/npu: Move tlb flush before launching ATSD
powerpc/macintosh: constify wf_sensor_ops structures
powerpc/iommu: Use permission-specific DEVICE_ATTR variants
powerpc/eeh: Delete an error out of memory message at init time
powerpc/mm: Use seq_putc() in two functions
macintosh: Convert to using %pOF instead of full_name
...
|
| |
| |
| |
| |
| |
| |
| |
| | |
Cc: Markus Elfring <elfring@users.sourceforge.net>
Cc: Jim Paris <jim@jtan.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Geoff Levand <geoff@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
ps3vram_cache_init()
Omit an extra message for a memory allocation failure in this function.
This issue was detected by using the Coccinelle software.
Link: http://events.linuxfoundation.org/sites/events/files/slides/LCJ16-Refactor_Strings-WSang_0.pdf
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Cc: Jim Paris <jim@jtan.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Geoff Levand <geoff@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The .rw_page in struct block_device_operations is used by the swap
subsystem to read/write the page contents from/into the corresponding
swap slot in the swap device. To support the THP (Transparent Huge
Page) swap optimization, the .rw_page is enhanced to support to
read/write THP if possible.
Link: http://lkml.kernel.org/r/20170724051840.2309-6-ying.huang@intel.com
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Reviewed-by: Ross Zwisler <ross.zwisler@intel.com> [for brd.c, zram_drv.c, pmem.c]
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Vishal L Verma <vishal.l.verma@intel.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Shaohua Li <shli@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This patch adds document and kconfig for using of writeback feature.
Link: http://lkml.kernel.org/r/1498459987-24562-10-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Juneho Choi <juno.choi@lge.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This patch enables read IO from backing device. For the feature, it
implements two IO read functions to transfer data from backing storage.
One is asynchronous IO function and other is synchronous one.
A reason I need synchrnous IO is due to partial write which need to
complete read IO before the overwriting partial data.
We can make the partial IO's case asynchronous, too but at the moment, I
don't feel adding more complexity to support such rare use cases so want
to go with simple.
[xieyisheng1@huawei.com: read_from_bdev_async(): return 1 to avoid call page_endio() in zram_rw_page()]
Link: http://lkml.kernel.org/r/1502707447-6944-1-git-send-email-xieyisheng1@huawei.com
Link: http://lkml.kernel.org/r/1498459987-24562-9-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Yisheng Xie <xieyisheng1@huawei.com>
Cc: Juneho Choi <juno.choi@lge.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This patch enables write IO to transfer data to backing device. For
that, it implements write_to_bdev function which creates new bio and
chaining with parent bio to make the parent bio asynchrnous.
For rw_page which don't have parent bio, it submit owned bio and handle
IO completion by zram_page_end_io.
Also, this patch defines new flag ZRAM_WB to mark written page for later
read IO.
[xieyisheng1@huawei.com: fix typo in comment]
Link: http://lkml.kernel.org/r/1502707447-6944-2-git-send-email-xieyisheng1@huawei.com
Link: http://lkml.kernel.org/r/1498459987-24562-8-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Yisheng Xie <xieyisheng1@huawei.com>
Cc: Juneho Choi <juno.choi@lge.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
For upcoming asynchronous IO like writeback, zram_rw_page should be
aware of that whether requested IO was completed or submitted
successfully, otherwise error.
For the goal, zram_bvec_rw has three return values.
-errno: returns error number
0: IO request is done synchronously
1: IO request is issued successfully.
Link: http://lkml.kernel.org/r/1498459987-24562-7-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Juneho Choi <juno.choi@lge.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
With backing device, zram needs management of free space of backing
device.
This patch adds bitmap logic to manage free space which is very naive.
However, it would be simple enough as considering uncompressible pages's
frequenty in zram.
Link: http://lkml.kernel.org/r/1498459987-24562-6-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Juneho Choi <juno.choi@lge.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
For writeback feature, user should set up backing device before the zram
working.
This patch enables the interface via /sys/block/zramX/backing_dev.
Currently, it supports block device only but it could be enhanced for
file as well.
Link: http://lkml.kernel.org/r/1498459987-24562-5-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Juneho Choi <juno.choi@lge.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
zram_decompress_page naming is not proper because it doesn't decompress
if page was dedup hit or stored with compression.
Use more abstract term and consistent with write path function
__zram_bvec_write.
Link: http://lkml.kernel.org/r/1498459987-24562-4-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Juneho Choi <juno.choi@lge.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
zram_compress does several things, compress, entry alloc and check
limitation. I did for just readbility but it hurts modulization.:(
So this patch removes zram_compress functions and inline it in
__zram_bvec_write for upcoming patches.
Link: http://lkml.kernel.org/r/1498459987-24562-3-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Juneho Choi <juno.choi@lge.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Patch series "writeback incompressible pages to storage", v1.
zRam is useful for memory saving with compressible pages but sometime,
workload can be changed and system has lots of incompressible pages
which is very harmful for zram.
This patch supports writeback feature of zram so admin can set up a
block device and with it, zram can save the memory via writing out the
incompressile pages once it found it's incompressible pages (1/4 comp
ratio) instead of keeping the page in memory.
[1-3] is just clean up and [4-8] is step by step feature enablement.
[4-8] is logically not bisectable(ie, logical unit separation)
although I tried to compiled out without breaking but I think it would
be better to review.
This patch (of 9):
__zram_bvec_write has some of duplicated logic for zram meta data
handling of same_page|compressed_page. This patch aims to clean it up
without behavior change.
[xieyisheng1@huawei.com: fix compr_data_size stat]
Link: http://lkml.kernel.org/r/1502707447-6944-1-git-send-email-xieyisheng1@huawei.com
Link: http://lkml.kernel.org/r/1496019048-27016-1-git-send-email-minchan@kernel.org
Link: http://lkml.kernel.org/r/1498459987-24562-2-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Yisheng Xie <xieyisheng1@huawei.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Juneho Choi <juno.choi@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|\ \
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
- Add 'cross-release' support to lockdep, which allows APIs like
completions, where it's not the 'owner' who releases the lock, to be
tracked. It's all activated automatically under
CONFIG_PROVE_LOCKING=y.
- Clean up (restructure) the x86 atomics op implementation to be more
readable, in preparation of KASAN annotations. (Dmitry Vyukov)
- Fix static keys (Paolo Bonzini)
- Add killable versions of down_read() et al (Kirill Tkhai)
- Rework and fix jump_label locking (Marc Zyngier, Paolo Bonzini)
- Rework (and fix) tlb_flush_pending() barriers (Peter Zijlstra)
- Remove smp_mb__before_spinlock() and convert its usages, introduce
smp_mb__after_spinlock() (Peter Zijlstra)
* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (56 commits)
locking/lockdep/selftests: Fix mixed read-write ABBA tests
sched/completion: Avoid unnecessary stack allocation for COMPLETION_INITIALIZER_ONSTACK()
acpi/nfit: Fix COMPLETION_INITIALIZER_ONSTACK() abuse
locking/pvqspinlock: Relax cmpxchg's to improve performance on some architectures
smp: Avoid using two cache lines for struct call_single_data
locking/lockdep: Untangle xhlock history save/restore from task independence
locking/refcounts, x86/asm: Disable CONFIG_ARCH_HAS_REFCOUNT for the time being
futex: Remove duplicated code and fix undefined behaviour
Documentation/locking/atomic: Finish the document...
locking/lockdep: Fix workqueue crossrelease annotation
workqueue/lockdep: 'Fix' flush_work() annotation
locking/lockdep/selftests: Add mixed read-write ABBA tests
mm, locking/barriers: Clarify tlb_flush_pending() barriers
locking/lockdep: Make CONFIG_LOCKDEP_CROSSRELEASE and CONFIG_LOCKDEP_COMPLETIONS truly non-interactive
locking/lockdep: Explicitly initialize wq_barrier::done::map
locking/lockdep: Rename CONFIG_LOCKDEP_COMPLETE to CONFIG_LOCKDEP_COMPLETIONS
locking/lockdep: Reword title of LOCKDEP_CROSSRELEASE config
locking/lockdep: Make CONFIG_LOCKDEP_CROSSRELEASE part of CONFIG_PROVE_LOCKING
locking/refcounts, x86/asm: Implement fast refcount overflow protection
locking/lockdep: Fix the rollback and overwrite detection logic in crossrelease
...
|
| |\ \
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Conflicts:
mm/page_alloc.c
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
struct call_single_data is used in IPIs to transfer information between
CPUs. Its size is bigger than sizeof(unsigned long) and less than
cache line size. Currently it is not allocated with any explicit alignment
requirements. This makes it possible for allocated call_single_data to
cross two cache lines, which results in double the number of the cache lines
that need to be transferred among CPUs.
This can be fixed by requiring call_single_data to be aligned with the
size of call_single_data. Currently the size of call_single_data is the
power of 2. If we add new fields to call_single_data, we may need to
add padding to make sure the size of new definition is the power of 2
as well.
Fortunately, this is enforced by GCC, which will report bad sizes.
To set alignment requirements of call_single_data to the size of
call_single_data, a struct definition and a typedef is used.
To test the effect of the patch, I used the vm-scalability multiple
thread swap test case (swap-w-seq-mt). The test will create multiple
threads and each thread will eat memory until all RAM and part of swap
is used, so that huge number of IPIs are triggered when unmapping
memory. In the test, the throughput of memory writing improves ~5%
compared with misaligned call_single_data, because of faster IPIs.
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Huang, Ying <ying.huang@intel.com>
[ Add call_single_data_t and align with size of call_single_data. ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Aaron Lu <aaron.lu@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/87bmnqd6lz.fsf@yhuang-mobile.sh.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|\ \ \ \
| |_|/ /
|/| | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 asm updates from Ingo Molnar:
- Introduce the ORC unwinder, which can be enabled via
CONFIG_ORC_UNWINDER=y.
The ORC unwinder is a lightweight, Linux kernel specific debuginfo
implementation, which aims to be DWARF done right for unwinding.
Objtool is used to generate the ORC unwinder tables during build, so
the data format is flexible and kernel internal: there's no
dependency on debuginfo created by an external toolchain.
The ORC unwinder is almost two orders of magnitude faster than the
(out of tree) DWARF unwinder - which is important for perf call graph
profiling. It is also significantly simpler and is coded defensively:
there has not been a single ORC related kernel crash so far, even
with early versions. (knock on wood!)
But the main advantage is that enabling the ORC unwinder allows
CONFIG_FRAME_POINTERS to be turned off - which speeds up the kernel
measurably:
With frame pointers disabled, GCC does not have to add frame pointer
instrumentation code to every function in the kernel. The kernel's
.text size decreases by about 3.2%, resulting in better cache
utilization and fewer instructions executed, resulting in a broad
kernel-wide speedup. Average speedup of system calls should be
roughly in the 1-3% range - measurements by Mel Gorman [1] have shown
a speedup of 5-10% for some function execution intense workloads.
The main cost of the unwinder is that the unwinder data has to be
stored in RAM: the memory cost is 2-4MB of RAM, depending on kernel
config - which is a modest cost on modern x86 systems.
Given how young the ORC unwinder code is it's not enabled by default
- but given the performance advantages the plan is to eventually make
it the default unwinder on x86.
See Documentation/x86/orc-unwinder.txt for more details.
- Remove lguest support: its intended role was that of a temporary
proof of concept for virtualization, plus its removal will enable the
reduction (removal) of the paravirt API as well, so Rusty agreed to
its removal. (Juergen Gross)
- Clean up and fix FSGS related functionality (Andy Lutomirski)
- Clean up IO access APIs (Andy Shevchenko)
- Enhance the symbol namespace (Jiri Slaby)
* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (47 commits)
objtool: Handle GCC stack pointer adjustment bug
x86/entry/64: Use ENTRY() instead of ALIGN+GLOBAL for stub32_clone()
x86/fpu/math-emu: Add ENDPROC to functions
x86/boot/64: Extract efi_pe_entry() from startup_64()
x86/boot/32: Extract efi_pe_entry() from startup_32()
x86/lguest: Remove lguest support
x86/paravirt/xen: Remove xen_patch()
objtool: Fix objtool fallthrough detection with function padding
x86/xen/64: Fix the reported SS and CS in SYSCALL
objtool: Track DRAP separately from callee-saved registers
objtool: Fix validate_branch() return codes
x86: Clarify/fix no-op barriers for text_poke_bp()
x86/switch_to/64: Rewrite FS/GS switching yet again to fix AMD CPUs
selftests/x86/fsgsbase: Test selectors 1, 2, and 3
x86/fsgsbase/64: Report FSBASE and GSBASE correctly in core dumps
x86/fsgsbase/64: Fully initialize FS and GS state in start_thread_common
x86/asm: Fix UNWIND_HINT_REGS macro for older binutils
x86/asm/32: Fix regs_get_register() on segment registers
x86/xen/64: Rearrange the SYSCALL entries
x86/asm/32: Remove a bunch of '& 0xffff' from pt_regs segment reads
...
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Lguest seems to be rather unused these days. It has seen only patches
ensuring it still builds the last two years and its official state is
"Odd Fixes".
Remove it in order to be able to clean up the paravirt code.
Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: boris.ostrovsky@oracle.com
Cc: lguest@lists.ozlabs.org
Cc: rusty@rustcorp.com.au
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/20170816173157.8633-3-jgross@suse.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|\ \ \ \
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Pull block fixes from Jens Axboe:
"Unfortunately a few issues that warrant sending another pull request,
even if I had hoped to avoid it. This contains:
- A fix for multiqueue xen-blkback, on tear down / disconnect.
- A few fixups for NVMe, including a wrong bit definition, fix for
host memory buffers, and an nvme rdma page size fix"
* 'for-linus' of git://git.kernel.dk/linux-block:
nvme: fix the definition of the doorbell buffer config support bit
nvme-pci: use dma memory for the host memory buffer descriptors
nvme-rdma: default MR page size to 4k
xen-blkback: stop blkback thread of every queue in xen_blkif_disconnect
|
| |\ \ \ \
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen into for-linus
Pull xen-blkback fix from Konrad:
"[...] A bug-fix when shutting down xen block backend driver with
multiple queues and the driver not clearing all of them."
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
In xen_blkif_disconnect, before checking inflight I/O, following code
stops the blkback thread,
if (ring->xenblkd) {
kthread_stop(ring->xenblkd);
wake_up(&ring->shutdown_wq);
}
If there is inflight I/O in any non-last queue, blkback returns -EBUSY
directly, and above code would not be called to stop thread of remaining
queue and processs them. When removing vbd device with lots of disk I/O
load, some queues with inflight I/O still have blkback thread running even
though the corresponding vbd device or guest is gone.
And this could cause some problems, for example, if the backend device type
is file, some loop devices and blkback thread always lingers there forever
after guest is destroyed, and this causes failure of umounting repositories
unless rebooting the dom0.
This patch allows thread of every queue has the chance to get stopped.
Otherwise, only thread of queue previous to(including) first busy one get
stopped, blkthread of remaining queue will still run. So stop all threads
properly and return -EBUSY if any queue has inflight I/O.
Signed-off-by: Annie Li <annie.li@oracle.com>
Reviewed-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Reviewed-by: Bhavesh Davda <bhavesh.davda@oracle.com>
Reviewed-by: Adnan Misherfi <adnan.misherfi@oracle.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
|\ \ \ \ \ \
| |/ / / / /
|/| | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Pull virtio fixes from Michael Tsirkin:
"Fixes two obvious bugs in virtio pci"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
virtio_pci: fix cpu affinity support
virtio_blk: fix incorrect message when disk is resized
|
| | |_|/ /
| |/| | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
The message printed on disk resize is incorrect. The following is
printed when resizing to 2 GiB:
$ truncate -s 1G test.img
$ qemu -device virtio-blk-pci,logical_block_size=4096,...
(qemu) block_resize drive1 2G
virtio_blk virtio0: new size: 4194304 4096-byte logical blocks (17.2 GB/16.0 GiB)
The virtio_blk capacity config field is in 512-byte sector units
regardless of logical_block_size as per the VIRTIO specification.
Therefore the message should read:
virtio_blk virtio0: new size: 524288 4096-byte logical blocks (2.15 GB/2.0 GiB)
Note that this only affects the printed message. Thankfully the actual
block device has the correct size because the block layer expects
capacity in sectors.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|/ / / /
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
There's some stuff still up in the air, let's not get stuck with a
subpar ABI. I'll follow up with something better for 4.14.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|\| | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen into for-linus
Pull xen block changes from Konrad:
Two fixes, both of them spotted by Amazon:
1) Fix in Xen-blkfront caused by the re-write in 4.8 time-frame.
2) Fix in the xen_biovec_phys_mergeable which allowed guest
requests when using NVMe - to slurp up more data than allowed
leading to an XSA (which has been made public today).
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Since commit d05d7f40791c ("Merge branch 'for-4.8/core' of
git://git.kernel.dk/linux-block") and 3fc9d690936f ("Merge branch
'for-4.8/drivers' of git://git.kernel.dk/linux-block"), blkfront_resume()
has been using an index for iterating ring_info to check request when
iterating blk_shadow in an inner loop. This seems to have been
accidentally introduced during the massive rewrite of the block layer
macros in the commits.
This may cause crash like this:
[11798.057074] BUG: unable to handle kernel NULL pointer dereference at 0000000000000048
[11798.058832] IP: [<ffffffff814411fa>] blkfront_resume+0x10a/0x610
....
[11798.061063] Call Trace:
[11798.061063] [<ffffffff8139ce93>] xenbus_dev_resume+0x53/0x140
[11798.061063] [<ffffffff8139ce40>] ? xenbus_dev_probe+0x150/0x150
[11798.061063] [<ffffffff813f359e>] dpm_run_callback+0x3e/0x110
[11798.061063] [<ffffffff813f3a08>] device_resume+0x88/0x190
[11798.061063] [<ffffffff813f4cc0>] dpm_resume+0x100/0x2d0
[11798.061063] [<ffffffff813f5221>] dpm_resume_end+0x11/0x20
[11798.061063] [<ffffffff813950a8>] do_suspend+0xe8/0x1a0
[11798.061063] [<ffffffff813954bd>] shutdown_handler+0xfd/0x130
[11798.061063] [<ffffffff8139aba0>] ? split+0x110/0x110
[11798.061063] [<ffffffff8139ac26>] xenwatch_thread+0x86/0x120
[11798.061063] [<ffffffff810b4570>] ? prepare_to_wait_event+0x110/0x110
[11798.061063] [<ffffffff8108fe57>] kthread+0xd7/0xf0
[11798.061063] [<ffffffff811da811>] ? kfree+0x121/0x170
[11798.061063] [<ffffffff8108fd80>] ? kthread_park+0x60/0x60
[11798.061063] [<ffffffff810863b0>] ? call_usermodehelper_exec_work+0xb0/0xb0
[11798.061063] [<ffffffff810864ea>] ? call_usermodehelper_exec_async+0x13a/0x140
[11798.061063] [<ffffffff81534a45>] ret_from_fork+0x25/0x30
Use the right index in the inner loop.
Fixes: d05d7f40791c ("Merge branch 'for-4.8/core' of git://git.kernel.dk/linux-block")
Fixes: 3fc9d690936f ("Merge branch 'for-4.8/drivers' of git://git.kernel.dk/linux-block")
Signed-off-by: Munehisa Kamata <kamatam@amazon.com>
Reviewed-by: Thomas Friebel <friebelt@amazon.de>
Reviewed-by: Eduardo Valentin <eduval@amazon.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Roger Pau Monne <roger.pau@citrix.com>
Cc: xen-devel@lists.xenproject.org
Cc: stable@vger.kernel.org
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
comp_algorithm_store() passes the size of the source buffer to strlcpy()
instead of the destination buffer size. Make it explicit that the two
buffers have the same size and use strcpy() instead of strlcpy(). The
latter can be done safely since the function ensures that the string in
the source buffer is terminated.
Link: http://lkml.kernel.org/r/20170803163350.45245-1-mka@chromium.org
Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| |/ /
|/| |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Using mpgroup to define multiple paths for a virtual disk causes multiple
virtual-device-port ports to be created for that virtual device.
Each virtual-device-port port then gets a vdisk created for it by the Linux
sunvdc driver. As mpgroup is not supported by the Linux sunvdc driver it
cannot handle multiple ports for a single vdisk, leading to a kernel panic
at startup.
This fix prevents more than one vdisk per virtual-device-port being created
until full virtual disk multipathing (mpgroup) support is implemented.
Signed-off-by: Jim Quigley <Jim.Quigley@oracle.com>
Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Reviewed-by: Aaron Young <aaron.young@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|\ \ \
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Pull block fixes from Jens Axboe:
"A small collection of fixes that should go into this series. This
contains:
- NVMe pull request from Christoph, with various fixes for nvme
proper and nvme-fc.
- disable runtime PM for blk-mq for now.
With scsi now defaulting to using blk-mq, this reared its head as
an issue. Longer term we'll fix up runtime PM for blk-mq, for now
just disable it to prevent a hang on laptop resume for some folks.
- blk-mq CPU <-> hw queue map fix from Christoph.
- xen/blkfront pull request from Konrad, with two small fixes for the
blkfront driver.
- a few fixups for nbd from Joseph.
- a stable fix for pblk from Javier"
* 'for-linus' of git://git.kernel.dk/linux-block:
lightnvm: pblk: advance bio according to lba index
nvme: validate admin queue before unquiesce
nbd: clear disconnected on reconnect
nvme-pci: fix HMB size calculation
nvme-fc: revise TRADDR parsing
nvme-fc: address target disconnect race conditions in fcp io submit
nvme: fabrics commands should use the fctype field for data direction
nvme: also provide a UUID in the WWID sysfs attribute
xen/blkfront: always allocate grants first from per-queue persistent grants
xen-blkfront: fix mq start/stop race
blk-mq: map queues to all present CPUs
block: disable runtime-pm for blk-mq
xen-blkfront: Fix handling of non-supported operations
nbd: only set sndtimeo if we have a timeout set
nbd: take tx_lock before disconnecting
nbd: allow multiple disconnects to be sent
|
| |\| |
| | | |
| | | |
| | | |
| | | |
| | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen into for-linus
Pull xen-blkfront fixes from Konrad for 4.13.
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
This patch partially reverts 3df0e50 ("xen/blkfront: pseudo support for
multi hardware queues/rings"). The xen-blkfront queue/ring might hang due
to grants allocation failure in the situation when gnttab_free_head is
almost empty while many persistent grants are reserved for this queue/ring.
As persistent grants management was per-queue since 73716df ("xen/blkfront:
make persistent grants pool per-queue"), we should always allocate from
persistent grants first.
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
When ring buf full, hw queue will be stopped. While blkif interrupt consume
request and make free space in ring buf, hw queue will be started again.
But since start queue is protected by spin lock while stop not, that will
cause a race.
interrupt: process:
blkif_interrupt() blkif_queue_rq()
kick_pending_request_queues_locked()
blk_mq_start_stopped_hw_queues()
clear_bit(BLK_MQ_S_STOPPED, &hctx->state)
blk_mq_stop_hw_queue(hctx)
blk_mq_run_hw_queue(hctx, async)
If ring buf is made empty in this case, interrupt will never come, then the
hw queue will be stopped forever, all processes waiting for the pending io
in the queue will hung.
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Ankur Arora <ankur.a.arora@oracle.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
| |/ /
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
If our device loses its connection for longer than the dead timeout we
will set NBD_DISCONNECTED in order to quickly fail any pending IO's that
flood in after the IO's that were waiting during the dead timer.
However if we re-connect at some point in the future we'll still see
this DISCONNECTED flag set if we then lose our connection again after
that, which means we won't get notifications for our newly lost
connections. Fix this by just clearing the DISCONNECTED flag on
reconnect in order to make sure everything works as it's supposed to.
Reported-by: Dan Melnic <dmm@fb.com>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This patch fixes the following sparse warnings:
drivers/block/xen-blkfront.c:916:45: warning: incorrect type in argument 2 (different base types)
drivers/block/xen-blkfront.c:916:45: expected restricted blk_status_t [usertype] error
drivers/block/xen-blkfront.c:916:45: got int [signed] error
drivers/block/xen-blkfront.c:1599:47: warning: incorrect type in assignment (different base types)
drivers/block/xen-blkfront.c:1599:47: expected int [signed] error
drivers/block/xen-blkfront.c:1599:47: got restricted blk_status_t [usertype] <noident>
drivers/block/xen-blkfront.c:1607:55: warning: incorrect type in assignment (different base types)
drivers/block/xen-blkfront.c:1607:55: expected int [signed] error
drivers/block/xen-blkfront.c:1607:55: got restricted blk_status_t [usertype] <noident>
drivers/block/xen-blkfront.c:1625:55: warning: incorrect type in assignment (different base types)
drivers/block/xen-blkfront.c:1625:55: expected int [signed] error
drivers/block/xen-blkfront.c:1625:55: got restricted blk_status_t [usertype] <noident>
drivers/block/xen-blkfront.c:1628:62: warning: restricted blk_status_t degrades to integer
Compile-tested only.
Fixes: commit 2a842acab109 ("block: introduce new block status code type")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Roger Pau Monné <roger.pau@citrix.com>
Cc: <xen-devel@lists.xenproject.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
A user reported that he was getting immediate disconnects with my
sndtimeo patch applied. This is because by default the OSS nbd client
doesn't set a timeout, so we end up setting the sndtimeo to 0, which of
course means we have send errors a lot. Instead only set our sndtimeo
if the user specified a timeout, otherwise we'll just wait forever like
we did previously.
Fixes: dc88e34d69d8 ("nbd: set sk->sk_sndtimeo for our sockets")
Reported-by: Adam Borowski <kilobyte@angband.pl>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
We need to take the tx_lock so we don't interleave our disconnect
request between real data going down the wire.
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
| |/
| |
| |
| |
| |
| |
| |
| |
| | |
There's no reason to limit ourselves to one disconnect message per
socket. Sometimes networks do strange things, might as well let
sysadmins hit the panic button as much as they want.
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|/
|
|
|
|
|
|
|
|
| |
Use sysfs_match_string() helper instead of open coded variant.
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Jason Wang <jasowang@redhat.com>
|
|
|
|
|
|
|
| |
No need to return value in queue work, kill ret variable.
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Pull more block updates from Jens Axboe:
"This is a followup for block changes, that didn't make the initial
pull request. It's a bit of a mixed bag, this contains:
- A followup pull request from Sagi for NVMe. Outside of fixups for
NVMe, it also includes a series for ensuring that we properly
quiesce hardware queues when browsing live tags.
- Set of integrity fixes from Dmitry (mostly), fixing various issues
for folks using DIF/DIX.
- Fix for a bug introduced in cciss, with the req init changes. From
Christoph.
- Fix for a bug in BFQ, from Paolo.
- Two followup fixes for lightnvm/pblk from Javier.
- Depth fix from Ming for blk-mq-sched.
- Also from Ming, performance fix for mtip32xx that was introduced
with the dynamic initialization of commands"
* 'for-linus' of git://git.kernel.dk/linux-block: (44 commits)
block: call bio_uninit in bio_endio
nvmet: avoid unneeded assignment of submit_bio return value
nvme-pci: add module parameter for io queue depth
nvme-pci: compile warnings in nvme_alloc_host_mem()
nvmet_fc: Accept variable pad lengths on Create Association LS
nvme_fc/nvmet_fc: revise Create Association descriptor length
lightnvm: pblk: remove unnecessary checks
lightnvm: pblk: control I/O flow also on tear down
cciss: initialize struct scsi_req
null_blk: fix error flow for shared tags during module_init
block: Fix __blkdev_issue_zeroout loop
nvme-rdma: unconditionally recycle the request mr
nvme: split nvme_uninit_ctrl into stop and uninit
virtio_blk: quiesce/unquiesce live IO when entering PM states
mtip32xx: quiesce request queues to make sure no submissions are inflight
nbd: quiesce request queues to make sure no submissions are inflight
nvme: kick requeue list when requeueing a request instead of when starting the queues
nvme-pci: quiesce/unquiesce admin_q instead of start/stop its hw queues
nvme-loop: quiesce/unquiesce admin_q instead of start/stop its hw queues
nvme-fc: quiesce/unquiesce admin_q instead of start/stop its hw queues
...
|
| |\
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Pull followup NVMe (mostly) changes from Sagi:
I added the quiesce/unquiesce patches in here as it's
easy for me easily apply changes on top. It has accumulated
reviews and includes mostly nvme anyway, please tell me if
you don't want to take them with this.
This includes:
- quiesce/unquiesce fixes in nvme and others from me
- nvme-fc add create association padding spec updates from James
- some more quirking from MKP
- nvmet nit cleanup from Max
- Fix nvme-rdma racy RDMA completion signalling from Marta
- some centralization patches from me
- add tagset nr_hw_queues updates on controller resets in
nvme drivers from me
- nvme-rdma fix resources recycling when doing error recovery from me
- minor cleanups in nvme-fc from me
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Without it its not guaranteed that no .queue_rq is inflight.
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Cc: virtio-dev@lists.oasis-open.org
Cc: Jason Wang <jasowang@redhat.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Unlike blk_mq_stop_hw_queues, blk_mq_quiesce_queue respects the
submission path rcu grace. quiesce the queue before iterating
on live tags, or performing device io quiescing.
While were at it, verify that the request started in mtip_abort_cmd
amd mtip_queue_cmd tag iteration calls.
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Unlike blk_mq_stop_hw_queues, blk_mq_quiesce_queue respects the
submission path rcu grace. quiesce the queue before iterating
on live tags.
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Acked-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The changes in "block: Make most scsi_req_init() calls implicit" mean
that every driver that supports the generic scsi ioctls needs to
call scsi_req_init on newly allocated requests, but that commit didn't
add the call to the ccіss driver. Fix that to avoid crashes when
udev issues SG_IO commands.
Fixes: ca18d6f7 ("block: Make most scsi_req_init() calls implicit")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
In case we use shared tags feature, blk_mq_alloc_tag_set might fail
during module initialization. In that case, fail the load with a
suitable error code. Also move the tagset initialization process after
defining the amount of submission queues.
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
| |/
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
It is observed reading the register from HW takes a bit long,
for example in my box, the following difference of 'perf report
--no-children fio ...' can be seen when running I/O:
1) V4.12 without patch
+ 9.28% fio [mtip32xx] [k] mtip_irq_handler
+ 8.48% fio [mtip32xx] [k] mtip_init_cmd_header
2) V4.12 with the following patch
+ 9.14% fio [mtip32xx] [k] mtip_irq_handler
......
+ 1.14% fio [mtip32xx] [k] mtip_init_cmd_header
IOPS can be increased by ~5% with this patch too.
Fixes: a4e84aae8139(mtip32xx: use runtime tag to initialize command header)
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The global variable 'rd_size' is declared as 'int' in source file
arch/arm/kernel/atags_parse.c and as 'unsigned long' in
drivers/block/brd.c. Fix this inconsistency.
Additionally, remove the declarations of rd_image_start, rd_prompt and
rd_doload from parse_tag_ramdisk() since these duplicate existing
declarations in <linux/initrd.h>.
Link: http://lkml.kernel.org/r/20170627065024.12347-1-bart.vanassche@wdc.com
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Acked-by: Russell King <rmk+kernel@armlinux.org.uk>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jan Kara <jack@suse.cz>
Cc: Jason Yan <yanaijie@huawei.com>
Cc: Zhaohongjiang <zhaohongjiang@huawei.com>
Cc: Miao Xie <miaoxie@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
attribute_groups are not supposed to change at runtime. All functions
working with attribute_groups provided by <linux/sysfs.h> work with
const attribute_group. So mark the non-const structs as const.
File size before:
text data bss dec hex filename
8293 841 4 9138 23b2 drivers/block/zram/zram_drv.o
File size After adding 'const':
text data bss dec hex filename
8357 777 4 9138 23b2 drivers/block/zram/zram_drv.o
Link: http://lkml.kernel.org/r/65680c1c4d85818f7094cbfa31c91bf28185ba1b.1499061182.git.arvind.yadav.cs@gmail.com
Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Use __sysfs_match_string() helper instead of open coded variant.
Link: http://lkml.kernel.org/r/20170609120835.22156-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|