linux - linux

	Commit message (Collapse)	Author	Age	Files	Lines
*	md/r5cache: write-out phase and reclaim support	Song Liu	2016-11-18	3	-41/+430
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are two limited resources, stripe cache and journal disk space. For better performance, we priotize reclaim of full stripe writes. To free up more journal space, we free earliest data on the journal. In current implementation, reclaim happens when: 1. Periodically (every R5C_RECLAIM_WAKEUP_INTERVAL, 30 seconds) reclaim if there is no reclaim in the past 5 seconds. 2. when there are R5C_FULL_STRIPE_FLUSH_BATCH (256) cached full stripes, or cached stripes is enough for a full stripe (chunk size / 4k) (r5c_check_cached_full_stripe) 3. when there is pressure on stripe cache (r5c_check_stripe_cache_usage) 4. when there is pressure on journal space (r5l_write_stripe, r5c_cache_data) r5c_do_reclaim() contains new logic of reclaim. For stripe cache: When stripe cache pressure is high (more than 3/4 stripes are cached, or there is empty inactive lists), flush all full stripe. If fewer than R5C_RECLAIM_STRIPE_GROUP (NR_STRIPE_HASH_LOCKS * 2) full stripes are flushed, flush some paritial stripes. When stripe cache pressure is moderate (1/2 to 3/4 of stripes are cached), flush all full stripes. For log space: To avoid deadlock due to log space, we need to reserve enough space to flush cached data. The size of required log space depends on total number of cached stripes (stripe_in_journal_count). In current implementation, the writing-out phase automatically include pending data writes with parity writes (similar to write through case). Therefore, we need up to (conf->raid_disks + 1) pages for each cached stripe (1 page for meta data, raid_disks pages for all data and parity). r5c_log_required_to_flush_cache() calculates log space required to flush cache. In the following, we refer to the space calculated by r5c_log_required_to_flush_cache() as reclaim_required_space. Two flags are added to r5conf->cache_state: R5C_LOG_TIGHT and R5C_LOG_CRITICAL. R5C_LOG_TIGHT is set when free space on the log device is less than 3x of reclaim_required_space. R5C_LOG_CRITICAL is set when free space on the log device is less than 2x of reclaim_required_space. r5c_cache keeps all data in cache (not fully committed to RAID) in a list (stripe_in_journal_list). These stripes are in the order of their first appearance on the journal. So the log tail (last_checkpoint) should point to the journal_start of the first item in the list. When R5C_LOG_TIGHT is set, r5l_reclaim_thread starts flushing out stripes at the head of stripe_in_journal. When R5C_LOG_CRITICAL is set, the state machine only writes data that are already in the log device (in stripe_in_journal_list). This patch includes a fix to improve performance by Shaohua Li <shli@fb.com>. Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Shaohua Li <shli@fb.com>
*	md/r5cache: caching phase of r5cache	Song Liu	2016-11-18	3	-32/+381
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As described in previous patch, write back cache operates in two phases: caching and writing-out. The caching phase works as: 1. write data to journal (r5c_handle_stripe_dirtying, r5c_cache_data) 2. call bio_endio (r5c_handle_data_cached, r5c_return_dev_pending_writes). Then the writing-out phase is as: 1. Mark the stripe as write-out (r5c_make_stripe_write_out) 2. Calcualte parity (reconstruct or RMW) 3. Write parity (and maybe some other data) to journal device 4. Write data and parity to RAID disks This patch implements caching phase. The cache is integrated with stripe cache of raid456. It leverages code of r5l_log to write data to journal device. Writing-out phase of the cache is implemented in the next patch. With r5cache, write operation does not wait for parity calculation and write out, so the write latency is lower (1 write to journal device vs. read and then write to raid disks). Also, r5cache will reduce RAID overhead (multipile IO due to read-modify-write of parity) and provide more opportunities of full stripe writes. This patch adds 2 flags to stripe_head.state: - STRIPE_R5C_PARTIAL_STRIPE, - STRIPE_R5C_FULL_STRIPE, Instead of inactive_list, stripes with cached data are tracked in r5conf->r5c_full_stripe_list and r5conf->r5c_partial_stripe_list. STRIPE_R5C_FULL_STRIPE and STRIPE_R5C_PARTIAL_STRIPE are flags for stripes in these lists. Note: stripes in r5c_full/partial_stripe_list are not considered as "active". For RMW, the code allocates an extra page for each data block being updated. This is stored in r5dev->orig_page and the old data is read into it. Then the prexor calculation subtracts ->orig_page from the parity block, and the reconstruct calculation adds the ->page data back into the parity block. r5cache naturally excludes SkipCopy. When the array has write back cache, async_copy_data() will not skip copy. There are some known limitations of the cache implementation: 1. Write cache only covers full page writes (R5_OVERWRITE). Writes of smaller granularity are write through. 2. Only one log io (sh->log_io) for each stripe at anytime. Later writes for the same stripe have to wait. This can be improved by moving log_io to r5dev. 3. With writeback cache, read path must enter state machine, which is a significant bottleneck for some workloads. 4. There is no per stripe checkpoint (with r5l_payload_flush) in the log, so recovery code has to replay more than necessary data (sometimes all the log from last_checkpoint). This reduces availability of the array. This patch includes a fix proposed by ZhengYuan Liu <liuzhengyuan@kylinos.cn> Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Shaohua Li <shli@fb.com>
*	md/r5cache: State machine for raid5-cache write back mode	Song Liu	2016-11-18	3	-8/+211
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds state machine for raid5-cache. With log device, the raid456 array could operate in two different modes (r5c_journal_mode): - write-back (R5C_MODE_WRITE_BACK) - write-through (R5C_MODE_WRITE_THROUGH) Existing code of raid5-cache only has write-through mode. For write-back cache, it is necessary to extend the state machine. With write-back cache, every stripe could operate in two different phases: - caching - writing-out In caching phase, the stripe handles writes as: - write to journal - return IO In writing-out phase, the stripe behaviors as a stripe in write through mode R5C_MODE_WRITE_THROUGH. STRIPE_R5C_CACHING is added to sh->state to differentiate caching and writing-out phase. Please note: this is a "no-op" patch for raid5-cache write-through mode. The following detailed explanation is copied from the raid5-cache.c: /* * raid5 cache state machine * * With rhe RAID cache, each stripe works in two phases: * - caching phase * - writing-out phase * * These two phases are controlled by bit STRIPE_R5C_CACHING: * if STRIPE_R5C_CACHING == 0, the stripe is in writing-out phase * if STRIPE_R5C_CACHING == 1, the stripe is in caching phase * * When there is no journal, or the journal is in write-through mode, * the stripe is always in writing-out phase. * * For write-back journal, the stripe is sent to caching phase on write * (r5c_handle_stripe_dirtying). r5c_make_stripe_write_out() kicks off * the write-out phase by clearing STRIPE_R5C_CACHING. * * Stripes in caching phase do not write the raid disks. Instead, all * writes are committed from the log device. Therefore, a stripe in * caching phase handles writes as: * - write to log device * - return IO * * Stripes in writing-out phase handle writes as: * - calculate parity * - write pending data and parity to journal * - write data and parity to raid disks * - return IO for pending writes */ Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Shaohua Li <shli@fb.com>
*	md/r5cache: move some code to raid5.h	Song Liu	2016-11-18	2	-71/+77
\| \| \| \| \| \| \| \|	Move some define and inline functions to raid5.h, so they can be used in raid5-cache.c Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Shaohua Li <shli@fb.com>
*	md/r5cache: Check array size in r5l_init_log	Song Liu	2016-11-18	1	-10/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, r5l_write_stripe checks meta size for each stripe write, which is not necessary. With this patch, r5l_init_log checks maximal meta size of the array, which is (r5l_meta_block + raid_disks x r5l_payload_data_parity). If this is too big to fit in one page, r5l_init_log aborts. With current meta data, r5l_log support raid_disks up to 203. Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Shaohua Li <shli@fb.com>
*	md: add blktrace event for writes to superblock	Shaohua Li	2016-11-18	1	-0/+3
\| \| \| \| \| \| \| \|	superblock write is an expensive operation. With raid5-cache, it can be called regularly. Tracing to help performance debug. Signed-off-by: Shaohua Li <shli@fb.com> Cc: NeilBrown <neilb@suse.com>
*	md/raid1, raid10: add blktrace records when IO is delayed	NeilBrown	2016-11-18	2	-0/+16
\| \| \| \| \| \| \| \| \| \| \|	Both raid1 and raid10 will sometimes delay handling an IO request, such as when resync is happening or there are too many requests queued. Add some blktrace messsages so we can see when that is happening when looking for performance artefacts. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>
*	md/bitmap: add blktrace event for writes to the bitmap	NeilBrown	2016-11-18	1	-1/+10
\| \| \| \| \| \| \| \| \| \| \|	We trace wheneven bitmap_unplug() finds that it needs to write to the bitmap, or when bitmap_daemon_work() find there is work to do. This makes it easier to correlate bitmap updates with data writes. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>
*	md: add block tracing for bio_remapping	NeilBrown	2016-11-18	4	-13/+75
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The block tracing infrastructure (accessed with blktrace/blkparse) supports the tracing of mapping bios from one device to another. This is currently used when a bio in a partition is mapped to the whole device, when bios are mapped by dm, and for mapping in md/raid5. Other md personalities do not include this tracing yet, so add it. When a read-error is detected we redirect the request to a different device. This could justifiably be seen as a new mapping for the originial bio, or a secondary mapping for the bio that errors. This patch uses the second option. When md is used under dm-raid, the mappings are not traced as we do not have access to the block device number of the parent. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>
*	raid5-cache: fix lockdep warning	Shaohua Li	2016-11-17	1	-2/+14
\| \| \| \| \| \| \|	lockdep reports warning of the rcu_dereference usage. Using normal rdev access pattern to avoid the warning. Signed-off-by: Shaohua Li <shli@fb.com>
*	md: remove md_super_wait() call after bitmap_flush()	NeilBrown	2016-11-09	1	-1/+0
\| \| \| \| \| \| \| \| \| \|	bitmap_flush() finishes with bitmap_update_sb(), and that finishes with write_page(..., 1), so write_page() will wait for all writes to complete. So there is no point calling md_super_wait() immediately afterwards. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>
*	md: define mddev flags, recovery flags and r1bio state bits using enums	NeilBrown	2016-11-09	2	-48/+46
\| \| \| \| \| \| \|	This is less error prone than using individual #defines. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>
*	md/raid1: fix: IO can block resync indefinitely	NeilBrown	2016-11-09	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	While performing a resync/recovery, raid1 divides the array space into three regions: - before the resync - at or shortly after the resync point - much further ahead of the resync point. Write requests to the first or third do not need to wait. Write requests to the middle region do need to wait if resync requests are pending. If there are any active write requests in the middle region, resync will wait for them. Due to an accounting error, there is a small range of addresses, between conf->next_resync and conf->start_next_window, where write requests will not be blocked, but will be counted in the middle region. This can effectively block resync indefinitely if filesystem writes happen repeatedly to this region. As ->next_window_requests is incremented when the sector is after conf->start_next_window + NEXT_NORMALIO_DISTANCE the same boundary should be used for determining when write requests should wait. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>
*	md/bitmap: Don't write bitmap while earlier writes might be in-flight	NeilBrown	2016-11-07	1	-5/+22
\| \| \| \| \| \| \| \| \| \| \| \|	As we don't wait for writes to complete in bitmap_daemon_work, they could still be in-flight when bitmap_unplug writes again. Or when bitmap_daemon_work tries to write again. This can be confusing and could risk the wrong data being written last. So make sure we wait for old writes to complete before new writes start. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>
*	md/raid10: abort delayed writes when device fails.	NeilBrown	2016-11-07	1	-6/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When writing to an array with a bitmap enabled, the writes are grouped in batches which are preceded by an update to the bitmap. It is quite likely if that a drive develops a problem which is not media related, that the bitmap write will be the first to report an error and cause the device to be marked faulty (as the bitmap write is at the start of a batch). In this case, there is point submiting the subsequent writes to the failed device - that just wastes times. So re-check the Faulty state of a device before submitting a delayed write. This requires that we keep the 'rdev', rather than the 'bdev' in the bio, then swap in the bdev just before final submission. Reported-by: Hannes Reinecke <hare@suse.com> Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>
*	md/raid1: abort delayed writes when device fails.	NeilBrown	2016-11-07	1	-5/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When writing to an array with a bitmap enabled, the writes are grouped in batches which are preceded by an update to the bitmap. It is quite likely if that a drive develops a problem which is not media related, that the bitmap write will be the first to report an error and cause the device to be marked faulty (as the bitmap write is at the start of a batch). In this case, there is point submiting the subsequent writes to the failed device - that just wastes times. So re-check the Faulty state of a device before submitting a delayed write. This requires that we keep the 'rdev', rather than the 'bdev' in the bio, then swap in the bdev just before final submission. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>
*	md: perform async updates for metadata where possible.	NeilBrown	2016-11-07	1	-4/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	When adding devices to, or removing device from, an array we need to update the metadata. However we don't need to do it synchronously as data integrity doesn't depend on these changes being recorded instantly. So avoid the synchronous call to md_update_sb and just set a flag so that the thread will do it. This can reduce the number of updates performed when lots of devices are being added or removed. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>
*	raid5-cache: restrict the use area of the log_offset variable	JackieLiu	2016-11-07	1	-11/+8
\| \| \| \| \| \| \| \|	We can calculate this offset by using ctx->meta_total_blocks, without passing in from the function Signed-off-by: JackieLiu <liuyun01@kylinos.cn> Signed-off-by: Shaohua Li <shli@fb.com>
*	md/raid5: change printk() to pr_*()	NeilBrown	2016-11-07	1	-121/+86
\| \| \| \| \|	Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>
*	md/raid10: change printk() to pr_*()	NeilBrown	2016-11-07	1	-85/+56
\| \| \| \| \|	Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>
*	md/raid1: change printk() to pr_*()	NeilBrown	2016-11-07	1	-58/+41
\| \| \| \| \|	Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>
*	md/raid0: replace printk() with pr_*()	NeilBrown	2016-11-07	1	-45/+44
\| \| \| \| \| \| \| \|	This makes md/raid0 much less verbose as the messages about the array geometry are now pr_debug() Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>
*	md/multipath: replace printk() with pr_*()	NeilBrown	2016-11-07	1	-57/+33
\| \| \| \| \| \| \| \|	Also remove all messages about memory allocation failure. page_alloc() reports those. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>
*	md/linear: replace printk() with pr_*()	NeilBrown	2016-11-07	1	-8/+5
\| \| \| \| \|	Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>
*	md/bitmap: change all printk() to pr_*()	NeilBrown	2016-11-07	1	-57/+50
\| \| \| \| \| \| \| \|	Follow err/warn distinction introduced in md.c Join multi-part strings into single string. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>
*	md: change all printk() to pr_err() or pr_warn() etc.	NeilBrown	2016-11-07	1	-219/+170
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	1/ using pr_debug() for a number of messages reduces the noise of md, but still allows them to be enabled when needed. 2/ try to be consistent in the usage of pr_err() and pr_warn(), and document the intention 3/ When strings have been split onto multiple lines, rejoin into a single string. The cost of having lines > 80 chars is less than the cost of not being able to easily search for a particular message. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>
*	md: fix some issues with alloc_disk_sb()	NeilBrown	2016-11-07	1	-10/+10
\| \| \| \| \| \| \| \| \|	1/ don't print a warning if allocation fails. page_alloc() does that already. 2/ always check return status for error. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>
*	md/bitmap: call bitmap_file_unmap once bitmap_storage_alloc returns -ENOMEM	Guoqing Jiang	2016-11-07	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \|	It is possible that bitmap_storage_alloc could return -ENOMEM, and some member inside store could be allocated such as filemap. To avoid memory leak, we need to call bitmap_file_unmap to free those members in the bitmap_resize. Reviewed-by: NeilBrown <neilb@suse.com> Signed-off-by: Guoqing Jiang <gqjiang@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>
*	raid5: revert commit 11367799f3d1	Tomasz Majchrzak	2016-11-07	1	-3/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Revert commit 11367799f3d1 ("md: Prevent IO hold during accessing to faulty raid5 array") as it doesn't comply with commit c3cce6cda162 ("md/raid5: ensure device failure recorded before write request returns."). That change is not required anymore as the problem is resolved by commit 16f889499a52 ("md: report 'write_pending' state when array in sync") - read request is stuck as array state is not reported correctly via sysfs attribute. Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com> Signed-off-by: Shaohua Li <shli@fb.com>
*	md: wake up personality thread after array state update	Tomasz Majchrzak	2016-11-07	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When raid1/raid10 array fails to write to one of the drives, the request is added to bio_end_io_list and finished by personality thread. The thread doesn't handle it as long as MD_CHANGE_PENDING flag is set. In case of external metadata this flag is cleared, however the thread is not woken up. It causes request to be blocked for few seconds (until another action on the array wakes up the thread) or to get stuck indefinitely. Wake up personality thread once MD_CHANGE_PENDING has been cleared. Moving 'restart_array' call after the flag is cleared it not a solution because in read-write mode the call doesn't wake up the thread. Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com> Signed-off-by: Shaohua Li <shli@fb.com>
*	md: don't fail an array if there are unacknowledged bad blocks	Tomasz Majchrzak	2016-11-07	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If external metadata handler supports bad blocks and unacknowledged bad blocks are present, don't report disk via sysfs as faulty. Such situation can be still handled so disk just has to be blocked for a moment. It makes it consistent with kernel state as corresponding rdev flag is also not set. When the disk in being unblocked there are few cases: 1. Disk has been in blocked and faulty state, it is being unblocked but it still remains in faulty state. Metadata handler will remove it from array in the next call. 2. There is no bad block support in external metadata handler and bad blocks are present - put the disk in blocked and faulty state (see case 1). 3. There is bad block support in external metadata handler and all bad blocks are acknowledged - clear all flags, continue. 4. There is bad block support in external metadata handler but there are still unacknowledged bad blocks - clear all flags, continue. It is fine to clear Blocked flag because it was probably not set anyway (if it was it is case 1). BlockedBadBlocks flag can also be cleared because the request waiting for it will set it again when it finds out that some bad block is still not acknowledged. Recovery is not necessary but there are no problems if the flag is set. Sysfs rdev state is still reported as blocked (due to unacknowledged bad blocks) so metadata handler will process remaining bad blocks and unblock disk again. Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com> Signed-off-by: Shaohua Li <shli@fb.com>
*	md: add bad block support for external metadata	Tomasz Majchrzak	2016-11-07	2	-39/+42
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add new rdev flag which external metadata handler can use to switch on/off bad block support. If new bad block is encountered, notify it via rdev 'unacknowledged_bad_blocks' sysfs file. If bad block has been cleared, notify update to rdev 'bad_blocks' sysfs file. When bad blocks support is being removed, just clear rdev flag. It is not necessary to reset badblocks->shift field. If there are bad blocks cleared or added at the same time, it is ok for those changes to be applied to the structure. The array is in blocked state and the drive which cannot handle bad blocks any more will be removed from the array before it is unlocked. Simplify state_show function by adding a separator at the end of each string and overwrite last separator with new line. Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com> Reviewed-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com> Signed-off-by: Shaohua Li <shli@fb.com>
*	lib/raid6: Add AVX2 optimized xor_syndrome functions	Gayatri Kammela	2016-11-07	1	-3/+229
\| \| \| \| \| \| \| \| \| \| \|	Implement the AVX2 optimization of RAID6 xor_syndrome functions which is simply based on sse2.c written by hpa. Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: Yuanhan Liu <yuanhan.liu@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Gayatri Kammela <gayatri.kammela@intel.com> Signed-off-by: Shaohua Li <shli@fb.com>
*	Merge tag 'arm64-fixes' of ↵	Linus Torvalds	2016-11-07	4	-21/+42
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fix from Will Deacon: "It's been pretty quiet on the fixes side of things for us, but Artem reported a build failure introduced during the merge window that appears with older GCCs that do not support asm goto. The fix is bigger than I'd like, but it's a mechnical move of some constants to break an include dependency between atomic.h and jump_label.h when !HAVE_JUMP_LABEL. Summary: - Fix build failure on compilers without asm goto" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: Fix circular include of asm/lse.h through linux/jump_label.h
\| *	arm64: Fix circular include of asm/lse.h through linux/jump_label.h	Catalin Marinas	2016-11-05	4	-21/+42
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit efd9e03facd0 ("arm64: Use static keys for CPU features") introduced support for static keys in asm/cpufeature.h, including linux/jump_label.h. When CC_HAVE_ASM_GOTO is not defined, this causes a circular dependency via linux/atomic.h, asm/lse.h and asm/cpufeature.h. This patch moves the capability macros out out of asm/cpufeature.h into a separate asm/cpucaps.h and modifies some of the #includes accordingly. Fixes: efd9e03facd0 ("arm64: Use static keys for CPU features") Reported-by: Artem Savkov <asavkov@redhat.com> Tested-by: Artem Savkov <asavkov@redhat.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
* \|	Merge tag 'openrisc-for-linus-v4.9-rc5' of ↵	Linus Torvalds	2016-11-07	1	-0/+2
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging Pull openrisc fix from Guenter Roeck: "Fix openrisc crash caused by ro_init changes" * tag 'openrisc-for-linus-v4.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging: openrisc: Define __ro_after_init to avoid crash
\| * \|	openrisc: Define __ro_after_init to avoid crash	Guenter Roeck	2016-11-06	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	openrisc qemu tests fail with the following crash. Unable to handle kernel access at virtual address 0xc0300c34 Oops#: 0001 CPU #: 0 PC: c016c710 SR: 0000ae67 SP: c1017e04 GPR00: 00000000 GPR01: c1017e04 GPR02: c0300c34 GPR03: c0300c34 GPR04: 00000000 GPR05: c0300cb0 GPR06: c0300c34 GPR07: 000000ff GPR08: c107f074 GPR09: c0199ef4 GPR10: c1016000 GPR11: 00000000 GPR12: 00000000 GPR13: c107f044 GPR14: c0473774 GPR15: 07ce0000 GPR16: 00000000 GPR17: c107ed8a GPR18: 00009600 GPR19: c107f044 GPR20: c107ee74 GPR21: 00000003 GPR22: c0473770 GPR23: 00000033 GPR24: 000000bf GPR25: 00000019 GPR26: c046400c GPR27: 00000001 GPR28: c0464028 GPR29: c1018000 GPR30: 00000006 GPR31: ccf37483 RES: 00000000 oGPR11: ffffffff Process swapper (pid: 1, stackpage=c1001960) Stack: Stack dump [0xc1017cf8]: sp + 00: 0xc1017e04 sp + 04: 0xc0300c34 sp + 08: 0xc0300c34 sp + 12: 0x00000000 ... Bisect points to commit d2ec3f77de8e ("pty: make ptmx file ops read-only after init"). Fix by defining __ro_after_init for the openrisc architecture, similar to parisc. Fixes: d2ec3f77de8e ("pty: make ptmx file ops read-only after init") Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Acked-by: Stafford Horne <shorne@gmail.com>
* \| \|	Merge tag 'hwmon-for-linus-v4.9-rc5' of ↵	Linus Torvalds	2016-11-07	1	-2/+4
\|\ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging Pull hwmon fix from Guenter Roeck: "Fix resource leak on devm_kcalloc failure" * tag 'hwmon-for-linus-v4.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging: hwmon: (core) fix resource leak on devm_kcalloc failure
\| * \| \|	hwmon: (core) fix resource leak on devm_kcalloc failure	Colin Ian King	2016-10-24	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If dev_kcalloc fails to allocate hw_dev->groups then the current exit path is a direct return, causing a leak of resources such as hwdev and ida is not removed. Fix this by exiting via the free_hwmon exit path that performs the necessary resource cleanup. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net>
* \| \| \|	Merge branch 'for-linus' of ↵	Linus Torvalds	2016-11-07	6	-36/+95
\|\ \ \ \ \| \|_\|/ / \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid Pull HID fixes from Jiri Kosina: - modprobe-after-rmmod load failure bugfix for intel-ish, from Even Xu - IRQ probing bugfix for intel-ish, from Srinivas Pandruvada - attribute parsing fix in hid-sensor, from Ooi, Joyce - other small misc fixes / quirky device additions * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid: HID: sensor: fix attributes in HID sensor interface HID: intel-ish-hid: request_irq failure HID: intel-ish-hid: Fix driver reinit failure HID: intel-ish-hid: Move DMA disable code to new function HID: intel-ish-hid: consolidate ish wake up operation HID: usbhid: add ATEN CS962 to list of quirky devices HID: intel-ish-hid: Fix !CONFIG_PM build warning HID: sensor-hub: Fix packing of result buffer for feature report
\| * \| \|	HID: sensor: fix attributes in HID sensor interface	Ooi, Joyce	2016-11-05	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	User is unable to access to input-X-yyy and feature-X-yyy where X is a hex value and more than 9 (e.g. input-a-yyy, feature-b-yyy) in HID sensor custom sysfs interface. This is because when creating the attribute, the attribute index is written to using %x (hex). However, when reading and writing values into the attribute, the attribute index is scanned using %d (decimal). Hence, user is unable to access to attributes with index in hex values (e.g. 'a', 'b', 'c') but able to access to attributes with index in decimal values (e.g. 1, 2, 3,..). This fix will change input-%d-%x-%s and feature-%d-%x-%s to input-%x-%x-%s and feature-%x-%x-%s in show_values() and store_values() accordingly. Signed-off-by: Ooi, Joyce <joyce.ooi@intel.com> Reviewed-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
\| * \| \|	HID: intel-ish-hid: request_irq failure	Srinivas Pandruvada	2016-11-05	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On some platforms ISH interrupt is shared, which causes request_irq to fail. This requires IRQF_SHARED irq flag. But IRQF_NO_SUSPEND and IRQF_SHARED should not be used together, so removed IRQF_NO_SUSPEND flag. Anyway this driver doesn't require IRQF_NO_SUSPEND, as this interrupt is not required during "noirq" phases of suspending and resuming devices as well as during the time when nonboot CPUs are taken offline and brought back online. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
\| * \| \|	HID: intel-ish-hid: Fix driver reinit failure	Even Xu	2016-11-05	1	-0/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When built as a module, modprobe followed by rmmod can fail because DMA was still active. So to fix this, DMA needs to be disabled during module exit. This change disables DMA during modules exit and change the ISH PCI device status to D3. Signed-off-by: Even Xu <even.xu@intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
\| * \| \|	HID: intel-ish-hid: Move DMA disable code to new function	Even Xu	2016-11-05	1	-10/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add a new function ish_disable_dma() and move DMA disable operations here, so that this functionality can be reused. Signed-off-by: Even Xu <even.xu@intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
\| * \| \|	HID: intel-ish-hid: consolidate ish wake up operation	Even Xu	2016-11-05	1	-19/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Same operations are done in ish_hw_start() and _ish_hw_reset() to wakeup ISH device. Consolidate them by introducing a new function ish_wakeup() and move the code there. Signed-off-by: Even Xu <even.xu@intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
\| * \| \|	HID: usbhid: add ATEN CS962 to list of quirky devices	Oliver Neukum	2016-11-03	2	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Like many similar devices it needs a quirk to work. Issuing the request gets the device into an irrecoverable state. Signed-off-by: Oliver Neukum <oneukum@suse.com> CC: stable@vger.kernel.org Signed-off-by: Jiri Kosina <jkosina@suse.cz>
\| * \| \|	HID: intel-ish-hid: Fix !CONFIG_PM build warning	Borislav Petkov	2016-11-03	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix drivers/hid/intel-ish-hid/ipc/pci-ish.c:247:12: warning: ‘ish_suspend’ defined but not used [-Wunused-function] static int ish_suspend(struct device device) ^ drivers/hid/intel-ish-hid/ipc/pci-ish.c:282:12: warning: ‘ish_resume’ defined but not used [-Wunused-function] static int ish_resume(struct device device) ^ by sticking them in the CONFIG_PM range too. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Cc: Jiri Kosina <jikos@kernel.org> Cc: Benjamin Tissoires <benjamin.tissoires@redhat.com> Cc: Wei Yongjun <weiyongjun1@huawei.com> Cc: linux-input@vger.kernel.org Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
\| * \| \|	HID: sensor-hub: Fix packing of result buffer for feature report	Srinivas Pandruvada	2016-11-03	1	-1/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When report count is more than one and report size is not 4 bytes, then we need some packing into result buffer from the caller of function sensor_hub_get_feature. By default the value extracted from a field is 4 bytes from hid core (using hid_hw_request(hsdev->hdev, report, HID_REQ_GET_REPORT)), even if report size if less than 4 byte. So when we copy data to user buffer in sensor_hub_get_feature, we need to only copy report size bytes even when report count is more than 1. This is not an issue for most of the sensor hub fields as report count will be 1 where we already copy only report size bytes, but some string fields like description, it is a problem as the report count will be more than 1. For example: Field(6) Physical(Sensor.OtherCustom) Application(Sensor.Sensor) Usage(11) Sensor.0306 Sensor.0306 Sensor.0306 Sensor.0306 Sensor.0306 Sensor.0306 Sensor.0306 Sensor.0306 Sensor.0306 Sensor.0306 Sensor.0306 Report Size(16) Report Count(11) Here since the report size is 2 bytes, we will have 2 additional bytes of 0s copied into user buffer, if we directly copy to user buffer from report->field[]->value This change will copy report size bytes into the buffer of caller for each usage report->field[]->value. So for example without this change, the data displayed for a custom sensor field "sensor-model": 76 00 101 00 110 00 111 00 118 00 111 (truncated to report count of 11) With change 76 101 110 111 118 111 32 89 111 103 97 ("Lenovo Yoga" in ASCII ) Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
* \| \| \|	Linux 4.9-rc4v4.9-rc4	Linus Torvalds	2016-11-05	1	-1/+1
\| \| \| \|
* \| \| \|	Merge branch 'i2c/for-current' of ↵	Linus Torvalds	2016-11-05	1	-1/+1
\|\ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c fix from Wolfram Sang: "A bugfix for the I2C core fixing a (rare) race condition" * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: core: fix NULL pointer dereference under race condition