aboutsummaryrefslogtreecommitdiffstats
path: root/fs
Commit message (Collapse)AuthorAgeFilesLines
* Merge tag 'exfat-for-5.8-rc1' of ↵Linus Torvalds2020-06-0912-469/+423
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat Pull exfat update from Namjae Jeon: "Bug fixes: - Fix memory leak on mount failure with iocharset= option - Fix incorrect update of stream entry - Fix cluster range validation error Clean-ups: - Remove unused code and unneeded assignment - Rename variables in exfat structure as specification - Reorganize boot sector analysis code - Simplify exfat_utf8_d_hash and exfat_utf8_d_cmp() - Optimize exfat entry cache functions - Improve wording of EXFAT_DEFAULT_IOCHARSET config option New Feature: - Add boot region verification" * tag 'exfat-for-5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat: exfat: Fix potential use after free in exfat_load_upcase_table() exfat: fix range validation error in alloc and free cluster exfat: fix incorrect update of stream entry in __exfat_truncate() exfat: fix memory leak in exfat_parse_param() exfat: remove unnecessary reassignment of p_uniname->name_len exfat: standardize checksum calculation exfat: add boot region verification exfat: separate the boot sector analysis exfat: redefine PBR as boot_sector exfat: optimize dir-cache exfat: replace 'time_ms' with 'time_cs' exfat: remove the assignment of 0 to bool variable exfat: Remove unused functions exfat_high_surrogate() and exfat_low_surrogate() exfat: Simplify exfat_utf8_d_hash() for code points above U+FFFF exfat: Improve wording of EXFAT_DEFAULT_IOCHARSET config option exfat: Use a more common logging style exfat: Simplify exfat_utf8_d_cmp() for code points above U+FFFF
| * exfat: Fix potential use after free in exfat_load_upcase_table()Dan Carpenter2020-06-091-1/+1
| | | | | | | | | | | | | | | | | | | | This code calls brelse(bh) and then dereferences "bh" on the next line resulting in a possible use after free. The brelse() should just be moved down a line. Fixes: b676fdbcf4c8 ("exfat: standardize checksum calculation") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
| * exfat: fix range validation error in alloc and free clusterhyeongseok.kim2020-06-091-2/+2
| | | | | | | | | | | | | | | | | | | | There is check error in range condition that can never be entered even with invalid input. Replace incorrent checking code with already existing valid checker. Signed-off-by: hyeongseok.kim <hyeongseok@gmail.com> Acked-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
| * exfat: fix incorrect update of stream entry in __exfat_truncate()Namjae Jeon2020-06-091-4/+4
| | | | | | | | | | | | | | | | | | | | | | At truncate, there is a problem of incorrect updating in the file entry pointer instead of stream entry. This will cause the problem of overwriting the time field of the file entry to new_size. Fix it to update stream entry. Fixes: 98d917047e8b ("exfat: add file operations") Cc: stable@vger.kernel.org # v5.7 Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
| * exfat: fix memory leak in exfat_parse_param()Al Viro2020-06-091-4/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | butt3rflyh4ck reported memory leak found by syzkaller. A param->string held by exfat_mount_options. BUG: memory leak unreferenced object 0xffff88801972e090 (size 8): comm "syz-executor.2", pid 16298, jiffies 4295172466 (age 14.060s) hex dump (first 8 bytes): 6b 6f 69 38 2d 75 00 00 koi8-u.. backtrace: [<000000005bfe35d6>] kstrdup+0x36/0x70 mm/util.c:60 [<0000000018ed3277>] exfat_parse_param+0x160/0x5e0 fs/exfat/super.c:276 [<000000007680462b>] vfs_parse_fs_param+0x2b4/0x610 fs/fs_context.c:147 [<0000000097c027f2>] vfs_parse_fs_string+0xe6/0x150 fs/fs_context.c:191 [<00000000371bf78f>] generic_parse_monolithic+0x16f/0x1f0 fs/fs_context.c:231 [<000000005ce5eb1b>] do_new_mount fs/namespace.c:2812 [inline] [<000000005ce5eb1b>] do_mount+0x12bb/0x1b30 fs/namespace.c:3141 [<00000000b642040c>] __do_sys_mount fs/namespace.c:3350 [inline] [<00000000b642040c>] __se_sys_mount fs/namespace.c:3327 [inline] [<00000000b642040c>] __x64_sys_mount+0x18f/0x230 fs/namespace.c:3327 [<000000003b024e98>] do_syscall_64+0xf6/0x7d0 arch/x86/entry/common.c:295 [<00000000ce2b698c>] entry_SYSCALL_64_after_hwframe+0x49/0xb3 exfat_free() should call exfat_free_iocharset(), to prevent a leak in case we fail after parsing iocharset= but before calling get_tree_bdev(). Additionally, there's no point copying param->string in exfat_parse_param() - just steal it, leaving NULL in param->string. That's independent from the leak or fix thereof - it's simply avoiding an extra copy. Fixes: 719c1e182916 ("exfat: add super block operations") Cc: stable@vger.kernel.org # v5.7 Reported-by: butt3rflyh4ck <butterflyhuangxx@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
| * exfat: remove unnecessary reassignment of p_uniname->name_lenNamjae Jeon2020-06-091-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | kbuild test robot reported : fs/exfat/nls.c:531:22: warning: Variable 'p_uniname->name_len' is reassigned a value before the old one has been used. The reassignment of p_uniname->name_len is not needed and remove it. Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
| * exfat: standardize checksum calculationTetsuhiro Kohada2020-06-094-27/+19
| | | | | | | | | | | | | | | | | | | | | | To clarify that it is a 16-bit checksum, the parts related to the 16-bit checksum are renamed and change type to u16. Furthermore, replace checksum calculation in exfat_load_upcase_table() with exfat_calc_checksum32(). Signed-off-by: Tetsuhiro Kohada <kohada.t2@gmail.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
| * exfat: add boot region verificationTetsuhiro Kohada2020-06-093-0/+65
| | | | | | | | | | | | | | | | | | | | Add Boot-Regions verification specified in exFAT specification. Note that the checksum type is strongly related to the raw structure, so the'u32 'type is used to clarify the number of bits. Signed-off-by: Tetsuhiro Kohada <kohada.t2@gmail.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
| * exfat: separate the boot sector analysisTetsuhiro Kohada2020-06-092-43/+56
| | | | | | | | | | | | | | | | | | | | | | Separate the boot sector analysis to read_boot_sector(). And add a check for the fs_name field. Furthermore, add a strict consistency check, because overlapping areas can cause serious corruption. Signed-off-by: Tetsuhiro Kohada <kohada.t2@gmail.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
| * exfat: redefine PBR as boot_sectorTetsuhiro Kohada2020-06-093-93/+72
| | | | | | | | | | | | | | | | | | | | Aggregate PBR related definitions and redefine as "boot_sector" to comply with the exFAT specification. And, rename variable names including 'pbr'. Signed-off-by: Tetsuhiro Kohada <kohada.t2@gmail.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
| * exfat: optimize dir-cacheTetsuhiro Kohada2020-06-095-182/+124
| | | | | | | | | | | | | | | | | | | | | | | | | | Optimize directory access based on exfat_entry_set_cache. - Hold bh instead of copied d-entry. - Modify bh->data directly instead of the copied d-entry. - Write back the retained bh instead of rescanning the d-entry-set. And - Remove unused cache related definitions. Signed-off-by: Tetsuhiro Kohada <kohada.tetsuhiro@dc.mitsubishielectric.co.jp> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
| * exfat: replace 'time_ms' with 'time_cs'Tetsuhiro Kohada2020-06-097-22/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | Replace time_ms with time_cs in the file directory entry structure and related functions. The unit of create_time_ms/modify_time_ms in File Directory Entry are not 'milli-second', but 'centi-second'. The exfat specification uses the term '10ms', but instead use 'cs' as in msdos_fs.h. Signed-off-by: Tetsuhiro Kohada <kohada.t2@gmail.com> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
| * exfat: remove the assignment of 0 to bool variableJason Yan2020-06-091-1/+1
| | | | | | | | | | | | | | | | | | | | There is no need to init 'sync' in exfat_set_vol_flags(). This also fixes the following coccicheck warning: fs/exfat/super.c:104:6-10: WARNING: Assignment of 0/1 to bool variable Signed-off-by: Jason Yan <yanaijie@huawei.com> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
| * exfat: Remove unused functions exfat_high_surrogate() and exfat_low_surrogate()Pali Rohár2020-06-092-15/+0
| | | | | | | | | | | | | | After applying previous two patches, these functions are not used anymore. Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
| * exfat: Simplify exfat_utf8_d_hash() for code points above U+FFFFPali Rohár2020-06-091-8/+2
| | | | | | | | | | | | | | | | | | Function partial_name_hash() takes long type value into which can be stored one Unicode code point. Therefore conversion from UTF-32 to UTF-16 is not needed. Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
| * exfat: Improve wording of EXFAT_DEFAULT_IOCHARSET config optionGeert Uytterhoeven2020-06-091-3/+4
| | | | | | | | | | | | | | | | | | - Use consistent capitalization for "exFAT". - Fix grammar, - Split long sentence. Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
| * exfat: Use a more common logging styleJoe Perches2020-06-098-74/+60
| | | | | | | | | | | | | | | | | | | | | | | | | | Remove the direct use of KERN_<LEVEL> in functions by creating separate exfat_<level> macros. Miscellanea: o Remove several unnecessary terminating newlines in formats o Realign arguments and fit to 80 columns where appropriate Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
| * exfat: Simplify exfat_utf8_d_cmp() for code points above U+FFFFPali Rohár2020-06-091-7/+2
| | | | | | | | | | | | | | | | | | If two Unicode code points represented in UTF-16 are different then also their UTF-32 representation must be different. Therefore conversion from UTF-32 to UTF-16 is not needed. Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
* | Merge tag 'linux-kselftest-kunit-5.8-rc1' of ↵Linus Torvalds2020-06-091-1/+2
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull Kunit updates from Shuah Khan: "This consists of: - Several config fragment fixes from Anders Roxell to improve test coverage. - Improvements to kunit run script to use defconfig as default and restructure the code for config/build/exec/parse from Vitor Massaru Iha and David Gow. - Miscellaneous documentation warn fix" * tag 'linux-kselftest-kunit-5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: security: apparmor: default KUNIT_* fragments to KUNIT_ALL_TESTS fs: ext4: default KUNIT_* fragments to KUNIT_ALL_TESTS drivers: base: default KUNIT_* fragments to KUNIT_ALL_TESTS lib: Kconfig.debug: default KUNIT_* fragments to KUNIT_ALL_TESTS kunit: default KUNIT_* fragments to KUNIT_ALL_TESTS kunit: Kconfig: enable a KUNIT_ALL_TESTS fragment kunit: Fix TabError, remove defconfig code and handle when there is no kunitconfig kunit: use KUnit defconfig by default kunit: use --build_dir=.kunit as default Documentation: test.h - fix warnings kunit: kunit_tool: Separate out config/build/exec/parse
| * | fs: ext4: default KUNIT_* fragments to KUNIT_ALL_TESTSAnders Roxell2020-06-011-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This makes it easier to enable all KUnit fragments. Adding 'if !KUNIT_ALL_TESTS' so individual tests can not be turned off. Therefore if KUNIT_ALL_TESTS is enabled that will hide the prompt in menuconfig. Reviewed-by: David Gow <davidgow@google.com> Signed-off-by: Anders Roxell <anders.roxell@linaro.org> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
* | | mmap locking API: convert mmap_sem commentsMichel Lespinasse2020-06-0911-34/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Convert comments that reference mmap_sem to reference mmap_lock instead. [akpm@linux-foundation.org: fix up linux-next leftovers] [akpm@linux-foundation.org: s/lockaphore/lock/, per Vlastimil] [akpm@linux-foundation.org: more linux-next fixups, per Michel] Signed-off-by: Michel Lespinasse <walken@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Laurent Dufour <ldufour@linux.ibm.com> Cc: Liam Howlett <Liam.Howlett@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ying Han <yinghan@google.com> Link: http://lkml.kernel.org/r/20200520052908.204642-13-walken@google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | mmap locking API: convert mmap_sem API commentsMichel Lespinasse2020-06-092-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Convert comments that reference old mmap_sem APIs to reference corresponding new mmap locking APIs instead. Signed-off-by: Michel Lespinasse <walken@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Davidlohr Bueso <dbueso@suse.de> Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: David Rientjes <rientjes@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Laurent Dufour <ldufour@linux.ibm.com> Cc: Liam Howlett <Liam.Howlett@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ying Han <yinghan@google.com> Link: http://lkml.kernel.org/r/20200520052908.204642-12-walken@google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | mmap locking API: add mmap_assert_locked() and mmap_assert_write_locked()Michel Lespinasse2020-06-091-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add new APIs to assert that mmap_sem is held. Using this instead of rwsem_is_locked and lockdep_assert_held[_write] makes the assertions more tolerant of future changes to the lock type. Signed-off-by: Michel Lespinasse <walken@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Laurent Dufour <ldufour@linux.ibm.com> Cc: Liam Howlett <Liam.Howlett@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ying Han <yinghan@google.com> Link: http://lkml.kernel.org/r/20200520052908.204642-10-walken@google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | mmap locking API: convert mmap_sem call sites missed by coccinelleMichel Lespinasse2020-06-091-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Convert the last few remaining mmap_sem rwsem calls to use the new mmap locking API. These were missed by coccinelle for some reason (I think coccinelle does not support some of the preprocessor constructs in these files ?) [akpm@linux-foundation.org: convert linux-next leftovers] [akpm@linux-foundation.org: more linux-next leftovers] [akpm@linux-foundation.org: more linux-next leftovers] Signed-off-by: Michel Lespinasse <walken@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com> Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Liam Howlett <Liam.Howlett@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ying Han <yinghan@google.com> Link: http://lkml.kernel.org/r/20200520052908.204642-6-walken@google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | mmap locking API: use coccinelle to convert mmap_sem rwsem call sitesMichel Lespinasse2020-06-098-53/+53
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This change converts the existing mmap_sem rwsem calls to use the new mmap locking API instead. The change is generated using coccinelle with the following rule: // spatch --sp-file mmap_lock_api.cocci --in-place --include-headers --dir . @@ expression mm; @@ ( -init_rwsem +mmap_init_lock | -down_write +mmap_write_lock | -down_write_killable +mmap_write_lock_killable | -down_write_trylock +mmap_write_trylock | -up_write +mmap_write_unlock | -downgrade_write +mmap_write_downgrade | -down_read +mmap_read_lock | -down_read_killable +mmap_read_lock_killable | -down_read_trylock +mmap_read_trylock | -up_read +mmap_read_unlock ) -(&mm->mmap_sem) +(mm) Signed-off-by: Michel Lespinasse <walken@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com> Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Liam Howlett <Liam.Howlett@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ying Han <yinghan@google.com> Link: http://lkml.kernel.org/r/20200520052908.204642-5-walken@google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | mm: don't include asm/pgtable.h if linux/mm.h is already includedMike Rapoport2020-06-094-4/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Patch series "mm: consolidate definitions of page table accessors", v2. The low level page table accessors (pXY_index(), pXY_offset()) are duplicated across all architectures and sometimes more than once. For instance, we have 31 definition of pgd_offset() for 25 supported architectures. Most of these definitions are actually identical and typically it boils down to, e.g. static inline unsigned long pmd_index(unsigned long address) { return (address >> PMD_SHIFT) & (PTRS_PER_PMD - 1); } static inline pmd_t *pmd_offset(pud_t *pud, unsigned long address) { return (pmd_t *)pud_page_vaddr(*pud) + pmd_index(address); } These definitions can be shared among 90% of the arches provided XYZ_SHIFT, PTRS_PER_XYZ and xyz_page_vaddr() are defined. For architectures that really need a custom version there is always possibility to override the generic version with the usual ifdefs magic. These patches introduce include/linux/pgtable.h that replaces include/asm-generic/pgtable.h and add the definitions of the page table accessors to the new header. This patch (of 12): The linux/mm.h header includes <asm/pgtable.h> to allow inlining of the functions involving page table manipulations, e.g. pte_alloc() and pmd_alloc(). So, there is no point to explicitly include <asm/pgtable.h> in the files that include <linux/mm.h>. The include statements in such cases are remove with a simple loop: for f in $(git grep -l "include <linux/mm.h>") ; do sed -i -e '/include <asm\/pgtable.h>/ d' $f done Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Cain <bcain@codeaurora.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chris Zankel <chris@zankel.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greentime Hu <green.hu@gmail.com> Cc: Greg Ungerer <gerg@linux-m68k.org> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: Guo Ren <guoren@kernel.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ley Foon Tan <ley.foon.tan@intel.com> Cc: Mark Salter <msalter@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nick Hu <nickhu@andestech.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Richard Weinberger <richard@nod.at> Cc: Rich Felker <dalias@libc.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Vincent Chen <deanbo422@gmail.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Link: http://lkml.kernel.org/r/20200514170327.31389-1-rppt@kernel.org Link: http://lkml.kernel.org/r/20200514170327.31389-2-rppt@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | Merge tag 'ceph-for-5.8-rc1' of git://github.com/ceph/ceph-clientLinus Torvalds2020-06-0816-184/+807
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull ceph updates from Ilya Dryomov: "The highlights are: - OSD/MDS latency and caps cache metrics infrastructure for the filesytem (Xiubo Li). Currently available through debugfs and will be periodically sent to the MDS in the future. - support for replica reads (balanced and localized reads) for rbd and the filesystem (myself). The default remains to always read from primary, users can opt-in with the new crush_location and read_from_replica options. Note that reading from replica is safe for general use only since Octopus. - support for RADOS allocation hint flags (myself). Currently used by rbd to propagate the compressible/incompressible hint given with the new compression_hint map option and ready for passing on more advanced hints, e.g. based on fadvise() from the filesystem. - support for efficient cross-quota-realm renames (Luis Henriques) - assorted cap handling improvements and cleanups, particularly untangling some of the locking (Jeff Layton)" * tag 'ceph-for-5.8-rc1' of git://github.com/ceph/ceph-client: (29 commits) rbd: compression_hint option libceph: support for alloc hint flags libceph: read_from_replica option libceph: support for balanced and localized reads libceph: crush_location infrastructure libceph: decode CRUSH device/bucket types and names libceph: add non-asserting rbtree insertion helper ceph: skip checking caps when session reconnecting and releasing reqs ceph: make sure mdsc->mutex is nested in s->s_mutex to fix dead lock ceph: don't return -ESTALE if there's still an open file libceph, rbd: replace zero-length array with flexible-array ceph: allow rename operation under different quota realms ceph: normalize 'delta' parameter usage in check_quota_exceeded ceph: ceph_kick_flushing_caps needs the s_mutex ceph: request expedited service on session's last cap flush ceph: convert mdsc->cap_dirty to a per-session list ceph: reset i_requested_max_size if file write is not wanted ceph: throw a warning if we destroy session with mutex still locked ceph: fix potential race in ceph_check_caps ceph: document what protects i_dirty_item and i_flushing_item ...
| * | | ceph: skip checking caps when session reconnecting and releasing reqsXiubo Li2020-06-014-4/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It make no sense to check the caps when reconnecting to mds. And for the async dirop caps, they will be put by its _cb() function, so when releasing the requests, it will make no sense too. URL: https://tracker.ceph.com/issues/45635 Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * | | ceph: make sure mdsc->mutex is nested in s->s_mutex to fix dead lockXiubo Li2020-06-011-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | send_mds_reconnect takes the s_mutex while the mdsc->mutex is already held. That inverts the locking order documented in mds_client.h. Drop the mdsc->mutex, acquire the s_mutex and then reacquire the mdsc->mutex to prevent a deadlock. URL: https://tracker.ceph.com/issues/45609 Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * | | ceph: don't return -ESTALE if there's still an open fileLuis Henriques2020-06-011-1/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Similarly to commit 03f219041fdb ("ceph: check i_nlink while converting a file handle to dentry"), this fixes another corner case with name_to_handle_at/open_by_handle_at. The issue has been detected by xfstest generic/467, when doing: - name_to_handle_at("/cephfs/myfile") - open("/cephfs/myfile") - unlink("/cephfs/myfile") - sync; sync; - drop caches - open_by_handle_at() The call to open_by_handle_at should not fail because the file hasn't been deleted yet (only unlinked) and we do have a valid handle to it. -ESTALE shall be returned only if i_nlink is 0 *and* i_count is 1. This patch also makes sure we have LINK caps before checking i_nlink. Signed-off-by: Luis Henriques <lhenriques@suse.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Acked-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * | | ceph: allow rename operation under different quota realmsLuis Henriques2020-06-013-6/+64
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Returning -EXDEV when trying to 'mv' files/directories from different quota realms results in copy+unlink operations instead of the faster CEPH_MDS_OP_RENAME. This will occur even when there aren't any quotas set in the destination directory, or if there's enough space left for the new file(s). This patch adds a new helper function to be called on rename operations which will allow these operations if they can be executed. This patch mimics userland fuse client commit b8954e5734b3 ("client: optimize rename operation under different quota root"). Since ceph_quota_is_same_realm() is now called only from this new helper, make it static. URL: https://tracker.ceph.com/issues/44791 Signed-off-by: Luis Henriques <lhenriques@suse.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * | | ceph: normalize 'delta' parameter usage in check_quota_exceededLuis Henriques2020-06-011-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Function check_quota_exceeded() uses delta parameter only for the QUOTA_CHECK_MAX_BYTES_OP operation. Using this parameter also for MAX_FILES will makes the code cleaner and will be required to support cross-quota-tree renames. Signed-off-by: Luis Henriques <lhenriques@suse.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * | | ceph: ceph_kick_flushing_caps needs the s_mutexJeff Layton2020-06-014-3/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The mdsc->cap_dirty_lock is not held while walking the list in ceph_kick_flushing_caps, which is not safe. ceph_early_kick_flushing_caps does something similar, but the s_mutex is held while it's called and I think that guards against changes to the list. Ensure we hold the s_mutex when calling ceph_kick_flushing_caps, and add some clarifying comments. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * | | ceph: request expedited service on session's last cap flushJeff Layton2020-06-011-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When flushing a lot of caps to the MDS's at once (e.g. for syncfs), we can end up waiting a substantial amount of time for MDS replies, due to the fact that it may delay some of them so that it can batch them up together in a single journal transaction. This can lead to stalls when calling sync or syncfs. What we'd really like to do is request expedited service on the _last_ cap we're flushing back to the server. If the CHECK_CAPS_FLUSH flag is set on the request and the current inode was the last one on the session->s_cap_dirty list, then mark the request with CEPH_CLIENT_CAPS_SYNC. Note that this heuristic is not perfect. New inodes can race onto the list after we've started flushing, but it does seem to fix some common use cases. URL: https://tracker.ceph.com/issues/44744 Reported-by: Jan Fajerski <jfajerski@suse.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * | | ceph: convert mdsc->cap_dirty to a per-session listJeff Layton2020-06-014-19/+85
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a per-sb list now, but that makes it difficult to tell when the cap is the last dirty one associated with the session. Switch this to be a per-session list, but continue using the mdsc->cap_dirty_lock to protect the lists. This list is only ever walked in ceph_flush_dirty_caps, so change that to walk the sessions array and then flush the caps for inodes on each session's list. If the auth cap ever changes while the inode has dirty caps, then move the inode to the appropriate session for the new auth_cap. Also, ensure that we never remove an auth cap while the inode is still on the s_cap_dirty list. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * | | ceph: reset i_requested_max_size if file write is not wantedYan, Zheng2020-06-011-10/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | write can stuck at waiting for larger max_size in following sequence of events: - client opens a file and writes to position 'A' (larger than unit of max size increment) - client closes the file handle and updates wanted caps (not wanting file write caps) - client opens and truncates the file, writes to position 'A' again. At the 1st event, client set inode's requested_max_size to 'A'. At the 2nd event, mds removes client's writable range, but client does not reset requested_max_size. At the 3rd event, client does not request max size because requested_max_size is already larger than 'A'. Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * | | ceph: throw a warning if we destroy session with mutex still lockedJeff Layton2020-06-011-0/+1
| | | | | | | | | | | | | | | | | | | | Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * | | ceph: fix potential race in ceph_check_capsJeff Layton2020-06-011-1/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Nothing ensures that session will still be valid by the time we dereference the pointer. Take and put a reference. In principle, we should always be able to get a reference here, but throw a warning if that's ever not the case. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * | | ceph: document what protects i_dirty_item and i_flushing_itemJeff Layton2020-06-011-1/+3
| | | | | | | | | | | | | | | | | | | | Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * | | ceph: don't take i_ceph_lock in handle_cap_importJeff Layton2020-06-011-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Just take it before calling it. This means we have to do a couple of minor in-memory operations under the spinlock now, but those shouldn't be an issue. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * | | ceph: don't release i_ceph_lock in handle_cap_truncJeff Layton2020-06-011-8/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There's no reason to do this here. Just have the caller handle it. Also, add a lockdep assertion. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * | | ceph: add comments for handle_cap_flush_ack logicJeff Layton2020-06-011-1/+13
| | | | | | | | | | | | | | | | | | | | Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * | | ceph: split up __finish_cap_flushJeff Layton2020-06-011-31/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This function takes a mdsc argument or ci argument, but if both are passed in, it ignores the ci arg. Fortunately, nothing does that, but there's no good reason to have the same function handle both cases. Also, get rid of some branches and just use |= to set the wake_* vals. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * | | ceph: reorganize __send_cap for less spinlock abuseJeff Layton2020-06-011-79/+86
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Get rid of the __releases annotation by breaking it up into two functions: __prep_cap which is done under the spinlock and __send_cap that is done outside it. Add new fields to cap_msg_args for the wake boolean and old_xattr_buf pointer. Nothing checks the return value from __send_cap, so make it void return. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * | | ceph: add metadata perf metric supportXiubo Li2020-06-015-0/+53
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a new "r_ended" field to struct ceph_mds_request and use that to maintain the average latency of MDS requests. URL: https://tracker.ceph.com/issues/43215 Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * | | ceph: add read/write latency metric supportXiubo Li2020-06-015-1/+186
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Calculate the latency for OSD read requests. Add a new r_end_stamp field to struct ceph_osd_request that will hold the time of that the reply was received. Use that to calculate the RTT for each call, and divide the sum of those by number of calls to get averate RTT. Keep a tally of RTT for OSD writes and number of calls to track average latency of OSD writes. URL: https://tracker.ceph.com/issues/43215 Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * | | ceph: add caps perf metric for each superblockXiubo Li2020-06-019-14/+83
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Count hits and misses in the caps cache. If the client has all of the necessary caps when a task needs references, then it's counted as a hit. Any other situation is a miss. URL: https://tracker.ceph.com/issues/43215 Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * | | ceph: add dentry lease metric supportXiubo Li2020-06-018-7/+113
| | |/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For dentry leases, only count the hit/miss info triggered from the vfs calls. For the cases like request reply handling and ceph_trim_dentries, ignore them. For now, these are only viewable using debugfs. Future patches will allow the client to send the stats to the MDS. The output looks like: item total miss hit ------------------------------------------------- d_lease 11 7 141 URL: https://tracker.ceph.com/issues/43215 Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* | | Merge tag 'gfs2-for-5.8' of ↵Linus Torvalds2020-06-0816-76/+396
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 updates from Andreas Gruenbacher: - An iopen glock locking scheme rework that speeds up deletes of inodes accessed from multiple nodes - Various bug fixes and debugging improvements - Convert gfs2-glocks.txt to ReST * tag 'gfs2-for-5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: gfs2: fix use-after-free on transaction ail lists gfs2: new slab for transactions gfs2: initialize transaction tr_ailX_lists earlier gfs2: Smarter iopen glock waiting gfs2: Wake up when setting GLF_DEMOTE gfs2: Check inode generation number in delete_work_func gfs2: Move inode generation number check into gfs2_inode_lookup gfs2: Minor gfs2_lookup_by_inum cleanup gfs2: Try harder to delete inodes locally gfs2: Give up the iopen glock on contention gfs2: Turn gl_delete into a delayed work gfs2: Keep track of deleted inode generations in LVBs gfs2: Allow ASPACE glocks to also have an lvb gfs2: instrumentation wrt log_flush stuck gfs2: introduce new gfs2_glock_assert_withdraw gfs2: print mapping->nrpages in glock dump for address space glocks gfs2: Only do glock put in gfs2_create_inode for free inodes gfs2: Allow lock_nolock mount to specify jid=X gfs2: Don't ignore inode write errors during inode_go_sync docs: filesystems: convert gfs2-glocks.txt to ReST
| * \ \ Merge branch 'gfs2-iopen' into for-nextAndreas Gruenbacher2020-06-059-38/+289
| |\ \ \