linux - linux

	Commit message (Collapse)	Author	Age	Files	Lines
*	Merge tag 'libnvdimm-for-5.17' of ↵	Linus Torvalds	2022-01-12	1	-2/+1
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull dax and libnvdimm updates from Dan Williams: "The bulk of this is a rework of the dax_operations API after discovering the obstacles it posed to the work-in-progress DAX+reflink support for XFS and other copy-on-write filesystem mechanics. Primarily the need to plumb a block_device through the API to handle partition offsets was a sticking point and Christoph untangled that dependency in addition to other cleanups to make landing the DAX+reflink support easier. The DAX_PMEM_COMPAT option has been around for 4 years and not only are distributions shipping userspace that understand the current configuration API, but some are not even bothering to turn this option on anymore, so it seems a good time to remove it per the deprecation schedule. Recall that this was added after the device-dax subsystem moved from /sys/class/dax to /sys/bus/dax for its sysfs organization. All recent functionality depends on /sys/bus/dax. Some other miscellaneous cleanups and reflink prep patches are included as well. Summary: - Simplify the dax_operations API: - Eliminate bdev_dax_pgoff() in favor of the filesystem maintaining and applying a partition offset to all its DAX iomap operations. - Remove wrappers and device-mapper stacked callbacks for ->copy_from_iter() and ->copy_to_iter() in favor of moving block_device relative offset responsibility to the dax_direct_access() caller. - Remove the need for an @bdev in filesystem-DAX infrastructure - Remove unused uio helpers copy_from_iter_flushcache() and copy_mc_to_iter() as only the non-check_copy_size() versions are used for DAX. - Prepare XFS for the pending (next merge window) DAX+reflink support - Remove deprecated DEV_DAX_PMEM_COMPAT support - Cleanup a straggling misuse of the GUID api" * tag 'libnvdimm-for-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (38 commits) iomap: Fix error handling in iomap_zero_iter() ACPI: NFIT: Import GUID before use dax: remove the copy_from_iter and copy_to_iter methods dax: remove the DAXDEV_F_SYNC flag dax: simplify dax_synchronous and set_dax_synchronous uio: remove copy_from_iter_flushcache() and copy_mc_to_iter() iomap: turn the byte variable in iomap_zero_iter into a ssize_t memremap: remove support for external pgmap refcounts fsdax: don't require CONFIG_BLOCK iomap: build the block based code conditionally dax: fix up some of the block device related ifdefs fsdax: shift partition offset handling into the file systems dax: return the partition offset from fs_dax_get_by_bdev iomap: add a IOMAP_DAX flag xfs: pass the mapping flags to xfs_bmbt_to_iomap xfs: use xfs_direct_write_iomap_ops for DAX zeroing xfs: move dax device handling into xfs_{alloc,free}_buftarg ext4: cleanup the dax handling in ext4_fill_super ext2: cleanup the dax handling in ext2_fill_super fsdax: decouple zeroing from the iomap buffered I/O code ...
\| *	xfs: add xfs_zero_range and xfs_truncate_page helpers	Shiyang Ruan	2021-12-04	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add helpers to prepare for using different DAX operations. Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com> [hch: split from a larger patch + slight cleanups] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/20211129102203.2243509-16-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* \|	xfs: only run COW extent recovery when there are no live extents	Darrick J. Wong	2021-12-21	1	-1/+4
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As part of multiple customer escalations due to file data corruption after copy on write operations, I wrote some fstests that use fsstress to hammer on COW to shake things loose. Regrettably, I caught some filesystem shutdowns due to incorrect rmap operations with the following loop: mount <filesystem> # (0) fsstress <run only readonly ops> & # (1) while true; do fsstress <run all ops> mount -o remount,ro # (2) fsstress <run only readonly ops> mount -o remount,rw # (3) done When (2) happens, notice that (1) is still running. xfs_remount_ro will call xfs_blockgc_stop to walk the inode cache to free all the COW extents, but the blockgc mechanism races with (1)'s reader threads to take IOLOCKs and loses, which means that it doesn't clean them all out. Call such a file (A). When (3) happens, xfs_remount_rw calls xfs_reflink_recover_cow, which walks the ondisk refcount btree and frees any COW extent that it finds. This function does not check the inode cache, which means that incore COW forks of inode (A) is now inconsistent with the ondisk metadata. If one of those former COW extents are allocated and mapped into another file (B) and someone triggers a COW to the stale reservation in (A), A's dirty data will be written into (B) and once that's done, those blocks will be transferred to (A)'s data fork without bumping the refcount. The results are catastrophic -- file (B) and the refcount btree are now corrupt. In the first patch, we fixed the race condition in (2) so that (A) will always flush the COW fork. In this second patch, we move the _recover_cow call to the initial mount call in (0) for safety. As mentioned previously, xfs_reflink_recover_cow walks the refcount btree looking for COW staging extents, and frees them. This was intended to be run at mount time (when we know there are no live inodes) to clean up any leftover staging events that may have been left behind during an unclean shutdown. As a time "optimization" for readonly mounts, we deferred this to the ro->rw transition, not realizing that any failure to clean all COW forks during a rw->ro transition would result in catastrophic corruption. Therefore, remove this optimization and only run the recovery routine when we're guaranteed not to have any COW staging extents anywhere, which means we always run this at mount time. While we're at it, move the callsite to xfs_log_mount_finish because any refcount btree expansion (however unlikely given that we're removing records from the right side of the index) must be fed by a per-AG reservation, which doesn't exist in its current location. Fixes: 174edb0e46e5 ("xfs: store in-progress CoW allocations in the refcount btree") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
*	xfs: rename xfs_bmap_add_free to xfs_free_extent_later	Darrick J. Wong	2021-10-22	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	xfs_bmap_add_free isn't a block mapping function; it schedules deferred freeing operations for a later point in a compound transaction chain. While it's primarily used by bunmapi, its use has expanded beyond that. Move it to xfs_alloc.c and rename the function since it's now general freeing functionality. Bring the slab cache bits in line with the way we handle the other intent items. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
*	xfs: replace xfs_sb_version checks with feature flag checks	Dave Chinner	2021-08-19	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Convert the xfs_sb_version_hasfoo() to checks against mp->m_features. Checks of the superblock itself during disk operations (e.g. in the read/write verifiers and the to/from disk formatters) are not converted - they operate purely on the superblock state. Everything else should use the mount features. Large parts of this conversion were done with sed with commands like this: for f in `git grep -l xfs_sb_version_has fs/xfs/.c`; do sed -i -e 's/xfs_sb_version_has$.$(&$.*$->m_sb)/xfs_has_\1(\2)/' $f done With manual cleanups for things like "xfs_has_extflgbit" and other little inconsistencies in naming. The result is ia lot less typing to check features and an XFS binary size reduced by a bit over 3kB: $ size -t fs/xfs/built-in.a text data bss dec hex filenam before 1130866 311352 484 1442702 16038e (TOTALS) after 1127727 311352 484 1439563 15f74b (TOTALS) Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
*	xfs: convert refcount btree cursor to use perags	Dave Chinner	2021-06-02	1	-2/+2
\| \| \| \| \| \| \|	Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
*	xfs: add a perag to the btree cursor	Dave Chinner	2021-06-02	1	-1/+1
\| \| \| \| \| \| \| \| \|	Which will eventually completely replace the agno in it. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Brian Foster <bfoster@redhat.com>
*	xfs: convert raw ag walks to use for_each_perag	Dave Chinner	2021-06-02	1	-3/+6
\| \| \| \| \| \| \| \| \| \|	Convert the raw walks to an iterator, pulling the current AG out of pag->pag_agno instead of the loop iterator variable. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
*	xfs: move xfs_perag_get/put to xfs_ag.[ch]	Dave Chinner	2021-06-02	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	They are AG functions, not superblock functions, so move them to the appropriate location. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
*	xfs: fix xfs_reflink_unshare usage of filemap_write_and_wait_range	Darrick J. Wong	2021-04-29	1	-1/+2
\| \| \| \| \| \| \| \| \| \|	The final parameter of filemap_write_and_wait_range is the end of the range to flush, not the length of the range to flush. Fixes: 46afb0628b86 ("xfs: only flush the unshared range in xfs_reflink_unshare") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
*	xfs: move the XFS_IFEXTENTS check into xfs_iread_extents	Christoph Hellwig	2021-04-15	1	-5/+3
\| \| \| \| \| \| \| \| \| \|	Move the XFS_IFEXTENTS check from the callers into xfs_iread_extents to simplify the code. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
*	xfs: move the di_flags2 field to struct xfs_inode	Christoph Hellwig	2021-04-07	1	-4/+4
\| \| \| \| \| \| \| \| \|	In preparation of removing the historic icinode struct, move the flags2 field into the containing xfs_inode structure. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
*	xfs: move the di_cowextsize field to struct xfs_inode	Christoph Hellwig	2021-04-07	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	In preparation of removing the historic icinode struct, move the cowextsize field into the containing xfs_inode structure. Also switch to use the xfs_extlen_t instead of a uint32_t. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
*	xfs: move the di_size field to struct xfs_inode	Christoph Hellwig	2021-04-07	1	-2/+2
\| \| \| \| \| \| \| \| \|	In preparation of removing the historic icinode struct, move the on-disk size field into the containing xfs_inode structure. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
*	xfs: flush eof/cowblocks if we can't reserve quota for file blocks	Darrick J. Wong	2021-02-03	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	If a fs modification (data write, reflink, xattr set, fallocate, etc.) is unable to reserve enough quota to handle the modification, try clearing whatever space the filesystem might have been hanging onto in the hopes of speeding up the filesystem. The flushing behavior will become particularly important when we add deferred inode inactivation because that will increase the amount of space that isn't actively tied to user data. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com>
*	xfs: try worst case space reservation upfront in xfs_reflink_remap_extent	Darrick J. Wong	2021-02-03	1	-3/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Now that we've converted xfs_reflink_remap_extent to use the new xfs_trans_alloc_inode API, we can focus on its slightly unusual behavior with regard to quota reservations. Since it's valid to remap written blocks into a hole, we must be able to increase the quota count by the number of blocks in the mapping. However, the incore space reservation process requires us to supply an asymptotic guess before we can gain exclusive access to resources. We'd like to reserve all the quota we need up front, but we also don't want to fail a written -> allocated remap operation unnecessarily. The solution is to make the remap_extents function call the transaction allocation function twice. The first time we ask to reserve enough space and quota to handle the absolute worst case situation, but if that fails, we can fall back to the old strategy: ask for the bare minimum space reservation upfront and increase the quota reservation later if we need to. Later in this patchset we change the transaction and quota code to try to reclaim space if we cannot reserve free space or quota. Restructuring the remap_extent function in this manner means that if the fallback increase fails, we can pass that back to the caller knowing that the transaction allocation already tried freeing space. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
*	xfs: refactor reflink functions to use xfs_trans_alloc_inode	Darrick J. Wong	2021-02-03	1	-32/+21
\| \| \| \| \| \| \| \| \| \|	The two remaining callers of xfs_trans_reserve_quota_nblks are in the reflink code. These conversions aren't as uniform as the previous conversions, so call that out in a separate patch. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
*	xfs: reserve data and rt quota at the same time	Darrick J. Wong	2021-02-03	1	-4/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Modify xfs_trans_reserve_quota_nblks so that we can reserve data and realtime blocks from the dquot at the same time. This change has the theoretical side effect that for allocations to realtime files we will reserve from the dquot both the number of rtblocks being allocated and the number of bmbt blocks that might be needed to add the mapping. However, since the mount code disables quota if it finds a realtime device, this should not result in any behavior changes. Now that we've moved the inode creation callers away from using the _nblks function, we can repurpose the (now unused) ninos argument for realtime blocks, so make that change. This also replaces the flags argument with a boolean parameter to force the reservation since we don't need to distinguish between data and rt quota reservations any more, and the only flag being passed in was FORCE_RES. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com>
*	xfs: remove xfs_trans_unreserve_quota_nblks completely	Darrick J. Wong	2021-02-03	1	-4/+1
\| \| \| \| \| \| \| \| \|	xfs_trans_cancel will release all the quota resources that were reserved on behalf of the transaction, so get rid of the explicit unreserve step. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com>
*	xfs: create convenience wrappers for incore quota block reservations	Darrick J. Wong	2021-02-03	1	-3/+2
\| \| \| \| \| \| \| \| \|	Create a couple of convenience wrappers for creating and deleting quota block reservations against future changes. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com>
*	xfs: clean up quota reservation callsites	Darrick J. Wong	2021-02-03	1	-2/+2
\| \| \| \| \| \| \| \| \|	Convert a few xfs_trans_reserve callsites that are open-coding other convenience functions. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com>
*	xfs: Check for extent overflow when remapping an extent	Chandan Babu R	2021-01-22	1	-0/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Remapping an extent involves unmapping the existing extent and mapping in the new extent. When unmapping, an extent containing the entire unmap range can be split into two extents, i.e. \| Old extent \| hole \| Old extent \| Hence extent count increases by 1. Mapping in the new extent into the destination file can increase the extent count by 1. Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
*	xfs: Check for extent overflow when moving extent from cow to data fork	Chandan Babu R	2021-01-22	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Moving an extent to data fork can cause a sub-interval of an existing extent to be unmapped. This will increase extent count by 1. Mapping in the new extent can increase the extent count by 1 again i.e. \| Old extent \| New extent \| Old extent \| Hence number of extents increases by 2. Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
*	xfs: only flush the unshared range in xfs_reflink_unshare	Darrick J. Wong	2020-11-04	1	-1/+2
\| \| \| \| \| \| \| \|	There's no reason to flush an entire file when we're unsharing part of a file. Therefore, only initiate writeback on the selected range. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
*	xfs: delete duplicated words + other fixes	Randy Dunlap	2020-08-05	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	Delete repeated words in fs/xfs/. {we, that, the, a, to, fork} Change "it it" to "it is" in one location. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> To: linux-fsdevel@vger.kernel.org Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: linux-xfs@vger.kernel.org Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
*	xfs: move helpers that lock and unlock two inodes against userspace IO	Darrick J. Wong	2020-07-06	1	-95/+2
\| \| \| \| \| \| \| \|	Move the double-inode locking helpers to xfs_inode.c since they're not specific to reflink. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
*	xfs: refactor locking and unlocking two inodes against userspace IO	Darrick J. Wong	2020-07-06	1	-20/+32
\| \| \| \| \| \| \| \| \|	Refactor the two functions that we use to lock and unlock two inodes to block userspace from initiating IO against a file, whether via system calls or mmap activity. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
*	xfs: fix xfs_reflink_remap_prep calling conventions	Darrick J. Wong	2020-07-06	1	-3/+3
\| \| \| \| \| \| \| \|	Fix the return value of xfs_reflink_remap_prep so that its return value conventions match the rest of xfs. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
*	xfs: reflink can skip remap existing mappings	Darrick J. Wong	2020-07-06	1	-0/+16
\| \| \| \| \| \| \| \|	If the source and destination map are identical, we can skip the remap step to save some time. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
*	xfs: only reserve quota blocks if we're mapping into a hole	Darrick J. Wong	2020-07-06	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \|	When logging quota block count updates during a reflink operation, we only log the /delta/ of the block count changes to the dquot. Since we now know ahead of time the extent type of both dmap and smap (and that they have the same length), we know that we only need to reserve quota blocks for dmap's blockcount if we're mapping it into a hole. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
*	xfs: only reserve quota blocks for bmbt changes if we're changing the data fork	Darrick J. Wong	2020-07-06	1	-7/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Now that we've reworked xfs_reflink_remap_extent to remap only one extent per transaction, we actually know if the extent being removed is an allocated mapping. This means that we now know ahead of time if we're going to be touching the data fork. Since we only need blocks for a bmbt split if we're going to update the data fork, we only need to get quota reservation if we know we're going to touch the data fork. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
*	xfs: redesign the reflink remap loop to fix blkres depletion crash	Darrick J. Wong	2020-07-06	1	-109/+129
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The existing reflink remapping loop has some structural problems that need addressing: The biggest problem is that we create one transaction for each extent in the source file without accounting for the number of mappings there are for the same range in the destination file. In other words, we don't know the number of remap operations that will be necessary and we therefore cannot guess the block reservation required. On highly fragmented filesystems (e.g. ones with active dedupe) we guess wrong, run out of block reservation, and fail. The second problem is that we don't actually use the bmap intents to their full potential -- instead of calling bunmapi directly and having to deal with its backwards operation, we could call the deferred ops xfs_bmap_unmap_extent and xfs_refcount_decrease_extent instead. This makes the frontend loop much simpler. Solve all of these problems by refactoring the remapping loops so that we only perform one remapping operation per transaction, and each operation only tries to remap a single extent from source to dest. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reported-by: Edwin Török <edwin@etorok.net> Tested-by: Edwin Török <edwin@etorok.net>
*	xfs: rename xfs_bmap_is_real_extent to is_written_extent	Darrick J. Wong	2020-07-06	1	-3/+3
\| \| \| \| \| \| \| \| \|	The name of this predicate is a little misleading -- it decides if the extent mapping is allocated and written. Change the name to be more direct, as we're going to add a new predicate in the next patch. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
*	xfs: fix reflink quota reservation accounting error	Darrick J. Wong	2020-07-06	1	-7/+14
\| \| \| \| \| \| \| \| \| \| \|	Quota reservations are supposed to account for the blocks that might be allocated due to a bmap btree split. Reflink doesn't do this, so fix this to make the quota accounting more accurate before we start rearranging things. Fixes: 862bb360ef56 ("xfs: reflink extents from one file to another") Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
*	xfs: fix partially uninitialized structure in xfs_reflink_remap_extent	Darrick J. Wong	2020-04-13	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \|	In the reflink extent remap function, it turns out that uirec (the block mapping corresponding only to the part of the passed-in mapping that got unmapped) was not fully initialized. Specifically, br_state was not being copied from the passed-in struct to the uirec. This could lead to unpredictable results such as the reflinked mapping being marked unwritten in the destination file. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
*	xfs: remove unnecessary null pointer checks from _read_agf callers	Darrick J. Wong	2020-01-26	1	-2/+0
\| \| \| \| \| \| \| \| \| \|	Drop the null buffer pointer checks in all code that calls xfs_alloc_read_agf and doesn't pass XFS_ALLOC_FLAG_TRYLOCK because they're no longer necessary. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com>
*	xfs: change return value of xfs_inode_need_cow to int	zhengbin	2020-01-20	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \|	Fixes coccicheck warning: fs/xfs/xfs_reflink.c:236:9-10: WARNING: return of 0/1 in function 'xfs_inode_need_cow' with return type bool Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: zhengbin <zhengbin13@huawei.com> [darrick: rename the function so it doesn't sound like a predicate] Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
*	xfs: introduce XFS_MAX_FILEOFF	Darrick J. Wong	2020-01-14	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \|	Introduce a new #define for the maximum supported file block offset. We'll use this in the next patch to make it more obvious that we're doing some operation for all possible inode fork mappings after a given offset. We can't use ULLONG_MAX here because bunmapi uses that to detect when it's done. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
*	xfs: don't set bmapi total block req where minleft is	Brian Foster	2019-10-23	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	xfs_bmapi_write() takes a total block requirement parameter that is passed down to the block allocation code and is used to specify the total block requirement of the associated transaction. This is used to try and select an AG that can not only satisfy the requested extent allocation, but can also accommodate subsequent allocations that might be required to complete the transaction. For example, additional bmbt block allocations may be required on insertion of the resulting extent to an inode data fork. While it's important for callers to calculate and reserve such extra blocks in the transaction, it is not necessary to pass the total value to xfs_bmapi_write() in all cases. The latter automatically sets minleft to ensure that sufficient free blocks remain after the allocation attempt to expand the format of the associated inode (i.e., such as extent to btree conversion, btree splits, etc). Therefore, any callers that pass a total block requirement of the bmap mapping length plus worst case bmbt expansion essentially specify the additional reservation requirement twice. These callers can pass a total of zero to rely on the bmapi minleft policy. Beyond being superfluous, the primary motivation for this change is that the total reservation logic in the bmbt code is dubious in scenarios where minlen < maxlen and a maxlen extent cannot be allocated (which is more common for data extent allocations where contiguity is not required). The total value is based on maxlen in the xfs_bmapi_write() caller. If the bmbt code falls back to an allocation between minlen and maxlen, that allocation will not succeed until total is reset to minlen, which essentially throws away any additional reservation included in total by the caller. In addition, the total value is not reset until after alignment is dropped, which means that such callers drop alignment far too aggressively than necessary. Update all callers of xfs_bmapi_write() that pass a total block value of the mapping length plus bmbt reservation to instead pass zero and rely on xfs_bmapi_minleft() to enforce the bmbt reservation requirement. This trades off slightly less conservative AG selection for the ability to preserve alignment in more scenarios. xfs_bmapi_write() callers that incorporate unrelated or additional reservations in total beyond what is already included in minleft must continue to use the former. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
*	xfs: split the iomap ops for buffered vs direct writes	Christoph Hellwig	2019-10-21	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \|	Instead of lots of magic conditionals in the main write_begin handler this make the intent very clear. Thing will become even better once we support delayed allocations for extent size hints and realtime allocations. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
*	xfs: pass two imaps to xfs_reflink_allocate_cow	Christoph Hellwig	2019-10-21	1	-15/+15
\| \| \| \| \| \| \| \| \| \| \|	xfs_reflink_allocate_cow consumes the source data fork imap, and potentially returns the COW fork imap. Split the arguments in two to clear up the calling conventions and to prepare for returning a source iomap from ->iomap_begin. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
*	xfs: remove xfs_reflink_dirty_extents	Christoph Hellwig	2019-10-21	1	-98/+5
\| \| \| \| \| \| \| \| \|	Now that xfs_file_unshare is not completely dumb we can just call it directly without iterating the extent and reflink btrees ourselves. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
*	iomap: ignore non-shared or non-data blocks in xfs_file_dirty	Christoph Hellwig	2019-10-21	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	xfs_file_dirty is used to unshare reflink blocks. Rename the function to xfs_file_unshare to better document that purpose, and skip iomaps that are not shared and don't need zeroing. This will allow to simplify the caller. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
*	xfs: remove unnecessary int returns from deferred bmap functions	Darrick J. Wong	2019-08-28	1	-6/+2
\| \| \| \| \| \| \| \|	Remove the return value from the functions that schedule deferred bmap operations since they never fail and do not return status. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
*	xfs: remove unnecessary int returns from deferred refcount functions	Darrick J. Wong	2019-08-28	1	-11/+4
\| \| \| \| \| \| \| \|	Remove the return value from the functions that schedule deferred refcount operations since they never fail and do not return status. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
*	xfs: fix reflink source file racing with directio writes	Darrick J. Wong	2019-08-18	1	-26/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	While trawling through the dedupe file comparison code trying to fix page deadlocking problems, Dave Chinner noticed that the reflink code only takes shared IOLOCK/MMAPLOCKs on the source file. Because page_mkwrite and directio writes do not take the EXCL versions of those locks, this means that reflink can race with writer processes. For pure remapping this can lead to undefined behavior and file corruption; for dedupe this means that we cannot be sure that the contents are identical when we decide to go ahead with the remapping. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
*	xfs: remove XFS_TRANS_NOFS	Christoph Hellwig	2019-06-30	1	-2/+2
\| \| \| \| \| \| \| \| \| \|	Instead of a magic flag for xfs_trans_alloc, just ensure all callers that can't relclaim through the file system use memalloc_nofs_save to set the per-task nofs flag. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
*	xfs: remove unused header files	Eric Sandeen	2019-06-28	1	-11/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are many, many xfs header files which are included but unneeded (or included twice) in the xfs code, so remove them. nb: xfs_linux.h includes about 9 headers for everyone, so those explicit includes get removed by this. I'm not sure what the preference is, but if we wanted explicit includes everywhere, a followup patch could remove those xfs_*.h includes from xfs_linux.h and move them into the files that need them. Or it could be left as-is. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
*	xfs: fix uninitialized error variables	Darrick J. Wong	2019-02-25	1	-1/+1
\| \| \| \| \| \| \|	smatch complained about some uninitialized error returns, so fix those. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
*	xfs: don't pass iomap flags to xfs_reflink_allocate_cow	Darrick J. Wong	2019-02-25	1	-2/+2
\| \| \| \| \| \| \| \|	Don't pass raw iomap flags to xfs_reflink_allocate_cow; signal our intention with a boolean argument. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>