| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
| |
MTD_NAND is large and encloses much more than what the symbol is
actually used for: raw NAND. Clarify the symbol by naming it
MTD_RAW_NAND instead.
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This list is a mess, while some items should probably not be in the
raw/ sub-directory, others are definitely at the right place but not
with the right description. Write uniform titles and group IPs by
vendor.
NAND controllers will appear under the list named "Raw/parallel NAND
flash controllers" while the other drivers will appear under
"Misc". Software ECC engines will later be moved out of the raw/
directory.
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
|
|
|
|
|
|
|
|
|
|
| |
The software Hamming ECC correction implementation is referred as
MTD_NAND_ECC which is too generic. Rename it
MTD_NAND_ECC_SW_HAMMING. Also rename MTD_NAND_ECC_SMC which is an
SMC quirk in the Hamming implementation as
MTD_NAND_ECC_SW_HAMMING_SMC.
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
|
|
|
|
|
|
|
| |
There is no point in having two distinct entries, merge them and
rename the symbol for more clarity: MTD_NAND_ECC_SW_BCH
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
nand_device embeds a nand_ecc_req object which contains the minimum
strength and step-size required by the NAND device.
Drop the chip->ecc_{strength,step}_ds fields and use
chip->base.eccreq.{strength,step_size} instead.
Signed-off-by: Boris Brezillon <bbrezillon@kernel.org>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Reviewed-by: Frieder Schrempf <frieder.schrempf@kontron.de>
|
|
|
|
|
|
|
|
| |
The same information is provided by nanddev_ntargets().
Signed-off-by: Boris Brezillon <bbrezillon@kernel.org>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Reviewed-by: Frieder Schrempf <frieder.schrempf@kontron.de>
|
|
|
|
|
|
|
|
|
| |
The target size can now be returned by nanddev_get_targetsize(). Get
rid of the chip->chipsize field and use this helper instead.
Signed-off-by: Boris Brezillon <bbrezillon@kernel.org>
Reviewed-by: Frieder Schrempf <frieder.schrempf@kontron.de>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
|
|
|
|
|
|
|
|
|
|
| |
Now that we inherit from nand_device, we can use
nand_device->memorg.bits_per_cell instead of having our own field at
the nand_chip level.
Signed-off-by: Boris Brezillon <bbrezillon@kernel.org>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Reviewed-by: Frieder Schrempf <frieder.schrempf@kontron.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
nanddev_mtd_max_bad_blocks() is implemented by the generic NAND layer
and is already doing what we need. Reuse this function instead of
having our own implementation.
While at it, get rid of the ->max_bb_per_die and ->blocks_per_die
fields which are now unused.
Signed-off-by: Boris Brezillon <bbrezillon@kernel.org>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Reviewed-by: Frieder Schrempf <frieder.schrempf@kontron.de>
|
|
|
|
|
|
|
|
|
|
| |
Looking at the field names it's hard to tell what ->data_buf, ->pagebuf
and ->pagebuf_bitflips are for. Clarify that by moving those fields
in a sub-struct named pagecache.
Signed-off-by: Boris Brezillon <bbrezillon@kernel.org>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Reviewed-by: Frieder Schrempf <frieder.schrempf@kontron.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We plan to move cache related fields to a pagecache struct in nand_chip
but some drivers access ->pagebuf directly to invalidate the cache
before they start using ->data_buf.
Let's provide an helper that returns a pointer to ->data_buf after
invalidating the cache.
Signed-off-by: Boris Brezillon <bbrezillon@kernel.org>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Reviewed-by: Frieder Schrempf <frieder.schrempf@kontron.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In order to use some of the nanddev_xxx() helpers, we need to
initialize the nand_device object embedded in nand_chip using
nanddev_init(). This requires implementing nand_ops.
We also drop useless mtd->xxx initialization when they're already taken
case of by nanddev_init().
Signed-off-by: Boris Brezillon <bbrezillon@kernel.org>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Reviewed-by: Frieder Schrempf <frieder.schrempf@kontron.de>
|
|
|
|
|
|
|
|
|
|
| |
If we want to use the generic NAND layer, we need to have the memorg
struct appropriately filled. Patch the detection code to fill this
struct.
Signed-off-by: Boris Brezillon <bbrezillon@kernel.org>
Reviewed-by: Frieder Schrempf <frieder.schrempf@kontron.de>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
|
|
|
|
|
|
|
|
| |
We just have to use nanddev_mtd_max_bad_blocks().
Signed-off-by: Boris Brezillon <bbrezillon@kernel.org>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Reviewed-by: Frieder Schrempf <frieder.schrempf@kontron.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
NAND datasheets usually give the maximum number of bad blocks per LUN
and this number can be used to help upper layers decide how much blocks
they should reserve for bad block handling.
Add a max_bad_eraseblocks_per_lun to the nand_memory_organization
struct and update the NAND_MEMORG() macro (and its users) accordingly.
We also provide a default mtd->_max_bad_blocks() implementation.
Signed-off-by: Boris Brezillon <bbrezillon@kernel.org>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Reviewed-by: Frieder Schrempf <frieder.schrempf@kontron.de>
|
|
|
|
|
|
|
|
|
|
| |
Specify the oob layout operation to avoid no oob scheme defined for
some nand flash.
Fixes: 8fae856c5350 ("mtd: rawnand: meson: add support for Amlogic NAND flash controller")
Signed-off-by: Liang Yang <liang.yang@amlogic.com>
Tested-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
|
|
|
|
|
|
|
|
|
| |
of_match_device can return NULL if there is no matching device. Avoid
a potential NULL pointer dereference by checking for the return value
and passing the error upstream.
Signed-off-by: Aditya Pakki <pakki001@umn.edu>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
|
|
|
|
|
|
|
|
| |
The generic layout for BBT markers will most likely overlap with our
ECC bytes in the OOB, so move the BBT markers outside the OOB area.
Signed-off-by: Paul Cercueil <paul@crapouillou.net>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
|
|
|
|
|
|
|
|
| |
The Ben Nanonote from Qi Hardware expects a specific OOB layout on its
NAND.
Signed-off-by: Paul Cercueil <paul@crapouillou.net>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
|
|
|
|
|
|
|
|
|
|
|
| |
The boot ROM of the JZ4725B SoC expects a specific OOB layout on the
NAND, so we use it unconditionally in the ingenic-nand driver.
Also add the jz4725b-bch driver to support the JZ4725B-specific BCH
hardware.
Signed-off-by: Paul Cercueil <paul@crapouillou.net>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
|
|
|
|
|
|
|
|
|
| |
Add support for probing the ingenic-nand driver on the JZ4740 SoC from
Ingenic, and the jz4740-ecc driver to support the JZ4740-specific
ECC hardware.
Signed-off-by: Paul Cercueil <paul@crapouillou.net>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
|
|
|
|
|
|
|
|
| |
Use the 'ecc-engine' standard property instead of the custom
'ingenic,bch-controller' custom property, which is now deprecated.
Signed-off-by: Paul Cercueil <paul@crapouillou.net>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
|
|
|
|
|
|
|
|
|
|
| |
The ingenic-nand driver uses an API provided by the jz4780-bch driver.
This makes it difficult to support other SoCs in the jz4780-bch driver.
To work around this, we separate the API functions from the SoC-specific
code, so that these API functions are SoC-agnostic.
Signed-off-by: Paul Cercueil <paul@crapouillou.net>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
|
|
|
|
|
|
|
|
|
| |
The jz4780_bch_init name was confusing, as it suggested that its content
should be executed once at init time, whereas what the function really
does is reset the hardware for a new ECC operation.
Signed-off-by: Paul Cercueil <paul@crapouillou.net>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
|
|
|
|
|
|
|
|
|
| |
The jz4780_nand driver will be modified to handle all the Ingenic
JZ47xx SoCs that the upstream Linux kernel supports (JZ4740, JZ4725B,
JZ4770, JZ4780), so it makes sense to rename it.
Signed-off-by: Paul Cercueil <paul@crapouillou.net>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
|
|
|
|
|
|
|
|
| |
Use SPDX license notifiers instead of GPLv2 license text in the headers.
Signed-off-by: Paul Cercueil <paul@crapouillou.net>
Reviewed-by: Boris Brezillon <bbrezillon@kernel.org>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
|
|
|
|
|
|
|
|
| |
Before adding support for more SoCs and seeing the number of files for
these drivers grow, we move them to their own subfolder to keep it tidy.
Signed-off-by: Paul Cercueil <paul@crapouillou.net>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
meson_nfc_dma_buffer_setup() is called with the "info" buffer which is
allocated a few lines before using kzalloc(). If
meson_nfc_dma_buffer_setup() fails we need to free the allocated "info"
buffer instead of only freeing it upon success.
Fixes: 8fae856c53500a ("mtd: rawnand: meson: add support for Amlogic NAND flash controller")
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Acked-by: Liang Yang <liang.yang@amlogic.com>
Reviewed-by: Kevin Hilman <khilman@baylibre.com>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
kzalloc() can return NULL if memory could not be allocated. Check the
return value of the kzalloc() call in meson_nfc_read_buf() to make it
consistent with other memory allocations within the meson_nand driver.
Fixes: 8fae856c53500a ("mtd: rawnand: meson: add support for Amlogic NAND flash controller")
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Acked-by: Liang Yang <liang.yang@amlogic.com>
Reviewed-by: Kevin Hilman <khilman@baylibre.com>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
|
|
|
|
|
|
|
|
| |
Adopt the SPDX license identifiers to ease license compliance
management.
Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
|
|
|
|
|
|
|
|
|
|
|
| |
The sam9x60 board defines the CCFG_EBICSA register under SFR,
and not as a MATRIX register, as previous boards do.
NAND Flash I/Os are connected to D16–D23, thus
SFR_CCFG_EBICSA.NFD0_ON_D16 is set to 1.
Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
|
|
|
|
|
|
|
|
|
|
| |
The sam9x60 board defines the CCFG_EBICSA register under SFR,
and not as a MATRIX register, as previous boards do. Add a
more generic name for the EBICSA regmap, as a prerequisite for
sam9x60 nand controller support.
Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
|
|
|
|
|
|
|
|
|
| |
The sam9x60 board defines the CCFG_EBICSA register under SFR,
and not as a MATRIX register, as previous boards do.
Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com>
Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
|
|
|
|
|
|
|
|
|
|
|
| |
The sam9x60 board defines the CCFG_EBICSA register under SFR,
and not as a MATRIX register, as previous boards do. Add a
more generic name for the EBI regmap as a prerequisite for
sam9x60 ebi support.
Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com>
Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In preparation to enabling -Wimplicit-fallthrough, mark switch
cases where we are expecting to fall through.
This patch fixes the following warning:
drivers/mtd/nand/raw/diskonchip.c: In function ‘doc_probe’:
./include/linux/printk.h:303:2: warning: this statement may fall through [-Wimplicit-fallthrough=]
printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/mtd/nand/raw/diskonchip.c:1479:4: note: in expansion of macro ‘pr_err’
pr_err("DiskOnChip Millennium Plus 32MB is not supported, ignoring.\n");
^~~~~~
drivers/mtd/nand/raw/diskonchip.c:1480:3: note: here
default:
^~~~~~~
drivers/mtd/nand/raw/nandsim.c: In function ‘ns_init_module’:
drivers/mtd/nand/raw/nandsim.c:2254:22: warning: this statement may fall through [-Wimplicit-fallthrough=]
chip->bbt_options |= NAND_BBT_NO_OOB;
drivers/mtd/nand/raw/nandsim.c:2255:2: note: here
case 1:
^~~~
drivers/mtd/nand/raw/nuc900_nand.c: In function ‘nuc900_nand_command_lp’:
./arch/x86/include/asm/io.h:91:22: warning: this statement may fall through [-Wimplicit-fallthrough=]
#define __raw_writel __writel
drivers/mtd/nand/raw/nuc900_nand.c:52:2: note: in expansion of macro ‘__raw_writel’
__raw_writel((val), (dev)->reg + REG_SMCMD)
^~~~~~~~~~~~
drivers/mtd/nand/raw/nuc900_nand.c:196:3: note: in expansion of macro ‘write_cmd_reg’
write_cmd_reg(nand, NAND_CMD_READSTART);
^~~~~~~~~~~~~
drivers/mtd/nand/raw/nuc900_nand.c:197:2: note: here
default:
^~~~~~~
drivers/mtd/nand/raw/omap_elm.c: In function ‘elm_context_restore’:
drivers/mtd/nand/raw/omap_elm.c:512:4: warning: this statement may fall through [-Wimplicit-fallthrough=]
elm_write_reg(info, ELM_SYNDROME_FRAGMENT_4 + offset,
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
regs->elm_syndrome_fragment_4[i]);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/mtd/nand/raw/omap_elm.c:514:3: note: here
case BCH8_ECC:
^~~~
drivers/mtd/nand/raw/omap_elm.c:517:4: warning: this statement may fall through [-Wimplicit-fallthrough=]
elm_write_reg(info, ELM_SYNDROME_FRAGMENT_2 + offset,
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
regs->elm_syndrome_fragment_2[i]);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/mtd/nand/raw/omap_elm.c:519:3: note: here
case BCH4_ECC:
^~~~
drivers/mtd/nand/raw/omap_elm.c: In function ‘elm_context_save’:
drivers/mtd/nand/raw/omap_elm.c:466:37: warning: this statement may fall through [-Wimplicit-fallthrough=]
regs->elm_syndrome_fragment_4[i] = elm_read_reg(info,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~
ELM_SYNDROME_FRAGMENT_4 + offset);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/mtd/nand/raw/omap_elm.c:468:3: note: here
case BCH8_ECC:
^~~~
drivers/mtd/nand/raw/omap_elm.c:471:37: warning: this statement may fall through [-Wimplicit-fallthrough=]
regs->elm_syndrome_fragment_2[i] = elm_read_reg(info,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~
ELM_SYNDROME_FRAGMENT_2 + offset);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/mtd/nand/raw/omap_elm.c:473:3: note: here
case BCH4_ECC:
^~~~
Warning level 3 was used: -Wimplicit-fallthrough=3
This patch is part of the ongoing efforts to enabling
-Wimplicit-fallthrough.
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
|
|
|
|
|
|
|
|
| |
Introduce a GPMI_IS_MXS() macro to take into account the cases
when mx23 or mx28 are used, which helps readability.
Signed-off-by: Fabio Estevam <festevam@gmail.com>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
|
|
|
|
|
|
|
|
|
|
| |
Make use of the spi-mem direct mapping API to let advanced controllers
optimize read/write operations when they support direct mapping.
Signed-off-by: Boris Brezillon <bbrezillon@kernel.org>
Cc: Stefan Roese <sr@denx.de>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Tested-by: Stefan Roese <sr@denx.de>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen fix from Juergen Gross:
"A fix for a Xen bug introduced by David's series for excluding
ballooned pages in vmcores"
* tag 'for-linus-5.1b-rc1b-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/balloon: Fix mapping PG_offline pages to user space
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The XEN balloon driver - in contrast to other balloon drivers - allows
to map some inflated pages to user space. Such pages are allocated via
alloc_xenballooned_pages() and freed via free_xenballooned_pages().
The pfn space of these allocated pages is used to map other things
by the hypervisor using hypercalls.
Pages marked with PG_offline must never be mapped to user space (as
this page type uses the mapcount field of struct pages).
So what we can do is, clear/set PG_offline when allocating/freeing an
inflated pages. This way, most inflated pages can be excluded by
dumping tools and the "reused for other purpose" balloon pages are
correctly not marked as PG_offline.
Fixes: 77c4adf6a6df (xen/balloon: mark inflated pages PG_offline)
Reported-by: Julien Grall <julien.grall@arm.com>
Tested-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
|
|\ \
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull device-dax updates from Dan Williams:
"New device-dax infrastructure to allow persistent memory and other
"reserved" / performance differentiated memories, to be assigned to
the core-mm as "System RAM".
Some users want to use persistent memory as additional volatile
memory. They are willing to cope with potential performance
differences, for example between DRAM and 3D Xpoint, and want to use
typical Linux memory management apis rather than a userspace memory
allocator layered over an mmap() of a dax file. The administration
model is to decide how much Persistent Memory (pmem) to use as System
RAM, create a device-dax-mode namespace of that size, and then assign
it to the core-mm. The rationale for device-dax is that it is a
generic memory-mapping driver that can be layered over any "special
purpose" memory, not just pmem. On subsequent boots udev rules can be
used to restore the memory assignment.
One implication of using pmem as RAM is that mlock() no longer keeps
data off persistent media. For this reason it is recommended to enable
NVDIMM Security (previously merged for 5.0) to encrypt pmem contents
at rest. We considered making this recommendation an actively enforced
requirement, but in the end decided to leave it as a distribution /
administrator policy to allow for emulation and test environments that
lack security capable NVDIMMs.
Summary:
- Replace the /sys/class/dax device model with /sys/bus/dax, and
include a compat driver so distributions can opt-in to the new ABI.
- Allow for an alternative driver for the device-dax address-range
- Introduce the 'kmem' driver to hotplug / assign a device-dax
address-range to the core-mm.
- Arrange for the device-dax target-node to be onlined so that the
newly added memory range can be uniquely referenced by numa apis"
NOTE! I'm not entirely happy with the whole "PMEM as RAM" model because
we currently have special - and very annoying rules in the kernel about
accessing PMEM only with the "MC safe" accessors, because machine checks
inside the regular repeat string copy functions can be fatal in some
(not described) circumstances.
And apparently the PMEM modules can cause that a lot more than regular
RAM. The argument is that this happens because PMEM doesn't necessarily
get scrubbed at boot like RAM does, but that is planned to be added for
the user space tooling.
Quoting Dan from another email:
"The exposure can be reduced in the volatile-RAM case by scanning for
and clearing errors before it is onlined as RAM. The userspace tooling
for that can be in place before v5.1-final. There's also runtime
notifications of errors via acpi_nfit_uc_error_notify() from
background scrubbers on the DIMM devices. With that mechanism the
kernel could proactively clear newly discovered poison in the volatile
case, but that would be additional development more suitable for v5.2.
I understand the concern, and the need to highlight this issue by
tapping the brakes on feature development, but I don't see PMEM as RAM
making the situation worse when the exposure is also there via DAX in
the PMEM case. Volatile-RAM is arguably a safer use case since it's
possible to repair pages where the persistent case needs active
application coordination"
* tag 'devdax-for-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
device-dax: "Hotplug" persistent memory for use like normal RAM
mm/resource: Let walk_system_ram_range() search child resources
mm/memory-hotplug: Allow memory resources to be children
mm/resource: Move HMM pr_debug() deeper into resource code
mm/resource: Return real error codes from walk failures
device-dax: Add a 'modalias' attribute to DAX 'bus' devices
device-dax: Add a 'target_node' attribute
device-dax: Auto-bind device after successful new_id
acpi/nfit, device-dax: Identify differentiated memory with a unique numa-node
device-dax: Add /sys/class/dax backwards compatibility
device-dax: Add support for a dax override driver
device-dax: Move resource pinning+mapping into the common driver
device-dax: Introduce bus + driver model
device-dax: Start defining a dax bus model
device-dax: Remove multi-resource infrastructure
device-dax: Kill dax_region base
device-dax: Kill dax_region ida
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This is intended for use with NVDIMMs that are physically persistent
(physically like flash) so that they can be used as a cost-effective
RAM replacement. Intel Optane DC persistent memory is one
implementation of this kind of NVDIMM.
Currently, a persistent memory region is "owned" by a device driver,
either the "Direct DAX" or "Filesystem DAX" drivers. These drivers
allow applications to explicitly use persistent memory, generally
by being modified to use special, new libraries. (DIMM-based
persistent memory hardware/software is described in great detail
here: Documentation/nvdimm/nvdimm.txt).
However, this limits persistent memory use to applications which
*have* been modified. To make it more broadly usable, this driver
"hotplugs" memory into the kernel, to be managed and used just like
normal RAM would be.
To make this work, management software must remove the device from
being controlled by the "Device DAX" infrastructure:
echo dax0.0 > /sys/bus/dax/drivers/device_dax/unbind
and then tell the new driver that it can bind to the device:
echo dax0.0 > /sys/bus/dax/drivers/kmem/new_id
After this, there will be a number of new memory sections visible
in sysfs that can be onlined, or that may get onlined by existing
udev-initiated memory hotplug rules.
This rebinding procedure is currently a one-way trip. Once memory
is bound to "kmem", it's there permanently and can not be
unbound and assigned back to device_dax.
The kmem driver will never bind to a dax device unless the device
is *explicitly* bound to the driver. There are two reasons for
this: One, since it is a one-way trip, it can not be undone if
bound incorrectly. Two, the kmem driver destroys data on the
device. Think of if you had good data on a pmem device. It
would be catastrophic if you compile-in "kmem", but leave out
the "device_dax" driver. kmem would take over the device and
write volatile data all over your good data.
This inherits any existing NUMA information for the newly-added
memory from the persistent memory device that came from the
firmware. On Intel platforms, the firmware has guarantees that
require each socket's persistent memory to be in a separate
memory-only NUMA node. That means that this patch is not expected
to create NUMA nodes, but will simply hotplug memory into existing
nodes.
Because NUMA nodes are created, the existing NUMA APIs and tools
are sufficient to create policies for applications or memory areas
to have affinity for or an aversion to using this memory.
There is currently some metadata at the beginning of pmem regions.
The section-size memory hotplug restrictions, plus this small
reserved area can cause the "loss" of a section or two of capacity.
This should be fixable in follow-on patches. But, as a first step,
losing 256MB of memory (worst case) out of hundreds of gigabytes
is a good tradeoff vs. the required code to fix this up precisely.
This calculation is also the reason we export
memory_block_size_bytes().
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Ross Zwisler <zwisler@kernel.org>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: linux-nvdimm@lists.01.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: Huang Ying <ying.huang@intel.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Yaowei Bai <baiyaowei@cmss.chinamobile.com>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Jerome Glisse <jglisse@redhat.com>
Reviewed-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Add a 'modalias' attribute to devices under the DAX bus so that userspace
is able to dynamically load modules as needed.
Normally, udev can get the modalias from 'uevent', and that is correctly
set up by the DAX bus. However other tooling such as 'libndctl' for
interacting with drivers/nvdimm/, and 'libdaxctl' for drivers/dax/ can
also use the modalias to dynamically load modules via libkmod lookups.
The 'nd' bus set up by the libnvdimm subsystem exports a modalias
attribute. Imitate this to export the same for the 'dax' bus.
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The target-node attribute is the Linux numa-node that a device-dax
instance may create when it is online. Prior to being online the
device's 'numa_node' property reflects the closest online cpu node which
is the typical expectation of a device 'numa_node'. Once it is online it
becomes its own distinct numa node, i.e. 'target_node'.
Export the 'target_node' property to give userspace tooling the ability
to predict the effective numa-node from a device-dax instance configured
to provide 'System RAM' capacity.
Cc: Vishal Verma <vishal.l.verma@intel.com>
Reported-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The typical 'new_id' attribute behavior is to immediately attach a
device to its driver after a new device-id is added. Implement this
behavior for the dax bus.
Reported-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Reported-by: Brice Goglin <Brice.Goglin@inria.fr>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Persistent memory, as described by the ACPI NFIT (NVDIMM Firmware
Interface Table), is the first known instance of a memory range
described by a unique "target" proximity domain. Where "initiator" and
"target" proximity domains is an approach that the ACPI HMAT
(Heterogeneous Memory Attributes Table) uses to described the unique
performance properties of a memory range relative to a given initiator
(e.g. CPU or DMA device).
Currently the numa-node for a /dev/pmemX block-device or /dev/daxX.Y
char-device follows the traditional notion of 'numa-node' where the
attribute conveys the closest online numa-node. That numa-node attribute
is useful for cpu-binding and memory-binding processes *near* the
device. However, when the memory range backing a 'pmem', or 'dax' device
is onlined (memory hot-add) the memory-only-numa-node representing that
address needs to be differentiated from the set of online nodes. In
other words, the numa-node association of the device depends on whether
you can bind processes *near* the cpu-numa-node in the offline
device-case, or bind process *on* the memory-range directly after the
backing address range is onlined.
Allow for the case that platform firmware describes persistent memory
with a unique proximity domain, i.e. when it is distinct from the
proximity of DRAM and CPUs that are on the same socket. Plumb the Linux
numa-node translation of that proximity through the libnvdimm region
device to namespaces that are in device-dax mode. With this in place the
proposed kmem driver [1] can optionally discover a unique numa-node
number for the address range as it transitions the memory from an
offline state managed by a device-driver to an online memory range
managed by the core-mm.
[1]: https://lore.kernel.org/lkml/20181022201317.8558C1D8@viggo.jf.intel.com
Reported-by: Fan Du <fan.du@intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Oliver O'Halloran" <oohall@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Reviewed-by: Yang Shi <yang.shi@linux.alibaba.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
On the expectation that some environments may not upgrade libdaxctl
(userspace component that depends on the /sys/class/dax hierarchy),
provide a default / legacy dax_pmem_compat driver. The dax_pmem_compat
driver implements the original /sys/class/dax sysfs layout rather than
/sys/bus/dax. When userspace is upgraded it can blacklist this module
and switch to the dax_pmem driver going forward.
CONFIG_DEV_DAX_PMEM_COMPAT and supporting code will be deleted according
to the dax_pmem entry in Documentation/ABI/obsolete/.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Introduce the 'new_id' concept for enabling a custom device-driver attach
policy for dax-bus drivers. The intended use is to have a mechanism for
hot-plugging device-dax ranges into the page allocator on-demand. With
this in place the default policy of using device-dax for performance
differentiated memory can be overridden by user-space policy that can
arrange for the memory range to be managed as 'System RAM' with
user-defined NUMA and other performance attributes.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Move the responsibility of calling devm_request_resource() and
devm_memremap_pages() into the common device-dax driver. This is another
preparatory step to allowing an alternate personality driver for a
device-dax range.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
In support of multiple device-dax instances per device-dax-region and
allowing the 'kmem' driver to attach to dax-instances instead of the
current device-node access, convert the dax sub-system from a class to a
bus. Recall that the kmem driver takes reserved / special purpose
memories and assigns them to be managed by the core-mm.
Aside from the fact the device-dax instances are registered and probed
on a bus, two other lifetime-management changes are made:
1/ Delay attaching a cdev until driver probe time
2/ A new run_dax() helper is introduced to allow restoring dax-operation
after a kill_dax() event. So, at driver ->probe() time we run_dax()
and at ->remove() time we kill_dax() and invalidate all mappings.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Towards eliminating the dax_class, move the dax-device-attribute
enabling to a new bus.c file in the core. The amount of code
thrash of sub-sequent patches is reduced as no logic changes are made,
just pure code movement.
A temporary export of unregister_dex_dax() and dax_attribute_groups is
needed to preserve compilation, but those symbols become static again in
a follow-on patch.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|