Commit 88793e5c authored by Linus Torvalds's avatar Linus Torvalds

Merge tag 'libnvdimm-for-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/nvdimm

Pull libnvdimm subsystem from Dan Williams:
 "The libnvdimm sub-system introduces, in addition to the
  libnvdimm-core, 4 drivers / enabling modules:

  NFIT:
    Instantiates an "nvdimm bus" with the core and registers memory
    devices (NVDIMMs) enumerated by the ACPI 6.0 NFIT (NVDIMM Firmware
    Interface table).

    After registering NVDIMMs the NFIT driver then registers "region"
    devices.  A libnvdimm-region defines an access mode and the
    boundaries of persistent memory media.  A region may span multiple
    NVDIMMs that are interleaved by the hardware memory controller.  In
    turn, a libnvdimm-region can be carved into a "namespace" device and
    bound to the PMEM or BLK driver which will attach a Linux block
    device (disk) interface to the memory.

  PMEM:
    Initially merged in v4.1 this driver for contiguous spans of
    persistent memory address ranges is re-worked to drive
    PMEM-namespaces emitted by the libnvdimm-core.

    In this update the PMEM driver, on x86, gains the ability to assert
    that writes to persistent memory have been flushed all the way
    through the caches and buffers in the platform to persistent media.
    See memcpy_to_pmem() and wmb_pmem().

  BLK:
    This new driver enables access to persistent memory media through
    "Block Data Windows" as defined by the NFIT.  The primary difference
    of this driver to PMEM is that only a small window of persistent
    memory is mapped into system address space at any given point in
    time.

    Per-NVDIMM windows are reprogrammed at run time, per-I/O, to access
    different portions of the media.  BLK-mode, by definition, does not
    support DAX.

  BTT:
    This is a library, optionally consumed by either PMEM or BLK, that
    converts a byte-accessible namespace into a disk with atomic sector
    update semantics (prevents sector tearing on crash or power loss).

    The sinister aspect of sector tearing is that most applications do
    not know they have a atomic sector dependency.  At least today's
    disk's rarely ever tear sectors and if they do one almost certainly
    gets a CRC error on access.  NVDIMMs will always tear and always
    silently.  Until an application is audited to be robust in the
    presence of sector-tearing the usage of BTT is recommended.

  Thanks to: Ross Zwisler, Jeff Moyer, Vishal Verma, Christoph Hellwig,
  Ingo Molnar, Neil Brown, Boaz Harrosh, Robert Elliott, Matthew Wilcox,
  Andy Rudoff, Linda Knippers, Toshi Kani, Nicholas Moulin, Rafael
  Wysocki, and Bob Moore"

* tag 'libnvdimm-for-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/nvdimm: (33 commits)
  arch, x86: pmem api for ensuring durability of persistent memory updates
  libnvdimm: Add sysfs numa_node to NVDIMM devices
  libnvdimm: Set numa_node to NVDIMM devices
  acpi: Add acpi_map_pxm_to_online_node()
  libnvdimm, nfit: handle unarmed dimms, mark namespaces read-only
  pmem: flag pmem block devices as non-rotational
  libnvdimm: enable iostat
  pmem: make_request cleanups
  libnvdimm, pmem: fix up max_hw_sectors
  libnvdimm, blk: add support for blk integrity
  libnvdimm, btt: add support for blk integrity
  fs/block_dev.c: skip rw_page if bdev has integrity
  libnvdimm: Non-Volatile Devices
  tools/testing/nvdimm: libnvdimm unit test infrastructure
  libnvdimm, nfit, nd_blk: driver for BLK-mode access persistent memory
  nd_btt: atomic sector updates
  libnvdimm: infrastructure for btt devices
  libnvdimm: write blk label set
  libnvdimm: write pmem label set
  libnvdimm: blk labels and namespace instantiation
  ...
parents 1bc5e157 61031952
This diff is collapsed.
This diff is collapsed.
...@@ -6102,6 +6102,39 @@ M: Sasha Levin <sasha.levin@oracle.com> ...@@ -6102,6 +6102,39 @@ M: Sasha Levin <sasha.levin@oracle.com>
S: Maintained S: Maintained
F: tools/lib/lockdep/ F: tools/lib/lockdep/
LIBNVDIMM: NON-VOLATILE MEMORY DEVICE SUBSYSTEM
M: Dan Williams <dan.j.williams@intel.com>
L: linux-nvdimm@lists.01.org
Q: https://patchwork.kernel.org/project/linux-nvdimm/list/
S: Supported
F: drivers/nvdimm/*
F: include/linux/nd.h
F: include/linux/libnvdimm.h
F: include/uapi/linux/ndctl.h
LIBNVDIMM BLK: MMIO-APERTURE DRIVER
M: Ross Zwisler <ross.zwisler@linux.intel.com>
L: linux-nvdimm@lists.01.org
Q: https://patchwork.kernel.org/project/linux-nvdimm/list/
S: Supported
F: drivers/nvdimm/blk.c
F: drivers/nvdimm/region_devs.c
F: drivers/acpi/nfit*
LIBNVDIMM BTT: BLOCK TRANSLATION TABLE
M: Vishal Verma <vishal.l.verma@intel.com>
L: linux-nvdimm@lists.01.org
Q: https://patchwork.kernel.org/project/linux-nvdimm/list/
S: Supported
F: drivers/nvdimm/btt*
LIBNVDIMM PMEM: PERSISTENT MEMORY DRIVER
M: Ross Zwisler <ross.zwisler@linux.intel.com>
L: linux-nvdimm@lists.01.org
Q: https://patchwork.kernel.org/project/linux-nvdimm/list/
S: Supported
F: drivers/nvdimm/pmem.c
LINUX FOR IBM pSERIES (RS/6000) LINUX FOR IBM pSERIES (RS/6000)
M: Paul Mackerras <paulus@au.ibm.com> M: Paul Mackerras <paulus@au.ibm.com>
W: http://www.ibm.com/linux/ltc/projects/ppc W: http://www.ibm.com/linux/ltc/projects/ppc
...@@ -8363,12 +8396,6 @@ S: Maintained ...@@ -8363,12 +8396,6 @@ S: Maintained
F: Documentation/blockdev/ramdisk.txt F: Documentation/blockdev/ramdisk.txt
F: drivers/block/brd.c F: drivers/block/brd.c
PERSISTENT MEMORY DRIVER
M: Ross Zwisler <ross.zwisler@linux.intel.com>
L: linux-nvdimm@lists.01.org
S: Supported
F: drivers/block/pmem.c
RANDOM NUMBER DRIVER RANDOM NUMBER DRIVER
M: "Theodore Ts'o" <tytso@mit.edu> M: "Theodore Ts'o" <tytso@mit.edu>
S: Maintained S: Maintained
......
...@@ -158,6 +158,7 @@ static __init int is_reserve_region(efi_memory_desc_t *md) ...@@ -158,6 +158,7 @@ static __init int is_reserve_region(efi_memory_desc_t *md)
case EFI_BOOT_SERVICES_CODE: case EFI_BOOT_SERVICES_CODE:
case EFI_BOOT_SERVICES_DATA: case EFI_BOOT_SERVICES_DATA:
case EFI_CONVENTIONAL_MEMORY: case EFI_CONVENTIONAL_MEMORY:
case EFI_PERSISTENT_MEMORY:
return 0; return 0;
default: default:
break; break;
......
...@@ -1222,6 +1222,10 @@ efi_initialize_iomem_resources(struct resource *code_resource, ...@@ -1222,6 +1222,10 @@ efi_initialize_iomem_resources(struct resource *code_resource,
flags |= IORESOURCE_DISABLED; flags |= IORESOURCE_DISABLED;
break; break;
case EFI_PERSISTENT_MEMORY:
name = "Persistent Memory";
break;
case EFI_RESERVED_TYPE: case EFI_RESERVED_TYPE:
case EFI_RUNTIME_SERVICES_CODE: case EFI_RUNTIME_SERVICES_CODE:
case EFI_RUNTIME_SERVICES_DATA: case EFI_RUNTIME_SERVICES_DATA:
......
...@@ -27,6 +27,7 @@ config X86 ...@@ -27,6 +27,7 @@ config X86
select ARCH_HAS_ELF_RANDOMIZE select ARCH_HAS_ELF_RANDOMIZE
select ARCH_HAS_FAST_MULTIPLIER select ARCH_HAS_FAST_MULTIPLIER
select ARCH_HAS_GCOV_PROFILE_ALL select ARCH_HAS_GCOV_PROFILE_ALL
select ARCH_HAS_PMEM_API
select ARCH_HAS_SG_CHAIN select ARCH_HAS_SG_CHAIN
select ARCH_HAVE_NMI_SAFE_CMPXCHG select ARCH_HAVE_NMI_SAFE_CMPXCHG
select ARCH_MIGHT_HAVE_ACPI_PDC if ACPI select ARCH_MIGHT_HAVE_ACPI_PDC if ACPI
...@@ -1419,6 +1420,9 @@ source "mm/Kconfig" ...@@ -1419,6 +1420,9 @@ source "mm/Kconfig"
config X86_PMEM_LEGACY config X86_PMEM_LEGACY
bool "Support non-standard NVDIMMs and ADR protected memory" bool "Support non-standard NVDIMMs and ADR protected memory"
depends on PHYS_ADDR_T_64BIT
depends on BLK_DEV
select LIBNVDIMM
help help
Treat memory marked using the non-standard e820 type of 12 as used Treat memory marked using the non-standard e820 type of 12 as used
by the Intel Sandy Bridge-EP reference BIOS as protected memory. by the Intel Sandy Bridge-EP reference BIOS as protected memory.
......
...@@ -1224,6 +1224,10 @@ static efi_status_t setup_e820(struct boot_params *params, ...@@ -1224,6 +1224,10 @@ static efi_status_t setup_e820(struct boot_params *params,
e820_type = E820_NVS; e820_type = E820_NVS;
break; break;
case EFI_PERSISTENT_MEMORY:
e820_type = E820_PMEM;
break;
default: default:
continue; continue;
} }
......
...@@ -4,6 +4,7 @@ ...@@ -4,6 +4,7 @@
/* Caches aren't brain-dead on the intel. */ /* Caches aren't brain-dead on the intel. */
#include <asm-generic/cacheflush.h> #include <asm-generic/cacheflush.h>
#include <asm/special_insns.h> #include <asm/special_insns.h>
#include <asm/uaccess.h>
/* /*
* The set_memory_* API can be used to change various attributes of a virtual * The set_memory_* API can be used to change various attributes of a virtual
...@@ -108,4 +109,75 @@ static inline int rodata_test(void) ...@@ -108,4 +109,75 @@ static inline int rodata_test(void)
} }
#endif #endif
#ifdef ARCH_HAS_NOCACHE_UACCESS
/**
* arch_memcpy_to_pmem - copy data to persistent memory
* @dst: destination buffer for the copy
* @src: source buffer for the copy
* @n: length of the copy in bytes
*
* Copy data to persistent memory media via non-temporal stores so that
* a subsequent arch_wmb_pmem() can flush cpu and memory controller
* write buffers to guarantee durability.
*/
static inline void arch_memcpy_to_pmem(void __pmem *dst, const void *src,
size_t n)
{
int unwritten;
/*
* We are copying between two kernel buffers, if
* __copy_from_user_inatomic_nocache() returns an error (page
* fault) we would have already reported a general protection fault
* before the WARN+BUG.
*/
unwritten = __copy_from_user_inatomic_nocache((void __force *) dst,
(void __user *) src, n);
if (WARN(unwritten, "%s: fault copying %p <- %p unwritten: %d\n",
__func__, dst, src, unwritten))
BUG();
}
/**
* arch_wmb_pmem - synchronize writes to persistent memory
*
* After a series of arch_memcpy_to_pmem() operations this drains data
* from cpu write buffers and any platform (memory controller) buffers
* to ensure that written data is durable on persistent memory media.
*/
static inline void arch_wmb_pmem(void)
{
/*
* wmb() to 'sfence' all previous writes such that they are
* architecturally visible to 'pcommit'. Note, that we've
* already arranged for pmem writes to avoid the cache via
* arch_memcpy_to_pmem().
*/
wmb();
pcommit_sfence();
}
static inline bool __arch_has_wmb_pmem(void)
{
#ifdef CONFIG_X86_64
/*
* We require that wmb() be an 'sfence', that is only guaranteed on
* 64-bit builds
*/
return static_cpu_has(X86_FEATURE_PCOMMIT);
#else
return false;
#endif
}
#else /* ARCH_HAS_NOCACHE_UACCESS i.e. ARCH=um */
extern void arch_memcpy_to_pmem(void __pmem *dst, const void *src, size_t n);
extern void arch_wmb_pmem(void);
static inline bool __arch_has_wmb_pmem(void)
{
return false;
}
#endif
#endif /* _ASM_X86_CACHEFLUSH_H */ #endif /* _ASM_X86_CACHEFLUSH_H */
...@@ -248,6 +248,12 @@ static inline void flush_write_buffers(void) ...@@ -248,6 +248,12 @@ static inline void flush_write_buffers(void)
#endif #endif
} }
static inline void __pmem *arch_memremap_pmem(resource_size_t offset,
unsigned long size)
{
return (void __force __pmem *) ioremap_cache(offset, size);
}
#endif /* __KERNEL__ */ #endif /* __KERNEL__ */
extern void native_io_delay(void); extern void native_io_delay(void);
......
...@@ -32,6 +32,7 @@ ...@@ -32,6 +32,7 @@
#define E820_ACPI 3 #define E820_ACPI 3
#define E820_NVS 4 #define E820_NVS 4
#define E820_UNUSABLE 5 #define E820_UNUSABLE 5
#define E820_PMEM 7
/* /*
* This is a non-standardized way to represent ADR or NVDIMM regions that * This is a non-standardized way to represent ADR or NVDIMM regions that
......
...@@ -149,6 +149,7 @@ static void __init e820_print_type(u32 type) ...@@ -149,6 +149,7 @@ static void __init e820_print_type(u32 type)
case E820_UNUSABLE: case E820_UNUSABLE:
printk(KERN_CONT "unusable"); printk(KERN_CONT "unusable");
break; break;
case E820_PMEM:
case E820_PRAM: case E820_PRAM:
printk(KERN_CONT "persistent (type %u)", type); printk(KERN_CONT "persistent (type %u)", type);
break; break;
...@@ -918,11 +919,32 @@ static inline const char *e820_type_to_string(int e820_type) ...@@ -918,11 +919,32 @@ static inline const char *e820_type_to_string(int e820_type)
case E820_ACPI: return "ACPI Tables"; case E820_ACPI: return "ACPI Tables";
case E820_NVS: return "ACPI Non-volatile Storage"; case E820_NVS: return "ACPI Non-volatile Storage";
case E820_UNUSABLE: return "Unusable memory"; case E820_UNUSABLE: return "Unusable memory";
case E820_PRAM: return "Persistent RAM"; case E820_PRAM: return "Persistent Memory (legacy)";
case E820_PMEM: return "Persistent Memory";
default: return "reserved"; default: return "reserved";
} }
} }
static bool do_mark_busy(u32 type, struct resource *res)
{
/* this is the legacy bios/dos rom-shadow + mmio region */
if (res->start < (1ULL<<20))
return true;
/*
* Treat persistent memory like device memory, i.e. reserve it
* for exclusive use of a driver
*/
switch (type) {
case E820_RESERVED:
case E820_PRAM:
case E820_PMEM:
return false;
default:
return true;
}
}
/* /*
* Mark e820 reserved areas as busy for the resource manager. * Mark e820 reserved areas as busy for the resource manager.
*/ */
...@@ -952,9 +974,7 @@ void __init e820_reserve_resources(void) ...@@ -952,9 +974,7 @@ void __init e820_reserve_resources(void)
* pci device BAR resource and insert them later in * pci device BAR resource and insert them later in
* pcibios_resource_survey() * pcibios_resource_survey()
*/ */
if (((e820.map[i].type != E820_RESERVED) && if (do_mark_busy(e820.map[i].type, res)) {
(e820.map[i].type != E820_PRAM)) ||
res->start < (1ULL<<20)) {
res->flags |= IORESOURCE_BUSY; res->flags |= IORESOURCE_BUSY;
insert_resource(&iomem_resource, res); insert_resource(&iomem_resource, res);
} }
......
/* /*
* Copyright (c) 2015, Christoph Hellwig. * Copyright (c) 2015, Christoph Hellwig.
* Copyright (c) 2015, Intel Corporation.
*/ */
#include <linux/memblock.h>
#include <linux/platform_device.h> #include <linux/platform_device.h>
#include <linux/slab.h> #include <linux/libnvdimm.h>
#include <linux/module.h>
#include <asm/e820.h> #include <asm/e820.h>
#include <asm/page_types.h>
#include <asm/setup.h>
static __init void register_pmem_device(struct resource *res) static void e820_pmem_release(struct device *dev)
{ {
struct platform_device *pdev; struct nvdimm_bus *nvdimm_bus = dev->platform_data;
int error;
pdev = platform_device_alloc("pmem", PLATFORM_DEVID_AUTO); if (nvdimm_bus)
if (!pdev) nvdimm_bus_unregister(nvdimm_bus);
return; }
error = platform_device_add_resources(pdev, res, 1); static struct platform_device e820_pmem = {
if (error) .name = "e820_pmem",
goto out_put_pdev; .id = -1,
.dev = {
.release = e820_pmem_release,
},
};
error = platform_device_add(pdev); static const struct attribute_group *e820_pmem_attribute_groups[] = {
if (error) &nvdimm_bus_attribute_group,
goto out_put_pdev; NULL,
return; };
out_put_pdev: static const struct attribute_group *e820_pmem_region_attribute_groups[] = {
dev_warn(&pdev->dev, "failed to add 'pmem' (persistent memory) device!\n"); &nd_region_attribute_group,
platform_device_put(pdev); &nd_device_attribute_group,
} NULL,
};
static __init int register_pmem_devices(void) static __init int register_e820_pmem(void)
{ {
int i; static struct nvdimm_bus_descriptor nd_desc;
struct device *dev = &e820_pmem.dev;
struct nvdimm_bus *nvdimm_bus;
int rc, i;
rc = platform_device_register(&e820_pmem);
if (rc)
return rc;
nd_desc.attr_groups = e820_pmem_attribute_groups;
nd_desc.provider_name = "e820";
nvdimm_bus = nvdimm_bus_register(dev, &nd_desc);
if (!nvdimm_bus)
goto err;
dev->platform_data = nvdimm_bus;
for (i = 0; i < e820.nr_map; i++) { for (i = 0; i < e820.nr_map; i++) {
struct e820entry *ei = &e820.map[i]; struct e820entry *ei = &e820.map[i];
if (ei->type == E820_PRAM) {
struct resource res = { struct resource res = {
.flags = IORESOURCE_MEM, .flags = IORESOURCE_MEM,
.start = ei->addr, .start = ei->addr,
.end = ei->addr + ei->size - 1, .end = ei->addr + ei->size - 1,
}; };
register_pmem_device(&res); struct nd_region_desc ndr_desc;
}
if (ei->type != E820_PRAM)
continue;
memset(&ndr_desc, 0, sizeof(ndr_desc));
ndr_desc.res = &res;
ndr_desc.attr_groups = e820_pmem_region_attribute_groups;
ndr_desc.numa_node = NUMA_NO_NODE;
if (!nvdimm_pmem_region_create(nvdimm_bus, &ndr_desc))
goto err;
} }
return 0; return 0;
err:
dev_err(dev, "failed to register legacy persistent memory ranges\n");
platform_device_unregister(&e820_pmem);
return -ENXIO;
} }
device_initcall(register_pmem_devices); device_initcall(register_e820_pmem);
...@@ -174,6 +174,9 @@ static void __init do_add_efi_memmap(void) ...@@ -174,6 +174,9 @@ static void __init do_add_efi_memmap(void)
case EFI_UNUSABLE_MEMORY: case EFI_UNUSABLE_MEMORY:
e820_type = E820_UNUSABLE; e820_type = E820_UNUSABLE;
break; break;
case EFI_PERSISTENT_MEMORY:
e820_type = E820_PMEM;
break;
default: default:
/* /*
* EFI_RESERVED_TYPE EFI_RUNTIME_SERVICES_CODE * EFI_RESERVED_TYPE EFI_RUNTIME_SERVICES_CODE
......
...@@ -182,4 +182,6 @@ source "drivers/thunderbolt/Kconfig" ...@@ -182,4 +182,6 @@ source "drivers/thunderbolt/Kconfig"
source "drivers/android/Kconfig" source "drivers/android/Kconfig"
source "drivers/nvdimm/Kconfig"
endmenu endmenu
...@@ -64,6 +64,7 @@ obj-$(CONFIG_FB_INTEL) += video/fbdev/intelfb/ ...@@ -64,6 +64,7 @@ obj-$(CONFIG_FB_INTEL) += video/fbdev/intelfb/
obj-$(CONFIG_PARPORT) += parport/ obj-$(CONFIG_PARPORT) += parport/
obj-y += base/ block/ misc/ mfd/ nfc/ obj-y += base/ block/ misc/ mfd/ nfc/
obj-$(CONFIG_LIBNVDIMM) += nvdimm/
obj-$(CONFIG_DMA_SHARED_BUFFER) += dma-buf/ obj-$(CONFIG_DMA_SHARED_BUFFER) += dma-buf/
obj-$(CONFIG_NUBUS) += nubus/ obj-$(CONFIG_NUBUS) += nubus/
obj-y += macintosh/ obj-y += macintosh/
......
...@@ -386,6 +386,32 @@ config ACPI_REDUCED_HARDWARE_ONLY ...@@ -386,6 +386,32 @@ config ACPI_REDUCED_HARDWARE_ONLY
If you are unsure what to do, do not enable this option. If you are unsure what to do, do not enable this option.
config ACPI_NFIT
tristate "ACPI NVDIMM Firmware Interface Table (NFIT)"
depends on PHYS_ADDR_T_64BIT
depends on BLK_DEV
select LIBNVDIMM
help
Infrastructure to probe ACPI 6 compliant platforms for
NVDIMMs (NFIT) and register a libnvdimm device tree. In
addition to storage devices this also enables libnvdimm to pass
ACPI._DSM messages for platform/dimm configuration.
To compile this driver as a module, choose M here:
the module will be called nfit.
config ACPI_NFIT_DEBUG
bool "NFIT DSM debug"
depends on ACPI_NFIT
depends on DYNAMIC_DEBUG
default n
help
Enabling this option causes the nfit driver to dump the
input and output buffers of _DSM operations on the ACPI0012
device and its children. This can be very verbose, so leave
it disabled unless you are debugging a hardware / firmware
issue.
source "drivers/acpi/apei/Kconfig" source "drivers/acpi/apei/Kconfig"
config ACPI_EXTLOG config ACPI_EXTLOG
......
...@@ -68,6 +68,7 @@ obj-$(CONFIG_ACPI_PCI_SLOT) += pci_slot.o ...@@ -68,6 +68,7 @@ obj-$(CONFIG_ACPI_PCI_SLOT) += pci_slot.o
obj-$(CONFIG_ACPI_PROCESSOR) += processor.o obj-$(CONFIG_ACPI_PROCESSOR) += processor.o
obj-y += container.o obj-y += container.o
obj-$(CONFIG_ACPI_THERMAL) += thermal.o obj-$(CONFIG_ACPI_THERMAL) += thermal.o
obj-$(CONFIG_ACPI_NFIT) += nfit.o
obj-y += acpi_memhotplug.o obj-y += acpi_memhotplug.o
obj-$(CONFIG_ACPI_HOTPLUG_IOAPIC) += ioapic.o obj-$(CONFIG_ACPI_HOTPLUG_IOAPIC) += ioapic.o
obj-$(CONFIG_ACPI_BATTERY) += battery.o obj-$(CONFIG_ACPI_BATTERY) += battery.o
......
This diff is collapsed.
/*
* NVDIMM Firmware Interface Table - NFIT
*
* Copyright(c) 2013-2015 Intel Corporation. All rights reserved.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of version 2 of the GNU General Public License as
* published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful, but
* WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License for more details.
*/
#ifndef __NFIT_H__
#define __NFIT_H__
#include <linux/libnvdimm.h>
#include <linux/types.h>
#include <linux/uuid.h>
#include <linux/acpi.h>
#include <acpi/acuuid.h>
#define UUID_NFIT_BUS "2f10e7a4-9e91-11e4-89d3-123b93f75cba"
#define UUID_NFIT_DIMM "4309ac30-0d11-11e4-9191-0800200c9a66"
#define ACPI_NFIT_MEM_FAILED_MASK (ACPI_NFIT_MEM_SAVE_FAILED \
| ACPI_NFIT_MEM_RESTORE_FAILED | ACPI_NFIT_MEM_FLUSH_FAILED \
| ACPI_NFIT_MEM_ARMED)
enum nfit_uuids {
NFIT_SPA_VOLATILE,
NFIT_SPA_PM,
NFIT_SPA_DCR,
NFIT_SPA_BDW,
NFIT_SPA_VDISK,
NFIT_SPA_VCD,
NFIT_SPA_PDISK,
NFIT_SPA_PCD,
NFIT_DEV_BUS,
NFIT_DEV_DIMM,
NFIT_UUID_MAX,
};
struct nfit_spa {
struct acpi_nfit_system_address *spa;
struct list_head list;
};
struct nfit_dcr {
struct acpi_nfit_control_region *dcr;
struct list_head list;
};
struct nfit_bdw {
struct acpi_nfit_data_region *bdw;
struct list_head list;
};
struct nfit_idt {
struct acpi_nfit_interleave *idt;
struct list_head list;
};
struct nfit_memdev {
struct acpi_nfit_memory_map *memdev;
struct list_head list;
};
/* assembled tables for a given dimm/memory-device */
struct nfit_mem {
struct nvdimm *nvdimm;
struct acpi_nfit_memory_map *memdev_dcr;
struct acpi_nfit_memory_map *memdev_pmem;
struct acpi_nfit_memory_map *memdev_bdw;
struct acpi_nfit_control_region *dcr;
struct acpi_nfit_data_region *bdw;
struct acpi_nfit_system_address *spa_dcr;
struct acpi_nfit_system_address *spa_bdw;
struct acpi_nfit_interleave *idt_dcr;
struct acpi_nfit_interleave *idt_bdw;
struct list_head list;
struct acpi_device *adev;
unsigned long dsm_mask;
};
struct acpi_nfit_desc {
struct nvdimm_bus_descriptor nd_desc;
struct acpi_table_nfit *nfit;
struct mutex spa_map_mutex;
struct list_head spa_maps;
struct list_head memdevs;
struct list_head dimms;
struct list_head spas;
struct list_head dcrs;
struct list_head bdws;
struct list_head idts;
struct nvdimm_bus *nvdimm_bus;
struct device *dev;
unsigned long dimm_dsm_force_en;
int (*blk_do_io)(struct nd_blk_region *ndbr, resource_size_t dpa,
void *iobuf, u64 len, int rw);
};
enum nd_blk_mmio_selector {
BDW,
DCR,
};
struct nfit_blk {
struct nfit_blk_mmio {
union {
void __iomem *base;
void *aperture;
};
u64 size;
u64 base_offset;
u32 line_size;
u32 num_lines;
u32 table_size;
struct acpi_nfit_interleave *idt;
struct acpi_nfit_system_address *spa;
} mmio[2];
struct nd_region *nd_region;
u64 bdw_offset; /* post interleave offset */
u64 stat_offset;
u64 cmd_offset;
};
struct nfit_spa_mapping {
struct acpi_nfit_desc *acpi_desc;
struct acpi_nfit_system_address *spa;
struct list_head list;
struct kref kref;
void __iomem *iomem;
};
static inline struct nfit_spa_mapping *to_spa_map(struct kref *kref)
{
return container_of(kref, struct nfit_spa_mapping, kref);
}
static inline struct acpi_nfit_memory_map *__to_nfit_memdev(
struct nfit_mem *nfit_mem)
{
if (nfit_mem->memdev_dcr)
return nfit_mem->memdev_dcr;
return nfit_mem->memdev_pmem;
}
static inline struct acpi_nfit_desc *to_acpi_desc(
struct nvdimm_bus_descriptor *nd_desc)
{
return container_of(nd_desc, struct acpi_nfit_desc, nd_desc);
}
const u8 *to_nfit_uuid(enum nfit_uuids id);
int acpi_nfit_init(struct acpi_nfit_desc *nfit, acpi_size sz);
extern const struct attribute_group *acpi_nfit_attribute_groups[];
#endif /* __NFIT_H__ */
...@@ -29,6 +29,8 @@ ...@@ -29,6 +29,8 @@
#include <linux/errno.h> #include <linux/errno.h>
#include <linux/acpi.h> #include <linux/acpi.h>
#include <linux/numa.h> #include <linux/numa.h>
#include <linux/nodemask.h>
#include <linux/topology.h>
#define PREFIX "ACPI: " #define PREFIX "ACPI: "
...@@ -70,7 +72,12 @@ static void __acpi_map_pxm_to_node(int pxm, int node) ...@@ -70,7 +72,12 @@ static void __acpi_map_pxm_to_node(int pxm, int node)
int acpi_map_pxm_to_node(int pxm) int acpi_map_pxm_to_node(int pxm)
{ {
int node = pxm_to_node_map[pxm]; int node;
if (pxm < 0 || pxm >= MAX_PXM_DOMAINS)
return NUMA_NO_NODE;
node = pxm_to_node_map[pxm];
if (node == NUMA_NO_NODE) { if (node == NUMA_NO_NODE) {
if (nodes_weight(nodes_found_map) >= MAX_NUMNODES) if (nodes_weight(nodes_found_map) >= MAX_NUMNODES)
...@@ -83,6 +90,45 @@ int acpi_map_pxm_to_node(int pxm) ...@@ -83,6 +90,45 @@ int acpi_map_pxm_to_node(int pxm)
return node; return node;
} }
/**
* acpi_map_pxm_to_online_node - Map proximity ID to online node
* @pxm: ACPI proximity ID
*
* This is similar to acpi_map_pxm_to_node(), but always returns an online
* node. When the mapped node from a given proximity ID is offline, it
* looks up the node distance table and returns the nearest online node.
*
* ACPI device drivers, which are called after the NUMA initialization has
* completed in the kernel, can call this interface to obtain their device
* NUMA topology from ACPI tables. Such drivers do not have to deal with
* offline nodes. A node may be offline when a device proximity ID is
* unique, SRAT memory entry does not exist, or NUMA is disabled, ex.
* "numa=off" on x86.
*/
int acpi_map_pxm_to_online_node(int pxm)
{
int node, n, dist, min_dist;
node = acpi_map_pxm_to_node(pxm);
if (node == NUMA_NO_NODE)
node = 0;
if (!node_online(node)) {
min_dist = INT_MAX;
for_each_online_node(n) {
dist = node_distance(node, n);
if (dist < min_dist) {
min_dist = dist;
node = n;
}
}
}
return node;
}
EXPORT_SYMBOL(acpi_map_pxm_to_online_node);
static void __init static void __init
acpi_table_print_srat_entry(struct acpi_subtable_header *header) acpi_table_print_srat_entry(struct acpi_subtable_header *header)
{ {
...@@ -328,8 +374,6 @@ int acpi_get_node(acpi_handle handle) ...@@ -328,8 +374,6 @@ int acpi_get_node(acpi_handle handle)
int pxm; int pxm;
pxm = acpi_get_pxm(handle); pxm = acpi_get_pxm(handle);
if (pxm < 0 || pxm >= MAX_PXM_DOMAINS)
return NUMA_NO_NODE;
return acpi_map_pxm_to_node(pxm); return acpi_map_pxm_to_node(pxm);
} }
......
...@@ -404,18 +404,6 @@ config BLK_DEV_RAM_DAX ...@@ -404,18 +404,6 @@ config BLK_DEV_RAM_DAX
and will prevent RAM block device backing store memory from being and will prevent RAM block device backing store memory from being
allocated from highmem (only a problem for highmem systems). allocated from highmem (only a problem for highmem systems).
config BLK_DEV_PMEM
tristate "Persistent memory block device support"
depends on HAS_IOMEM
help
Saying Y here will allow you to use a contiguous range of reserved
memory as one or more persistent block devices.
To compile this driver as a module, choose M here: the module will be
called 'pmem'.
If unsure, say N.
config CDROM_PKTCDVD config CDROM_PKTCDVD
tristate "Packet writing on CD/DVD media" tristate "Packet writing on CD/DVD media"
depends on !UML depends on !UML
......
...@@ -14,7 +14,6 @@ obj-$(CONFIG_PS3_VRAM) += ps3vram.o ...@@ -14,7 +14,6 @@ obj-$(CONFIG_PS3_VRAM) += ps3vram.o
obj-$(CONFIG_ATARI_FLOPPY) += ataflop.o obj-$(CONFIG_ATARI_FLOPPY) += ataflop.o
obj-$(CONFIG_AMIGA_Z2RAM) += z2ram.o obj-$(CONFIG_AMIGA_Z2RAM) += z2ram.o
obj-$(CONFIG_BLK_DEV_RAM) += brd.o obj-$(CONFIG_BLK_DEV_RAM) += brd.o
obj-$(CONFIG_BLK_DEV_PMEM) += pmem.o
obj-$(CONFIG_BLK_DEV_LOOP) += loop.o obj-$(CONFIG_BLK_DEV_LOOP) += loop.o
obj-$(CONFIG_BLK_CPQ_DA) += cpqarray.o obj-$(CONFIG_BLK_CPQ_DA) += cpqarray.o
obj-$(CONFIG_BLK_CPQ_CISS_DA) += cciss.o obj-$(CONFIG_BLK_CPQ_CISS_DA) += cciss.o
......
menuconfig LIBNVDIMM
tristate "NVDIMM (Non-Volatile Memory Device) Support"
depends on PHYS_ADDR_T_64BIT
depends on BLK_DEV
help
Generic support for non-volatile memory devices including
ACPI-6-NFIT defined resources. On platforms that define an
NFIT, or otherwise can discover NVDIMM resources, a libnvdimm
bus is registered to advertise PMEM (persistent memory)
namespaces (/dev/pmemX) and BLK (sliding mmio window(s))
namespaces (/dev/ndblkX.Y). A PMEM namespace refers to a
memory resource that may span multiple DIMMs and support DAX
(see CONFIG_DAX). A BLK namespace refers to an NVDIMM control
region which exposes an mmio register set for windowed access
mode to non-volatile memory.
if LIBNVDIMM
config BLK_DEV_PMEM
tristate "PMEM: Persistent memory block device support"
default LIBNVDIMM
depends on HAS_IOMEM
select ND_BTT if BTT
help
Memory ranges for PMEM are described by either an NFIT
(NVDIMM Firmware Interface Table, see CONFIG_NFIT_ACPI), a
non-standard OEM-specific E820 memory type (type-12, see
CONFIG_X86_PMEM_LEGACY), or it is manually specified by the
'memmap=nn[KMG]!ss[KMG]' kernel command line (see
Documentation/kernel-parameters.txt). This driver converts
these persistent memory ranges into block devices that are
capable of DAX (direct-access) file system mappings. See
Documentation/nvdimm/nvdimm.txt for more details.
Say Y if you want to use an NVDIMM
config ND_BLK
tristate "BLK: Block data window (aperture) device support"
default LIBNVDIMM
select ND_BTT if BTT
help
Support NVDIMMs, or other devices, that implement a BLK-mode
access capability. BLK-mode access uses memory-mapped-i/o
apertures to access persistent media.
Say Y if your platform firmware emits an ACPI.NFIT table
(CONFIG_ACPI_NFIT), or otherwise exposes BLK-mode
capabilities.
config ND_BTT
tristate
config BTT
bool "BTT: Block Translation Table (atomic sector updates)"
default y if LIBNVDIMM
help
The Block Translation Table (BTT) provides atomic sector
update semantics for persistent memory devices, so that
applications that rely on sector writes not being torn (a
guarantee that typical disks provide) can continue to do so.
The BTT manifests itself as an alternate personality for an
NVDIMM namespace, i.e. a namespace can be in raw mode (pmemX,
ndblkX.Y, etc...), or 'sectored' mode, (pmemXs, ndblkX.Ys,
etc...).
Select Y if unsure
endif
obj-$(CONFIG_LIBNVDIMM) += libnvdimm.o
obj-$(CONFIG_BLK_DEV_PMEM) += nd_pmem.o
obj-$(CONFIG_ND_BTT) += nd_btt.o
obj-$(CONFIG_ND_BLK) += nd_blk.o
nd_pmem-y := pmem.o
nd_btt-y := btt.o
nd_blk-y := blk.o
libnvdimm-y := core.o
libnvdimm-y += bus.o
libnvdimm-y += dimm_devs.o
libnvdimm-y += dimm.o
libnvdimm-y += region_devs.o
libnvdimm-y += region.o
libnvdimm-y += namespace_devs.o
libnvdimm-y += label.o
libnvdimm-$(CONFIG_BTT) += btt_devs.o
/*
* NVDIMM Block Window Driver
* Copyright (c) 2014, Intel Corporation.
*
* This program is free software; you can redistribute it and/or modify it
* under the terms and conditions of the GNU General Public License,
* version 2, as published by the Free Software Foundation.
*
* This program is distributed in the hope it will be useful, but WITHOUT
* ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
* FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
* more details.
*/
#include <linux/blkdev.h>
#include <linux/fs.h>
#include <linux/genhd.h>
#include <linux/module.h>
#include <linux/moduleparam.h>
#include <linux/nd.h>
#include <linux/sizes.h>
#include "nd.h"
struct nd_blk_device {
struct request_queue *queue;
struct gendisk *disk;
struct nd_namespace_blk *nsblk;
struct nd_blk_region *ndbr;
size_t disk_size;
u32 sector_size;
u32 internal_lbasize;
};
static int nd_blk_major;
static u32 nd_blk_meta_size(struct nd_blk_device *blk_dev)
{
return blk_dev->nsblk->lbasize - blk_dev->sector_size;
}
static resource_size_t to_dev_offset(struct nd_namespace_blk *nsblk,
resource_size_t ns_offset, unsigned int len)
{
int i;
for (i = 0; i < nsblk->num_resources; i++) {
if (ns_offset < resource_size(nsblk->res[i])) {
if (ns_offset + len > resource_size(nsblk->res[i])) {
dev_WARN_ONCE(&nsblk->common.dev, 1,
"illegal request\n");
return SIZE_MAX;
}
return nsblk->res[i]->start + ns_offset;
}
ns_offset -= resource_size(nsblk->res[i]);
}
dev_WARN_ONCE(&nsblk->common.dev, 1, "request out of range\n");
return SIZE_MAX;
}
#ifdef CONFIG_BLK_DEV_INTEGRITY
static int nd_blk_rw_integrity(struct nd_blk_device *blk_dev,
struct bio_integrity_payload *bip, u64 lba,
int rw)
{
unsigned int len = nd_blk_meta_size(blk_dev);
resource_size_t dev_offset, ns_offset;
struct nd_namespace_blk *nsblk;
struct nd_blk_region *ndbr;
int err = 0;
nsblk = blk_dev->nsblk;
ndbr = blk_dev->ndbr;
ns_offset = lba * blk_dev->internal_lbasize + blk_dev->sector_size;
dev_offset = to_dev_offset(nsblk, ns_offset, len);
if (dev_offset == SIZE_MAX)
return -EIO;
while (len) {
unsigned int cur_len;
struct bio_vec bv;
void *iobuf;
bv = bvec_iter_bvec(bip->bip_vec, bip->bip_iter);
/*
* The 'bv' obtained from bvec_iter_bvec has its .bv_len and
* .bv_offset already adjusted for iter->bi_bvec_done, and we
* can use those directly
*/
cur_len = min(len, bv.bv_len);
iobuf = kmap_atomic(bv.bv_page);
err = ndbr->do_io(ndbr, dev_offset, iobuf + bv.bv_offset,
cur_len, rw);
kunmap_atomic(iobuf);
if (err)
return err;
len -= cur_len;
dev_offset += cur_len;
bvec_iter_advance(bip->bip_vec, &bip->bip_iter, cur_len);
}
return err;
}
#else /* CONFIG_BLK_DEV_INTEGRITY */
static int nd_blk_rw_integrity(struct nd_blk_device *blk_dev,
struct bio_integrity_payload *bip, u64 lba,
int rw)
{
return 0;
}
#endif
static int nd_blk_do_bvec(struct nd_blk_device *blk_dev,
struct bio_integrity_payload *bip, struct page *page,
unsigned int len, unsigned int off, int rw,
sector_t sector)
{
struct nd_blk_region *ndbr = blk_dev->ndbr;
resource_size_t dev_offset, ns_offset;
int err = 0;
void *iobuf;
u64 lba;
while (len) {
unsigned int cur_len;
/*
* If we don't have an integrity payload, we don't have to
* split the bvec into sectors, as this would cause unnecessary
* Block Window setup/move steps. the do_io routine is capable
* of handling len <= PAGE_SIZE.
*/
cur_len = bip ? min(len, blk_dev->sector_size) : len;
lba = div_u64(sector << SECTOR_SHIFT, blk_dev->sector_size);
ns_offset = lba * blk_dev->internal_lbasize;
dev_offset = to_dev_offset(blk_dev->nsblk, ns_offset, cur_len);
if (dev_offset == SIZE_MAX)
return -EIO;
iobuf = kmap_atomic(page);
err = ndbr->do_io(ndbr, dev_offset, iobuf + off, cur_len, rw);
kunmap_atomic(iobuf);
if (err)
return err;
if (bip) {
err = nd_blk_rw_integrity(blk_dev, bip, lba, rw);
if (err)
return err;
}
len -= cur_len;
off += cur_len;
sector += blk_dev->sector_size >> SECTOR_SHIFT;
}
return err;
}
static void nd_blk_make_request(struct request_queue *q, struct bio *bio)
{
struct block_device *bdev = bio->bi_bdev;
struct gendisk *disk = bdev->bd_disk;
struct bio_integrity_payload *bip;
struct nd_blk_device *blk_dev;
struct bvec_iter iter;
unsigned long start;
struct bio_vec bvec;
int err = 0, rw;
bool do_acct;
/*
* bio_integrity_enabled also checks if the bio already has an
* integrity payload attached. If it does, we *don't* do a
* bio_integrity_prep here - the payload has been generated by
* another kernel subsystem, and we just pass it through.
*/
if (bio_integrity_enabled(bio) && bio_integrity_prep(bio)) {
err = -EIO;
goto out;
}
bip = bio_integrity(bio);
blk_dev = disk->private_data;
rw = bio_data_dir(bio);
do_acct = nd_iostat_start(bio, &start);
bio_for_each_segment(bvec, bio, iter) {
unsigned int len = bvec.bv_len;
BUG_ON(len > PAGE_SIZE);
err = nd_blk_do_bvec(blk_dev, bip, bvec.bv_page, len,
bvec.bv_offset, rw, iter.bi_sector);
if (err) {
dev_info(&blk_dev->nsblk->common.dev,
"io error in %s sector %lld, len %d,\n",
(rw == READ) ? "READ" : "WRITE",
(unsigned long long) iter.bi_sector, len);
break;
}
}
if (do_acct)
nd_iostat_end(bio, start);
out:
bio_endio(bio, err);
}
static int nd_blk_rw_bytes(struct nd_namespace_common *ndns,
resource_size_t offset, void *iobuf, size_t n, int rw)
{
struct nd_blk_device *blk_dev = dev_get_drvdata(ndns->claim);
struct nd_namespace_blk *nsblk = blk_dev->nsblk;
struct nd_blk_region *ndbr = blk_dev->ndbr;
resource_size_t dev_offset;
dev_offset = to_dev_offset(nsblk, offset, n);
if (unlikely(offset + n > blk_dev->disk_size)) {
dev_WARN_ONCE(&ndns->dev, 1, "request out of range\n");
return -EFAULT;
}
if (dev_offset == SIZE_MAX)
return -EIO;
return ndbr->do_io(ndbr, dev_offset, iobuf, n, rw);
}
static const struct block_device_operations nd_blk_fops = {
.owner = THIS_MODULE,
.revalidate_disk = nvdimm_revalidate_disk,
};
static int nd_blk_attach_disk(struct nd_namespace_common *ndns,
struct nd_blk_device *blk_dev)
{
resource_size_t available_disk_size;
struct gendisk *disk;
u64 internal_nlba;
internal_nlba = div_u64(blk_dev->disk_size, blk_dev->internal_lbasize);
available_disk_size = internal_nlba * blk_dev->sector_size;
blk_dev->queue = blk_alloc_queue(GFP_KERNEL);
if (!blk_dev->queue)
return -ENOMEM;
blk_queue_make_request(blk_dev->queue, nd_blk_make_request);
blk_queue_max_hw_sectors(blk_dev->queue, UINT_MAX);
blk_queue_bounce_limit(blk_dev->queue, BLK_BOUNCE_ANY);
blk_queue_logical_block_size(blk_dev->queue, blk_dev->sector_size);
queue_flag_set_unlocked(QUEUE_FLAG_NONROT, blk_dev->queue);
disk = blk_dev->disk = alloc_disk(0);
if (!disk) {
blk_cleanup_queue(blk_dev->queue);
return -ENOMEM;
}
disk->driverfs_dev = &ndns->dev;
disk->major = nd_blk_major;
disk->first_minor = 0;
disk->fops = &nd_blk_fops;
disk->private_data = blk_dev;
disk->queue = blk_dev->queue;
disk->flags = GENHD_FL_EXT_DEVT;
nvdimm_namespace_disk_name(ndns, disk->disk_name);
set_capacity(disk, 0);
add_disk(disk);
if (nd_blk_meta_size(blk_dev)) {
int rc = nd_integrity_init(disk, nd_blk_meta_size(blk_dev));
if (rc) {
del_gendisk(disk);
put_disk(disk);
blk_cleanup_queue(blk_dev->queue);
return rc;
}
}
set_capacity(disk, available_disk_size >> SECTOR_SHIFT);
revalidate_disk(disk);
return 0;
}
static int nd_blk_probe(struct device *dev)
{
struct nd_namespace_common *ndns;
struct nd_namespace_blk *nsblk;
struct nd_blk_device *blk_dev;
int rc;
ndns = nvdimm_namespace_common_probe(dev);
if (IS_ERR(ndns))
return PTR_ERR(ndns);
blk_dev = kzalloc(sizeof(*blk_dev), GFP_KERNEL);
if (!blk_dev)
return -ENOMEM;
nsblk = to_nd_namespace_blk(&ndns->dev);
blk_dev->disk_size = nvdimm_namespace_capacity(ndns);
blk_dev->ndbr = to_nd_blk_region(dev->parent);
blk_dev->nsblk = to_nd_namespace_blk(&ndns->dev);
blk_dev->internal_lbasize = roundup(nsblk->lbasize,
INT_LBASIZE_ALIGNMENT);
blk_dev->sector_size = ((nsblk->lbasize >= 4096) ? 4096 : 512);
dev_set_drvdata(dev, blk_dev);
ndns->rw_bytes = nd_blk_rw_bytes;
if (is_nd_btt(dev))
rc = nvdimm_namespace_attach_btt(ndns);
else if (nd_btt_probe(ndns, blk_dev) == 0) {
/* we'll come back as btt-blk */
rc = -ENXIO;
} else
rc = nd_blk_attach_disk(ndns, blk_dev);
if (rc)
kfree(blk_dev);
return rc;
}
static void nd_blk_detach_disk(struct nd_blk_device *blk_dev)
{
del_gendisk(blk_dev->disk);
put_disk(blk_dev->disk);
blk_cleanup_queue(blk_dev->queue);
}
static int nd_blk_remove(struct device *dev)
{
struct nd_blk_device *blk_dev = dev_get_drvdata(dev);
if (is_nd_btt(dev))
nvdimm_namespace_detach_btt(to_nd_btt(dev)->ndns);
else
nd_blk_detach_disk(blk_dev);
kfree(blk_dev);
return 0;
}
static struct nd_device_driver nd_blk_driver = {
.probe = nd_blk_probe,
.remove = nd_blk_remove,
.drv = {
.name = "nd_blk",
},
.type = ND_DRIVER_NAMESPACE_BLK,
};
static int __init nd_blk_init(void)
{
int rc;
rc = register_blkdev(0, "nd_blk");
if (rc < 0)
return rc;
nd_blk_major = rc;
rc = nd_driver_register(&nd_blk_driver);
if (rc < 0)
unregister_blkdev(nd_blk_major, "nd_blk");
return rc;
}
static void __exit nd_blk_exit(void)
{
driver_unregister(&nd_blk_driver.drv);
unregister_blkdev(nd_blk_major, "nd_blk");
}
MODULE_AUTHOR("Ross Zwisler <ross.zwisler@linux.intel.com>");
MODULE_LICENSE("GPL v2");
MODULE_ALIAS_ND_DEVICE(ND_DEVICE_NAMESPACE_BLK);
module_init(nd_blk_init);
module_exit(nd_blk_exit);
This diff is collapsed.
/*
* Block Translation Table library
* Copyright (c) 2014-2015, Intel Corporation.
*
* This program is free software; you can redistribute it and/or modify it
* under the terms and conditions of the GNU General Public License,
* version 2, as published by the Free Software Foundation.
*
* This program is distributed in the hope it will be useful, but WITHOUT
* ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
* FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
* more details.
*/
#ifndef _LINUX_BTT_H
#define _LINUX_BTT_H
#include <linux/types.h>
#define BTT_SIG_LEN 16
#define BTT_SIG "BTT_ARENA_INFO\0"
#define MAP_ENT_SIZE 4
#define MAP_TRIM_SHIFT 31
#define MAP_TRIM_MASK (1 << MAP_TRIM_SHIFT)
#define MAP_ERR_SHIFT 30
#define MAP_ERR_MASK (1 << MAP_ERR_SHIFT)
#define MAP_LBA_MASK (~((1 << MAP_TRIM_SHIFT) | (1 << MAP_ERR_SHIFT)))
#define MAP_ENT_NORMAL 0xC0000000
#define LOG_ENT_SIZE sizeof(struct log_entry)
#define ARENA_MIN_SIZE (1UL << 24) /* 16 MB */
#define ARENA_MAX_SIZE (1ULL << 39) /* 512 GB */
#define RTT_VALID (1UL << 31)
#define RTT_INVALID 0
#define BTT_PG_SIZE 4096
#define BTT_DEFAULT_NFREE ND_MAX_LANES
#define LOG_SEQ_INIT 1
#define IB_FLAG_ERROR 0x00000001
#define IB_FLAG_ERROR_MASK 0x00000001
enum btt_init_state {
INIT_UNCHECKED = 0,
INIT_NOTFOUND,
INIT_READY
};
struct log_entry {
__le32 lba;
__le32 old_map;
__le32 new_map;
__le32 seq;
__le64 padding[2];
};
struct btt_sb {
u8 signature[BTT_SIG_LEN];
u8 uuid[16];
u8 parent_uuid[16];
__le32 flags;
__le16 version_major;
__le16 version_minor;
__le32 external_lbasize;
__le32 external_nlba;
__le32 internal_lbasize;
__le32 internal_nlba;
__le32 nfree;
__le32 infosize;
__le64 nextoff;
__le64 dataoff;
__le64 mapoff;
__le64 logoff;
__le64 info2off;
u8 padding[3968];
__le64 checksum;
};
struct free_entry {
u32 block;
u8 sub;
u8 seq;
};
struct aligned_lock {
union {
spinlock_t lock;
u8 cacheline_padding[L1_CACHE_BYTES];
};
};
/**
* struct arena_info - handle for an arena
* @size: Size in bytes this arena occupies on the raw device.
* This includes arena metadata.
* @external_lba_start: The first external LBA in this arena.
* @internal_nlba: Number of internal blocks available in the arena
* including nfree reserved blocks
* @internal_lbasize: Internal and external lba sizes may be different as
* we can round up 'odd' external lbasizes such as 520B
* to be aligned.
* @external_nlba: Number of blocks contributed by the arena to the number
* reported to upper layers. (internal_nlba - nfree)
* @external_lbasize: LBA size as exposed to upper layers.
* @nfree: A reserve number of 'free' blocks that is used to
* handle incoming writes.
* @version_major: Metadata layout version major.
* @version_minor: Metadata layout version minor.
* @nextoff: Offset in bytes to the start of the next arena.
* @infooff: Offset in bytes to the info block of this arena.
* @dataoff: Offset in bytes to the data area of this arena.
* @mapoff: Offset in bytes to the map area of this arena.
* @logoff: Offset in bytes to the log area of this arena.
* @info2off: Offset in bytes to the backup info block of this arena.
* @freelist: Pointer to in-memory list of free blocks
* @rtt: Pointer to in-memory "Read Tracking Table"
* @map_locks: Spinlocks protecting concurrent map writes
* @nd_btt: Pointer to parent nd_btt structure.
* @list: List head for list of arenas
* @debugfs_dir: Debugfs dentry
* @flags: Arena flags - may signify error states.
*
* arena_info is a per-arena handle. Once an arena is narrowed down for an
* IO, this struct is passed around for the duration of the IO.
*/
struct arena_info {
u64 size; /* Total bytes for this arena */
u64 external_lba_start;
u32 internal_nlba;
u32 internal_lbasize;
u32 external_nlba;
u32 external_lbasize;
u32 nfree;
u16 version_major;
u16 version_minor;
/* Byte offsets to the different on-media structures */
u64 nextoff;
u64 infooff;
u64 dataoff;
u64 mapoff;
u64 logoff;
u64 info2off;
/* Pointers to other in-memory structures for this arena */
struct free_entry *freelist;
u32 *rtt;
struct aligned_lock *map_locks;
struct nd_btt *nd_btt;
struct list_head list;
struct dentry *debugfs_dir;
/* Arena flags */
u32 flags;
};
/**
* struct btt - handle for a BTT instance
* @btt_disk: Pointer to the gendisk for BTT device
* @btt_queue: Pointer to the request queue for the BTT device
* @arena_list: Head of the list of arenas
* @debugfs_dir: Debugfs dentry
* @nd_btt: Parent nd_btt struct
* @nlba: Number of logical blocks exposed to the upper layers
* after removing the amount of space needed by metadata
* @rawsize: Total size in bytes of the available backing device
* @lbasize: LBA size as requested and presented to upper layers.
* This is sector_size + size of any metadata.
* @sector_size: The Linux sector size - 512 or 4096
* @lanes: Per-lane spinlocks
* @init_lock: Mutex used for the BTT initialization
* @init_state: Flag describing the initialization state for the BTT
* @num_arenas: Number of arenas in the BTT instance
*/
struct btt {
struct gendisk *btt_disk;
struct request_queue *btt_queue;
struct list_head arena_list;
struct dentry *debugfs_dir;
struct nd_btt *nd_btt;
u64 nlba;
unsigned long long rawsize;
u32 lbasize;
u32 sector_size;
struct nd_region *nd_region;
struct mutex init_lock;
int init_state;
int num_arenas;
};
#endif
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
/*
* Copyright(c) 2013-2015 Intel Corporation. All rights reserved.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of version 2 of the GNU General Public License as
* published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful, but
* WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License for more details.
*/
#include <linux/vmalloc.h>
#include <linux/module.h>
#include <linux/device.h>
#include <linux/sizes.h>
#include <linux/ndctl.h>
#include <linux/slab.h>
#include <linux/mm.h>
#include <linux/nd.h>
#include "label.h"
#include "nd.h"
static int nvdimm_probe(struct device *dev)
{
struct nvdimm_drvdata *ndd;
int rc;
ndd = kzalloc(sizeof(*ndd), GFP_KERNEL);
if (!ndd)
return -ENOMEM;
dev_set_drvdata(dev, ndd);
ndd->dpa.name = dev_name(dev);
ndd->ns_current = -1;
ndd->ns_next = -1;
ndd->dpa.start = 0;
ndd->dpa.end = -1;
ndd->dev = dev;
get_device(dev);
kref_init(&ndd->kref);
rc = nvdimm_init_nsarea(ndd);
if (rc)
goto err;
rc = nvdimm_init_config_data(ndd);
if (rc)
goto err;
dev_dbg(dev, "config data size: %d\n", ndd->nsarea.config_size);
nvdimm_bus_lock(dev);
ndd->ns_current = nd_label_validate(ndd);
ndd->ns_next = nd_label_next_nsindex(ndd->ns_current);
nd_label_copy(ndd, to_next_namespace_index(ndd),
to_current_namespace_index(ndd));
rc = nd_label_reserve_dpa(ndd);
nvdimm_bus_unlock(dev);
if (rc)
goto err;
return 0;
err:
put_ndd(ndd);
return rc;
}
static int nvdimm_remove(struct device *dev)
{
struct nvdimm_drvdata *ndd = dev_get_drvdata(dev);
nvdimm_bus_lock(dev);
dev_set_drvdata(dev, NULL);
nvdimm_bus_unlock(dev);
put_ndd(ndd);
return 0;
}
static struct nd_device_driver nvdimm_driver = {
.probe = nvdimm_probe,
.remove = nvdimm_remove,
.drv = {
.name = "nvdimm",
},
.type = ND_DRIVER_DIMM,
};
int __init nvdimm_init(void)
{
return nd_driver_register(&nvdimm_driver);
}
void nvdimm_exit(void)
{
driver_unregister(&nvdimm_driver.drv);
}
MODULE_ALIAS_ND_DEVICE(ND_DEVICE_DIMM);
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment