Chapter 3. Using NVDIMM persistent memory storage

As a system administrator, you can enable and manage various types of storage on Non-Volatile Dual In-line Memory Modules (NVDIMM) devices connected to your system.

For installing Red Hat Enterprise Linux 8 on NVDIMM storage, see Installing to an NVDIMM device instead.

3.1. The NVDIMM persistent memory technology

NVDIMM persistent memory, also called storage class memory or pmem, is a combination of memory and storage.

NVDIMM combines the durability of storage with the low access latency and the high bandwidth of dynamic RAM (DRAM):

  • NVDIMM storage is byte-addressable, so it can be accessed by using the CPU load and store instructions. In addition to the read() and write() system calls, which are required for accessing traditional block-based storage, NVDIMM also supports direct load and store programming model.
  • The performance characteristics of NVDIMM are similar to DRAM with very low access latency, typically in the tens to hundreds of nanoseconds.
  • Data stored on NVDIMM are preserved when the power is off, like with storage.
  • The direct access (DAX) technology enables applications to memory map storage directly, without going through the system page cache. This frees up DRAM for other purposes.

NVDIMM is beneficial in use cases such as:

Databases
The reduced storage access latency on NVDIMM can dramatically improve database performance.
Rapid restart

Rapid restart is also called the warm cache effect. For example, a file server has none of the file contents in memory after starting. As clients connect and read or write data, that data is cached in the page cache. Eventually, the cache contains mostly hot data. After a reboot, the system must start the process again on traditional storage.

NVDIMM enables an application to keep the warm cache across reboots if the application is designed properly. In this example, there would be no page cache involved: the application would cache data directly in the persistent memory.

Fast write-cache
File servers often do not acknowledge a client’s write request until the data is on durable media. Using NVDIMM as a fast write cache enables a file server to acknowledge the write request quickly thanks to the low latency.

3.2. NVDIMM interleaving and regions

NVDIMM devices support grouping into interleaved regions.

NVDIMM devices can be grouped into interleave sets in the same way as regular DRAM. An interleave set is similar to a RAID 0 level (stripe) configuration across multiple DIMMs. An Interleave set is also called a region.

Interleaving has the following advantages:

  • NVDIMM devices benefit from increased performance when they are configured into interleave sets.
  • Interleaving can combine multiple smaller NVDIMM devices into a larger logical device.

NVDIMM interleave sets are configured in the system BIOS or UEFI firmware.

Red Hat Enterprise Linux creates one region device for each interleave set.

3.3. NVDIMM namespaces

NVDIMM regions are divided into one or more namespaces. Namespaces enable you to access the device using different methods, based on the type of the namespace.

Some NVDIMM devices do not support multiple namespaces on a region:

  • If your ⁠NVDIMM device supports labels, you can subdivide the region into namespaces.
  • If your NVDIMM device does not support labels, the region can only contain a single namespace. In that case, Red Hat Enterprise Linux creates a default namespace that covers the entire region.

3.4. NVDIMM access modes

You can configure NVDIMM namespaces to use either of the following modes:

sector

Presents the storage as a fast block device. This mode is useful for legacy applications that have not been modified to use NVDIMM storage, or for applications that make use of the full I/O stack, including Device Mapper.

A sector device can be used in the same way as any other block device on the system. You can create partitions or file systems on it, configure it as part of a software RAID set, or use it as the cache device for dm-cache.

Devices in this mode are available at /dev/pmemNs. See the blockdev value listed after creating the namespace.

devdax, or device direct access (DAX)

Enables NVDIMM devices to support direct access programming as described in the Storage Networking Industry Association (SNIA) Non-Volatile Memory (NVM) Programming Model specification. In this mode, I/O bypasses the storage stack of the kernel. Therefore, no Device Mapper drivers can be used.

Device DAX provides raw access to NVDIMM storage by using a DAX character device node. Data on a devdax device can be made durable using CPU cache flushing and fencing instructions. Certain databases and virtual machine hypervisors might benefit from this mode. File systems cannot be created on devdax devices.

Devices in this mode are available at /dev/daxN.M. See the chardev value listed after creating the namespace.

fsdax, or file system direct access (DAX)

Enables NVDIMM devices to support direct access programming as described in the Storage Networking Industry Association (SNIA) Non-Volatile Memory (NVM) Programming Model specification. In this mode, I/O bypasses the storage stack of the kernel, and many Device Mapper drivers therefore cannot be used.

You can create file systems on file system DAX devices.

Devices in this mode are available at /dev/pmemN. See the blockdev value listed after creating the namespace.

Important

The file system DAX technology is provided only as a Technology Preview, and is not supported by Red Hat.

raw

Presents a memory disk that does not support DAX. In this mode, namespaces have several limitations and should not be used.

Devices in this mode are available at /dev/pmemN. See the blockdev value listed after creating the namespace.

3.5. Creating a sector namespace on an NVDIMM to act as a block device

You can configure an NVDIMM device in sector mode, which is also called legacy mode, to support traditional, block-based storage.

You can either:

  • reconfigure an existing namespace to sector mode, or
  • create a new sector namespace if there is available space.

3.5.1. Prerequisites

  • An NVDIMM device is attached to your system.

3.5.2. Installing ndctl

This procedure installs the ndctl utility, which is used to configure and monitor NVDIMM devices.

Procedure
  • To install the ndctl utility, use the following command:

    # yum install ndctl

3.5.3. Reconfiguring an existing NVDIMM namespace to sector mode

This procedure reconfigures an NVDIMM namespace to sector mode for use as a fast block device.

Warning

Reconfiguring a namespace deletes all data previously stored on the namespace.

Prerequisites
Procedure
  1. Reconfigure the selected namespace to sector mode:

    # ndctl create-namespace \
            --force \
            --reconfig=namespace-ID \
            --mode=sector

    Example 3.1. Reconfiguring namespace1.0 in sector mode

    To reconfigure the namespace1.0 namespace to use sector mode:

    # ndctl create-namespace \
            --force \
            --reconfig=namespace1.0 \
            --mode=sector
    
    {
      "dev":"namespace1.0",
      "mode":"sector",
      "size":"11.99 GiB (12.87 GB)",
      "uuid":"5805480e-90e6-407e-96a4-23e1cde2ed78",
      "raw_uuid":"879d9e9f-fd43-4ed5-b64f-3bcd0781391a",
      "sector_size":4096,
      "blockdev":"pmem1s",
      "numa_node":1
    }
  2. The reconfigured namespace is now available under the /dev directory as /dev/pmemNs.
Additional resources
  • The ndctl-create-namespace(1) man page

3.5.4. Creating a new NVDIMM namespace in sector mode

This procedure creates a new sector namespace on an NVDIMM device, enabling you to use it as a traditional block device.

Prerequisites
Procedure
  1. List the pmem regions on your system that have available space. In the following example, space is available in the region5 and region4 regions:

    # ndctl list --regions
    
    [
      {
        "dev":"region5",
        "size":270582939648,
        "available_size":270582939648,
        "type":"pmem",
        "iset_id":-7337419320239190016
      },
      {
        "dev":"region4",
        "size":270582939648,
        "available_size":270582939648,
        "type":"pmem",
        "iset_id":-137289417188962304
      }
    ]
  2. On any of the available regions, allocate one or more namespaces:

    # ndctl create-namespace \
            --mode=sector \
            --region=regionN \
            --size=namespace-size

    Example 3.2. Creating a namespace on a region

    The following command creates a 36-GiB sector namespace on region4:

    # ndctl create-namespace \
            --mode=sector \
            --region=region4 \
            --size=36G
  3. The new namespace is now available under the /dev directory as /dev/pmemNs.
Additional resources
  • The ndctl-create-namespace(1) man page

3.6. Creating a device DAX namespace on an NVDIMM

You can configure an NVDIMM device in device DAX mode to support character storage with direct access capabilities.

You can either:

  • reconfigure an existing namespace to device DAX mode, or
  • create a new device DAX namespace if there is available space.

3.6.1. Prerequisites

  • An NVDIMM device is attached to your system.

3.6.2. NVDIMM in device direct access mode

Device direct access (device DAX, devdax) provides a means for applications to directly access storage, without the involvement of a file system. The benefit of device DAX is that it provides a guaranteed fault granularity, which can be configured using the --align option of the ndctl utility

For the Intel 64 and AMD64 architecture, the following fault granularities are supported:

  • 4 KiB
  • 2 MiB
  • 1 GiB

Device DAX nodes support only the following system calls:

  • open()
  • close()
  • mmap()

The read() and write() variants are not supported because the device DAX use case is tied to persistent memory programming.

3.6.3. Installing ndctl

This procedure installs the ndctl utility, which is used to configure and monitor NVDIMM devices.

Procedure
  • To install the ndctl utility, use the following command:

    # yum install ndctl

3.6.4. Reconfiguring an existing NVDIMM namespace to device DAX mode

This procedure reconfigures a namespace on an NVDIMM device to device DAX mode, and enables you to store data on the namespace.

Warning

Reconfiguring a namespace deletes all data previously stored on the namespace.

Prerequisites
Procedure
  1. List all namespaces on your system:

    # ndctl list --namespaces --idle
    
    [
      {
        "dev":"namespace1.0",
        "mode":"raw",
        "size":34359738368,
        "state":"disabled",
        "numa_node":1
      },
      {
        "dev":"namespace0.0",
        "mode":"raw",
        "size":34359738368,
        "state":"disabled",
        "numa_node":0
      }
    ]
  2. Reconfigure any namespace:

    # ndctl create-namespace \
            --force \
            --mode=devdax \
            --reconfig=namespace-ID

    Example 3.3. Reconfiguring a namespace as device DAX

    The following command reconfigures namespace0.0 for data storage that supports DAX. It is aligned to a 2-MiB fault granularity to ensure that the operating system faults in 2-MiB pages at a time:

    # ndctl create-namespace \
            --force \
            --mode=devdax \
            --align=2M \
            --reconfig=namespace0.0
  3. The namespace is now available at the /dev/daxN.M path.
Additional resources
  • The ndctl-create-namespace(1) man page

3.6.5. Creating a new NVDIMM namespace in device DAX mode

This procedure creates a new device DAX namespace on an NVDIMM device, enabling you to store data on the namespace.

Prerequisites
Procedure
  1. List the pmem regions on your system that have available space. In the following example, space is available in the region5 and region4 regions:

    # ndctl list --regions
    
    [
      {
        "dev":"region5",
        "size":270582939648,
        "available_size":270582939648,
        "type":"pmem",
        "iset_id":-7337419320239190016
      },
      {
        "dev":"region4",
        "size":270582939648,
        "available_size":270582939648,
        "type":"pmem",
        "iset_id":-137289417188962304
      }
    ]
  2. On any of the available regions, allocate one or more namespaces:

    # ndctl create-namespace \
            --mode=devdax \
            --region=regionN \
            --size=namespace-size

    Example 3.4. Creating a namespace on a region

    The following command creates a 36-GiB device DAX namespace on region4. It is aligned to a 2-MiB fault granularity to ensure that the operating system faults in 2-MiB pages at a time:

    # ndctl create-namespace \
            --mode=devdax \
            --region=region4 \
            --align=2M \
            --size=36G
    
    {
      "dev":"namespace1.2",
      "mode":"devdax",
      "map":"dev",
      "size":"35.44 GiB (38.05 GB)",
      "uuid":"5ae01b9c-1ebf-4fb6-bc0c-6085f73d31ee",
      "raw_uuid":"4c8be2b0-0842-4bcb-8a26-4bbd3b44add2",
      "daxregion":{
        "id":1,
        "size":"35.44 GiB (38.05 GB)",
        "align":2097152,
        "devices":[
          {
            "chardev":"dax1.2",
            "size":"35.44 GiB (38.05 GB)"
          }
        ]
      },
      "numa_node":1
    }
  3. The namespace is now available at the /dev/daxN.M path.
Additional resources
  • The ndctl-create-namespace(1) man page

3.7. Creating a file system DAX namespace on an NVDIMM

You can configure an NVDIMM device in file system DAX mode to support a file system with direct access capabilities.

You can either:

  • reconfigure an existing namespace to file system DAX mode, or
  • create a new file system DAX namespace if there is available space.
Important

The file system DAX technology is provided only as a Technology Preview, and is not supported by Red Hat.

3.7.1. Prerequisites

  • An NVDIMM device is attached to your system.

3.7.2. NVDIMM in file system direct access mode

When an NVDIMM device is configured in file system direct access (file system DAX, fsdax) mode, a file system can be created on top of it.

Any application that performs an mmap() operation on a file on this file system gets direct access to its storage. This enables the direct access programming model on NVDIMM. The file system must be mounted with the -o dax option in order for direct mapping to happen.

Per-page metadata allocation

This mode requires allocating per-page metadata in the system DRAM or on the NVDIMM device itself. The overhead of this data structure is 64 bytes per each 4-KiB page:

  • On small devices, the amount of overhead is small enough to fit in DRAM with no problems. For example, a 16-GiB namespace only requires 256 MiB for page structures. Because NVDIMM devices are usually small and expensive, storing the page tracking data structures in DRAM is preferable.
  • On NVDIMM devices that are be terabytes in size or larger, the amount of memory required to store the page tracking data structures might exceed the amount of DRAM in the system. One TiB of NVDIMM requires 16 GiB just for page structures. As a result, storing the data structures on the NVDIMM itself is preferable in such cases.

You can configure where per-page metadata are stored using the --map option when configuring a namespace:

  • To allocate in the system RAM, use --map=mem.
  • To allocate on the NVDIMM, use --map=dev.
Partitions and file systems on fsdax

When creating partitions on an fsdax device, partitions must be aligned on page boundaries. On the Intel 64 and AMD64 architecture, at least 4 KiB alignment is required for the start and end of the partition. 2 MiB is the preferred alignment.

On Red Hat Enterprise Linux 8, both the XFS and ext4 file system can be created on NVDIMM as a Technology Preview.

3.7.3. Installing ndctl

This procedure installs the ndctl utility, which is used to configure and monitor NVDIMM devices.

Procedure
  • To install the ndctl utility, use the following command:

    # yum install ndctl

3.7.4. Reconfiguring an existing NVDIMM namespace to file system DAX mode

This procedure reconfigures a namespace on an NVDIMM device to file system DAX mode, and enables you to store files on the namespace.

Warning

Reconfiguring a namespace deletes all data previously stored on the namespace.

Prerequisites
Procedure
  1. List all namespaces on your system:

    # ndctl list --namespaces --idle
    
    [
      {
        "dev":"namespace1.0",
        "mode":"raw",
        "size":34359738368,
        "state":"disabled",
        "numa_node":1
      },
      {
        "dev":"namespace0.0",
        "mode":"raw",
        "size":34359738368,
        "state":"disabled",
        "numa_node":0
      }
    ]
  2. Reconfigure any namespace:

    # ndctl create-namespace \
            --force \
            --mode=fsdax \
            --reconfig=namespace-ID

    Example 3.5. Reconfiguring a namespace as file system DAX

    To use namespace0.0 for a file system that supports DAX, use the following command:

    # ndctl create-namespace \
            --force \
            --mode=fsdax \
            --reconfig=namespace0.0
    
    {
      "dev":"namespace0.0",
      "mode":"fsdax",
      "size":"32.00 GiB (34.36 GB)",
      "uuid":"ab91cc8f-4c3e-482e-a86f-78d177ac655d",
      "blockdev":"pmem0",
      "numa_node":0
    }
  3. The namespace is now available at the /dev/pmemN path.
Additional resources
  • The ndctl-create-namespace(1) man page

3.7.5. Creating a new NVDIMM namespace in file system DAX mode

This procedure creates a new file system DAX namespace on an NVDIMM device, enabling you to store files on the namespace.

Prerequisites
Procedure
  1. List the pmem regions on your system that have available space. In the following example, space is available in the region5 and region4 regions:

    # ndctl list --regions
    
    [
      {
        "dev":"region5",
        "size":270582939648,
        "available_size":270582939648,
        "type":"pmem",
        "iset_id":-7337419320239190016
      },
      {
        "dev":"region4",
        "size":270582939648,
        "available_size":270582939648,
        "type":"pmem",
        "iset_id":-137289417188962304
      }
    ]
  2. On any of the available regions, allocate one or more namespaces:

    # ndctl create-namespace \
            --mode=fsdax \
            --region=regionN \
            --size=namespace-size

    Example 3.6. Creating a namespace on a region

    The following command creates a 36-GiB file system DAX namespace on region4:

    # ndctl create-namespace \
            --mode=fsdax \
            --region=region4 \
            --size=36G
    
    {
      "dev":"namespace4.0",
      "mode":"fsdax",
      "size":"35.44 GiB (38.05 GB)",
      "uuid":"9c5330b5-dc90-4f7a-bccd-5b558fa881fe",
      "blockdev":"pmem4",
      "numa_node":0
    }
  3. The namespace is now available at the /dev/pmemN path.
Additional resources
  • The ndctl-create-namespace(1) man page

3.7.6. Creating a file system on a file system DAX device

This procedure creates a file system on a file system DAX device and mounts the file system.

Procedure
  1. Optionally, create a partition on the file system DAX device. See Section 1.3, “Creating a partition”.

    By default, the parted tool aligns partitions on 1 MiB boundaries. For the first partition, specify 2 MiB as the start of the partition. If the size of the partition is a multiple of 2 MiB, all other partitions are also aligned.

  2. Create an XFS or ext4 file system on the partition or the NVDIMM device.

    For XFS, disable shared copy-on-write data extents when creating the file system:

    # mkfs.xfs -m reflink=0 fsdax-partition-or-device
  3. Mount the file system with the -o fsdax mount option:

    # mount -o fsdax fsdax-partition-or-device mount-point
  4. Applications can now use persistent memory and create files in the mount-point directory, open the files, and use the mmap operation to map the files for direct access.
Additional resources
  • The mkfs.xfs(8) man page

3.8. Troubleshooting NVDIMM persistent memory

You can detect and fix different kinds of errors on NVDIMM devices.

3.8.1. Prerequisites

  • An NVDIMM device is connected to your system and configured.

3.8.2. Installing ndctl

This procedure installs the ndctl utility, which is used to configure and monitor NVDIMM devices.

Procedure
  • To install the ndctl utility, use the following command:

    # yum install ndctl

3.8.3. Monitoring NVDIMM health using S.M.A.R.T.

Some NVDIMM devices support Self-Monitoring, Analysis and Reporting Technology (S.M.A.R.T.) interfaces for retrieving health information.

Important

Monitor NVDIMM health regularly to prevent data loss. If S.M.A.R.T. reports problems with the health status of an NVDIMM device, replace it as described in Section 3.8.4, “Detecting and replacing a broken NVDIMM device”.

Prerequisites
  • On some systems, the acpi_ipmi driver must be loaded to retrieve health information using the following command:

    # modprobe acpi_ipmi
Procedure
  • To access the health information, use the following command:

    # ndctl list --dimms --health
    
    ...
        {
          "dev":"nmem0",
          "id":"802c-01-1513-b3009166",
          "handle":1,
          "phys_id":22,
          "health":
          {
            "health_state":"ok",
            "temperature_celsius":25.000000,
            "spares_percentage":99,
            "alarm_temperature":false,
            "alarm_spares":false,
            "temperature_threshold":50.000000,
            "spares_threshold":20,
            "life_used_percentage":1,
            "shutdown_state":"clean"
          }
         }
    ...
Additional resources
  • The ndctl-list(1) man page

3.8.4. Detecting and replacing a broken NVDIMM device

If you find error messages related to NVDIMM reported in your system log or by S.M.A.R.T., it might mean an NVDIMM device is failing. In that case, it is necessary to:

  1. Detect which NVDIMM device is failing
  2. Back up data stored on it
  3. Physically replace the device
Procedure
  1. To detect the broken device, use the following command:

    # ndctl list --dimms --regions --health --media-errors --human

    The badblocks field shows which NVDIMM is broken. Note its name in the dev field.

    Example 3.7. Health status of NVDIMM devices

    In the following example, the NVDIMM named nmem0 is broken:

    # ndctl list --dimms --regions --health --media-errors --human
    
    ...
      "regions":[
        {
          "dev":"region0",
          "size":"250.00 GiB (268.44 GB)",
          "available_size":0,
          "type":"pmem",
          "numa_node":0,
          "iset_id":"0xXXXXXXXXXXXXXXXX",
          "mappings":[
            {
              "dimm":"nmem1",
              "offset":"0x10000000",
              "length":"0x1f40000000",
              "position":1
            },
            {
              "dimm":"nmem0",
              "offset":"0x10000000",
              "length":"0x1f40000000",
              "position":0
            }
          ],
          "badblock_count":1,
          "badblocks":[
            {
              "offset":65536,
              "length":1,
              "dimms":[
                "nmem0"
              ]
            }
          ],
          "persistence_domain":"memory_controller"
        }
      ]
    }
  2. Use the following command to find the phys_id attribute of the broken NVDIMM:

    # ndctl list --dimms --human

    From the previous example, you know that nmem0 is the broken NVDIMM. Therefore, find the phys_id attribute of nmem0.

    Example 3.8. The phys_id attributes of NVDIMMs

    In the following example, the phys_id is 0x10:

    # ndctl list --dimms --human
    
    [
      {
        "dev":"nmem1",
        "id":"XXXX-XX-XXXX-XXXXXXXX",
        "handle":"0x120",
        "phys_id":"0x1c"
      },
      {
        "dev":"nmem0",
        "id":"XXXX-XX-XXXX-XXXXXXXX",
        "handle":"0x20",
        "phys_id":"0x10",
        "flag_failed_flush":true,
        "flag_smart_event":true
      }
    ]
  3. Use the following command to find the memory slot of the broken NVDIMM:

    # dmidecode

    In the output, find the entry where the Handle identifier matches the phys_id attribute of the broken NVDIMM. The Locator field lists the memory slot used by the broken NVDIMM.

    Example 3.9. NVDIMM Memory Slot Listing

    In the following example, the nmem0 device matches the 0x0010 identifier and uses the DIMM-XXX-YYYY memory slot:

    # dmidecode
    
    ...
    Handle 0x0010, DMI type 17, 40 bytes
    Memory Device
            Array Handle: 0x0004
            Error Information Handle: Not Provided
            Total Width: 72 bits
            Data Width: 64 bits
            Size: 125 GB
            Form Factor: DIMM
            Set: 1
            Locator: DIMM-XXX-YYYY
            Bank Locator: Bank0
            Type: Other
            Type Detail: Non-Volatile Registered (Buffered)
    ...
  4. Back up all data in the namespaces on the NVDIMM. If you do not back up the data before replacing the NVDIMM, the data will be lost when you remove the NVDIMM from your system.

    Warning

    In some cases, such as when the NVDIMM is completely broken, the backup might fail.

    To prevent this, regularly monitor your NVDIMM devices using S.M.A.R.T. as described in Section 3.8.3, “Monitoring NVDIMM health using S.M.A.R.T.” and replace failing NVDIMMs before they break.

    Use the following command to list the namespaces on the NVDIMM:

    # ndctl list --namespaces --dimm=DIMM-ID-number

    Example 3.10. NVDIMM namespaces listing

    In the following example, the nmem0 device contains the namespace0.0 and namespace0.2 namespaces, which you need to back up:

    # ndctl list --namespaces --dimm=0
    
    [
      {
        "dev":"namespace0.2",
        "mode":"sector",
        "size":67042312192,
        "uuid":"XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX",
        "raw_uuid":"XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX",
        "sector_size":4096,
        "blockdev":"pmem0.2s",
        "numa_node":0
      },
      {
        "dev":"namespace0.0",
        "mode":"sector",
        "size":67042312192,
        "uuid":"XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX",
        "raw_uuid":"XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX",
        "sector_size":4096,
        "blockdev":"pmem0s",
        "numa_node":0
      }
    ]
  5. Replace the broken NVDIMM physically.
Additional resources
  • The ndctl-list(1) man page
  • The dmidecode(8) man page