Chapter 5. Using NVDIMM persistent memory storage
As a system administrator, you can enable and manage various types of storage on Non-Volatile Dual In-line Memory Modules (NVDIMM) devices connected to your system.
For installing Red Hat Enterprise Linux 8 on NVDIMM storage, see Installing to an NVDIMM device instead.
5.1. The NVDIMM persistent memory technology
NVDIMM persistent memory, also called storage class memory or pmem
, is a combination of memory and storage.
NVDIMM combines the durability of storage with the low access latency and the high bandwidth of dynamic RAM (DRAM):
-
NVDIMM storage is byte-addressable, so it can be accessed by using the CPU load and store instructions. In addition to the
read()
andwrite()
system calls, which are required for accessing traditional block-based storage, NVDIMM also supports direct load and store programming model. - The performance characteristics of NVDIMM are similar to DRAM with very low access latency, typically in the tens to hundreds of nanoseconds.
- Data stored on NVDIMM are preserved when the power is off, like with storage.
- The direct access (DAX) technology enables applications to memory map storage directly, without going through the system page cache. This frees up DRAM for other purposes.
NVDIMM is beneficial in use cases such as:
- Databases
- The reduced storage access latency on NVDIMM can dramatically improve database performance.
- Rapid restart
Rapid restart is also called the warm cache effect. For example, a file server has none of the file contents in memory after starting. As clients connect and read or write data, that data is cached in the page cache. Eventually, the cache contains mostly hot data. After a reboot, the system must start the process again on traditional storage.
NVDIMM enables an application to keep the warm cache across reboots if the application is designed properly. In this example, there would be no page cache involved: the application would cache data directly in the persistent memory.
- Fast write-cache
- File servers often do not acknowledge a client’s write request until the data is on durable media. Using NVDIMM as a fast write cache enables a file server to acknowledge the write request quickly thanks to the low latency.
5.2. NVDIMM interleaving and regions
NVDIMM devices support grouping into interleaved regions.
NVDIMM devices can be grouped into interleave sets in the same way as regular DRAM. An interleave set is similar to a RAID 0 level (stripe) configuration across multiple DIMMs. An Interleave set is also called a region.
Interleaving has the following advantages:
- NVDIMM devices benefit from increased performance when they are configured into interleave sets.
- Interleaving can combine multiple smaller NVDIMM devices into a larger logical device.
NVDIMM interleave sets are configured in the system BIOS or UEFI firmware.
Red Hat Enterprise Linux creates one region device for each interleave set.
5.3. NVDIMM namespaces
NVDIMM regions are divided into one or more namespaces. Namespaces enable you to access the device using different methods, based on the type of the namespace.
Some NVDIMM devices do not support multiple namespaces on a region:
- If your NVDIMM device supports labels, you can subdivide the region into namespaces.
- If your NVDIMM device does not support labels, the region can only contain a single namespace. In that case, Red Hat Enterprise Linux creates a default namespace that covers the entire region.
5.4. NVDIMM access modes
You can configure NVDIMM namespaces to use either of the following modes:
sector
Presents the storage as a fast block device. This mode is useful for legacy applications that have not been modified to use NVDIMM storage, or for applications that make use of the full I/O stack, including Device Mapper.
A
sector
device can be used in the same way as any other block device on the system. You can create partitions or file systems on it, configure it as part of a software RAID set, or use it as the cache device fordm-cache
.Devices in this mode are available at
/dev/pmemNs
. See theblockdev
value listed after creating the namespace.devdax
, or device direct access (DAX)Enables NVDIMM devices to support direct access programming as described in the Storage Networking Industry Association (SNIA) Non-Volatile Memory (NVM) Programming Model specification. In this mode, I/O bypasses the storage stack of the kernel. Therefore, no Device Mapper drivers can be used.
Device DAX provides raw access to NVDIMM storage by using a DAX character device node. Data on a
devdax
device can be made durable using CPU cache flushing and fencing instructions. Certain databases and virtual machine hypervisors might benefit from this mode. File systems cannot be created ondevdax
devices.Devices in this mode are available at
/dev/daxN.M
. See thechardev
value listed after creating the namespace.fsdax
, or file system direct access (DAX)Enables NVDIMM devices to support direct access programming as described in the Storage Networking Industry Association (SNIA) Non-Volatile Memory (NVM) Programming Model specification. In this mode, I/O bypasses the storage stack of the kernel, and many Device Mapper drivers therefore cannot be used.
You can create file systems on file system DAX devices.
Devices in this mode are available at
/dev/pmemN
. See theblockdev
value listed after creating the namespace.ImportantThe file system DAX technology is provided only as a Technology Preview, and is not supported by Red Hat.
raw
Presents a memory disk that does not support DAX. In this mode, namespaces have several limitations and should not be used.
Devices in this mode are available at
/dev/pmemN
. See theblockdev
value listed after creating the namespace.
5.5. Creating a sector namespace on an NVDIMM to act as a block device
You can configure an NVDIMM device in sector mode, which is also called legacy mode, to support traditional, block-based storage.
You can either:
- reconfigure an existing namespace to sector mode, or
- create a new sector namespace if there is available space.
Prerequisites
- An NVDIMM device is attached to your system.
5.5.1. Installing ndctl
This procedure installs the ndctl
utility, which is used to configure and monitor NVDIMM devices.
Procedure
To install the
ndctl
utility, use the following command:# yum install ndctl
5.5.2. Reconfiguring an existing NVDIMM namespace to sector mode
This procedure reconfigures an NVDIMM namespace to sector mode for use as a fast block device.
Reconfiguring a namespace deletes all data previously stored on the namespace.
Prerequisites
-
The
ndctl
utility is installed. See Section 5.5.1, “Installing ndctl”.
Procedure
Reconfigure the selected namespace to sector mode:
# ndctl create-namespace \ --force \ --reconfig=namespace-ID \ --mode=sector
Example 5.1. Reconfiguring namespace1.0 in sector mode
To reconfigure the
namespace1.0
namespace to usesector
mode:# ndctl create-namespace \ --force \ --reconfig=namespace1.0 \ --mode=sector { "dev":"namespace1.0", "mode":"sector", "size":"11.99 GiB (12.87 GB)", "uuid":"5805480e-90e6-407e-96a4-23e1cde2ed78", "raw_uuid":"879d9e9f-fd43-4ed5-b64f-3bcd0781391a", "sector_size":4096, "blockdev":"pmem1s", "numa_node":1 }
-
The reconfigured namespace is now available under the
/dev
directory as/dev/pmemNs
.
Additional resources
-
The
ndctl-create-namespace(1)
man page
5.5.3. Creating a new NVDIMM namespace in sector mode
This procedure creates a new sector namespace on an NVDIMM device, enabling you to use it as a traditional block device.
Prerequisites
-
The
ndctl
utility is installed. See Section 5.5.1, “Installing ndctl”. - The NVDIMM device supports labels.
Procedure
List the
pmem
regions on your system that have available space. In the following example, space is available in theregion5
andregion4
regions:# ndctl list --regions [ { "dev":"region5", "size":270582939648, "available_size":270582939648, "type":"pmem", "iset_id":-7337419320239190016 }, { "dev":"region4", "size":270582939648, "available_size":270582939648, "type":"pmem", "iset_id":-137289417188962304 } ]
On any of the available regions, allocate one or more namespaces:
# ndctl create-namespace \ --mode=sector \ --region=regionN \ --size=namespace-size
Example 5.2. Creating a namespace on a region
The following command creates a 36-GiB sector namespace on
region4
:# ndctl create-namespace \ --mode=sector \ --region=region4 \ --size=36G
-
The new namespace is now available under the
/dev
directory as/dev/pmemNs
.
Additional resources
-
The
ndctl-create-namespace(1)
man page
5.6. Creating a device DAX namespace on an NVDIMM
You can configure an NVDIMM device in device DAX mode to support character storage with direct access capabilities.
You can either:
- reconfigure an existing namespace to device DAX mode, or
- create a new device DAX namespace if there is available space.
Prerequisites
- An NVDIMM device is attached to your system.
5.6.1. NVDIMM in device direct access mode
Device direct access (device DAX, devdax
) provides a means for applications to directly access storage, without the involvement of a file system. The benefit of device DAX is that it provides a guaranteed fault granularity, which can be configured using the --align
option of the ndctl
utility
For the Intel 64 and AMD64 architecture, the following fault granularities are supported:
- 4 KiB
- 2 MiB
- 1 GiB
Device DAX nodes support only the following system calls:
-
open()
-
close()
-
mmap()
The read()
and write()
variants are not supported because the device DAX use case is tied to persistent memory programming.
5.6.2. Installing ndctl
This procedure installs the ndctl
utility, which is used to configure and monitor NVDIMM devices.
Procedure
To install the
ndctl
utility, use the following command:# yum install ndctl
5.6.3. Reconfiguring an existing NVDIMM namespace to device DAX mode
This procedure reconfigures a namespace on an NVDIMM device to device DAX mode, and enables you to store data on the namespace.
Reconfiguring a namespace deletes all data previously stored on the namespace.
Prerequisites
-
The
ndctl
utility is installed. See Section 5.6.2, “Installing ndctl”.
Procedure
List all namespaces on your system:
# ndctl list --namespaces --idle [ { "dev":"namespace1.0", "mode":"raw", "size":34359738368, "state":"disabled", "numa_node":1 }, { "dev":"namespace0.0", "mode":"raw", "size":34359738368, "state":"disabled", "numa_node":0 } ]
Reconfigure any namespace:
# ndctl create-namespace \ --force \ --mode=devdax \ --reconfig=namespace-ID
Example 5.3. Reconfiguring a namespace as device DAX
The following command reconfigures
namespace0.0
for data storage that supports DAX. It is aligned to a 2-MiB fault granularity to ensure that the operating system faults in 2-MiB pages at a time:# ndctl create-namespace \ --force \ --mode=devdax \ --align=2M \ --reconfig=namespace0.0
-
The namespace is now available at the
/dev/daxN.M
path.
Additional resources
-
The
ndctl-create-namespace(1)
man page
5.6.4. Creating a new NVDIMM namespace in device DAX mode
This procedure creates a new device DAX namespace on an NVDIMM device, enabling you to store data on the namespace.
Prerequisites
-
The
ndctl
utility is installed. See Section 5.6.2, “Installing ndctl”. - The NVDIMM device supports labels.
Procedure
List the
pmem
regions on your system that have available space. In the following example, space is available in theregion5
andregion4
regions:# ndctl list --regions [ { "dev":"region5", "size":270582939648, "available_size":270582939648, "type":"pmem", "iset_id":-7337419320239190016 }, { "dev":"region4", "size":270582939648, "available_size":270582939648, "type":"pmem", "iset_id":-137289417188962304 } ]
On any of the available regions, allocate one or more namespaces:
# ndctl create-namespace \ --mode=devdax \ --region=regionN \ --size=namespace-size
Example 5.4. Creating a namespace on a region
The following command creates a 36-GiB device DAX namespace on
region4
. It is aligned to a 2-MiB fault granularity to ensure that the operating system faults in 2-MiB pages at a time:# ndctl create-namespace \ --mode=devdax \ --region=region4 \ --align=2M \ --size=36G { "dev":"namespace1.2", "mode":"devdax", "map":"dev", "size":"35.44 GiB (38.05 GB)", "uuid":"5ae01b9c-1ebf-4fb6-bc0c-6085f73d31ee", "raw_uuid":"4c8be2b0-0842-4bcb-8a26-4bbd3b44add2", "daxregion":{ "id":1, "size":"35.44 GiB (38.05 GB)", "align":2097152, "devices":[ { "chardev":"dax1.2", "size":"35.44 GiB (38.05 GB)" } ] }, "numa_node":1 }
-
The namespace is now available at the
/dev/daxN.M
path.
Additional resources
-
The
ndctl-create-namespace(1)
man page
5.7. Creating a file system DAX namespace on an NVDIMM
You can configure an NVDIMM device in file system DAX mode to support a file system with direct access capabilities.
You can either:
- reconfigure an existing namespace to file system DAX mode, or
- create a new file system DAX namespace if there is available space.
The file system DAX technology is provided only as a Technology Preview, and is not supported by Red Hat.
Prerequisites
- An NVDIMM device is attached to your system.
5.7.1. NVDIMM in file system direct access mode
When an NVDIMM device is configured in file system direct access (file system DAX, fsdax
) mode, a file system can be created on top of it.
Any application that performs an mmap()
operation on a file on this file system gets direct access to its storage. This enables the direct access programming model on NVDIMM. The file system must be mounted with the -o dax
option in order for direct mapping to happen.
Per-page metadata allocation
This mode requires allocating per-page metadata in the system DRAM or on the NVDIMM device itself. The overhead of this data structure is 64 bytes per each 4-KiB page:
- On small devices, the amount of overhead is small enough to fit in DRAM with no problems. For example, a 16-GiB namespace only requires 256 MiB for page structures. Because NVDIMM devices are usually small and expensive, storing the page tracking data structures in DRAM is preferable.
- On NVDIMM devices that are be terabytes in size or larger, the amount of memory required to store the page tracking data structures might exceed the amount of DRAM in the system. One TiB of NVDIMM requires 16 GiB just for page structures. As a result, storing the data structures on the NVDIMM itself is preferable in such cases.
You can configure where per-page metadata are stored using the --map
option when configuring a namespace:
-
To allocate in the system RAM, use
--map=mem
. -
To allocate on the NVDIMM, use
--map=dev
.
Partitions and file systems on fsdax
When creating partitions on an fsdax
device, partitions must be aligned on page boundaries. On the Intel 64 and AMD64 architecture, at least 4 KiB alignment is required for the start and end of the partition. 2 MiB is the preferred alignment.
On Red Hat Enterprise Linux 8, both the XFS and ext4 file system can be created on NVDIMM as a Technology Preview.
5.7.2. Installing ndctl
This procedure installs the ndctl
utility, which is used to configure and monitor NVDIMM devices.
Procedure
To install the
ndctl
utility, use the following command:# yum install ndctl
5.7.3. Reconfiguring an existing NVDIMM namespace to file system DAX mode
This procedure reconfigures a namespace on an NVDIMM device to file system DAX mode, and enables you to store files on the namespace.
Reconfiguring a namespace deletes all data previously stored on the namespace.
Prerequisites
-
The
ndctl
utility is installed. See Section 5.7.2, “Installing ndctl”.
Procedure
List all namespaces on your system:
# ndctl list --namespaces --idle [ { "dev":"namespace1.0", "mode":"raw", "size":34359738368, "state":"disabled", "numa_node":1 }, { "dev":"namespace0.0", "mode":"raw", "size":34359738368, "state":"disabled", "numa_node":0 } ]
Reconfigure any namespace:
# ndctl create-namespace \ --force \ --mode=fsdax \ --reconfig=namespace-ID
Example 5.5. Reconfiguring a namespace as file system DAX
To use
namespace0.0
for a file system that supports DAX, use the following command:# ndctl create-namespace \ --force \ --mode=fsdax \ --reconfig=namespace0.0 { "dev":"namespace0.0", "mode":"fsdax", "size":"32.00 GiB (34.36 GB)", "uuid":"ab91cc8f-4c3e-482e-a86f-78d177ac655d", "blockdev":"pmem0", "numa_node":0 }
-
The namespace is now available at the
/dev/pmemN
path.
Additional resources
-
The
ndctl-create-namespace(1)
man page
5.7.4. Creating a new NVDIMM namespace in file system DAX mode
This procedure creates a new file system DAX namespace on an NVDIMM device, enabling you to store files on the namespace.
Prerequisites
-
The
ndctl
utility is installed. See Section 5.7.2, “Installing ndctl”. - The NVDIMM device supports labels.
Procedure
List the
pmem
regions on your system that have available space. In the following example, space is available in theregion5
andregion4
regions:# ndctl list --regions [ { "dev":"region5", "size":270582939648, "available_size":270582939648, "type":"pmem", "iset_id":-7337419320239190016 }, { "dev":"region4", "size":270582939648, "available_size":270582939648, "type":"pmem", "iset_id":-137289417188962304 } ]
On any of the available regions, allocate one or more namespaces:
# ndctl create-namespace \ --mode=fsdax \ --region=regionN \ --size=namespace-size
Example 5.6. Creating a namespace on a region
The following command creates a 36-GiB file system DAX namespace on
region4
:# ndctl create-namespace \ --mode=fsdax \ --region=region4 \ --size=36G { "dev":"namespace4.0", "mode":"fsdax", "size":"35.44 GiB (38.05 GB)", "uuid":"9c5330b5-dc90-4f7a-bccd-5b558fa881fe", "blockdev":"pmem4", "numa_node":0 }
-
The namespace is now available at the
/dev/pmemN
path.
Additional resources
-
The
ndctl-create-namespace(1)
man page
5.7.5. Creating a file system on a file system DAX device
This procedure creates a file system on a file system DAX device and mounts the file system.
Procedure
Optionally, create a partition on the file system DAX device. See Section 3.3, “Creating a partition”.
By default, the
parted
tool aligns partitions on 1 MiB boundaries. For the first partition, specify 2 MiB as the start of the partition. If the size of the partition is a multiple of 2 MiB, all other partitions are also aligned.Create an XFS or ext4 file system on the partition or the NVDIMM device.
For XFS, disable shared copy-on-write data extents, because they are incompatible with the dax mount option. Additionally, in order to increase the likelihood of large page mappings, set the stripe unit and stripe width.
# mkfs.xfs -m reflink=0 -d su=2m,sw=1 fsdax-partition-or-device
Mount the file system with the
-o dax
mount option:# mount -o dax fsdax-partition-or-device mount-point
-
Applications can now use persistent memory and create files in the mount-point directory, open the files, and use the
mmap
operation to map the files for direct access.
Additional resources
-
The
mkfs.xfs(8)
man page
5.8. Troubleshooting NVDIMM persistent memory
You can detect and fix different kinds of errors on NVDIMM devices.
Prerequisites
- An NVDIMM device is connected to your system and configured.
5.8.1. Installing ndctl
This procedure installs the ndctl
utility, which is used to configure and monitor NVDIMM devices.
Procedure
To install the
ndctl
utility, use the following command:# yum install ndctl
5.8.2. Monitoring NVDIMM health using S.M.A.R.T.
Some NVDIMM devices support Self-Monitoring, Analysis and Reporting Technology (S.M.A.R.T.) interfaces for retrieving health information.
Monitor NVDIMM health regularly to prevent data loss. If S.M.A.R.T. reports problems with the health status of an NVDIMM device, replace it as described in Section 5.8.3, “Detecting and replacing a broken NVDIMM device”.
Prerequisites
On some systems, the
acpi_ipmi
driver must be loaded to retrieve health information using the following command:# modprobe acpi_ipmi
Procedure
To access the health information, use the following command:
# ndctl list --dimms --health ... { "dev":"nmem0", "id":"802c-01-1513-b3009166", "handle":1, "phys_id":22, "health": { "health_state":"ok", "temperature_celsius":25.000000, "spares_percentage":99, "alarm_temperature":false, "alarm_spares":false, "temperature_threshold":50.000000, "spares_threshold":20, "life_used_percentage":1, "shutdown_state":"clean" } } ...
Additional resources
-
The
ndctl-list(1)
man page
5.8.3. Detecting and replacing a broken NVDIMM device
If you find error messages related to NVDIMM reported in your system log or by S.M.A.R.T., it might mean an NVDIMM device is failing. In that case, it is necessary to:
- Detect which NVDIMM device is failing
- Back up data stored on it
- Physically replace the device
Procedure
To detect the broken device, use the following command:
# ndctl list --dimms --regions --health --media-errors --human
The
badblocks
field shows which NVDIMM is broken. Note its name in thedev
field.Example 5.7. Health status of NVDIMM devices
In the following example, the NVDIMM named
nmem0
is broken:# ndctl list --dimms --regions --health --media-errors --human ... "regions":[ { "dev":"region0", "size":"250.00 GiB (268.44 GB)", "available_size":0, "type":"pmem", "numa_node":0, "iset_id":"0xXXXXXXXXXXXXXXXX", "mappings":[ { "dimm":"nmem1", "offset":"0x10000000", "length":"0x1f40000000", "position":1 }, { "dimm":"nmem0", "offset":"0x10000000", "length":"0x1f40000000", "position":0 } ], "badblock_count":1, "badblocks":[ { "offset":65536, "length":1, "dimms":[ "nmem0" ] } ], "persistence_domain":"memory_controller" } ] }
Use the following command to find the
phys_id
attribute of the broken NVDIMM:# ndctl list --dimms --human
From the previous example, you know that
nmem0
is the broken NVDIMM. Therefore, find thephys_id
attribute ofnmem0
.Example 5.8. The phys_id attributes of NVDIMMs
In the following example, the
phys_id
is0x10
:# ndctl list --dimms --human [ { "dev":"nmem1", "id":"XXXX-XX-XXXX-XXXXXXXX", "handle":"0x120", "phys_id":"0x1c" }, { "dev":"nmem0", "id":"XXXX-XX-XXXX-XXXXXXXX", "handle":"0x20", "phys_id":"0x10", "flag_failed_flush":true, "flag_smart_event":true } ]
Use the following command to find the memory slot of the broken NVDIMM:
# dmidecode
In the output, find the entry where the
Handle
identifier matches thephys_id
attribute of the broken NVDIMM. TheLocator
field lists the memory slot used by the broken NVDIMM.Example 5.9. NVDIMM Memory Slot Listing
In the following example, the
nmem0
device matches the0x0010
identifier and uses theDIMM-XXX-YYYY
memory slot:# dmidecode ... Handle 0x0010, DMI type 17, 40 bytes Memory Device Array Handle: 0x0004 Error Information Handle: Not Provided Total Width: 72 bits Data Width: 64 bits Size: 125 GB Form Factor: DIMM Set: 1 Locator: DIMM-XXX-YYYY Bank Locator: Bank0 Type: Other Type Detail: Non-Volatile Registered (Buffered) ...
Back up all data in the namespaces on the NVDIMM. If you do not back up the data before replacing the NVDIMM, the data will be lost when you remove the NVDIMM from your system.
WarningIn some cases, such as when the NVDIMM is completely broken, the backup might fail.
To prevent this, regularly monitor your NVDIMM devices using S.M.A.R.T. as described in Section 5.8.2, “Monitoring NVDIMM health using S.M.A.R.T.” and replace failing NVDIMMs before they break.
Use the following command to list the namespaces on the NVDIMM:
# ndctl list --namespaces --dimm=DIMM-ID-number
Example 5.10. NVDIMM namespaces listing
In the following example, the
nmem0
device contains thenamespace0.0
andnamespace0.2
namespaces, which you need to back up:# ndctl list --namespaces --dimm=0 [ { "dev":"namespace0.2", "mode":"sector", "size":67042312192, "uuid":"XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX", "raw_uuid":"XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX", "sector_size":4096, "blockdev":"pmem0.2s", "numa_node":0 }, { "dev":"namespace0.0", "mode":"sector", "size":67042312192, "uuid":"XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX", "raw_uuid":"XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX", "sector_size":4096, "blockdev":"pmem0s", "numa_node":0 } ]
- Replace the broken NVDIMM physically.
Additional resources
-
The
ndctl-list(1)
man page -
The
dmidecode(8)
man page