6. Improving Search Performance (and Balancing Read Performance)

The most effective way to improve search operations against the directory is to configure thorough indexes for entries, combined with reasonable limits on search results.

6.1. Using Indexes

An index (as it implies) is a tag that shows that a certain entry contains a certain attribute, without having to contain any other detail about the entry (which saves space and makes returning search results faster). Each index is organized around a Directory Server attribute and a certain way of matching that attribute:
  • Presence index (pres) simply shows what entries contain an attribute.
  • Equality index (eq) shows which attribute values match a specific search string.
  • Approximate index (approx) is used for efficient sounds-like searches, which shows entries which have a value that phonetically matches a string.
  • Substring index (sub) matches any substring of an attribute value to the given search string. (This index if very expensive for the server to maintain.)
  • International index uses a matching rule to match strings in a directory which contains values in languages other than English.
  • Browsing index, or virtual list view (VLV) index, sets an index to use to display entries in the Directory Server Console.

Note

Indexing is described in much more detail in the Administrator's Guide.
However, just creating indexes is not ipso facto going to increase server performance. Maintaining indexes puts a burden on the Directory Server for every modify, add, and delete operation by having to verify every attribute in the change against every index maintained by the server:
  1. The Directory Server receives an add or modify operation.
  2. The Directory Server examines the indexing attributes to determine whether an index is maintained for the attribute values.
  3. If the created attribute values are indexed, then the Directory Server generates the new index entries.
  4. Once the server completes the indexing, the actual attribute values are created according to the client request.
For example, the Directory Server adds the entry:
dn: cn=John Doe, ou=People,dc=example,dc=com
objectclass: top
objectClass: person
objectClass: orgperson
objectClass: inetorgperson
cn: John Doe
cn: John
sn: Doe
ou: Manufacturing
ou: people
telephoneNumber: 408 555 8834
description: Manufacturing lead for the Z238 line of widgets.
The Directory Server is maintaining the following indexes:
  • Equality, approximate, and substring indexes for cn (common name) and sn (surname) attributes.
  • Equality and substring indexes for the telephone number attribute.
  • Substring indexes for the description attribute.
When adding that entry to the directory, the Directory Server must perform these steps:
  1. Create the cn equality index entry for John and John Doe.
  2. Create the appropriate cn approximate index entries for John and John Doe.
  3. Create the appropriate cn substring index entries for John and John Doe.
  4. Create the sn equality index entry for Doe.
  5. Create the appropriate sn approximate index entry for Doe.
  6. Create the appropriate sn substring index entries for Doe.
  7. Create the telephone number equality index entry for 408 555 8834.
  8. Create the appropriate telephone number substring index entries for 408 555 8834.
  9. Create the appropriate description substring index entries for Manufacturing lead for the Z238 line of widgets. A large number of substring entries are generated for this string.
Before creating new indexes, make sure to balance the overhead of maintaining the indexes against the potential improvements in search performance. Especially important, match the types of indexes that you maintain to the type of information stored in the directory and the type of information users routinely search for.
  • Approximate indexes are not efficient for attributes commonly containing numbers, such as telephone numbers.
  • Substring indexes do not work for binary attributes.
  • Equality indexes should be avoided if the value is big (such as attributes intended to contain photographs or passwords containing encrypted data).
  • Maintaining indexes for attributes not commonly used in a search increases overhead without improving global searching performance.
  • Attributes that are not indexed can still be specified in search requests, although the search performance may be degraded significantly, depending on the type of search.
  • The more indexes you maintain, the more disk space you require.

Note

Creating indexes is much more effective for directories which have a high search operation load and low modify operation load.

6.2. Tuning Directory Server Resource Settings

The server's performance can be managed and improved by limiting the amount of resources the server uses to process client search requests, which is done by defining four settings:
  • The maximum number of entries the server returns to the client in response to a search operation (size limit attribute).
  • The maximum amount of real time (in seconds) for the server to spend performing a search request (time limit attribute).
  • The time (in seconds) during which the server maintains an idle connection before terminating it (idle timeout attribute).
  • The maximum number of file descriptors available to the Directory Server (max number of file descriptors attribute).
To configure Directory Server to optimize performance:
  1. In the Directory Server Console, select the Configuration tab, and then select the topmost entry in the navigation tree in the left pane.
  2. Select the Performance tab in the right pane.
  3. Set the maximum number of entries the server will return to the client in response to a search operation by entering a new value in the Size Limit text box.
    To keep from setting a limit, type -1 in this text box.
  4. Enter the maximum amount of real time (in seconds) for the server to spend performing a search request in the Time Limit text box.
    To keep from setting a limit, type -1 in this text box.
  5. Enter the time (in seconds) for the server to maintain an idle connection before terminating it in the Idle Timeout text box.
    To keep from setting a limit, type zero (0) in this text box.
  6. Set the maximum number of file descriptors available to the Directory Server in the Max Number of File Descriptors text box. For more information on this parameter, see the Directory Server Configuration, Command, and File Reference.

6.3. Setting Index Scan Limits

In large directories, the search results list can get huge. A directory with a million inetorgperson entries would have a million entries that were returned with a filter like (objectclass=inetorgperson), and an index for the sn attribute would have at least a million entries in it.
Loading a long ID list from the database significantly reduces search performance. The configuration parameter, nsslapd-idlistscanlimit, sets a limit on the number of IDs that are read before a key is considered to match the entire primary index (meaning the search is treated as an unindexed search with a different set of resource limits).
For large indexes, it is actually more efficient to treat any search which matches the index as an unindexed search. The search operation only has to look in one place to process results (the entire directory) rather than searching through an index that is nearly the size of a directory, plus the directory itself.
The default value of the nsslapd-idlistscanlimit attribute is 4000, which is gives good performance for a common range of database sizes and access patterns. It's usually not necessary to change this value. If the database index is slightly larger than the 4000 entries, but still significantly smaller than the overall directory, then raising the scan limit improves searches which would otherwise hit the default limit of 4000.
On the other hand, lowering the limit can significantly speed up searches that would otherwise hit the 4000 entry limit, but where it is not necessary to scan every entry.

6.4. Fine Grained ID List Size

In large databases, some queries can consume a large amount of CPU and RAM resources. To improve the performance, you can set a default ID scan limit that applies to all indexes in the database using the nsslapd-idlistscanlimit attribute. However in some cases it is useful to define a limit for certain indexes, or use no ID list. You can set individual settings for ID list scan limits for different types of search filters using the nsIndexIDListScanLimit attribute.
To set a limit, for example for the objectClass attribute, add the nsIndexIDListScanLimit parameter to the DN cn=objectclass,cn=index,cn=userRoot,cn=ldbm database,cn=plugins,cn=config.
The nsIndexIDListScanLimit attribute is multi valued and takes the following list of parameters as a value:
nsIndexIDListScanLimit: limit=NNN [type=eq[,sub,...]] [flags=AND[,XXX,...]] [values=val[,val,...]]
  • limit: The maximum size of the ID list. Valid values are:
    • -1: Unlimited.
    • 0: Do not use the index.
    • 1 to the maximum 32-bit integer (2147483647): Maximum number of IDs.
  • type: Optional. The type of the index. eq, sub, pres, and so on. The value must be one of the actual nsIndexType specified for the index definition. For example, you cannot use type=eq if you do not have nsIndexType=eq defined.
  • flags: Optional. Flags that alter the behavior of applying the scan limit. Valid values are:
    • AND: Apply the scan limit only to searches in which the attribute appears in an AND clause.
    • OR: Apply the scan limit only to searches in which the attribute appears in an OR clause.
  • values: Optional. Comma separated list of values which must match the search filter in order for the limit to be applied. Since the matches are done one at a time, the values will match if any of the values matches.
    The values must be used with only one type at a time.
    The values must correspond to the index type, and must correspond to the syntax of the attribute to which the index is applied. For example, if you specified the integer based attribute uidNumber and it is indexed for eq, you cannot use type=eq values=abc.
    If the value contains spaces, commas, NULL, or other values which require to be escaped, the LDAP filter escape syntax should be used: backslash (\) followed by the 2 hex digit code for the character. In the following example, the commas in the DN value are escaped with \2C.
    nsIndexIDListScanLimit: limit=0 type=eq values=uid=user\2Cou=People\2Cdc=example\2Cdc=com

Example 1. Setting nsIndexIDListScanLimit

In a large database with 10 million entries that contain the object class inetOrgPerson, a search for (&(objectClass=inetOrgPerson)(uid=user)) creates first an ID list containing all 10 million IDs matching objectClass=inetOrgPerson. When the database applies the second part of the filter, it searches the result list for objects matching uid=user. In this cases it is useful to define a limit for certain indexes, or use no ID list at all.
To set that no ID list is created for objectClass=inetOrgPerson in AND clauses, add the following nsIndexIDListScanLimit:
ldapmodify -D "cn=directory manager" -W -p 389 -h server.example.com -x
dn: cn=objectclass,cn=index,cn=userRoot,cn=ldbm database,cn=plugins,cn=config
changetype: modify
replace: nsIndexIDListScanLimit
nsIndexIDListScanLimit: limit=0 type=eq flags=AND values=inetOrgPerson

modifying entry "cn=objectclass,cn=index,cn=userRoot,cn=ldbm database,cn=plugins,cn=config"
No ID list is created for objectClass=inetOrgPerson when used in an AND clause. In all other situations the value of nsslapd-idlistscanlimit is applied.

6.5. Tuning the Database Cache for Searches

The database attributes that affect search performance mainly define the amount of memory available to the server. The maximum values that can be set for the database's cache size attributes depends on the amount of real memory on the machine. Roughly, the amount of available memory on the machine should always be greater than sum total of the default database cache size and sum of each entry cache size.
Use caution when changing these cache sizing attributes. The ability to improve server performance with these attributes depends on the size of the database, the amount of physical memory available on the machine, and whether directory searches are random (that is, if the directory clients are searching for random and widely scattered directory data).
If the database does not fit into memory and if searches are random, attempting to increase the values set on these attributes does not help directory performance. In fact, changing these attributes may harm overall performance.
The attributes of each database used to store directory data, including the server configuration data in the NetscapeRoot database, can be resized.
To improve the cache hit ratio on search operations, increase the amount of data that the Directory Server maintains in the database cache, as described in Section 8.4, “Tuning the Database Cache”, by editing the values for the nsslapd-dbcachesize attribute.

6.6. Tuning the Database Settings for Searches

The Directory Server Console only shows the databases that contain the directory data and the NetscapeRoot database. However, the server uses another database to manage these. On this database, the following attributes can be changed to improve performance:
  • The amount of memory to make available for all databases (maximum cache size), which is described in Section 8.2, “Tuning the Entry Cache”.
  • The maximum number of entries for the server to verify in response to a search request (look-through limit).
  • The amount of memory to make available for import (import cache size).
To configure the default database attributes that apply to all other database instances:
  1. In the Directory Server Console, select the Configuration tab; then, in the navigation tree, expand the Data icon, and highlight the Database Settings node.
  2. Select the LDBM Plug-in Settings tab in the right pane.
    This tab contains the database attributes for all databases stored on this server.
  3. In the Maximum Cache Size field, enter a value corresponding to the amount of memory to make available for all databases. This value is for the total of the entire backend, meaning all databases cumulatively rather than the amount per single database instance.
  4. In the Look-Through Limit field, enter the maximum number of entries for the server to check in response to a search request.
  5. There are two ways to set the amount of memory in bytes to make available for import. The default is to have auto cache sizing, meaning 50% of the free memory is allocated for the import cache. It is also possible to set the import cache size manually by deselecting the Use Cache Auto-Size check box and then setting the value in the Import Cache Size field. For creating a very large database from LDIF, set this attribute as large as possible, depending on the memory available on the machine. The larger this parameter, the faster the database is created.

    Warning

    Setting this value too high can cause import failures because of a lack of memory.

6.7. Managing Special Entries

The cn=config entry in the simple, flat dse.ldif configuration file is not stored in the same highly scalable database as regular entries. As a result, if many entries, particularly entries that are likely to be updated frequently, are stored under cn=config, performance will probably suffer.
Although Red Hat recommends that simple user entries not be stored under cn=config for performance reasons, it can be useful to store special user entries such as the Directory Manager entry or replication manager (supplier bind DN) entry under cn=config since this centralizes configuration information.