Chapter 13. Managing Indexes

Indexing makes searching for and retrieving information easier by classifying and organizing attributes or values. This chapter describes the searching algorithm itself, placing indexing mechanisms in context, and then describes how to create, delete, and manage indexes.

13.1. About Indexes

This section provides an overview of indexing in Directory Server. It contains the following topics:

13.1.1. About Index Types

Indexes are stored in files in the directory's databases. The names of the files are based on the indexed attribute, not the type of index contained in the file. Each index file may contain multiple types of indexes if multiple indexes are maintained for the specific attribute. For example, all indexes maintained for the common name attribute are contained in the cn.db file.
Directory Server supports the following types of index:
  • Presence index (pres) contains a list of the entries that contain a particular attribute, which is very useful for searches. For example, it makes it easy to examine any entries that contain access control information. Generating an aci.db file that includes a presence index efficiently performs the search for ACI=* to generate the access control list for the server.
  • Equality index (eq) improves searches for entries containing a specific attribute value. For example, an equality index on the cn attribute allows a user to perform the search for cn=Babs Jensen far more efficiently.
  • Approximate index (approx) is used for efficient approximate or sounds-like searches. For example, an entry may include the attribute value cn=Firstname M Lastname. An approximate search would return this value for searches against cn~=Firstname Lastname, cn~=Firstname, or cn~=Lastname. Similarly, a search against l~=San Fransisco (note the misspelling) would return entries including l=San Francisco.
  • Substring index (sub) is a costly index to maintain, but it allows efficient searching against substrings within entries. Substring indexes are limited to a minimum of three characters for each entry.
    For example, searches of the form cn=*derson , match the common names containing strings such as Bill Anderson, Jill Henderson, or Steve Sanderson. Similarly, the search for telephoneNumber= *555* returns all the entries in the directory with telephone numbers that contain 555.
  • International index speeds up searches for information in international directories. The process for creating an international index is similar to the process for creating regular indexes, except that it applies a matching rule by associating an object identifier (OID) with the attributes to be indexed.
    The supported locales and their associated OIDs are listed in Appendix D, Internationalization. If there is a need to configure the Directory Server to accept additional matching rules, contact Red Hat Consulting.

13.1.2. About Default and Database Indexes

Directory Server contains a set of default indexes. When you create a new database, Directory Server copies these default indexes from cn=default indexes,cn=config,cn=ldbm database,cn=plugins,cn=config to the new database. Then the database only uses the copy of these indexes, which are stored in cn=index,cn=database_name,cn=ldbm database,cn=plugins,cn=config.

Note

Directory Server does not replicate settings in the cn=config entry. Therefore, you can configure indexes differently on servers that are part of a replication topology. For example, in an environment with cascading replication, you do not need to create custom indexes on a hub, if clients do not read data from the hub.
To display the Directory Server default indexes:
# ldapsearch -D "cn=Directory Manager" -W -p 389 -h server.example.com \
     -b "cn=default indexes,cn=config,cn=ldbm database,cn=plugins,cn=config" \
     '(objectClass=nsindex)'

Note

If you update the default index settings stored in cn=default indexes,cn=config,cn=ldbm database,cn=plugins,cn=config, the changes are not applied to the individual databases in cn=index,cn=database_name,cn=ldbm database,cn=plugins,cn=config.
To display the indexes of an individual database:
# dsconf -D "cn=Directory Manager" ldap://server.example.com backend index list database_name

13.1.3. Overview of the Searching Algorithm

Indexes are used to speed up searches. To understand how the directory uses indexes, it helps to understand the searching algorithm. Each index contains a list of attributes (such as the cn, common name, attribute) and a list of IDs of the entries which contain the indexed attribute value:
  1. An LDAP client application sends a search request to the directory.
  2. The directory examines the incoming request to make sure that the specified base DN matches a suffix contained by one or more of its databases or database links.
    • If they do match, the directory processes the request.
    • If they do not match, the directory returns an error to the client indicating that the suffix does not match. If a referral has been specified in the nsslapd-referral attribute under cn=config, the directory also returns the LDAP URL where the client can attempt to pursue the request.
    • The Directory Server examines the search filter to see what indexes apply, and it attempts to load the list of entry IDs from each index that satisfies the filter. The ID lists are combined based on whether the filter used AND or OR joins.
      Each filter component is handled independently and returns an ID list.
    • If the list of entry IDs is larger than the configured ID list scan limit or if there is no index defined for the attribute, then Directory Server sets the results for this filtercomponent to allids. If, after applying the logical operations to the results of the individual search components the list is still ALLIDs it searches every entry in the database. This is an unindexed search.
  3. The Directory Server reads every entry from the id2entry.db database or the entry cache for every entry ID in the ID list (or from the entire database for an unindexed search). The server then checks the entries to see if they match the search filter. Each match is returned as it is found.
    The server continues through the list of IDs until it has searched all candidate entries or until it hits one of the configured resource limits. (Resource limits are listed in Section 14.5.3, “Setting User and Global Resource Limits Using the Command Line”.)

    Note

    It's possible to set separate resource limits for searches using the simple paged results control. For example, administrators can set high or unlimited size and look-through limits with paged searches, but use the lower default limits for non-paged searches.

13.1.4. Approximate Searches

In addition, the directory uses a variation of the metaphone phonetic algorithm to perform searches on an approximate index. Each value is treated as a sequence of words, and a phonetic code is generated for each word.

Note

The metaphone phonetic algorithm in Directory Server supports only US-ASCII letters. Therefore, use approximate indexing only with English values.
Values entered on an approximate search are similarly translated into a sequence of phonetic codes. An entry is considered to match a query if both of the following are true:
  • All of the query string codes match the codes generated in the entry string.
  • All of the query string codes are in the same order as the entry string codes.
Name in the Directory (Phonetic Code) Query String (Phonetic code) Match Comments
Alice B Sarette (ALS B SRT) Alice Sarette (ALS SRT) Matches. Codes are specified in the correct order.
Alice Sarrette (ALS SRT) Matches. Codes are specified in the correct order, despite the misspelling of Sarette.
Surette (SRT) Matches. The generated code exists in the original name, despite the misspelling of Sarette.
Bertha Sarette (BR0 SRT) No match. The code BR0 does not exist in the original name.
Sarette, Alice (SRT ALS) No match. The codes are not specified in the correct order.

13.1.5. Balancing the Benefits of Indexing

Before creating new indexes, balance the benefits of maintaining indexes against the costs.
  • Approximate indexes are not efficient for attributes commonly containing numbers, such as telephone numbers.
  • Substring indexes do not work for binary attributes.
  • Equality indexes should be avoided if the value is big (such as attributes intended to contain photographs or passwords containing encrypted data).
  • Maintaining indexes for attributes not commonly used in a search increases overhead without improving global searching performance.
  • Attributes that are not indexed can still be specified in search requests, although the search performance may be degraded significantly, depending on the type of search.
  • The more indexes you maintain, the more disk space you require.
Indexes can become very time-consuming. For example:
  1. The Directory Server receives an add or modify operation.
  2. The Directory Server examines the indexing attributes to determine whether an index is maintained for the attribute values.
  3. If the created attribute values are indexed, then Directory Server adds or deletes the new attribute values from the index.
  4. The actual attribute values are created in the entry.
For example, the Directory Server adds the entry:
dn: cn=John Doe,ou=People,dc=example,dc=com
objectclass: top
objectClass: person
objectClass: orgperson
objectClass: inetorgperson
cn: John Doe
cn: John
sn: Doe
ou: Manufacturing
ou: people
telephoneNumber: 408 555 8834
description: Manufacturing lead for the Z238 line of widgets.
The Directory Server maintains the following indexes:
  • Equality, approximate, and substring indexes for cn (common name) and sn (surname) attributes.
  • Equality and substring indexes for the telephone number attribute.
  • Substring indexes for the description attribute.
When adding that entry to the directory, the Directory Server must perform these steps:
  1. Create the cn equality index entry for John and John Doe.
  2. Create the appropriate cn approximate index entries for John and John Doe.
  3. Create the appropriate cn substring index entries for John and John Doe.
  4. Create the sn equality index entry for Doe.
  5. Create the appropriate sn approximate index entry for Doe.
  6. Create the appropriate sn substring index entries for Doe.
  7. Create the telephone number equality index entry for 408 555 8834.
  8. Create the appropriate telephone number substring index entries for 408 555 8834.
  9. Create the appropriate description substring index entries for Manufacturing lead for the Z238 line of widgets. A large number of substring entries are generated for this string.
As this example shows, the number of actions required to create and maintain databases for a large directory can be resource-intensive.

13.1.6. Indexing Limitations

You cannot index virtual attributes, such as nsrole and cos_attribute. Virtual attributes contain computed values. If you index these attributes, Directory Server can return an invalid set of entries to direct and internal searches.