2.3. Performing a Site Survey

A site survey is a formal method for discovering and characterizing the contents of the directory. Budget plenty of time for performing a site survey, as preparation is the key to the directory architecture. The site survey consists of a number of tasks:
  • Identify the applications that use the directory.
    Determine the directory-enabled applications deployed across the enterprise and their data needs.
  • Identify data sources.
    Survey the enterprise and identify sources of data, such as Active Directory, other LDAP servers, PBX systems, human resources databases, and email systems.
  • Characterize the data the directory needs to contain.
    Determine what objects should be present in the directory (for example, people or groups) and what attributes of these objects to maintain in the directory (such as usernames and passwords).
  • Determine the level of service to provide.
    Decide how available the directory data needs to be to client applications, and design the architecture accordingly. How available the directory needs to be affects how data are replicated and how chaining policies are configured to connect data stored on remote servers.
    See Chapter 7, Designing the Replication Process for more information about replication and Section 6.1, “Topology Overview” for more information on chaining.
  • Identify a data master.
    A data master contains the primary source for directory data. This data might be mirrored to other servers for load balancing and recovery purposes. For each piece of data, determine its data master.
  • Determine data ownership.
    For each piece of data, determine the person responsible for ensuring that the data is up-to-date.
  • Determine data access.
    If data are imported from other sources, develop a strategy for both bulk imports and incremental updates. As a part of this strategy, try to master data in a single place, and limit the number of applications that can change the data. Also, limit the number of people who write to any given piece of data. A smaller group ensures data integrity while reducing the administrative overhead.
  • Document the site survey.
Because of the number of organizations that can be affected by the directory, it may be helpful to create a directory deployment team that includes representatives from each affected organization to perform the site survey.
Corporations generally have a human resources department, an accounting or accounts receivable department, manufacturing organizations, sales organizations, and development organizations. Including representatives from each of these organizations can help the survey process. Furthermore, directly involving all the affected organizations can help build acceptance for the migration from local data stores to a centralized directory.

2.3.1. Identifying the Applications That Use the Directory

Generally, the applications that access the directory and the data needs of these applications drive the planning of the directory contents. Many common applications use the directory:
  • Directory browser applications, such as online telephone books. Decide what information (such as email addresses, telephone numbers, and employee name) users need, and include it in the directory.
  • Email applications, especially email servers. All email servers require email addresses, user names, and some routing information to be available in the directory. Others, however, require more advanced information such as the place on disk where a user's mailbox is stored, vacation notification information, and protocol information (IMAP versus POP, for example).
  • Directory-enabled human resources applications. These require more personal information such as government identification numbers, home addresses, home telephone numbers, birth dates, salary, and job title.
  • Microsoft Active Directory. Through Windows User Sync, Windows directory services can be integrated to function in tandem with the Directory Server. Both directories can store user information (user names and passwords, email addresses, telephone numbers) and group information (members). Style the Directory Server deployment after the existing Windows server deployment (or vice versa) so that the users, groups, and other directory data can be smoothly synchronized.
When examining the applications that will use the directory, look at the types of information each application uses. The following table gives an example of applications and the information used by each:

Table 2.1. Example Application Data Needs

Application Class of Data Data
Phonebook People Name, email address, phone number, user ID, password, department number, manager, mail stop.
Web server People, groups User ID, password, group name, groups members, group owner.
Calendar server People, meeting rooms Name, user ID, cube number, conference room name.
After identifying the applications and information used by each application, it is apparent that some types of data are used by more than one application. Performing this kind of exercise during the data planning stage can help to avoid data redundancy problems in the directory, and show more clearly what data directory-dependent applications require.
The final decision about the types of data maintained in the directory and when the information is migrated to the directory is affected by these factors:
  • The data required by various legacy applications and users
  • The ability of legacy applications to communicate with an LDAP directory

2.3.2. Identifying Data Sources

To identify all of the data to include in the directory, perform a survey of the existing data stores. The survey should include the following:
  • Identify organizations that provide information.
    Locate all the organizations that manage information essential to the enterprise. Typically, this includes the information services, human resources, payroll, and accounting departments.
  • Identify the tools and processes that are information sources.
    Some common sources for information are networking operating systems (Windows, Novell Netware, UNIX NIS), email systems, security systems, PBX (telephone switching) systems, and human resources applications.
  • Determine how centralizing each piece of data affects the management of data.
    Centralized data management can require new tools and new processes. Sometimes centralization requires increasing staff in some organizations while decreasing staff in others.
During the survey, consider developing a matrix that identifies all of the information sources in the enterprise, similar to Table 2.2, “ Example Information Sources”:

Table 2.2.  Example Information Sources

Data Source Class of Data Data
Human resources database People Name, address, phone number, department number, manager.
Email system People, Groups Name, email address, user ID, password, email preferences.
Facilities system Facilities Building names, floor names, cube numbers, access codes.

2.3.3. Characterizing the Directory Data

All of the data identified to include in the directory can be characterized according to the following general points:
  • Format
  • Size
  • Number of occurrences in various applications
  • Data owner
  • Relationship to other directory data
Study each kind of data to include in the directory to determine what characteristics it shares with the other pieces of data. This helps save time during the schema design stage, described in more detail in Chapter 3, Designing the Directory Schema.
A good idea is to use a table, similar to Table 2.3, “Directory Data Characteristics”, which characterizes the directory data.

Table 2.3. Directory Data Characteristics

Data Format Size Owner Related to
Employee Name Text string 128 characters Human resources User's entry
Fax number Phone number 14 digits Facilities User's entry
Email address Text Many character IS department User's entry

2.3.4. Determining Level of Service

The level of service provided depends on the expectations of the people who rely on directory-enabled applications. To determine the level of service each application expects, first determine how and when the application is used.
As the directory evolves, it may need to support a wide variety of service levels, from production to mission critical. It can be difficult raising the level of service after the directory is deployed, so make sure the initial design can meet the future needs.
For example, if the risk of total failure must be eliminated, use a multi-master configuration, where several suppliers exist for the same data.

2.3.5. Considering a Data Master

A data master is a server that is the master source of data. Any time the same information is stored in multiple locations, the data integrity can be degraded. A data master makes sure all information stored in multiple locations is consistent and accurate. There are several scenarios that require a data master:
  • Replication among Directory Servers
  • Synchronization between Directory Server and Active Directory
  • Independent client applications which access the Directory Server data
Consider the master source of the data if there are applications that communicate indirectly with the directory. Keep the processes for changing data, and the places from which the data can be changed, as simple as possible. After deciding on a single site to master a piece of data, use the same site to master all of the other data contained there. A single site simplifies troubleshooting if the databases lose synchronization across the enterprise.
There are different ways to implement data mastering:
  • Master the data in both the directory and all applications that do not use the directory.
    Maintaining multiple data masters does not require custom scripts for moving data in and out of the directory and the other applications. However, if data changes in one place, someone has to change it on all the other sites. Maintaining master data in the directory and all applications not using the directory can result in data being unsynchronized across the enterprise (which is what the directory is supposed to prevent).
  • Master the data in some application other than the directory, and then write scripts, programs, or gateways to import that data into the directory.
    Mastering data in non-directory applications makes the most sense if there are one or two applications that are already used to master data, and the directory will be used only for lookups (for example, for online corporate telephone books).
How master copies of the data are maintained depends on the specific directory needs. However, regardless of how data masters are maintained, keep it simple and consistent. For example, do not attempt to master data in multiple sites, then automatically exchange data between competing applications. Doing so leads to a "last change wins" scenario and increases the administrative overhead.
For example, the directory is going to manage an employee's home telephone number. Both the LDAP directory and a human resources database store this information. The human resources application is LDAP-enabled, so an application can be written that automatically transfers data from the LDAP directory to the human resources database, and vice versa.
Attempting to master changes to that employee's telephone number in both the LDAP directory and the human resources data, however, means that the last place where the telephone number was changed overwrites the information in the other database. This is only acceptable as long as the last application to write the data had the correct information.
If that information was out of date, perhaps because the human resources data were reloaded from a backup, then the correct telephone number in the LDAP directory will be deleted.
With multi-mater replication, Directory Server can contain master sources of information on more than one server. Multiple masters keep changelogs and can resolve conflicts more safely. A limited number of Directory Server are considered masters which can accept changes; they then replicate the data to replica servers, or consumer servers.[1] Having more than on data master server provides safe failover in the event that a server goes off-line. For more information about replication and multi-master replication, see Chapter 7, Designing the Replication Process.
Synchronization allows Directory Server users, groups, attributes, and passwords to be integrated with Microsoft Active Directory users, groups, attributes, and passwords. With two directory services, decide whether they will handle the same information, what amount of that information will be shared, and which service will be the data master for that information. The best course is to choose a single application to master the data and allow the synchronization process to add, update, or delete the entries on the other service.

2.3.6. Determining Data Ownership

Data ownership refers to the person or organization responsible for making sure the data is up-to-date. During the data design phase, decide who can write data to the directory. The following are some common strategies for deciding data ownership:
  • Allow read-only access to the directory for everyone except a small group of directory content managers.
  • Allow individual users to manage some strategic subset of information for themselves.
    This subset of information might include their passwords, descriptive information about themselves and their role within the organization, their automobile license plate number, and contact information such as telephone numbers or office numbers.
  • Allow a person's manager to write to some strategic subset of that person's information, such as contact information or job title.
  • Allow an organization's administrator to create and manage entries for that organization.
    This approach allows an organization's administrators to function as the directory content managers.
  • Create roles that give groups of people read or write access privileges.
    For example, there can be roles created for human resources, finance, or accounting. Allow each of these roles to have read access, write access, or both to the data needed by the group. This could include salary information, government identification numbers, and home phone numbers and address.
    For more information about roles and grouping entries, see Section 4.3, “Grouping Directory Entries”.
There may be multiple individuals who need to have write access to the same information. For example, an information systems (IS) or directory management group probably requires write access to employee passwords. It may also be desirable for employees themselves to have write access to their own passwords. While, generally, multiple people will have write access to the same information, try to keep this group small and easy to identify. Keeping the group small helps ensure data integrity.
For information on setting access control for the directory, see Chapter 9, Designing a Secure Directory.

2.3.7. Determining Data Access

After determining data ownership, decide who can read each piece of data. For example, employees' home phone numbers can be stored in the directory. This data may be useful for a number of organizations, including the employee's manager and human resources. Employees should be able to read this information for verification purposes. However, home contact information can be considered sensitive, so it probably should not be widely available across the enterprise.
For each piece of information stored in the directory, decide the following:
  • Can the data be read anonymously?
    The LDAP protocol supports anonymous access and allows easy lookups for common information such as office sites, email addresses, and business telephone numbers. However, anonymous access gives anyone with access to the directory access to the common information. Consequently, use anonymous access sparingly.
  • Can the data be read widely across the enterprise?
    Access control can be set so that the client must log into (or bind to) the directory to read specific information. Unlike anonymous access, this form of access control ensures that only members of the organization can view directory information. It also captures login information in the directory's access log so there is a record of who accessed the information.
    For more information about access controls, see Section 9.7, “Designing Access Control”.
  • Is there an identifiable group of people or applications that need to read the data?
    Anyone who has write privileges to the data generally also needs read access (with the exception of write access to passwords). There may also be data specific to a particular organization or project group. Identifying these access needs helps determine what groups, roles, and access controls the directory needs.
    For information about groups and roles, see Chapter 4, Designing the Directory Tree. For information about access controls, see Section 9.7, “Designing Access Control”.
Making these decisions for each piece of directory data defines a security policy for the directory. These decisions depend upon the nature of the site and the kinds of security already available at the site. For example, having a firewall or no direct access to the Internet means it is safer to support anonymous access than if the directory is placed directly on the Internet. Additionally, some information may only need access controls and authentication measures to restrict access adequately; other sensitive information may need to be encrypted within the database as it is stored.
In many countries, data protection laws govern how enterprises must maintain personal information and restrict who has access to the personal information. For example, the laws may prohibit anonymous access to addresses and phone numbers or may require that users have the ability to view and correct information in entries that represent them. Be sure to check with the organization's legal department to ensure that the directory deployment follows all necessary laws for the countries in which the enterprise operates.
The creation of a security policy and the way it is implemented is described in detail in Chapter 9, Designing a Secure Directory.


[1] In replication, a consumer server or replica server is a server that receives updates from a supplier server or hub server.