Introduction To System Administration
For Red Hat Enterprise Linux 4
Edition 2
Copyright © 2008 Red Hat, Inc
Abstract
Introduction
- Generic overview material -- This section discusses the topic of the chapter without going into details about a specific operating system, technology, or methodology.
- Red Hat Enterprise Linux-specific material -- This section addresses aspects of the topic related to Linux in general and Red Hat Enterprise Linux in particular.
- Additional resources for further study -- This section includes pointers to other Red Hat Enterprise Linux manuals, helpful websites, and books containing information applicable to the topic.
1. Architecture-specific Information
2. More to Come
2.1. Send in Your Feedback
rhel-isa
.
Chapter 1. The Philosophy of System Administration
- Automate everything
- Document everything
- Communicate as much as possible
- Know your resources
- Know your users
- Know your business
- Security cannot be an afterthought
- Plan ahead
- Expect the unexpected
1.1. Automate Everything
- Free disk space checking and reporting
- Backups
- System performance data collection
- User account maintenance (creation, deletion, etc.)
- Business-specific functions (pushing new data to a Web server, running monthly/quarterly/yearly reports, etc.)
Note
1.2. Document Everything
- "I will get around to it later."
- Unfortunately, this is usually not true. Even if a system administrator is not kidding themselves, the nature of the job is such that everyday tasks are usually too chaotic to "do it later." Even worse, the longer it is put off, the more that is forgotten, leading to a much less detailed (and therefore, less useful) document.
- "Why write it up? I will remember it."
- Unless you are one of those rare individuals with a photographic memory, no, you will not remember it. Or worse, you will remember only half of it, not realizing that you are missing the whole story. This leads to wasted time either trying to relearn what you had forgotten or fixing what you had broken due to your incomplete understanding of the situation.
- "If I keep it in my head, they will not fire me -- I will have job security!"
- While this may work for a while, invariably it leads to less -- not more -- job security. Think for a moment about what may happen during an emergency. You may not be available; your documentation may save the day by letting someone else resolve the problem in your absence. And never forget that emergencies tend to be times when upper management pays close attention. In such cases, it is better to have your documentation be part of the solution than it is for your absence to be part of the problem.In addition, if you are part of a small but growing organization, eventually there will be a need for another system administrator. How can this person learn to back you up if everything is in your head? Worst yet, not documenting may make you so indispensable that you might not be able to advance your career. You could end up working for the very person that was hired to assist you.
- Policies
- Policies are written to formalize and clarify the relationship you have with your user community. They make it clear to your users how their requests for resources and/or assistance are handled. The nature, style, and method of disseminating policies to your a community varies from organization to organization.
- Procedures
- Procedures are any step-by-step sequence of actions that must be taken to accomplish a certain task. Procedures to be documented can include backup procedures, user account management procedures, problem reporting procedures, and so on. Like automation, if a procedure is followed more than once, it is a good idea to document it.
- Changes
- A large part of a system administrator's career revolves around making changes -- configuring systems for maximum performance, tweaking scripts, modifying configuration files, and so on. All of these changes should be documented in some fashion. Otherwise, you could find yourself being completely confused about a change you made several months earlier.Some organizations use more complex methods for keeping track of changes, but in many cases a simple revision history at the start of the file being changed is all that is necessary. At a minimum, each entry in the revision history should contain:
- The name or initials of the person making the change
- The date the change was made
- The reason the change was made
This results in concise, yet useful entries:ECB, 12-June-2002 -- Updated entry for new Accounting printer (to support the replacement printer's ability to print duplex)
1.3. Communicate as Much as Possible
- Tell your users what you are going to do
- Tell your users what you are doing
- Tell your users what you have done
1.3.1. Tell Your Users What You Are Going to Do
- The nature of the change
- When it will take place
- Why it is happening
- Approximately how long it should take
- The impact (if any) that the users can expect due to the change
- Contact information should they have any questions or concerns
System Downtime Scheduled for Friday Night
Starting this Friday at 6pm (midnight for our associates in Berlin), all financial applications will be unavailable for a period of approximately four hours.During this time, changes to both the hardware and software on the Finance database server will be performed. These changes should greatly reduce the time required to run the Accounts Payable and Accounts Receivable applications, and the weekly Balance Sheet report.Other than the change in runtime, most people should notice no other change. However, those of you that have written your own SQL queries should be aware that the layout of some indices will change. This is documented on the company intranet website, on the Finance page.Should you have any questions, comments, or concerns, please contact System Administration at extension 4321.
- Effectively communicate the start and duration of any downtime that might be involved in the change.
- Make sure you give the time of the change in such a way that it is useful to all users, no matter where they may be located.
- Use terms that your users understand. The people impacted by this work do not care that the new CPU module is a 2GHz unit with twice as much L2 cache, or that the database is being placed on a RAID 5 logical volume.
1.3.2. Tell Your Users What You Are Doing
System Downtime Scheduled for Tonight
Reminder: The system downtime announced this past Monday will take place as scheduled tonight at 6pm (midnight for the Berlin office). You can find the original announcement on the company intranet website, on the System Administration page.Several people have asked whether they should stop working early tonight to make sure their work is backed up prior to the downtime. This will not be necessary, as the work being done tonight will not impact any work done on your personal workstations.Remember, those of you that have written your own SQL queries should be aware that the layout of some indices will change. This is documented on the company intranet website, on the Finance page.
1.3.3. Tell Your Users What You Have Done
System Downtime Complete
The system downtime scheduled for Friday night (refer to the System Administration page on the company intranet website) has been completed. Unfortunately, hardware issues prevented one of the tasks from being completed. Due to this, the remaining tasks took longer than the originally-scheduled four hours. Instead, all systems were back in production by midnight (6am Saturday for the Berlin office).Because of the remaining hardware issues, performance of the AP, AR, and the Balance Sheet report will be slightly improved, but not to the extent originally planned. A second downtime will be announced and scheduled as soon as the issues that prevented completion of the task have been resolved.Please note that the downtime did change some database indices; people that have written their own SQL queries should consult the Finance page on the company intranet website.Please contact System Administration at extension 4321 with any questions.
1.4. Know Your Resources
- System resources, such as available processing power, memory, and disk space
- Network bandwidth
- Available money in the IT budget
- The services of operations personnel, other system administrators, or even an administrative assistant
- Time (often of critical importance when the time involves things such as the amount of time during which system backups may take place)
- Knowledge (whether it is stored in books, system documentation, or the brain of a person that has worked at the company for the past twenty years)
1.5. Know Your Users
1.6. Know Your Business
- Applications that must be run within certain time frames, such as at the end of a month, quarter, or year
- The times during which system maintenance may be done
- New technologies that could be used to resolve long-standing business problems
1.7. Security Cannot be an Afterthought
- The nature of possible threats to each of the systems under your care
- The location, type, and value of the data on those systems
- The type and frequency of authorized access to the systems
Note
1.7.1. The Risks of Social Engineering
1.8. Plan Ahead
- An offhand mention of a new project gearing up during that boring weekly staff meeting is a sure sign that you will likely need to support new users in the near future
- Talk of an impending acquisition means that you may end up being responsible for new (and possibly incompatible) systems in one or more remote locations
1.9. Expect the Unexpected
1.10. Red Hat Enterprise Linux-Specific Information
1.10.1. Automation
cron
and at
commands are most commonly used in these roles.
cron
can schedule the execution of commands or scripts for recurring intervals ranging in length from minutes to months. The crontab
command is used to manipulate the files controlling the cron
daemon that actually schedules each cron
job for execution.
at
command (and the closely-related command batch
) are more appropriate for scheduling the execution of one-time scripts or commands. These commands implement a rudimentary batch subsystem consisting of multiple queues with varying scheduling priorities. The priorities are known as niceness levels (due to the name of the command -- nice
). Both at
and batch
are perfect for tasks that must start at a given time but are not time-critical in terms of finishing.
- The
bash
command shell - The
perl
scripting language - The
python
scripting language
bash
shell tend to make more extensive use of the many small utility programs (for example, to perform character string manipulation), while perl
scripts perform more of these types of operations using features built into the language itself. A script written using python
can fully exploit the language's object-oriented capabilities, making complex scripts more easily extensible.
grep
and sed
) that are part of Red Hat Enterprise Linux. Learning perl
(and python
), on the other hand, tends to be a more "self-contained" process. However, many perl
language constructs are based on the syntax of various traditional UNIX utility programs, and as such are familiar to those Red Hat Enterprise Linux system administrators with shell scripting experience.
1.10.2. Documentation and Communication
- The gedit text editor
- The Emacs text editor
- The
Vim
text editor
vim
and Emacs are primarily text-based in nature.
1.10.3. Security
bob
to write and group finance
to read the file.
syslogd
, which can log system information locally (normally to files in the /var/log/
directory) or to a remote system (which acts as a dedicated log server for multiple computers.)
1.11. Additional Resources
1.11.1. Installed Documentation
crontab(1)
andcrontab(5)
man pages -- Learn how to schedule commands and scripts for automatic execution at regular intervals.at(1)
man page -- Learn how to schedule commands and scripts for execution at a later time.bash(1)
man page -- Learn more about the default shell and shell script writing.perl(1)
man page -- Review pointers to the many man pages that make up perl's online documentation.python(1)
man page -- Learn more about options, files, and environment variables controlling the Python interpreter.gedit(1)
man page and menu entry -- Learn how to edit text files with this graphical text editor.emacs(1)
man page -- Learn more about this highly-flexible text editor, including how to run its online tutorial.vim(1)
man page -- Learn how to use this powerful text editor.- Mozilla menu entry -- Learn how to edit HTML files, read mail, and browse the Web.
evolution(1)
man page and menu entry -- Learn how to manage your email with this graphical email client.mutt(1)
man page and files in/usr/share/doc/mutt-<version>
-- Learn how to manage your email with this text-based email client.pam(8)
man page and files in/usr/share/doc/pam-<version>
-- Learn how authentication takes place under Red Hat Enterprise Linux.
1.11.2. Useful Websites
- http://www.kernel.org/pub/linux/libs/pam/ -- The Linux-PAM project homepage.
- http://www.usenix.org/ -- The USENIX homepage. A professional organization dedicated to bringing together computer professionals of all types and fostering improved communication and innovation.
- http://www.sage.org/ -- The System Administrators Guild homepage. A USENIX special technical group that is a good resource for all system administrators responsible for Linux (or Linux-like) operating systems.
- http://www.python.org/ -- The Python Language Website. An excellent site for learning more about Python.
- http://www.perl.org/ -- The Perl Mongers Website. A good place to start learning about Perl and connecting with the Perl community.
- http://www.rpm.org/ -- The RPM Package Manager homepage. The most comprehensive website for learning about RPM.
1.11.3. Related Books
- The Reference Guide; Red Hat, Inc -- Provides an overview of locations of key system files, user and group settings, and PAM configuration.
- The Security Guide; Red Hat, Inc -- Contains a comprehensive discussion of many security-related issues for Red Hat Enterprise Linux system administrators.
- The System Administrators Guide; Red Hat, Inc -- Includes chapters on managing users and groups, automating tasks, and managing log files.
- Linux Administration Handbook by Evi Nemeth, Garth Snyder, and Trent R. Hein; Prentice Hall -- Provides a good section on the policies and politics side of system administration, including several "what-if" discussions concerning ethics.
- Linux System Administration: A User's Guide by Marcel Gagne; Addison Wesley Professional -- Contains a good chapter on automating various tasks.
- Solaris System Management by John Philcox; New Riders Publishing -- Although not specifically written for Red Hat Enterprise Linux (or even Linux in general), and using the term "system manager" instead of "system administrator," this book provides a 70-page overview of the many roles that system administrators play in a typical organization.
Chapter 2. Resource Monitoring
2.1. Basic Concepts
- CPU power
- Bandwidth
- Memory
- Storage
- The system is currently experiencing performance problems at least part of the time and you would like to improve its performance.
- The system is currently running well and you would like it to stay that way.
2.2. System Performance Monitoring
- Monitoring to identify the nature and scope of the resource shortages that are causing the performance problems
- The data produced from monitoring is analyzed and a course of action (normally performance tuning and/or the procurement of additional hardware) is taken to resolve the problem
- Monitoring to ensure that the performance problem has been resolved
Note
2.3. Monitoring System Capacity
- The monitoring is done on a more-or-less continuous basis
- The monitoring is usually not as detailed
2.4. What to Monitor?
- How much free space is available?
- How many I/O operations on average does it perform each second?
- How long on average does it take each I/O operation to be completed?
- How many of those I/O operations are reads? How many are writes?
- What is the average amount of data read/written with each I/O?
2.4.1. Monitoring CPU Power
- User Versus System
- The percentage of time spent performing user-level processing versus system-level processing can point out whether a system's load is primarily due to running applications or due to operating system overhead. High user-level percentages tend to be good (assuming users are not experiencing unsatisfactory performance), while high system-level percentages tend to point toward problems that will require further investigation.
- Context Switches
- A context switch happens when the CPU stops running one process and starts running another. Because each context switch requires the operating system to take control of the CPU, excessive context switches and high levels of system-level CPU consumption tend to go together.
- Interrupts
- As the name implies, interrupts are situations where the processing being performed by the CPU is abruptly changed. Interrupts generally occur due to hardware activity (such as an I/O device completing an I/O operation) or due to software (such as software interrupts that control application processing). Because interrupts must be serviced at a system level, high interrupt rates lead to higher system-level CPU consumption.
- Runnable Processes
- A process may be in different states. For example, it may be:
- Waiting for an I/O operation to complete
- Waiting for the memory management subsystem to handle a page fault
In these cases, the process has no need for the CPU.However, eventually the process state changes, and the process becomes runnable. As the name implies, a runnable process is one that is capable of getting work done as soon as it is scheduled to receive CPU time. However, if more than one process is runnable at any given time, all but one[4] of the runnable processes must wait for their turn at the CPU. By monitoring the number of runnable processes, it is possible to determine how CPU-bound your system is.
2.4.2. Monitoring Bandwidth
- Bytes received/sent
- Network interface statistics provide an indication of the bandwidth utilization of one of the more visible buses -- the network.
- Interface counts and rates
- These network-related statistics can give indications of excessive collisions, transmit and receive errors, and more. Through the use of these statistics (particularly if the statistics are available for more than one system on your network), it is possible to perform a modicum of network troubleshooting even before the more common network diagnostic tools are used.
- Transfers per Second
- Normally collected for block I/O devices, such as disk and high-performance tape drives, this statistic is a good way of determining whether a particular device's bandwidth limit is being reached. Due to their electromechanical nature, disk and tape drives can only perform so many I/O operations every second; their performance degrades rapidly as this limit is reached.
2.4.3. Monitoring Memory
- Page Ins/Page Outs
- These statistics make it possible to gauge the flow of pages from system memory to attached mass storage devices (usually disk drives). High rates for both of these statistics can mean that the system is short of physical memory and is thrashing, or spending more system resources on moving pages into and out of memory than on actually running applications.
- Active/Inactive Pages
- These statistics show how heavily memory-resident pages are used. A lack of inactive pages can point toward a shortage of physical memory.
- Free, Shared, Buffered, and Cached Pages
- These statistics provide additional detail over the more simplistic active/inactive page statistics. By using these statistics, it is possible to determine the overall mix of memory utilization.
- Swap Ins/Swap Outs
- These statistics show the system's overall swapping behavior. Excessive rates here can point to physical memory shortages.
2.4.4. Monitoring Storage
- Monitoring for sufficient disk space
- Monitoring for storage-related performance problems
- Free Space
- Free space is probably the one resource all system administrators watch closely; it would be a rare administrator that never checks on free space (or has some automated way of doing so).
- File System-Related Statistics
- These statistics (such as number of files/directories, average file size, etc.) provide additional detail over a single free space percentage. As such, these statistics make it possible for system administrators to configure the system to give the best performance, as the I/O load imposed by a file system full of many small files is not the same as that imposed by a file system filled with a single massive file.
- Transfers per Second
- This statistic is a good way of determining whether a particular device's bandwidth limitations are being reached.
- Reads/Writes per Second
- A slightly more detailed breakdown of transfers per second, these statistics allow the system administrator to more fully understand the nature of the I/O loads a storage device is experiencing. This can be critical, as some storage technologies have widely different performance characteristics for read versus write operations.
2.5. Red Hat Enterprise Linux-Specific Information
free
top
(and GNOME System Monitor, a more graphically oriented version oftop
)vmstat
- The Sysstat suite of resource monitoring tools
- The OProfile system-wide profiler
2.5.1. free
free
command displays system memory utilization. Here is an example of its output:
total used free shared buffers cached Mem: 255508 240268 15240 0 7592 86188 -/+ buffers/cache: 146488 109020 Swap: 530136 26268 503868
Mem:
row displays physical memory utilization, while the Swap:
row displays the utilization of the system swap space, and the -/+ buffers/cache:
row displays the amount of physical memory currently devoted to system buffers.
free
by default only displays memory utilization information once, it is only useful for very short-term monitoring, or quickly determining if a memory-related problem is currently in progress. Although free
has the ability to repetitively display memory utilization figures via its -s
option, the output scrolls, making it difficult to easily detect changes in memory utilization.
Note
free -s
would be to run free
using the watch
command. For example, to display memory utilization every two seconds (the default display interval for watch
), use this command:
watch free
watch
command issues the free
command every two seconds, updating by clearing the screen and writing the new output to the same screen location. This makes it much easier to determine how memory utilization changes over time, since watch
creates a single updated view with no scrolling. You can control the delay between updates by using the -n
option, and can cause any changes between updates to be highlighted by using the -d
option, as in the following command:
watch -n 1 -d free
watch
man page.
watch
command runs until interrupted with Ctrl+C. The watch
command is something to keep in mind; it can come in handy in many situations.
2.5.2. top
free
displays only memory-related information, the top
command does a little bit of everything. CPU utilization, process statistics, memory utilization -- top
monitors it all. In addition, unlike the free
command, top
's default behavior is to run continuously; there is no need to use the watch
command. Here is a sample display:
14:06:32 up 4 days, 21:20, 4 users, load average: 0.00, 0.00, 0.00 77 processes: 76 sleeping, 1 running, 0 zombie, 0 stopped CPU states: cpu user nice system irq softirq iowait idle total 19.6% 0.0% 0.0% 0.0% 0.0% 0.0% 180.2% cpu00 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 100.0% cpu01 19.6% 0.0% 0.0% 0.0% 0.0% 0.0% 80.3% Mem: 1028548k av, 716604k used, 311944k free, 0k shrd, 131056k buff 324996k actv, 108692k in_d, 13988k in_c Swap: 1020116k av, 5276k used, 1014840k free 382228k cached PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND 17578 root 15 0 13456 13M 9020 S 18.5 1.3 26:35 1 rhn-applet-gu 19154 root 20 0 1176 1176 892 R 0.9 0.1 0:00 1 top 1 root 15 0 168 160 108 S 0.0 0.0 0:09 0 init 2 root RT 0 0 0 0 SW 0.0 0.0 0:00 0 migration/0 3 root RT 0 0 0 0 SW 0.0 0.0 0:00 1 migration/1 4 root 15 0 0 0 0 SW 0.0 0.0 0:00 0 keventd 5 root 34 19 0 0 0 SWN 0.0 0.0 0:00 0 ksoftirqd/0 6 root 35 19 0 0 0 SWN 0.0 0.0 0:00 1 ksoftirqd/1 9 root 15 0 0 0 0 SW 0.0 0.0 0:07 1 bdflush 7 root 15 0 0 0 0 SW 0.0 0.0 1:19 0 kswapd 8 root 15 0 0 0 0 SW 0.0 0.0 0:14 1 kscand 10 root 15 0 0 0 0 SW 0.0 0.0 0:03 1 kupdated 11 root 25 0 0 0 0 SW 0.0 0.0 0:00 0 mdrecoveryd
top
is running. For example, top
by default displays both idle and non-idle processes. To display only non-idle processes, press i; a second press returns to the default display mode.
Warning
top
appears like a simple display-only program, this is not the case. That is because top
uses single character commands to perform various operations. For example, if you are logged in as root, it is possible to change the priority and even kill any process on your system. Therefore, until you have reviewed top
's help screen (type ? to display it), it is safest to only type q (which exits top
).
2.5.2.1. The GNOME System Monitor -- A Graphical top
top
, the GNOME System Monitor displays information related to overall system status, process counts, memory and swap utilization, and process-level statistics.

Figure 2.1. The GNOME System Monitor Process Listing Display
2.5.3. vmstat
vmstat
. With vmstat
, it is possible to get an overview of process, memory, swap, I/O, system, and CPU activity in one line of numbers:
procs memory swap io system cpu r b swpd free buff cache si so bi bo in cs us sy id wa 0 0 5276 315000 130744 380184 1 1 2 24 14 50 1 1 47 0
r
-- The number of runnable processes waiting for access to the CPUb
-- The number of processes in an uninterruptible sleep state
swpd
-- The amount of virtual memory usedfree
-- The amount of free memorybuff
-- The amount of memory used for bufferscache
-- The amount of memory used as page cache
si
-- The amount of memory swapped in from diskso
-- The amount of memory swapped out to disk
bi
-- Blocks sent to a block devicebo
-- Blocks received from a block device
in
-- The number of interrupts per secondcs
-- The number of context switches per second
us
-- The percentage of the time the CPU ran user-level codesy
-- The percentage of the time the CPU ran system-level codeid
-- The percentage of the time the CPU was idlewa
-- I/O wait
vmstat
is run without any options, only one line is displayed. This line contains averages, calculated from the time the system was last booted.
vmstat
's ability to repetitively display resource utilization data at set intervals. For example, the command vmstat 1
displays one new line of utilization data every second, while the command vmstat 1 10
displays one new line per second, but only for the next ten seconds.
vmstat
can be used to quickly determine resource utilization and performance issues. But to gain more insight into those issues, a different kind of tool is required -- a tool capable of more in-depth data collection and analysis.
2.5.4. The Sysstat Suite of Resource Monitoring Tools
iostat
- Displays an overview of CPU utilization, along with I/O statistics for one or more disk drives.
mpstat
- Displays more in-depth CPU statistics.
sadc
- Known as the system activity data collector,
sadc
collects system resource utilization information and writes it to a file. sar
- Producing reports from the files created by
sadc
,sar
reports can be generated interactively or written to a file for more intensive analysis.
2.5.4.1. The iostat
command
iostat
command at its most basic provides an overview of CPU and disk I/O statistics:
Linux 2.4.20-1.1931.2.231.2.10.ent (pigdog.example.com) 07/11/2003 avg-cpu: %user %nice %sys %idle 6.11 2.56 2.15 89.18 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn dev3-0 1.68 15.69 22.42 31175836 44543290
iostat
displays an overview of the system's average CPU utilization since the last reboot. The CPU utilization report includes the following percentages:
- Percentage of time spent in user mode (running applications, etc.)
- Percentage of time spent in user mode (for processes that have altered their scheduling priority using
nice(2)
) - Percentage of time spent in kernel mode
- Percentage of time spent idle
- The device specification, displayed as
dev<major-number>-sequence-number
, where<major-number>
is the device's major number[6], and<sequence-number>
is a sequence number starting at zero. - The number of transfers (or I/O operations) per second.
- The number of 512-byte blocks read per second.
- The number of 512-byte blocks written per second.
- The total number of 512-byte blocks read.
- The total number of 512-byte block written.
iostat
. For more information, refer to the iostat(1)
man page.
2.5.4.2. The mpstat
command
mpstat
command produces the following output:
Linux 2.6.11-1.1369_FC4 (example.redhat.com) 02/07/2006 01:22:23 PM CPU %user %nice %system %iowait %irq %soft %idle intr/s 01:22:23 PM all 0.02 0.00 0.02 0.02 0.02 0.00 99.92 1011.86
mpstat
allows the utilization for each CPU to be displayed individually, making it possible to determine how effectively each CPU is being used.
2.5.4.3. The sadc
command
sadc
command collects system utilization data and writes it to a file for later analysis. By default, the data is written to files in the /var/log/sa/
directory. The files are named sa<dd>
, where <dd>
is the current day's two-digit date.
sadc
is normally run by the sa1
script. This script is periodically invoked by cron
via the file sysstat
, which is located in /etc/cron.d/
. The sa1
script invokes sadc
for a single one-second measuring interval. By default, cron
runs sa1
every 10 minutes, adding the data collected during each interval to the current /var/log/sa/sa<dd>
file.
2.5.4.4. The sar
command
sar
command produces system utilization reports based on the data collected by sadc
. As configured in Red Hat Enterprise Linux, sar
is automatically run to process the files automatically collected by sadc
. The report files are written to /var/log/sa/
and are named sar<dd>
, where <dd>
is the two-digit representations of the previous day's two-digit date.
sar
is normally run by the sa2
script. This script is periodically invoked by cron
via the file sysstat
, which is located in /etc/cron.d/
. By default, cron
runs sa2
once a day at 23:53, allowing it to produce a report for the entire day's data.
2.5.4.4.1. Reading sar
Reports
sar
report produced by the default Red Hat Enterprise Linux configuration consists of multiple sections, with each section containing a specific type of data, ordered by the time of day that the data was collected. Since sadc
is configured to perform a one-second measurement interval every ten minutes, the default sar
reports contain data in ten-minute increments, from 00:00 to 23:50[7].
sar
report, with the data from 00:30 through 23:40 removed to save space:
00:00:01 CPU %user %nice %system %idle 00:10:00 all 6.39 1.96 0.66 90.98 00:20:01 all 1.61 3.16 1.09 94.14 … 23:50:01 all 44.07 0.02 0.77 55.14 Average: all 5.80 4.99 2.87 86.34
iostat
.
00:00:01 CPU %user %nice %system %idle 00:10:00 0 4.19 1.75 0.70 93.37 00:10:00 1 8.59 2.18 0.63 88.60 00:20:01 0 1.87 3.21 1.14 93.78 00:20:01 1 1.35 3.12 1.04 94.49 … 23:50:01 0 42.84 0.03 0.80 56.33 23:50:01 1 45.29 0.01 0.74 53.95 Average: 0 6.00 5.01 2.74 86.25 Average: 1 5.61 4.97 2.99 86.43
sar
configuration; some are explored in upcoming chapters. For more information about the data contained in each section, refer to the sar(1)
man page.
2.5.5. OProfile
Warning
opcontrol
command supports the --list-events
option, which displays the event types available for the currently-installed processor, along with suggested minimum counter values for each.
2.5.5.1. OProfile Components
- Data collection software
- Data analysis software
- Administrative interface software
oprofile.o
kernel module, and the oprofiled
daemon.
op_time
- Displays the number and relative percentages of samples taken for each executable file
oprofpp
- Displays the number and relative percentage of samples taken by either function, individual instruction, or in
gprof
-style output op_to_source
- Displays annotated source code and/or assembly listings
op_visualise
- Graphically displays collected data
opcontrol
command.
2.5.5.2. A Sample OProfile Session
opcontrol
to configure the type of data to be collected with the following command:
opcontrol \ --vmlinux=/boot/vmlinux-`uname -r` \ --ctr0-event=CPU_CLK_UNHALTED \ --ctr0-count=6000
opcontrol
to:
- Direct OProfile to a copy of the currently running kernel (
--vmlinux=/boot/vmlinux-`uname -r`
) - Specify that the processor's counter 0 is to be used and that the event to be monitored is the time when the CPU is executing instructions (
--ctr0-event=CPU_CLK_UNHALTED
) - Specify that OProfile is to collect samples every 6000th time the specified event occurs (
--ctr0-count=6000
)
oprofile
kernel module is loaded by using the lsmod
command:
Module Size Used by Not tainted oprofile 75616 1 …
/dev/oprofile/
) is mounted with the ls /dev/oprofile/
command:
0 buffer buffer_watershed cpu_type enable stats 1 buffer_size cpu_buffer_size dump kernel_only
/root/.oprofile/daemonrc
file contains the settings required by the data collection software:
CTR_EVENT[0]=CPU_CLK_UNHALTED CTR_COUNT[0]=6000 CTR_KERNEL[0]=1 CTR_USER[0]=1 CTR_UM[0]=0 CTR_EVENT_VAL[0]=121 CTR_EVENT[1]= CTR_COUNT[1]= CTR_KERNEL[1]=1 CTR_USER[1]=1 CTR_UM[1]=0 CTR_EVENT_VAL[1]= one_enabled=1 SEPARATE_LIB_SAMPLES=0 SEPARATE_KERNEL_SAMPLES=0 VMLINUX=/boot/vmlinux-2.4.21-1.1931.2.349.2.2.entsmp
opcontrol
to actually start data collection with the opcontrol --start
command:
Using log file /var/lib/oprofile/oprofiled.log Daemon started. Profiler running.
oprofiled
daemon is running with the command ps x | grep -i oprofiled
:
32019 ? S 0:00 /usr/bin/oprofiled --separate-lib-samples=0 … 32021 pts/0 S 0:00 grep -i oprofiled
oprofiled
command line displayed by ps
is much longer; however, it has been truncated here for formatting purposes.)
/var/lib/oprofile/samples/
directory. The files in this directory follow a somewhat unusual naming convention. Here is an example:
}usr}bin}less#0
/
) characters replaced by right curly brackets (}
), and ending with a pound sign (#
) followed by a number (in this case, 0
.) Therefore, the file used in this example represents data collected while /usr/bin/less
was running.
opcontrol --dump
command to force the samples to disk.
op_time
is used to display (in reverse order -- from highest number of samples to lowest) the samples that have been collected:
3321080 48.8021 0.0000 /boot/vmlinux-2.4.21-1.1931.2.349.2.2.entsmp 761776 11.1940 0.0000 /usr/bin/oprofiled 368933 5.4213 0.0000 /lib/tls/libc-2.3.2.so 293570 4.3139 0.0000 /usr/lib/libgobject-2.0.so.0.200.2 205231 3.0158 0.0000 /usr/lib/libgdk-x11-2.0.so.0.200.2 167575 2.4625 0.0000 /usr/lib/libglib-2.0.so.0.200.2 123095 1.8088 0.0000 /lib/libcrypto.so.0.9.7a 105677 1.5529 0.0000 /usr/X11R6/bin/XFree86 …
less
is a good idea when producing a report interactively, as the reports can be hundreds of lines long. The example given here has been truncated for that reason.
<sample-count> <sample-percent> <unused-field> <executable-name>
<sample-count>
represents the number of samples collected<sample-percent>
represents the percentage of all samples collected for this specific executable<unused-field>
is a field that is not used<executable-name>
represents the name of the file containing executable code for which samples were collected.
XFree86
. It is worth noting that for the system running this sample session, the counter value of 6000 used represents the minimum value recommended by opcontrol --list-events
. This means that -- at least for this particular system -- OProfile overhead at its highest consumes roughly 11% of the CPU.
2.6. Additional Resources
2.6.1. Installed Documentation
free(1)
man page -- Learn how to display free and used memory statistics.top(1)
man page -- Learn how to display CPU utilization and process-level statistics.watch(1)
man page -- Learn how to periodically execute a user-specified program, displaying fullscreen output.- GNOME System MonitorHelp menu entry -- Learn how to graphically display process, CPU, memory, and disk space utilization statistics.
vmstat(8)
man page -- Learn how to display a concise overview of process, memory, swap, I/O, system, and CPU utilization.iostat(1)
man page -- Learn how to display CPU and I/O statistics.mpstat(1)
man page -- Learn how to display individual CPU statistics on multiprocessor systems.sadc(8)
man page -- Learn how to collects system utilization data.sa1(8)
man page -- Learn about a script that runssadc
periodically.sar(1)
man page -- Learn how to produce system resource utilization reports.sa2(8)
man page -- Learn how to produce daily system resource utilization report files.nice(1)
man page -- Learn how to change process scheduling priority.oprofile(1)
man page -- Learn how to profile system performance.op_visualise(1)
man page -- Learn how to graphically display OProfile data.
2.6.2. Useful Websites
- http://people.redhat.com/alikins/system_tuning.html -- System Tuning Info for Linux Servers. A stream-of-consciousness approach to performance tuning and resource monitoring for servers.
- http://www.linuxjournal.com/article.php?sid=2396 -- Performance Monitoring Tools for Linux. This Linux Journal page is geared more toward the administrator interested in writing a customized performance graphing solution. Written several years ago, some of the details may no longer apply, but the overall concept and execution are sound.
- http://oprofile.sourceforge.net/ -- OProfile project website. Includes valuable OProfile resources, including pointers to mailing lists and the #oprofile IRC channel.
2.6.3. Related Books
- The System Administrators Guide; Red Hat, Inc -- Includes information on many of the resource monitoring tools described here, including OProfile.
- Linux Performance Tuning and Capacity Planning by Jason R. Fink and Matthew D. Sherer; Sams -- Provides more in-depth overviews of the resource monitoring tools presented here and includes others that might be appropriate for more specific resource monitoring needs.
- Red Hat Linux Security and Optimization by Mohammed J. Kabir; Red Hat Press -- Approximately the first 150 pages of this book discuss performance-related issues. This includes chapters dedicated to performance issues specific to network, Web, email, and file servers.
- Linux Administration Handbook by Evi Nemeth, Garth Snyder, and Trent R. Hein; Prentice Hall -- Provides a short chapter similar in scope to this book, but includes an interesting section on diagnosing a system that has suddenly slowed down.
- Linux System Administration: A User's Guide by Marcel Gagne; Addison Wesley Professional -- Contains a small chapter on performance monitoring and tuning.
ls -l
to display the desired device file in /dev/
. The major number appears after the device's group specification.
Chapter 3. Bandwidth and Processing Power
3.1. Bandwidth
- A set of electrical conductors used to make low-level communication possible
- A protocol to facilitate the efficient and reliable communication of data
- Buses
- Datapaths
3.1.1. Buses
- Standardized electrical characteristics (such as the number of conductors, voltage levels, signaling speeds, etc.)
- Standardized mechanical characteristics (such as the type of connector, card size, physical layout, etc.)
- Standardized protocol
3.1.1.1. Examples of Buses
- Mass storage buses (ATA and SCSI)
- Networks[9] (Ethernet and Token Ring)
- Memory buses (PC133 and Rambus®)
- Expansion buses (PCI, ISA, USB)
3.1.2. Datapaths
- Use a simpler protocol (if any)
- Have little (if any) mechanical standardization
3.1.3. Potential Bandwidth-Related Problems
- The bus or datapath may represent a shared resource. In this situation, high levels of contention for the bus reduces the effective bandwidth available for all devices on the bus.A SCSI bus with several highly-active disk drives would be a good example of this. The highly-active disk drives saturate the SCSI bus, leaving little bandwidth available for any other device on the same bus. The end result is that all I/O to any of the devices on this bus is slow, even if each device on the bus is not overly active.
- The bus or datapath may be a dedicated resource with a fixed number of devices attached to it. In this case, the electrical characteristics of the bus (and to some extent the nature of the protocol being used) limit the available bandwidth. This is usually more the case with datapaths than with buses. This is one reason why graphics adapters tend to perform more slowly when operating at higher resolutions and/or color depths -- for every screen refresh, there is more data that must be passed along the datapath connecting video memory and the graphics processor.
3.1.4. Potential Bandwidth-Related Solutions
- Spread the load
- Reduce the load
- Increase the capacity
3.1.4.1. Spread the Load
3.1.4.2. Reduce the Load
3.1.4.3. Increase the Capacity
3.1.5. In Summary…
3.2. Processing Power
3.2.1. Facts About Processing Power
- Processing power is fixed
- Processing power cannot be stored
3.2.2. Consumers of Processing Power
- Applications
- The operating system itself
3.2.2.1. Applications
3.2.2.2. The Operating System
- Operating system housekeeping
- Process-related activities
3.2.3. Improving a CPU Shortage
- Reducing the load
- Increasing the capacity
3.2.3.1. Reducing the Load
- Reducing operating system overhead
- Reducing application overhead
- Eliminating applications entirely
3.2.3.1.1. Reducing Operating System Overhead
- Reducing the need for frequent process scheduling
- Reducing the amount of I/O performed
3.2.3.1.2. Reducing Application Overhead
3.2.3.1.3. Eliminating Applications Entirely
Note
3.2.3.2. Increasing the Capacity
3.2.3.2.1. Upgrading the CPU
3.2.3.2.2. Is Symmetric Multiprocessing Right for You?
3.3. Red Hat Enterprise Linux-Specific Information
3.3.1. Monitoring Bandwidth on Red Hat Enterprise Linux
vmstat
, it is possible to determine if overall device activity is excessive by examining the bi
and bo
fields; in addition, taking note of the si
and so
fields give you a bit more insight into how much disk activity is due to swap-related I/O:
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 1 0 0 248088 158636 480804 0 0 2 6 120 120 10 3 87 0
bi
field shows two blocks/second written to block devices (primarily disk drives), while the bo
field shows six blocks/second read from block devices. We can determine that none of this activity was due to swapping, as the si
and so
fields both show a swap-related I/O rate of zero kilobytes/second.
iostat
, it is possible to gain a bit more insight into disk-related activity:
Linux 2.4.21-1.1931.2.349.2.2.entsmp (raptor.example.com) 07/21/2003 avg-cpu: %user %nice %sys %idle 5.34 4.60 2.83 87.24 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn dev8-0 1.10 6.21 25.08 961342 3881610 dev8-1 0.00 0.00 0.00 16 0
/dev/sda
, the first SCSI disk) averaged slightly more than one I/O operation per second (the tsp
field). Most of the I/O activity for this device were writes (the Blk_wrtn
field), with slightly more than 25 blocks written each second (the Blk_wrtn/s
field).
iostat
's -x
option:
Linux 2.4.21-1.1931.2.349.2.2.entsmp (raptor.example.com) 07/21/2003 avg-cpu: %user %nice %sys %idle 5.37 4.54 2.81 87.27 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s avgrq-sz /dev/sda 13.57 2.86 0.36 0.77 32.20 29.05 16.10 14.53 54.52 /dev/sda1 0.17 0.00 0.00 0.00 0.34 0.00 0.17 0.00 133.40 /dev/sda2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 11.56 /dev/sda3 0.31 2.11 0.29 0.62 4.74 21.80 2.37 10.90 29.42 /dev/sda4 0.09 0.75 0.04 0.15 1.06 7.24 0.53 3.62 43.01
iostat
output is now displaying statistics on a per-partition level. By using df
to associate mount points with device names, it is possible to use this report to determine if, for example, the partition containing /home/
is experiencing an excessive workload.
iostat -x
is longer and contains more information than this; here is the remainder of each line (with the device column added for easier reading):
Device: avgqu-sz await svctm %util /dev/sda 0.24 20.86 3.80 0.43 /dev/sda1 0.00 141.18 122.73 0.03 /dev/sda2 0.00 6.00 6.00 0.00 /dev/sda3 0.12 12.84 2.68 0.24 /dev/sda4 0.11 57.47 8.94 0.17
/dev/sda2
is the system swap partition; it is obvious from the many fields reading 0.00
for this partition that swapping is not a problem on this system.
/dev/sda1
. The statistics for this partition are unusual; the overall activity seems low, but why are the average I/O request size (the avgrq-sz
field), average wait time (the await
field), and the average service time (the svctm
field) so much larger than the other partitions? The answer is that this partition contains the /boot/
directory, which is where the kernel and initial ramdisk are stored. When the system boots, the read I/Os (notice that only the rsec/s
and rkB/s
fields are non-zero; no writing is done here on a regular basis) used during the boot process are for large numbers of blocks, resulting in the relatively long wait and service times iostat
displays.
sar
for a longer-term overview of I/O statistics; for example, sar -b
displays a general I/O report:
Linux 2.4.21-1.1931.2.349.2.2.entsmp (raptor.example.com) 07/21/2003 12:00:00 AM tps rtps wtps bread/s bwrtn/s 12:10:00 AM 0.51 0.01 0.50 0.25 14.32 12:20:01 AM 0.48 0.00 0.48 0.00 13.32 … 06:00:02 PM 1.24 0.00 1.24 0.01 36.23 Average: 1.11 0.31 0.80 68.14 34.79
iostat
's initial display, the statistics are grouped for all block devices.
sar -d
:
Linux 2.4.21-1.1931.2.349.2.2.entsmp (raptor.example.com) 07/21/2003 12:00:00 AM DEV tps sect/s 12:10:00 AM dev8-0 0.51 14.57 12:10:00 AM dev8-1 0.00 0.00 12:20:01 AM dev8-0 0.48 13.32 12:20:01 AM dev8-1 0.00 0.00 … 06:00:02 PM dev8-0 1.24 36.25 06:00:02 PM dev8-1 0.00 0.00 Average: dev8-0 1.11 102.93 Average: dev8-1 0.00 0.00
3.3.2. Monitoring CPU Utilization on Red Hat Enterprise Linux
sar
, it is possible to accurately determine how much CPU power is being consumed and by what.
top
is the first resource monitoring tool discussed in Chapter 2, Resource Monitoring to provide a more in-depth representation of CPU utilization. Here is a top
report from a dual-processor workstation:
9:44pm up 2 days, 2 min, 1 user, load average: 0.14, 0.12, 0.09 90 processes: 82 sleeping, 1 running, 7 zombie, 0 stopped CPU0 states: 0.4% user, 1.1% system, 0.0% nice, 97.4% idle CPU1 states: 0.5% user, 1.3% system, 0.0% nice, 97.1% idle Mem: 1288720K av, 1056260K used, 232460K free, 0K shrd, 145644K buff Swap: 522104K av, 0K used, 522104K free 469764K cached PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND 30997 ed 16 0 1100 1100 840 R 1.7 0.0 0:00 top 1120 root 5 -10 249M 174M 71508 S 0.9 13.8 254:59 X 1260 ed 15 0 54408 53M 6864 S 0.7 4.2 12:09 gnome-terminal 888 root 15 0 2428 2428 1796 S 0.1 0.1 0:06 sendmail 1264 ed 15 0 16336 15M 9480 S 0.1 1.2 1:58 rhn-applet-gui 1 root 15 0 476 476 424 S 0.0 0.0 0:05 init 2 root 0K 0 0 0 0 SW 0.0 0.0 0:00 migration_CPU0 3 root 0K 0 0 0 0 SW 0.0 0.0 0:00 migration_CPU1 4 root 15 0 0 0 0 SW 0.0 0.0 0:01 keventd 5 root 34 19 0 0 0 SWN 0.0 0.0 0:00 ksoftirqd_CPU0 6 root 34 19 0 0 0 SWN 0.0 0.0 0:00 ksoftirqd_CPU1 7 root 15 0 0 0 0 SW 0.0 0.0 0:05 kswapd 8 root 15 0 0 0 0 SW 0.0 0.0 0:00 bdflush 9 root 15 0 0 0 0 SW 0.0 0.0 0:01 kupdated 10 root 25 0 0 0 0 SW 0.0 0.0 0:00 mdrecoveryd
top
does), which represent the load average for the past 1, 5, and 15 minutes, indicating that the system in this example was not very busy.
top
itself; in other words, the one runnable process on this otherwise-idle system was top
taking a "picture" of itself.
Note
vmstat
, we obtain a slightly different understanding of our example system:
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 1 0 0 233276 146636 469808 0 0 7 7 14 27 10 3 87 0 0 0 0 233276 146636 469808 0 0 0 0 523 138 3 0 96 0 0 0 0 233276 146636 469808 0 0 0 0 557 385 2 1 97 0 0 0 0 233276 146636 469808 0 0 0 0 544 343 2 0 97 0 0 0 0 233276 146636 469808 0 0 0 0 517 89 2 0 98 0 0 0 0 233276 146636 469808 0 0 0 32 518 102 2 0 98 0 0 0 0 233276 146636 469808 0 0 0 0 516 91 2 1 98 0 0 0 0 233276 146636 469808 0 0 0 0 516 72 2 0 98 0 0 0 0 233276 146636 469808 0 0 0 0 516 88 2 0 97 0 0 0 0 233276 146636 469808 0 0 0 0 516 81 2 0 97 0
vmstat 1 10
to sample the system every second for ten times. At first, the CPU-related statistics (the us
, sy
, and id
fields) seem similar to what top
displayed, and maybe even appear a bit less detailed. However, unlike top
, we can also gain a bit of insight into how the CPU is being used.
system
fields, we notice that the CPU is handling about 500 interrupts per second on average and is switching between processes anywhere from 80 to nearly 400 times a second. If you think this seems like a lot of activity, think again, because the user-level processing (the us
field) is only averaging 2%, while system-level processing (the sy
field) is usually under 1%. Again, this is an idle system.
iostat
and mpstat
provide little additional information over what we have already experienced with top
and vmstat
. However, sar
produces a number of reports that can come in handy when monitoring CPU utilization.
sar -q
, which displays the run queue length, total number of processes, and the load averages for the past one and five minutes. Here is a sample:
Linux 2.4.21-1.1931.2.349.2.2.entsmp (falcon.example.com) 07/21/2003 12:00:01 AM runq-sz plist-sz ldavg-1 ldavg-5 12:10:00 AM 3 122 0.07 0.28 12:20:01 AM 5 123 0.00 0.03 … 09:50:00 AM 5 124 0.67 0.65 Average: 4 123 0.26 0.26
sar
report is produced by the command sar -u
:
Linux 2.4.21-1.1931.2.349.2.2.entsmp (falcon.example.com) 07/21/2003 12:00:01 AM CPU %user %nice %system %idle 12:10:00 AM all 3.69 20.10 1.06 75.15 12:20:01 AM all 1.73 0.22 0.80 97.25 … 10:00:00 AM all 35.17 0.83 1.06 62.93 Average: all 7.47 4.85 3.87 83.81
sar
makes the data available on an ongoing basis and is therefore more useful for obtaining long-term averages, or for the production of CPU utilization graphs.
sar -U
command can produce statistics for an individual processor or for all processors. Here is an example of output from sar -U ALL
:
Linux 2.4.21-1.1931.2.349.2.2.entsmp (falcon.example.com) 07/21/2003 12:00:01 AM CPU %user %nice %system %idle 12:10:00 AM 0 3.46 21.47 1.09 73.98 12:10:00 AM 1 3.91 18.73 1.03 76.33 12:20:01 AM 0 1.63 0.25 0.78 97.34 12:20:01 AM 1 1.82 0.20 0.81 97.17 … 10:00:00 AM 0 39.12 0.75 1.04 59.09 10:00:00 AM 1 31.22 0.92 1.09 66.77 Average: 0 7.61 4.91 3.86 83.61 Average: 1 7.33 4.78 3.88 84.02
sar -w
command reports on the number of context switches per second, making it possible to gain additional insight in where CPU cycles are being spent:
Linux 2.4.21-1.1931.2.349.2.2.entsmp (falcon.example.com) 07/21/2003 12:00:01 AM cswch/s 12:10:00 AM 537.97 12:20:01 AM 339.43 … 10:10:00 AM 319.42 Average: 1158.25
sar
reports on interrupt activity. The first, (produced using the sar -I SUM
command) displays a single "interrupts per second" statistic:
Linux 2.4.21-1.1931.2.349.2.2.entsmp (falcon.example.com) 07/21/2003 12:00:01 AM INTR intr/s 12:10:00 AM sum 539.15 12:20:01 AM sum 539.49 … 10:40:01 AM sum 539.10 Average: sum 541.00
sar -I PROC
, it is possible to break down interrupt activity by processor (on multiprocessor systems) and by interrupt level (from 0 to 15):
Linux 2.4.21-1.1931.2.349.2.2.entsmp (pigdog.example.com) 07/21/2003 12:00:00 AM CPU i000/s i001/s i002/s i008/s i009/s i011/s i012/s 12:10:01 AM 0 512.01 0.00 0.00 0.00 3.44 0.00 0.00 12:10:01 AM CPU i000/s i001/s i002/s i008/s i009/s i011/s i012/s 12:20:01 AM 0 512.00 0.00 0.00 0.00 3.73 0.00 0.00 … 10:30:01 AM CPU i000/s i001/s i002/s i003/s i008/s i009/s i010/s 10:40:02 AM 0 512.00 1.67 0.00 0.00 0.00 15.08 0.00 Average: 0 512.00 0.42 0.00 N/A 0.00 6.03 N/A
i002/s
field illustrating the rate for interrupt level 2). If this were a multiprocessor system, there would be one line per sample period for each CPU.
sar
adds or removes specific interrupt fields if no data is collected for that field. The example report above provides an example of this, the end of the report includes interrupt levels (3 and 10) that were not present at the start of the sampling period.
Note
sar
reports -- sar -I ALL
and sar -I XALL
. However, the default configuration for the sadc
data collection utility does not collect the information necessary for these reports. This can be changed by editing the file /etc/cron.d/sysstat
, and changing this line:
*/10 * * * * root /usr/lib/sa/sa1 1 1
*/10 * * * * root /usr/lib/sa/sa1 -I 1 1
sadc
, and results in larger data file sizes. Therefore, make sure your system configuration can support the additional space consumption.
3.4. Additional Resources
3.4.1. Installed Documentation
vmstat(8)
man page -- Learn how to display a concise overview of process, memory, swap, I/O, system, and CPU utilization.iostat(1)
man page -- Learn how to display CPU and I/O statistics.sar(1)
man page -- Learn how to produce system resource utilization reports.sadc(8)
man page -- Learn how to collect system utilization data.sa1(8)
man page -- Learn about a script that runssadc
periodically.top(1)
man page -- Learn how to display CPU utilization and process-level statistics.
3.4.2. Useful Websites
- http://people.redhat.com/alikins/system_tuning.html -- System Tuning Info for Linux Servers. A stream-of-consciousness approach to performance tuning and resource monitoring for servers.
- http://www.linuxjournal.com/article.php?sid=2396 -- Performance Monitoring Tools for Linux. This Linux Journal page is geared more toward the administrator interested in writing a customized performance graphing solution. Written several years ago, some of the details may no longer apply, but the overall concept and execution are sound.
3.4.3. Related Books
- The System Administrators Guide; Red Hat, Inc -- Includes a chapter on many of the resource monitoring tools described here.
- Linux Performance Tuning and Capacity Planning by Jason R. Fink and Matthew D. Sherer; Sams -- Provides more in-depth overviews of the resource monitoring tools presented here, and includes others that might be appropriate for more specific resource monitoring needs.
- Linux Administration Handbook by Evi Nemeth, Garth Snyder, and Trent R. Hein; Prentice Hall -- Provides a short chapter similar in scope to this book, but includes an interesting section on diagnosing a system that has suddenly slowed down.
- Linux System Administration: A User's Guide by Marcel Gagne; Addison Wesley Professional -- Contains a small chapter on performance monitoring and tuning.
Chapter 4. Physical and Virtual Memory
4.1. Storage Access Patterns
- Access tends to be sequential
- Access tends to be localized
4.2. The Storage Spectrum
- CPU registers
- Cache memory
- RAM
- Hard drives
- Off-line backup storage (tape, optical disk, etc.)
- Very fast (access times of a few nanoseconds)
- Low capacity (usually less than 200 bytes)
- Very limited expansion capabilities (a change in CPU architecture would be required)
- Expensive (more than one dollar/byte)
- Very slow (access times may be measured in days, if the backup media must be shipped long distances)
- Very high capacity (10s - 100s of gigabytes)
- Essentially unlimited expansion capabilities (limited only by the floorspace needed to house the backup media)
- Very inexpensive (fractional cents/byte)
4.2.1. CPU Registers
4.2.2. Cache Memory
4.2.2.1. Cache Levels
- L1 cache is often located directly on the CPU chip itself and runs at the same speed as the CPU
- L2 cache is often part of the CPU module, runs at CPU speeds (or nearly so), and is usually a bit larger and slower than L1 cache
4.2.3. Main Memory -- RAM
- Power connections (to operate the circuitry within the chip)
- Data connections (to enable the transfer of data into or out of the chip)
- Read/Write connections (to control whether data is to be stored into or retrieved from the chip)
- Address connections (to determine where in the chip the data should be read/written)
- The data to be stored is presented to the data connections.
- The address at which the data is to be stored is presented to the address connections.
- The read/write connection is set to write mode.
- The address of the desired data is presented to the address connections.
- The read/write connection is set to read mode.
- The desired data is read from the data connections.
Note
4.2.4. Hard Drives
- Access arm movement (5.5 milliseconds)
- Disk rotation (.1 milliseconds)
- Heads reading/writing data (.00014 milliseconds)
- Data transfer to/from the drive's electronics (.003 Milliseconds)
Note
4.2.5. Off-Line Backup Storage
- Magnetic tape
- Optical disk
4.3. Basic Virtual Memory Concepts
4.3.1. Virtual Memory in Simple Terms
- The instruction is read from memory.
- The data required by the instruction is read from memory.
- After the instruction completes, the results of the instruction are written back to memory.
4.3.2. Backing Store -- the Central Tenet of Virtual Memory
4.4. Virtual Memory: The Details
4.4.1. Page Faults
- Find where the desired page resides on disk and read it in (this is normally the case if the page fault is for a page of code)
- Determine that the desired page is already in RAM (but not allocated to the current process) and reconfigure the MMU to point to it
- Point to a special page containing only zeros, and allocate a new page for the process only if the process ever attempts to write to the special page (this is called a copy on write page, and is often used for pages containing zero-initialized data)
- Get the desired page from somewhere else (which is discussed in more detail later)
4.4.2. The Working Set
- Writing modified pages to a dedicated area on a mass storage device (usually known as swapping or paging space)
- Marking unmodified pages as being free (there is no need to write these pages out to disk as they have not changed)
4.4.3. Swapping
- Pages from a process are swapped
- The process becomes runnable and attempts to access a swapped page
- The page is faulted back into memory (most likely forcing some other processes' pages to be swapped out)
- A short time later, the page is swapped out again
4.5. Virtual Memory Performance Implications
4.5.1. Worst Case Performance Scenario
- RAM -- It stands to reason that available RAM is low (otherwise there would be no need to page fault or swap).
- Disk -- While disk space might not be impacted, I/O bandwidth (due to heavy paging and swapping) would be.
- CPU -- The CPU is expending cycles doing the processing required to support memory management and setting up the necessary I/O operations for paging and swapping.
4.5.2. Best Case Performance Scenario
- RAM -- Sufficient RAM for all working sets with enough left over to handle any page faults[12]
- Disk -- Because of the limited page fault activity, disk I/O bandwidth would be minimally impacted
- CPU -- The majority of CPU cycles are dedicated to actually running applications, instead of running the operating system's memory management code
4.6. Red Hat Enterprise Linux-Specific Information
free
, it is possible to get a concise (if somewhat simplistic) overview of memory and swap utilization. Here is an example:
total used free shared buffers cached Mem: 1288720 361448 927272 0 27844 187632 -/+ buffers/cache: 145972 1142748 Swap: 522104 0 522104
total used free shared buffers cached Mem: 255088 246604 8484 0 6492 111320 -/+ buffers/cache: 128792 126296 Swap: 530136 111308 418828
free
, vmstat
has the benefit of displaying more than memory utilization statistics. Here is the output from vmstat 1 10
:
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 2 0 111304 9728 7036 107204 0 0 6 10 120 24 10 2 89 0 2 0 111304 9728 7036 107204 0 0 0 0 526 1653 96 4 0 0 1 0 111304 9616 7036 107204 0 0 0 0 552 2219 94 5 1 0 1 0 111304 9616 7036 107204 0 0 0 0 624 699 98 2 0 0 2 0 111304 9616 7052 107204 0 0 0 48 603 1466 95 5 0 0 3 0 111304 9620 7052 107204 0 0 0 0 768 932 90 4 6 0 3 0 111304 9440 7076 107360 92 0 244 0 820 1230 85 9 6 0 2 0 111304 9276 7076 107368 0 0 0 0 832 1060 87 6 7 0 3 0 111304 9624 7092 107372 0 0 16 0 813 1655 93 5 2 0 2 0 111304 9624 7108 107372 0 0 0 972 1189 1165 68 9 23 0
free
field) varies somewhat, and there is a bit of swap-related I/O (the si
and so
fields), but overall this system is running well. It is doubtful, however, how much additional workload it could handle, given the current memory utilization.
sar
, it is possible to examine this aspect of system performance in much more detail.
sar -r
report, we can examine memory and swap utilization more closely:
Linux 2.4.20-1.1931.2.231.2.10.ent (pigdog.example.com) 07/22/2003 12:00:01 AM kbmemfree kbmemused %memused kbmemshrd kbbuffers kbcached 12:10:00 AM 240468 1048252 81.34 0 133724 485772 12:20:00 AM 240508 1048212 81.34 0 134172 485600 … 08:40:00 PM 934132 354588 27.51 0 26080 185364 Average: 324346 964374 74.83 0 96072 467559
kbmemfree
and kbmemused
fields show the typical free and used memory statistics, with the percentage of memory used displayed in the %memused
field. The kbbuffers
and kbcached
fields show how many kilobytes of memory are allocated to buffers and the system-wide data cache.
kbmemshrd
field is always zero for systems (such as Red Hat Enterprise Linux) using the 2.4 Linux kernel.
12:00:01 AM kbswpfree kbswpused %swpused 12:10:00 AM 522104 0 0.00 12:20:00 AM 522104 0 0.00 … 08:40:00 PM 522104 0 0.00 Average: 522104 0 0.00
kbswpfree
and kbswpused
fields show the amount of free and used swap space, in kilobytes, with the %swpused
field showing the swap space used as a percentage.
sar -W
report. Here is an example:
Linux 2.4.20-1.1931.2.231.2.10.entsmp (raptor.example.com) 07/22/2003 12:00:01 AM pswpin/s pswpout/s 12:10:01 AM 0.15 2.56 12:20:00 AM 0.00 0.00 … 03:30:01 PM 0.42 2.56 Average: 0.11 0.37
pswpin/s
) as there were going out to swap (pswpout/s
).
sar -B
report:
Linux 2.4.20-1.1931.2.231.2.10.entsmp (raptor.example.com) 07/22/2003 12:00:01 AM pgpgin/s pgpgout/s activepg inadtypg inaclnpg inatarpg 12:10:00 AM 0.03 8.61 195393 20654 30352 49279 12:20:00 AM 0.01 7.51 195385 20655 30336 49275 … 08:40:00 PM 0.00 7.79 71236 1371 6760 15873 Average: 201.54 201.54 169367 18999 35146 44702
pgpgin/s
) and paged out to disk (pgpgout/s
). These statistics serve as a barometer of overall virtual memory activity.
activepg
field) averages approximately 660MB[13].
inadtypg
field shows how many inactive pages are dirty (modified) and may need to be written to disk. The inaclnpg
field, on the other hand, shows how many inactive pages are clean (unmodified) and do not need to be written to disk.
inatarpg
field represents the desired size of the inactive list. This value is calculated by the Linux kernel and is sized such that the inactive list remains large enough to act as a pool for page replacement purposes.
sar -R
report. Here is a sample report:
Linux 2.4.20-1.1931.2.231.2.10.entsmp (raptor.example.com) 07/22/2003 12:00:01 AM frmpg/s shmpg/s bufpg/s campg/s 12:10:00 AM -0.10 0.00 0.12 -0.07 12:20:00 AM 0.02 0.00 0.19 -0.07 … 08:50:01 PM -3.19 0.00 0.46 0.81 Average: 0.01 0.00 -0.00 -0.00
sar
report are unique, in that they may be positive, negative, or zero. When positive, the value indicates the rate at which pages of this type are increasing. When negative, the value indicates the rate at which pages of this type are decreasing. A value of zero indicates that pages of this type are neither increasing or decreasing.
frmpg/s
field) and nearly 1 page per second added to the page cache (the campg/s
field). The list of pages used as buffers (the bufpg/s
field) gained approximately one page every two seconds, while the shared memory page list (the shmpg/s
field) neither gained nor lost any pages.
4.7. Additional Resources
4.7.1. Installed Documentation
free(1)
man page -- Learn how to display free and used memory statistics.vmstat(8)
man page -- Learn how to display a concise overview of process, memory, swap, I/O, system, and CPU utilization.sar(1)
man page -- Learn how to produce system resource utilization reports.sa2(8)
man page -- Learn how to produce daily system resource utilization report files.
4.7.2. Useful Websites
- http://people.redhat.com/alikins/system_tuning.html -- System Tuning Info for Linux Servers. A stream-of-consciousness approach to performance tuning and resource monitoring for servers.
- http://www.linuxjournal.com/article.php?sid=2396 -- Performance Monitoring Tools for Linux. This Linux Journal page is geared more toward the administrator interested in writing a customized performance graphing solution. Written several years ago, some of the details may no longer apply, but the overall concept and execution are sound.
4.7.3. Related Books
- The System Administrators Guide; Red Hat, Inc -- Includes a chapter on many of the resource monitoring tools described here.
- Linux Performance Tuning and Capacity Planning by Jason R. Fink and Matthew D. Sherer; Sams -- Provides more in-depth overviews of the resource monitoring tools presented here and includes others that might be appropriate for more specific resource monitoring needs.
- Red Hat Linux Security and Optimization by Mohammed J. Kabir; Red Hat Press -- Approximately the first 150 pages of this book discuss performance-related issues. This includes chapters dedicated to performance issues specific to network, Web, email, and file servers.
- Linux Administration Handbook by Evi Nemeth, Garth Snyder, and Trent R. Hein; Prentice Hall -- Provides a short chapter similar in scope to this book, but includes an interesting section on diagnosing a system that has suddenly slowed down.
- Linux System Administration: A User's Guide by Marcel Gagne; Addison Wesley Professional -- Contains a small chapter on performance monitoring and tuning.
- Essential System Administration (3rd Edition) by Aeleen Frisch; O'Reilly &Associates -- The chapter on managing system resources contains good overall information, with some Linux specifics included.
- System Performance Tuning (2nd Edition) by Gian-Paolo D. Musumeci and Mike Loukides; O'Reilly &Associates -- Although heavily oriented toward more traditional UNIX implementations, there are many Linux-specific references throughout the book.
Chapter 5. Managing Storage
5.1. An Overview of Storage Hardware
- Disk platters
- Data reading/writing device
- Access arms
5.1.1. Disk Platters
5.1.2. Data reading/writing device
5.1.3. Access Arms
- Moving very quickly
- Moving very precisely
5.2. Storage Addressing Concepts
5.2.1. Geometry-Based Addressing
- Cylinder
- Head
- Sector
5.2.1.1. Cylinder
Cylinder | Head | Sector |
---|---|---|
1014 | X | X |
5.2.1.2. Head
Cylinder | Head | Sector |
---|---|---|
1014 | 2 | X |
5.2.1.3. Sector
Cylinder | Head | Sector |
---|---|---|
1014 | 2 | 12 |
5.2.1.4. Problems with Geometry-Based Addressing
5.2.2. Block-Based Addressing
5.3. Mass Storage Device Interfaces
- There are many different (mostly incompatible) interfaces
- Different interfaces have different performance and price characteristics
5.3.1. Historical Background
- FD-400
- An interface originally designed for the original 8-inch floppy disk drives in the mid-70s. Used a 44-conductor cable with an circuit board edge connector that supplied both power and data.
- SA-400
- Another floppy disk drive interface (this time originally developed in the late-70s for the then-new 5.25 inch floppy drive). Used a 34-conductor cable with a standard socket connector. A slightly modified version of this interface is still used today for 5.25 inch floppy and 3.5 inch diskette drives.
- IPI
- Standing for Intelligent Peripheral Interface, this interface was used on the 8 and 14-inch disk drives deployed on minicomputers of the 1970s.
- SMD
- A successor to IPI, SMD (stands for Storage Module Device) was used on 8 and 14-inch minicomputer hard drives in the 70s and 80s.
- ST506/412
- A hard drive interface dating from the early 80s. Used in many personal computers of the day, this interface used two cables -- one 34-conductor and one 20-conductor.
- ESDI
- Standing for Enhanced Small Device Interface, this interface was considered a successor to ST506/412 with faster transfer rates and larger supported drive sizes. Dating from the mid-80s, ESDI used the same two-cable connection scheme of its predecessor.
5.3.2. Present-Day Industry-Standard Interfaces
- IDE/ATA
- SCSI
5.3.2.1. IDE/ATA
Note
Note
5.3.2.2. SCSI
- Bus width
- Bus speed
- Electrical characteristics
Note
Note
5.4. Hard Drive Performance Characteristics
- The hard drive's mechanical and electrical limitations
- The I/O load imposed by the system
5.4.1. Mechanical/Electrical Limitations
Note
5.4.1.1. Command Processing Time
- Interacting with the outside world via hard drive's interface
- Controlling the operation of the rest of the hard drive's components, recovering from any error conditions that might arise
- Processing the raw data read from and written to the actual storage media
5.4.1.2. Heads Reading/Writing Data
5.4.1.3. Rotational Latency
5.4.1.4. Access Arm Movement
5.4.2. I/O Loads and Performance
- The amount of reads versus writes
- The number of current readers/writers
- The locality of reads/writes
5.4.2.1. Reads Versus Writes
5.4.2.2. Multiple Readers/Writers
5.4.2.3. Locality of Reads/Writes
Adjacent Cylinder | Full-Stroke |
---|---|
0.6 | 8.2 |
5.5. Making the Storage Usable
5.5.1. Partitions/Slices
5.5.1.1. Partition Attributes
- Partition geometry
- Partition type
- Partition type field
5.5.1.1.1. Geometry
5.5.1.1.2. Partition Type
- Primary partitions
- Extended partitions
- Logical partitions
5.5.1.1.2.1. Primary Partitions
5.5.1.1.2.2. Extended Partitions
5.5.1.1.3. Partition Type Field
5.5.2. File Systems
- File-based data storage
- Hierarchical directory (sometimes known as "folder") structure
- Tracking of file creation, access, and modification times
- Some level of control over the type of access allowed for a specific file
- Some concept of file ownership
- Accounting of space utilized
5.5.2.1. File-Based Storage
5.5.2.2. Hierarchical Directory Structure
5.5.2.3. Tracking of File Creation, Access, Modification Times
5.5.2.4. Access Control
- User identification
- Permitted action list
- Reading the file
- Writing the file
- Executing the file
5.5.2.5. Accounting of Space Utilized
5.5.3. Directory Structure
- More easily understood
- More flexibility in the future
accounting
, people that work in engineering would have their directories under engineering
, and so on.
engineering
directory, it would be a straightforward process to:
- Procure the additional storage necessary to support engineering
- Back up everything under the
engineering
directory - Restore the backup onto the new storage
- Rename the
engineering
directory on the original storage to something likeengineering-archive
(before deleting it entirely after running smoothly with the new configuration for a month) - Make the necessary changes so that all engineering personnel can access their files on the new storage
5.5.4. Enabling Storage Access
5.6. Advanced Storage Technologies
5.6.1. Network-Accessible Storage
- Consolidation of storage
- Simplified administration
- Minimal client integration issues
- Minimal work on each client system
- Low per-client cost of entry
5.6.2. RAID-Based Storage
5.6.2.1. Basic Concepts
5.6.2.1.1. RAID Levels
- Level 0
- Level 1
- Level 5
5.6.2.1.1.1. RAID 0
- The first 4KB would be written to the first drive, into the first chunk
- The second 4KB would be written to the second drive, into the first chunk
- The last 4KB would be written to the first drive, into the second chunk
- Larger total size -- RAID 0 arrays can be constructed that are larger than a single disk drive, making it easier to store larger data files
- Better read/write performance -- The I/O load on a RAID 0 array is spread evenly among all the drives in the array (Assuming all the I/O is not concentrated on a single chunk)
- No wasted space -- All available storage on all drives in the array are available for data storage
- Less reliability -- Every drive in a RAID 0 array must be operative for the array to be available; a single drive failure in an N-drive RAID 0 array results in the removal of 1/Nth of all the data, rendering the array useless
Note
5.6.2.1.1.2. RAID 1
- Improved redundancy -- Even if one drive in the array were to fail, the data would still be accessible
- Improved read performance -- With both drives operational, reads can be evenly split between them, reducing per-drive I/O loads
- Maximum array size is limited to the largest single drive available.
- Reduced write performance -- Because both drives must be kept up-to-date, all write I/Os must be performed by both drives, slowing the overall process of writing data to the array
- Reduced cost efficiency -- With one entire drive dedicated to redundancy, the cost of a RAID 1 array is at least double that of a single drive
Note
5.6.2.1.1.3. RAID 5
- Improved redundancy -- If one drive in the array fails, the parity information can be used to reconstruct the missing data chunks, all while keeping the array available for use[22]
- Improved read performance -- Due to the RAID 0-like way data is divided between drives in the array, read I/O activity is spread evenly between all the drives
- Reasonably good cost efficiency -- For a RAID 5 array of n drives, only 1/nth of the total available storage is dedicated to redundancy
- Reduced write performance -- Because each write to the array results in at least two writes to the physical drives (one write for the data and one for the parity), write performance is worse than a single drive[23]
5.6.2.1.1.4. Nested RAID Levels
- RAID 1+0
- RAID 5+0
- RAID 5+1
- Order matters -- The order in which RAID levels are nested can have a large impact on reliability. In other words, RAID 1+0 and RAID 0+1 are not the same.
- Costs can be high -- If there is any disadvantage common to all nested RAID implementations, it is one of cost; for example, the smallest possible RAID 5+1 array consists of six disk drives (and even more drives are required for larger arrays).
5.6.2.1.2. RAID Implementations
- Dividing incoming I/O requests to the individual disks in the array
- For RAID 5, calculating parity and writing it to the appropriate drive in the array
- Monitoring the individual disks in the array and taking the appropriate action should one fail
- Controlling the rebuilding of an individual disk in the array, when that disk has been replaced or repaired
- Providing a means to allow administrators to maintain the array (removing and adding drives, initiating and halting rebuilds, etc.)
5.6.2.1.2.1. Hardware RAID
- Specialized utility programs that run as applications under the host operating system, presenting a software interface to the controller card
- An on-board interface using a serial port that is accessed using a terminal emulator
- A BIOS-like interface that is only accessible during the system's power-up testing
5.6.2.1.2.2. Software RAID
5.6.3. Logical Volume Management
5.6.3.1. Physical Storage Grouping
5.6.3.2. Logical Volume Resizing
5.6.3.3. Data Migration
5.6.3.4. With LVM, Why Use RAID?
5.7. Storage Management Day-to-Day
- Monitoring free space
- Disk quota issues
- File-related issues
- Directory-related issues
- Backup-related issues
- Performance-related issues
- Adding/removing storage
5.7.1. Monitoring Free Space
- Excessive usage by a user
- Excessive usage by an application
- Normal growth in usage
5.7.1.1. Excessive Usage by a User
- Some people are very frugal in their storage usage and never leave any unneeded files hanging around.
- Some people never seem to find the time to get rid of files that are no longer needed.
5.7.1.1.1. Handling a User's Excessive Usage
- Provide temporary space
- Make archival backups
- Give up
Warning
Note
5.7.1.2. Excessive Usage by an Application
- Enhancements in the application's functionality require more storage
- An increase in the number of users using the application
- The application fails to clean up after itself, leaving no-longer-needed temporary files on disk
- The application is broken, and the bug is causing it to use more storage than it should
5.7.1.3. Normal Growth in Usage
5.7.2. Disk Quota Issues
5.7.3. File-Related Issues
- File Access
- File Sharing
5.7.3.1. File Access
- User #1 makes the necessary changes to allow user #2 to access the file wherever it currently exists.
- A file exchange area is created for such purposes; user #1 places a copy of the file there, which can then be copied by user #2.
- User #1 uses email to give user #2 a copy of the file.
5.7.4. Adding/Removing Storage
Note
5.7.4.1. Adding Storage
- Installing the hardware
- Partitioning
- Formatting the partition(s)
- Updating system configuration
- Modifying backup schedule
5.7.4.1.1. Installing the Hardware
Note
5.7.4.1.1.1. Adding ATA Disk Drives
- There is a channel with only one disk drive connected to it
- There is a channel with no disk drive connected to it
- There is no space available
- Acquire an ATA controller card, and install it
- Replace one of the installed disk drives with the newer, larger one
- Write the data to a backup device and restore it after installing the new disk drive
- Use your network to copy the data to another system with sufficient free space, restoring the data after installing the new disk drive
- Use the space physically occupied by a third disk drive by:
- Temporarily removing the third disk drive
- Temporarily installing the new disk drive in its place
- Copying the data to the new disk drive
- Removing the old disk drive
- Replacing it with the new disk drive
- Reinstalling the temporarily-removed third disk drive
- Temporarily install the original disk drive and the new disk drive in another computer, copy the data to the new disk drive, and then install the new disk drive in the original computer
5.7.4.1.1.2. Adding SCSI Disk Drives
- Narrow (8-bit) SCSI bus -- 7 devices (plus controller)
- Wide (16-bit) SCSI bus -- 15 devices (plus controller)
- There is a bus with less than the maximum number of disk drives connected to it
- There is a bus with no disk drives connected to it
- There is no space available on any bus
- Acquire and install a SCSI controller card
- Replace one of the installed disk drives with the new, larger one
- Write the data to a backup device, and restore it after installing the new disk drive
- Use your network to copy the data to another system with sufficient free space, and restore after installing the new disk drive
- Use the space physically occupied by a third disk drive by:
- Temporarily removing the third disk drive
- Temporarily installing the new disk drive in its place
- Copying the data to the new disk drive
- Removing the old disk drive
- Replacing it with the new disk drive
- Reinstalling the temporarily-removed third disk drive
- Temporarily install the original disk drive and the new disk drive in another computer, copy the data to the new disk drive, and then install the new disk drive in the original computer
5.7.4.1.2. Partitioning
- Select the new disk drive
- View the disk drive's current partition table, to ensure that the disk drive to be partitioned is, in fact, the correct one
- Delete any unwanted partitions that may already be present on the new disk drive
- Create the new partition(s), being sure to specify the desired size and partition type
- Save your changes and exit the partitioning program
Warning
5.7.4.1.3. Formatting the Partition(s)
5.7.4.1.4. Updating System Configuration
5.7.4.1.5. Modifying the Backup Schedule
- Consider what the optimal backup frequency should be
- Determine what backup style would be most appropriate (full backups only, full with incrementals, full with differentials, etc.)
- Consider the impact of the additional storage on your backup media usage, particularly as it starts to fill up
- Judge whether the additional backup could cause the backups to take too long and start using time outside of your alloted backup window
- Make sure that these changes are communicated to the people that need to know (other system administrators, operations personnel, etc.)
5.7.4.2. Removing Storage
- Move any data to be saved off the disk drive
- Modify the backup schedule so that the disk drive is no longer backed up
- Update the system configuration
- Erase the contents of the disk drive
- Remove the disk drive
5.7.4.2.1. Moving Data Off the Disk Drive
Note
5.7.4.2.2. Erase the Contents of the Disk Drive
Important
5.8. A Word About Backups…
5.9. Red Hat Enterprise Linux-Specific Information
5.9.1. Device Naming Conventions
Note
5.9.1.1. Device Files
/dev/
directory. The format for each file name depends on several aspects of the actual hardware and how it has been configured. The important points are as follows:
- Device type
- Unit
- Partition
5.9.1.1.1. Device Type
sd
-- The device is SCSI-basedhd
-- The device is ATA-based
5.9.1.1.2. Unit
hda
or sda
.
Note
sda
through sdz
, the next 26 would be named sdaa
through sdaz
, and so on.
5.9.1.1.3. Partition
/dev/hda1
-- The first partition on the first ATA drive/dev/sdb12
-- The twelfth partition on the second SCSI drive/dev/sdad4
-- The fourth partition on the thirtieth SCSI drive
5.9.1.1.4. Whole-Device Access
/dev/hdc
-- The entire third ATA device/dev/sdb
-- The entire second SCSI device
5.9.1.2. Alternatives to Device File Names
- The system administrator adds a new SCSI controller so that two new SCSI drives can be added to the system (the existing SCSI bus is completely full)
- The original SCSI drives (including the first drive on the bus:
/dev/sda
) are not changed in any way - The system is rebooted
- The SCSI drive formerly known as
/dev/sda
now has a new name, because the first SCSI drive on the new controller is now/dev/sda
5.9.1.2.1. File System Labels
5.9.1.2.2. Using devlabel
devlabel
software attempts to address the device naming issue in a different manner than file system labels. The devlabel
software is run by Red Hat Enterprise Linux whenever the system reboots (and whenever hotpluggable devices are inserted or removed).
devlabel
runs, it reads its configuration file (/etc/sysconfig/devlabel
) to obtain the list of devices for which it is responsible. For each device on the list, there is a symbolic link (chosen by the system administrator) and the device's UUID (Universal Unique IDentifier).
devlabel
command makes sure the symbolic link always refers to the originally-specified device -- even if that device's name has changed. In this way, a system administrator can configure a system to refer to /dev/projdisk
instead of /dev/sda12
, for example.
devlabel
must only search the system for the matching UUID and update the symbolic link appropriately.
devlabel
, refer to the System Administrators Guide.
5.9.2. File System Basics
- EXT2
- EXT3
- NFS
- ISO 9660
- MSDOS
- VFAT
5.9.2.1. EXT2
5.9.2.2. EXT3
5.9.2.3. ISO 9660
- CD-ROMs
- Files (usually referred to as ISO images) containing complete ISO 9660 file systems, meant to be written to CD-R or CD-RW media
- Rock Ridge -- Uses some fields undefined in ISO 9660 to provide support for features such as long mixed-case file names, symbolic links, and nested directories (in other words, directories that can themselves contain other directories)
- Joliet -- An extension of the ISO 9660 standard, developed by Microsoft to allow CD-ROMs to contain long file names, using the Unicode character set
5.9.2.4. MSDOS
5.9.3. Mounting File Systems
- A means of uniquely identifying the desired disk drive and partition, such as device file name, file system label, or
devlabel
-managed symbolic link - A directory under which the mounted file system is to be made available (otherwise known as a mount point)
5.9.3.1. Mount Points
foo
in its root directory; the full path to the directory would be /foo/
. Next, assume that this system has a partition that is to be mounted, and that the partition's mount point is to be /foo/
. If that partition had a file by the name of bar.txt
in its top-level directory, after the partition was mounted you could access the file with the following full file specification:
/foo/bar.txt
/foo/
directory will be read from or written to that partition.
/home/
-- that is because the login directories for all user accounts are normally located under /home/
. If /home/
is used as a mount point, all users' files are written to a dedicated partition and will not fill up the operating system's file system.
Note
5.9.3.2. Seeing What is Mounted
- Viewing
/etc/mtab
- Viewing
/proc/mounts
- Issuing the
df
command
5.9.3.2.1. Viewing /etc/mtab
/etc/mtab
is a normal file that is updated by the mount
program whenever file systems are mounted or unmounted. Here is a sample /etc/mtab
:
/dev/sda3 / ext3 rw 0 0 none /proc proc rw 0 0 usbdevfs /proc/bus/usb usbdevfs rw 0 0 /dev/sda1 /boot ext3 rw 0 0 none /dev/pts devpts rw,gid=5,mode=620 0 0 /dev/sda4 /home ext3 rw 0 0 none /dev/shm tmpfs rw 0 0 none /proc/sys/fs/binfmt_misc binfmt_misc rw 0 0
Note
/etc/mtab
file is meant to be used to display the status of currently-mounted file systems only. It should not be manually modified.
- The device specification
- The mount point
- The file system type
- Whether the file system is mounted read-only (
ro
) or read-write (rw
), along with any other mount options - Two unused fields with zeros in them (for compatibility with
/etc/fstab
[24])
5.9.3.2.2. Viewing /proc/mounts
/proc/mounts
file is part of the proc virtual file system. As with the other files under /proc/
, the mounts
"file" does not exist on any disk drive in your Red Hat Enterprise Linux system.
cat /proc/mounts
, we can view the status of all mounted file systems:
rootfs / rootfs rw 0 0 /dev/root / ext3 rw 0 0 /proc /proc proc rw 0 0 usbdevfs /proc/bus/usb usbdevfs rw 0 0 /dev/sda1 /boot ext3 rw 0 0 none /dev/pts devpts rw 0 0 /dev/sda4 /home ext3 rw 0 0 none /dev/shm tmpfs rw 0 0 none /proc/sys/fs/binfmt_misc binfmt_misc rw 0 0
/proc/mounts
is very similar to that of /etc/mtab
. There are a number of file systems mounted that have nothing to do with disk drives. Among these are the /proc/
file system itself (along with two other file systems mounted under /proc/
), pseudo-ttys, and shared memory.
/proc/mounts
is the best way to be 100% sure of seeing what is mounted on your Red Hat Enterprise Linux system, as the kernel is providing this information. Other methods can, under rare circumstances, be inaccurate.
5.9.3.2.3. Issuing the df
Command
/etc/mtab
or /proc/mounts
lets you know what file systems are currently mounted, it does little beyond that. Most of the time you are more interested in one particular aspect of the file systems that are currently mounted -- the amount of free space on them.
df
command. Here is some sample output from df
:
Filesystem 1k-blocks Used Available Use% Mounted on /dev/sda3 8428196 4280980 3719084 54% / /dev/sda1 124427 18815 99188 16% /boot /dev/sda4 8428196 4094232 3905832 52% /home none 644600 0 644600 0% /dev/shm
/etc/mtab
and /proc/mount
are immediately obvious:
- An easy-to-read heading is displayed
- With the exception of the shared memory file system, only disk-based file systems are shown
- Total size, used space, free space, and percentage in use figures are displayed
df
it is very easy to see where the problem lies.
5.9.4. Network-Accessible Storage Under Red Hat Enterprise Linux
- NFS
- SMB
5.9.4.1. NFS
/etc/exports
. For more information, see the exports(5)
man page and the System Administrators Guide.
5.9.4.2. SMB
5.9.5. Mounting File Systems Automatically with /etc/fstab
/etc/fstab
file. This file is used to control what file systems are mounted when the system boots, as well as to supply default values for other file systems that may be mounted manually from time to time. Here is a sample /etc/fstab
file:
LABEL=/ / ext3 defaults 1 1 /dev/sda1 /boot ext3 defaults 1 2 /dev/cdrom /mnt/cdrom iso9660 noauto,owner,kudzu,ro 0 0 /dev/homedisk /home ext3 defaults 1 2 /dev/sda2 swap swap defaults 0 0
- File system specifier -- For disk-based file systems, either a device file name (
/dev/sda1
), a file system label specification (LABEL=/
), or adevlabel
-managed symbolic link (/dev/homedisk
) - Mount point -- Except for swap partitions, this field specifies the mount point to be used when the file system is mounted (
/boot
) - File system type -- The type of file system present on the specified device (note that
auto
may be specified to select automatic detection of the file system to be mounted, which is handy for removable media units such as diskette drives) - Mount options -- A comma-separated list of options that can be used to control
mount
's behavior (noauto,owner,kudzu
) - Dump frequency -- If the
dump
backup utility is used, the number in this field controlsdump
's handling of the specified file system - File system check order -- Controls the order in which the file system checker
fsck
checks the integrity of the file systems
5.9.6. Adding/Removing Storage
5.9.6.1. Adding Storage
- Partitioning
- Formatting the partition(s)
- Updating
/etc/fstab
5.9.6.1.1. Partitioning
- Using the command-line
fdisk
utility program - Using
parted
, another command-line utility program
fdisk
are included:
- Select the new disk drive (the drive's name can be identified by following the device naming conventions outlined in Section 5.9.1, “Device Naming Conventions”). Using
fdisk
, this is done by including the device name when you startfdisk
:fdisk /dev/hda
- View the disk drive's partition table, to ensure that the disk drive to be partitioned is, in fact, the correct one. In our example,
fdisk
displays the partition table by using thep
command:Command (m for help):
p
Disk /dev/hda: 255 heads, 63 sectors, 1244 cylinders Units = cylinders of 16065 * 512 bytes Device Boot Start End Blocks Id System /dev/hda1 * 1 17 136521 83 Linux /dev/hda2 18 83 530145 82 Linux swap /dev/hda3 84 475 3148740 83 Linux /dev/hda4 476 1244 6176992+ 83 Linux - Delete any unwanted partitions that may already be present on the new disk drive. This is done using the
d
command infdisk
:Command (m for help):
d
Partition number (1-4):1
The process would be repeated for all unneeded partitions present on the disk drive. - Create the new partition(s), being sure to specify the desired size and file system type. Using
fdisk
, this is a two-step process -- first, creating the partition (using then
command):Command (m for help):
n
Command action e extended p primary partition (1-4)p
Partition number (1-4):1
First cylinder (1-767):1
Last cylinder or +size or +sizeM or +sizeK:+512M
Second, by setting the file system type (using thet
command):Command (m for help):
t
Partition number (1-4):1
Hex code (type L to list codes):82
Partition type 82 represents a Linux swap partition. - Save your changes and exit the partitioning program. This is done in
fdisk
by using thew
command:Command (m for help):
w
Warning
5.9.6.1.2. Formatting the Partition(s)
mkfs
utility program. However, mkfs
does not actually do the work of writing the file-system-specific information onto a disk drive; instead it passes control to one of several other programs that actually create the file system.
mkfs.<fstype>
man page for the file system you have selected. For example, look at the mkfs.ext3
man page to see the options available to you when creating a new ext3 file system. In general, the mkfs.<fstype>
programs provide reasonable defaults for most configurations; however here are some of the options that system administrators most commonly change:
- Setting a volume label for later use in
/etc/fstab
- On very large hard disks, setting a lower percentage of space reserved for the super-user
- Setting a non-standard block size and/or bytes per inode for configurations that must support either very large or very small files
- Checking for bad blocks before formatting
5.9.6.1.3. Updating /etc/fstab
/etc/fstab
”, you must add the necessary line(s) to /etc/fstab
to ensure that the new file system(s) are mounted whenever the system reboots. Once you have updated /etc/fstab
, test your work by issuing an "incomplete" mount
, specifying only the device or mount point. Something similar to one of the following commands is sufficient:
mount /home mount /dev/hda3
/home
or /dev/hda3
with the mount point or device for your specific situation.)
/etc/fstab
entry is correct, mount
obtains the missing information from it and completes the mount operation.
/etc/fstab
is configured properly to automatically mount the new storage every time the system boots (although if you can afford a quick reboot, it would not hurt to do so -- just to be sure).
5.9.6.2. Removing Storage
- Remove the disk drive's partitions from
/etc/fstab
- Unmount the disk drive's active partitions
- Erase the contents of the disk drive
5.9.6.2.1. Remove the Disk Drive's Partitions From /etc/fstab
/etc/fstab
file. You can identify the proper lines by one of the following methods:
- Matching the partition's mount point against the directories in the second column of
/etc/fstab
- Matching the device's file name against the file names in the first column of
/etc/fstab
Note
/etc/fstab
that identify swap partitions on the disk drive to be removed; they can be easily overlooked.
5.9.6.2.2. Terminating Access With umount
umount
command. If a swap partition exists on the disk drive, it must be either be deactivated with the swapoff
command, or the system should be rebooted.
umount
command requires you to specify either the device file name, or the partition's mount point:
umount /dev/hda2 umount /home
/etc/fstab
entry.
swapoff
to disable swapping to a partition, you must specify the device file name representing the swap partition:
swapoff /dev/hda4
swapoff
, boot into rescue mode and remove the partition's /etc/fstab
entry.
5.9.6.2.3. Erase the Contents of the Disk Drive
badblocks -ws <device-name>
<device-name>
represents the file name of the disk drive you wish to erase, excluding the partition number. For example, /dev/hdb
for the second ATA hard drive.
badblocks
runs:
Writing pattern 0xaaaaaaaa: done Reading and comparing: done Writing pattern 0x55555555: done Reading and comparing: done Writing pattern 0xffffffff: done Reading and comparing: done Writing pattern 0x00000000: done Reading and comparing: done
badblocks
is actually writing four different data patterns to every block on the disk drive. For large disk drives, this process can take a long time -- quite often several hours.
Important
rm
command. That is because when you delete a file using rm
it only marks the file as deleted -- it does not erase the contents of the file.
5.9.7. Implementing Disk Quotas
5.9.7.1. Some Background on Disk Quotas
- Per-file-system implementation
- Per-user space accounting
- Per-group space accounting
- Tracks disk block usage
- Tracks disk inode usage
- Hard limits
- Soft limits
- Grace periods
5.9.7.1.1. Per-File-System Implementation
/home/
directory was on its own file system, disk quotas could be enabled there, enforcing equitable disk usage by all users. However the root file system could be left without disk quotas, eliminating the complexity of maintaining disk quotas on a file system where only the operating system itself resides.
5.9.7.1.2. Per-User Space Accounting
5.9.7.1.3. Per-Group Space Accounting
5.9.7.1.4. Tracks Disk Block Usage
5.9.7.1.5. Tracks Disk Inode Usage
5.9.7.1.6. Hard Limits
5.9.7.1.7. Soft Limits
5.9.7.1.8. Grace Periods
5.9.7.2. Enabling Disk Quotas
Note
- Modifying
/etc/fstab
- Remounting the file system(s)
- Running
quotacheck
- Assigning quotas
/etc/fstab
file controls the mounting of file systems under Red Hat Enterprise Linux. Because disk quotas are implemented on a per-file-system basis, there are two options -- usrquota
and grpquota
-- that must be added to that file to enable disk quotas.
usrquota
option enables user-based disk quotas, while the grpquota
option enables group-based quotas. One or both of these options may be enabled by placing them in the options field for the desired file system.
quotacheck
command is used to create the disk quota files and to collect the current usage information from already existing files. The disk quota files (named aquota.user
and aquota.group
for user- and group-based quotas) contain the necessary quota-related information and reside in the file system's root directory.
edquota
command is used.
edquota
command. Here is an example:
Disk quotas for user matt (uid 500): Filesystem blocks soft hard inodes soft hard /dev/md3 6618000 0 0 17397 0 0
Disk quotas for user matt (uid 500): Filesystem blocks soft hard inodes soft hard /dev/md3 6618000 6900000 7000000 17397 0 0
Note
edquota
program can also be used to set the per-file-system grace period by using the -t
option.
5.9.7.3. Managing Disk Quotas
- Generating disk usage reports at regular intervals (and following up with users that seem to be having trouble effectively managing their allocated disk space)
- Making sure that the disk quotas remain accurate
repquota
utility program. Using the command repquota /home
produces this output:
*** Report for user quotas on device /dev/md3 Block grace time: 7days; Inode grace time: 7days Block limits File limits User used soft hard grace used soft hard grace ---------------------------------------------------------------------- root -- 32836 0 0 4 0 0 matt -- 6618000 6900000 7000000 17397 0 0
repquota
can be found in the System Administrators Guide, in the chapter on disk quotas.
quotacheck
. However, many system administrators recommend running quotacheck
on a regular basis, even if the system has not crashed.
quotacheck
when enabling disk quotas.
quotacheck
command:
quotacheck -avug
quotacheck
on a regular basis is to use cron
. Most system administrators run quotacheck
once a week, though there may be valid reasons to pick a longer or shorter interval, depending on your specific conditions.
5.9.8. Creating RAID Arrays
- While installing Red Hat Enterprise Linux
- After Red Hat Enterprise Linux has been installed
5.9.8.1. While Installing Red Hat Enterprise Linux
Note
5.9.8.2. After Red Hat Enterprise Linux Has Been Installed
mdadm
program (refer to man mdadm
for more information).
mdadm --create /dev/md0 --level=1 --raid-devices=2 /dev/hd[bc]1 mdadm --detail --scan > /dev/mdadm.conf
/dev/md0
is now ready to be formatted and mounted. The process at this point is no different than for formatting and mounting a single disk drive.
5.9.9. Day to Day Management of RAID Arrays
5.9.9.1. Checking Array Status With /proc/mdstat
/proc/mdstat
is the easiest way to check on the status of all RAID arrays on a particular system. Here is a sample mdstat
(view with the command cat /proc/mdstat
):
Personalities : [raid1] read_ahead 1024 sectors md1 : active raid1 hda3[0] hdc3[1] 522048 blocks [2/2] [UU] md0 : active raid1 hda2[0] hdc2[1] 4192896 blocks [2/2] [UU] md2 : active raid1 hda1[0] hdc1[1] 128384 blocks [2/2] [UU] unused devices: <none>
/proc/mdstat
and contains the following information:
- The RAID array device name (not including the
/dev/
part) - The status of the RAID array
- The RAID array's RAID level
- The physical partitions that currently make up the array (followed by the partition's array unit number)
- The size of the array
- The number of configured devices versus the number of operative devices in the array
- The status of each configured device in the array (
U
meaning the device is OK, and_
indicating that the device has failed)
5.9.9.2. Rebuilding a RAID array
/proc/mdstat
show that a problem exists with one of the RAID arrays, you can rebuild it by performing the following steps:
- Remove the disk from the raid array.
mdadm --manage /dev/md0 -r /dev/sdc3
- Remove the disk from the system.
- Using
fdisk
, replace the removed disk and re-format the replacement disk. - Add the new disk back to the RAID array.
mdadm --manage /dev/md0 -a /dev/sdc3
- To restore the disk, perform a "software fail" the previous spare slice:
mdadm --manage --set-faulty /dev/md0 /dev/sdc3
- The system will now attempt to rebuild the array on the replaced disk. Use the following command to monitor status:
watch -n 1 cat /proc/mdstat
- When the array is finished rebuilding, remove and then re-add the software-failed disk back to the array.
mdadm --manage /dev/md0 -r /dev/sdc3
mdadm --manage /dev/md0 -a /dev/sdc3
- Check the array.
mdadm --detail /dev/md0
5.9.10. Logical Volume Management
5.10. Additional Resources
5.10.1. Installed Documentation
exports(5)
man page -- Learn about the NFS configuration file format.fstab(5)
man page -- Learn about the file system information configuration file format.swapoff(8)
man page -- Learn how to disable swap partitions.df(1)
man page -- Learn how to display disk space usage on mounted file systems.fdisk(8)
man page -- Learn about this partition table maintenance utility program.mkfs(8)
,mke2fs(8)
man pages -- Learn about these file system creation utility programs.badblocks(8)
man page -- Learn how to test a device for bad blocks.quotacheck(8)
man page -- Learn how to verify block and inode usage for users and groups and optionally creates disk quota files.edquota(8)
man page -- Learn about this disk quota maintenance utility program.repquota(8)
man page -- Learn about this disk quota reporting utility program.raidtab(5)
man page -- Learn about the software RAID configuration file format.mdadm(8)
man page -- Learn about this software RAID array management utility program.lvm(8)
man page -- Learn about Logical Volume Management.devlabel(8)
man page -- Learn about persistent storage device access.
5.10.2. Useful Websites
- http://www.pcguide.com/ -- A good site for all kinds of information on various storage technologies.
- http://www.tldp.org/ -- The Linux Documentation Project has HOWTOs and mini-HOWTOs that provide good overviews of storage technologies as they relate to Linux.
5.10.3. Related Books
- The Installation Guide; Red Hat, Inc -- Contains instructions on partitioning hard drives during the Red Hat Enterprise Linux installation process as well as an overview of disk partitions.
- The Reference Guide; Red Hat, Inc -- Contains detailed information on the directory structure used in Red Hat Enterprise Linux and an overview of NFS.
- The System Administrators Guide; Red Hat, Inc -- Includes chapters on file systems, RAID, LVM,
devlabel
, partitioning, disk quotas, NFS, and Samba. - Linux System Administration: A User's Guide by Marcel Gagne; Addison Wesley Professional -- Contains information on user and group permissions, file systems and disk quota, NFS and Samba.
- Linux Performance Tuning and Capacity Planning by Jason R. Fink and Matthew D. Sherer; Sams -- Contains information on disk, RAID, and NFS performance.
- Linux Administration Handbook by Evi Nemeth, Garth Snyder, and Trent R. Hein; Prentice Hall -- Contains information on file systems, handling disk drives, NFS, and Samba.
/etc/fstab
” for more information.
Chapter 6. Managing User Accounts and Resource Access
6.1. Managing User Accounts
6.1.1. The Username
6.1.1.1. Naming Conventions
- The size of your organization
- The structure of your organization
- The nature of your organization
- First name (john, paul, george, etc.)
- Last name (smith, jones, brown, etc.)
- First initial, followed by last name (jsmith, pjones, gbrown, etc.)
- Last name, followed by department code (smith029, jones454, brown191, etc.)
Note
6.1.1.1.1. Dealing with Collisions
- Adding sequence numbers to the colliding username (smith, smith1, smith2, etc.)
- Adding user-specific data to the colliding username (smith, esmith, eksmith, etc.)
- Adding organizational information to the colliding username (smith, smith029, smith454, etc.)
6.1.1.2. Dealing with Name Changes
- Make the change to all affected systems
- Keep any underlying user identification constant
- Change the ownership of all files and other user-specific resources (if necessary)
- Handle email-related issues
Important
- The new user never receives any email — it all goes to the original user.
- The original user suddenly stops receiving any email — it all goes to the new user.
6.1.2. Passwords
- The secrecy of the password
- The resistance of the password to guessing
- The resistance of the password to a brute-force attack
- The system administrator can create passwords for all users.
- The system administrator can let the users create their own passwords, while verifying that the passwords are acceptably strong.
6.1.2.1. Weak Passwords
- It is secret
- It is resistant to being guessed
- It is resistant to a brute-force attack
6.1.2.1.1. Short Passwords
Password Length | Potential Passwords |
---|---|
1 | 26 |
2 | 676 |
3 | 17,576 |
4 | 456,976 |
5 | 11,881,376 |
6 | 308,915,776 |
Warning
6.1.2.1.2. Limited Character Set
6.1.2.1.3. Recognizable Words
Note
6.1.2.1.4. Personal Information
6.1.2.1.5. Simple Word Tricks
- drowssaPdaB1
- R3allyP00r
6.1.2.1.6. The Same Password for Multiple Systems
6.1.2.1.7. Passwords on Paper
- In a desk drawer (locked or unlocked)
- Below a keyboard
- In a wallet
- Taped to the side of a monitor
6.1.2.2. Strong Passwords
6.1.2.2.1. Longer Passwords
6.1.2.2.2. Expanded Character Set
- t1Te-Bf,te
- Lb@lbhom
6.1.2.2.3. Memorable
Note
6.1.2.3. Password Aging
- User convenience
- Security
6.1.3. Access Control Information
- System-wide user-specific identification
- System-wide group-specific identification
- Lists of additional groups/capabilities to which the user is a member
- Default access information to be applied to all user-created files and resources
6.1.4. Managing Accounts and Resource Access Day-to-Day
6.1.4.1. New Hires
- Create a procedure where your organization's personnel department notifies you when a new person arrives.
- Create a form that the person's supervisor can fill out and use to request an account for the new person.
6.1.4.2. Terminations
- Disabling the user's access to all systems and related resources (usually by changing/locking the user's password)
- Backing up the user's files, in case they contain something that is needed at a later time
- Coordinating access to the user's files by the user's manager
Note
6.1.4.3. Job Changes
- You
- The user's original manager
- The user's new manager
6.2. Managing User Resources
- Who can access shared data
- Where users access this data
- What barriers are in place to prevent abuse of resources
6.2.1. Who Can Access Shared Data
6.2.1.1. Shared Groups and Data
accounts
, this information can then be placed in a shared directory (owned by the accounts
group) with group read and write permissions on the directory.
6.2.1.2. Determining Group Structure
- What groups to create
- Who to put in a given group
- What type of permissions should these shared resources have
finance
, and make all finance personnel members of that group. If the financial information is too sensitive for the company at large, but vital for senior officials within the organization, then grant the senior officials group-level permission to access the directories and data used by the finance department by adding all senior officials to the finance
group.
6.2.2. Where Users Access Shared Data
6.2.2.1. Global Ownership Issues
6.2.2.2. Home Directories
6.2.3. What Barriers Are in Place To Prevent Abuse of Resources
6.3. Red Hat Enterprise Linux-Specific Information
6.3.1. User Accounts, Groups, and Permissions
r
— Indicates that a given category of user can read a file.w
— Indicates that a given category of user can write to a file.x
— Indicates that a given category of user can execute the contents of a file.
-
) indicates that no access is permitted.
- owner — The owner of the file or application.
- group — The group that owns the file or application.
- everyone — All users with access to the system.
ls -l
. For example, if the user juan
creates an executable file named foo
, the output of the command ls -l foo
would appear like this:
-rwxrwxr-x 1 juan juan 0 Sep 26 12:25 foo
rwx
. This first set of symbols define owner access — in this example, the owner juan
has full access, and may read, write, and execute the file. The next set of rwx
symbols define group access (again, with full access), while the last set of symbols define the types of access permitted for all other users. Here, all other users may read and execute the file, but may not modify it in any way.
juan
launches an application, the application runs using user juan
's context. However, in some cases the application may need a more privileged level of access in order to accomplish a task. Such applications include those that edit system settings or log in users. For this reason, special permissions have been created.
- setuid — used only for binary files (applications), this permission indicates that the file is to be executed with the permissions of the owner of the file, and not with the permissions of the user executing the file (which is the case without setuid). This is indicated by the character
s
in the place of thex
in the owner category. If the owner of the file does not have execute permissions, a capitalS
reflects this fact. - setgid — used primarily for binary files (applications), this permission indicates that the file is executed with the permissions of the group owning the file and not with the permissions of the group of the user executing the file (which is the case without setgid).If applied to a directory, all files created within the directory are owned by the group owning the directory, and not by the group of the user creating the file. The setgid permission is indicated by the character
s
in place of thex
in the group category. If the group owning the file or directory does not have execute permissions, a capitalS
reflects this fact. - sticky bit — used primarily on directories, this bit dictates that a file created in the directory can be removed only by the user that created the file. It is indicated by the character
t
in place of thex
in the everyone category. If the everyone category does not have execute permissions, theT
is capitalized to reflect this fact.Under Red Hat Enterprise Linux, the sticky bit is set by default on the/tmp/
directory for exactly this reason.
6.3.1.1. Usernames and UIDs, Groups and GIDs
Important
/etc/passwd
and /etc/group
files on a file server and a user's workstation differ in the UIDs or GIDs they contain, improper application of permissions can lead to security issues.
juan
has a UID of 500 on a desktop computer, files juan
creates on a file server will be created with owner UID 500. However, if user bob
logs in locally to the file server (or even some other computer), and bob
's account also has a UID of 500, bob
will have full access to juan
's files, and vice versa.
root
user, and are treated specially by Red Hat Enterprise Linux — all access is automatically granted.
6.3.2. Files Controlling User Accounts and Groups
/etc/
directory. When a system administrator creates new user accounts, these files must either be edited manually or applications must be used to make the necessary changes.
/etc/
directory that store user and group information under Red Hat Enterprise Linux.
6.3.2.1. /etc/passwd
/etc/passwd
file is world-readable and contains a list of users, each on a separate line. On each line is a colon delimited list containing the following information:
- Username — The name the user types when logging into the system.
- Password — Contains the encrypted password (or an
x
if shadow passwords are being used — more on this later). - User ID (UID) — The numerical equivalent of the username which is referenced by the system and applications when determining access privileges.
- Group ID (GID) — The numerical equivalent of the primary group name which is referenced by the system and applications when determining access privileges.
- GECOS — Named for historical reasons, the GECOS[25] field is optional and is used to store extra information (such as the user's full name). Multiple entries can be stored here in a comma delimited list. Utilities such as
finger
access this field to provide additional user information. - Home directory — The absolute path to the user's home directory, such as
/home/juan/
. - Shell — The program automatically launched whenever a user logs in. This is usually a command interpreter (often called a shell). Under Red Hat Enterprise Linux, the default value is
/bin/bash
. If this field is left blank,/bin/sh
is used. If it is set to a non-existent file, then the user will be unable to log into the system.
/etc/passwd
entry:
root:x:0:0:root:/root:/bin/bash
root
user has a shadow password, as well as a UID and GID of 0. The root
user has /root/
as a home directory, and uses /bin/bash
for a shell.
/etc/passwd
, see the passwd(5)
man page.
6.3.2.2. /etc/shadow
/etc/passwd
file must be world-readable (the main reason being that this file is used to perform the translation from UID to username), there is a risk involved in storing everyone's password in /etc/passwd
. True, the passwords are encrypted. However, it is possible to perform attacks against passwords if the encrypted password is available.
/etc/passwd
can be obtained by an attacker, an attack that can be carried out in secret becomes possible. Instead of risking detection by having to attempt an actual login with every potential password generated by password-cracker, an attacker can use a password cracker in the following manner:
- A password-cracker generates potential passwords
- Each potential password is then encrypted using the same algorithm as the system
- The encrypted potential password is then compared against the encrypted passwords in
/etc/passwd
/etc/shadow
file is readable only by the root user and contains password (and optional password aging information) for each user. As in the /etc/passwd
file, each user's information is on a separate line. Each of these lines is a colon delimited list including the following information:
- Username — The name the user types when logging into the system. This allows the login application to retrieve the user's password (and related information).
- Encrypted password — The 13 to 24 character password. The password is encrypted using either the
crypt(3)
library function or the md5 hash algorithm. In this field, values other than a validly-formatted encrypted or hashed password are used to control user logins and to show the password status. For example, if the value is!
or*
, the account is locked and the user is not allowed to log in. If the value is!!
a password has never been set before (and the user, not having set a password, will not be able to log in). - Date password last changed — The number of days since January 1, 1970 (also called the epoch) that the password was last changed. This information is used in conjunction with the password aging fields that follow.
- Number of days before password can be changed — The minimum number of days that must pass before the password can be changed.
- Number of days before a password change is required — The number of days that must pass before the password must be changed.
- Number of days warning before password change — The number of days before password expiration during which the user is warned of the impending expiration.
- Number of days before the account is disabled — The number of days after a password expires before the account will be disabled.
- Date since the account has been disabled — The date (stored as the number of days since the epoch) since the user account has been disabled.
- A reserved field — A field that is ignored in Red Hat Enterprise Linux.
/etc/shadow
:
juan:$1$.QKDPc5E$SWlkjRWexrXYgc98F.:12825:0:90:5:30:13096:
juan
:
- The password was last changed February 11, 2005
- There is no minimum amount of time required before the password can be changed
- The password must be changed every 90 days
- The user will get a warning five days before the password must be changed
- The account will be disabled 30 days after the password expires if no login attempt is made
- The account will expire on November 9,2005
/etc/shadow
file, see the shadow(5)
man page.
6.3.2.3. /etc/group
/etc/group
file is world-readable and contains a list of groups, each on a separate line. Each line is a four field, colon delimited list including the following information:
- Group name — The name of the group. Used by various utility programs as a human-readable identifier for the group.
- Group password — If set, this allows users that are not part of the group to join the group by using the
newgrp
command and typing the password stored here. If a lower casex
is in this field, then shadow group passwords are being used. - Group ID (GID) — The numerical equivalent of the group name. It is used by the operating system and applications when determining access privileges.
- Member list — A comma delimited list of the users belonging to the group.
/etc/group
:
general:x:502:juan,shelley,bob
general
group is using shadow passwords, has a GID of 502, and that juan
, shelley
, and bob
are members.
/etc/group
, see the group(5)
man page.
6.3.2.4. /etc/gshadow
/etc/gshadow
file is readable only by the root user and contains an encrypted password for each group, as well as group membership and administrator information. Just as in the /etc/group
file, each group's information is on a separate line. Each of these lines is a colon delimited list including the following information:
- Group name — The name of the group. Used by various utility programs as a human-readable identifier for the group.
- Encrypted password — The encrypted password for the group. If set, non-members of the group can join the group by typing the password for that group using the
newgrp
command. If the value of this field is!
, then no user is allowed to access the group using thenewgrp
command. A value of!!
is treated the same as a value of!
— however, it also indicates that a password has never been set before. If the value is null, only group members can log into the group. - Group administrators — Group members listed here (in a comma delimited list) can add or remove group members using the
gpasswd
command. - Group members — Group members listed here (in a comma delimited list) are regular, non-administrative members of the group.
/etc/gshadow
:
general:!!:shelley:juan,bob
general
group has no password and does not allow non-members to join using the newgrp
command. In addition, shelley
is a group administrator, and juan
and bob
are regular, non-administrative members.
6.3.3. User Account and Group Applications
- The graphical User Management Tool application
- A suite of command line tools
Application | Function |
---|---|
/usr/sbin/useradd | Adds user accounts. This tool is also used to specify primary and secondary group membership. |
/usr/sbin/userdel | Deletes user accounts. |
/usr/sbin/usermod | Edits account attributes including some functions related to password aging. For more fine-grained control, use the passwd command. usermod is also used to specify primary and secondary group membership. |
passwd | Sets passwords. Although primarily used to change a user's password, it also controls all aspects of password aging. |
/usr/sbin/chpasswd | Reads in a file consisting of username and password pairs, and updates each users' password accordingly. |
chage | Changes the user's password aging policies. The passwd command can also be used for this purpose. |
chfn | Changes the user's GECOS information. |
chsh | Changes the user's default shell. |
Application | Function |
---|---|
/usr/sbin/groupadd | Adds groups, but does not assign users to those groups. The useradd and usermod programs should then be used to assign users to a given group. |
/usr/sbin/groupdel | Deletes groups. |
/usr/sbin/groupmod | Modifies group names or GIDs, but does not change group membership. The useradd and usermod programs should be used to assign users to a given group. |
gpasswd | Changes group membership and sets passwords to allow non-group members who know the group password to join the group. It is also used to specify group administrators. |
/usr/sbin/grpck | Checks the integrity of the /etc/group and /etc/gshadow files. |
6.3.3.1. File Permission Applications
Application | Function |
---|---|
chgrp | Changes which group owns a given file. |
chmod | Changes access permissions for a given file. It is also capable of assigning special permissions. |
chown | Changes a file's ownership (and can also change group). |
6.4. Additional Resources
6.4.1. Installed Documentation
- User Management Tool menu entry — Learn how to manage user accounts and groups.
passwd(5)
man page — Learn more about the file format information for the/etc/passwd
file.group(5)
man page — Learn more about the file format information for the/etc/group
file.shadow(5)
man page — Learn more about the file format information for the/etc/shadow
file.useradd(8)
man page — Learn how to create or update user accounts.userdel(8)
man page — Learn how to delete user accounts.usermod(8)
man page — Learn how to modify user accounts.passwd(1)
man page — Learn how to update a user's password.chpasswd(8)
man page — Learn how to update many users' passwords at one time.chage(1)
man page — Learn how to change user password aging information.chfn(1)
man page — Learn how to change a user's GECOS (finger
) information.chsh(1)
man page — Learn how to change a user's login shell.groupadd(8)
man page — Learn how to create a new group.groupdel(8)
man page — Learn how to delete a group.groupmod(8)
man page — Learn how to modify a group.gpasswd(1)
man page — Learn how to administer the/etc/group
and/etc/gshadow
files.grpck(1)
man page — Learn how to verify the integrity of the/etc/group
and/etc/gshadow
files.chgrp(1)
man page — Learn how to change group-level ownership.chmod(1)
man page — Learn how to change file access permissions.chown(1)
man page — Learn how to change file owner and group.
6.4.2. Useful Websites
- http://www.bergen.org/ATC/Course/InfoTech/passwords.html — A good example of a document conveying information about password security to an organization's users.
- http://www.crypticide.org/users/alecm/ — Homepage of the author of one of the most popular password-cracking programs (Crack). You can download Crack from this page and see how many of your users have weak passwords.
- http://www.linuxpowered.com/html/editorials/file.html — a good overview of Linux file permissions.
6.4.3. Related Books
- The Security Guide; Red Hat, Inc — Provides an overview of the security-related aspects of user accounts, namely choosing strong passwords.
- The Reference Guide; Red Hat, Inc — Contains detailed information on the users and groups present in Red Hat Enterprise Linux.
- The System Administrators Guide; Red Hat, Inc — Includes a chapter on user and group configuration.
- Linux Administration Handbook by Evi Nemeth, Garth Snyder, and Trent R. Hein; Prentice Hall — Provides a chapter on user account maintenance, a section on security as it relates to user account files, and a section on file attributes and permissions.
Chapter 7. Printers and Printing
7.1. Types of Printers
7.1.1. Printing Considerations
7.1.1.1. Function
- letter -- (8 1/2" x 11")
- A4 -- (210mm x 297mm)
- JIS B5 -- (182mm x 257mm)
- legal -- (8 1/2" x 14")
7.1.1.2. Cost
7.2. Impact Printers
7.2.1. Dot-Matrix Printers
7.2.2. Daisy-Wheel Printers
7.2.3. Line Printers
7.2.4. Impact Printer Consumables
7.3. Inkjet Printers
7.3.1. Inkjet Consumables
Note
7.4. Laser Printers
7.4.1. Color Laser Printers
7.4.2. Laser Printer Consumables
7.5. Other Printer Types
- Thermal Wax Printers
- These printers are used mostly for business presentation transparencies and for color proofing (creating test documents and images for close quality inspection before sending off master documents to be printed on industrial four-color offset printers). Thermal wax printers use sheet-sized, belt driven CMYK ribbons and specially-coated paper or transparencies. The printhead contains heating elements that melt each wax color onto the paper as it is rolled through the printer.
- Dye-Sublimation Printers
- Used in organizations such as service bureaus -- where professional quality documents, pamphlets, and presentations are more important than consumables costs -- dye-sublimation (or dye-sub) printers are the workhorses of quality CMYK printing. The concepts behind dye-sub printers are similar to thermal wax printers except for the use of diffusive plastic dye film instead of colored wax. The printhead heats the colored film and vaporizes the image onto specially coated paper.Dye-sub is quite popular in the design and publishing world as well as the scientific research field, where preciseness and detail are required. Such detail and print quality comes at a price, as dye-sub printers are also known for their high costs-per-page.
- Solid Ink Printers
- Used mostly in the packaging and industrial design industries, solid ink printers are prized for their ability to print on a wide variety of paper types. Solid ink printers, as the name implies, use hardened ink sticks that that are melted and sprayed through small nozzles on the printhead. The paper is then sent through a fuser roller which further forces the ink onto the paper.The solid ink printer is ideal for prototyping and proofing new designs for product packages; as such, most service-oriented businesses would not have a need for this type of printer.
7.6. Printer Languages and Technologies
7.7. Networked Versus Local Printers
7.8. Red Hat Enterprise Linux-Specific Information
system-config-printer
. This command automatically determines whether to run the graphical or text-based version depending on whether the command is executed in the graphical desktop environment or from a text-based console.
system-config-printer-tui
from a shell prompt.
Important
/etc/printcap
file or the files in the /etc/cups/
directory. Each time the printer daemon (cups
) is started or restarted, new configuration files are dynamically created. The files are dynamically created when changes are applied with the Printer Configuration Tool as well.
- Locally-connected — a printer attached directly to the computer through a parallel or USB port.
- Networked CUPS (IPP) — a printer that can be accessed over a TCP/IP network via the Internet Printing Protocol, also known as IPP (for example, a printer attached to another Red Hat Enterprise Linux system running CUPS on the network).
- Networked UNIX (LPD) — a printer attached to a different UNIX system that can be accessed over a TCP/IP network (for example, a printer attached to another Red Hat Enterprise Linux system running LPD on the network).
- Networked Windows (SMB) — a printer attached to a different system which is sharing a printer over a SMB network (for example, a printer attached to a Microsoft Windows™ machine).
- Networked Novell (NCP) — a printer attached to a different system which uses Novell's NetWare network technology.
- Networked JetDirect — a printer connected directly to the network through HP JetDirect instead of to a computer.
Important
7.9. Additional Resources
7.9.1. Installed Documentation
lpr(1)
man page -- Learn how to print selected files on the printer of your choice.lprm(1)
man page -- Learn how to remove print jobs from a printer queue.cupsd(8)
man page -- Learn about the CUPS printer daemon.cupsd.conf(5)
man page -- Learn about the file format for the CUPS printer daemon configuration file.classes.conf(5)
man page -- Learn about the file format for the CUPS class configuration file.- Files in
/usr/share/doc/cups-<version>
-- Learn more about the CUPS printing system.
7.9.2. Useful Websites
- http://www.webopedia.com/TERM/p/printer.html -- General definitions of printers and descriptions of printer types.
- http://www.linuxprinting.org/ -- A database of documents about printing, along with a database of nearly 1000 printers compatible with Linux printing facilities.
- http://www.cups.org/ -- Documentation, FAQs, and newsgroups about CUPS.
7.9.3. Related Books
- Network Printing by Matthew Gast and Todd Radermacher; O'Reilly & Associates, Inc. -- Comprehensive information on using Linux as a print server in heterogeneous environments.
- The System Administrators Guide; Red Hat, Inc -- Includes a chapter on printer configuration.
Chapter 8. Planning for Disaster
8.1. Types of Disasters
- Hardware failures
- Software failures
- Environmental failures
- Human errors
8.1.1. Hardware Failures
8.1.1.1. Keeping Spare Hardware
- Someone on-site has the necessary skills to diagnose the problem, identify the failing hardware, and replace it.
- A replacement for the failing hardware is available.
8.1.1.1.1. Having the Skills
Note
- Is not still under warranty
- Is not under a service/maintenance contract of any kind
8.1.1.1.2. What to Stock?
- Maximum allowable downtime
- The skill required to make the repair
- Budget available for spares
- Storage space required for spares
- Other hardware that could utilize the same spares
8.1.1.1.2.1. How Much to Stock?
- Maximum allowable downtime
- Projected rate of failure
- Estimated time to replenish stock
- Budget available for spares
- Storage space required for spares
- Other hardware that could utilize the same spares
8.1.1.1.3. Spares That Are Not Spares
- Less money dedicated to "non-productive" spares
- The hardware is known to be operative
- Normal production of the lower-priority task is interrupted
- There is an exposure should the lower-priority hardware fail (leaving no spare for the higher-priority hardware)
8.1.1.2. Service Contracts
- Hours of coverage
- Response time
- Parts availability
- Available budget
- Hardware to be covered
8.1.1.2.1. Hours of Coverage
- Monday through Friday, 09:00 to 17:00
- Monday through Friday, 12/18/24 hours each day (with the start and stop times mutually agreed upon)
- Monday through Saturday (or Monday through Sunday), same times as above
8.1.1.2.1.1. Depot Service
Note
8.1.1.2.2. Response Time
Note
8.1.1.2.2.1. Zero Response Time -- Having an On-Site Technician
- Instant response to any problem
- A more proactive approach to system maintenance
8.1.1.2.3. Parts Availability
8.1.1.2.4. Available Budget
8.1.1.2.5. Hardware to be Covered
- The PC will be in use from 17:00 to 09:00 the next morning (not to mention weekends)
- A failure of this PC will be noticed, except between 09:00 and 17:00
Note
8.1.2. Software Failures
- Operating system
- Applications
8.1.2.1. Operating System Failures
- Crashes
- Hangs
8.1.2.1.1. Crashes
8.1.2.1.2. Hangs
8.1.2.2. Application Failures
8.1.2.3. Getting Help -- Software Support
- Documentation
- Self support
- Web or email support
- Telephone support
- On-site support
8.1.2.3.1. Documentation
8.1.2.3.2. Self Support
8.1.2.3.3. Web or Email Support
- Clearly describe the nature of the problem
- Include all pertinent version numbers
- Describe what you have already done in an attempt to address the problem (applied the latest patches, rebooted with a minimal configuration, etc.)
8.1.2.3.4. Telephone Support
8.1.2.3.5. On-Site Support
8.1.3. Environmental Failures
- Building integrity
- Electricity
- Air conditioning
- Weather and the outside world
8.1.3.1. Building Integrity
- Roofs can leak, allowing water into data centers.
- Various building systems (such as water, sewer, or air handling) can fail, rendering the building uninhabitable.
- Floors may have insufficient load-bearing capacity to hold the equipment you want to put in the data center.
8.1.3.2. Electricity
8.1.3.2.1. The Security of Your Power
Note
- The one servicing your area
- The one from the neighboring power company
- Damage from extreme weather conditions (ice, wind, lightning)
- Traffic accidents that damage the poles and/or transformers
- Animals straying into the wrong place and shorting out the lines
- Damage from construction workers digging in the wrong place
- Flooding
- Lightning (though much less so than above-ground lines)
8.1.3.2.2. Power Quality
- Voltage
- The voltage of the incoming power must be stable, with no voltage reductions (often called sags, droops, or brownouts) or voltage increases (often known as spikes and surges).
- Waveform
- The waveform must be a clean sine wave, with minimal THD (Total Harmonic Distortion).
- Frequency
- The frequency must be stable (most countries use a power frequency of either 50Hz or 60Hz).
- Noise
- The power must not include any RFI (Radio Frequency Interference) or EMI (Electro-Magnetic Interference) noise.
- Current
- The power must be supplied at a current rating sufficient to run the data center.
- Surge Protectors
- Surge protectors do just what their name implies -- they filter surges from the power supply. Most do nothing else, leaving equipment vulnerable to damage from other power-related problems.
- Power Conditioners
- Power conditioners attempt a more comprehensive approach; depending on the sophistication of the unit, power conditioners often can take care of most of the types of problems outlined above.
- Motor-Generator Sets
- A motor-generator set is essentially a large electric motor powered by your normal power supply. The motor is attached to a large flywheel, which is, in turn, attached to a generator. The motor turns the flywheel and generator, which generates electricity in sufficient quantities to run the data center. In this way, the data center power is electrically isolated from outside power, meaning that most power-related problems are eliminated. The flywheel also provides the ability to maintain power through short outages, as it takes several seconds for the flywheel to slow to the point at which it can no longer generate power.
- Uninterruptible Power Supplies
- Some types of Uninterruptible Power Supplies (more commonly known as a UPS) include most (if not all) of the protection features of a power conditioner[27].
8.1.3.2.3. Backup Power
Note
8.1.3.2.3.1. Providing Power For the Next Few Seconds
- Very short time to switch to backup power (known as transfer time)
- A runtime (the time that backup power will last) measured in seconds to minutes
8.1.3.2.3.2. Providing Power For the Next Few Minutes
- A transfer switch for switching from the primary power supply to the backup power supply
- A battery, for providing backup power
- An inverter, which converts the DC current from the battery into the AC current required by the data center hardware
- The offline UPS uses its inverter to generate power only when the primary power supply fails.
- The online UPS uses its inverter to generate power all the time, powering the inverter via its battery only when the primary power supply fails.
Note
8.1.3.2.3.3. Providing Power For the Next Few Hours (and Beyond)
Note
Note
8.1.3.2.4. Planning for Extended Outages
- What if there is no power to maintain environmental control in the data center?
- What if there is no power to maintain environmental control in the entire building?
- What if there is no power to operate personal workstations, the telephone system, the lights?
8.1.3.3. Heating, Ventilation, and Air Conditioning
- The air handling units (essentially large fans driven by large electric motors) can fail due to electrical overload, bearing failure, belt/pulley failure, etc.
- The cooling units (often called chillers) can lose their refrigerant due to leaks, or they can have their compressors and/or motors seize.
8.1.3.4. Weather and the Outside World
- Heavy snow and ice can prevent personnel from getting to the data center, and can even clog air conditioning condensers, resulting in elevated data center temperatures just when no one is able to get to the data center to take corrective action.
- High winds can disrupt power and communications, with extremely high winds actually doing damage to the building itself.
8.1.4. Human Errors
8.1.4.1. End-User Errors
8.1.4.1.1. Improper Use of Applications
- Files inadvertently overwritten
- Wrong data used as input to an application
- Files not clearly named and organized
- Files accidentally deleted
- Educate users in the proper use of their applications and in proper file management techniques
- Make sure backups of users' files are made regularly and that the restoration process is as streamlined and quick as possible
8.1.4.2. Operations Personnel Errors
8.1.4.2.1. Failure to Follow Procedures
- The environment was changed at some time in the past, and the procedures were never updated. Now the environment changes again, rendering the operator's memorized procedure invalid. At this point, even if the procedures were updated (which is unlikely, given the fact that they were not updated before) the operator will not be aware of it.
- The environment was changed, and no procedures exist. This is just a more out-of-control version of the previous situation.
- The procedures exist and are correct, but the operator will not (or cannot) follow them.
8.1.4.2.2. Mistakes Made During Procedures
8.1.4.3. System Administrator Errors
8.1.4.3.1. Misconfiguration Errors
- Email
- User accounts
- Network
- Applications
8.1.4.3.1.1. Change Control
- Preliminary research
- Preliminary research attempts to clearly define:
- The nature of the change to take place
- Its impact, should the change succeed
- A fallback position, should the change fail
- An assessment of what types of failures are possible
Preliminary research might include testing the proposed change during a scheduled downtime, or it may go so far as to include implementing the change first on a special test environment run on dedicated test hardware. - Scheduling
- The change is examined with an eye toward the actual mechanics of implementation. The scheduling being done includes outlining the sequencing and timing of the change (along with the sequencing and timing of any steps necessary to back the change out should a problem arise), as well as ensuring that the time allotted for the change is sufficient and does not conflict with any other system-level activity.The product of this process is often a checklist of steps for the system administrator to use while making the change. Included with each step are instructions to perform in order to back out the change should the step fail. Estimated times are often included, making it easier for the system administrator to determine whether the work is on schedule or not.
- Execution
- At this point, the actual execution of the steps necessary to implement the change should be straightforward and anti-climactic. The change is either implemented, or (if trouble crops up) it is backed out.
- Monitoring
- Whether the change is implemented or not, the environment is monitored to make sure that everything is operating as it should.
- Documenting
- If the change has been implemented, all existing documentation is updated to reflect the changed configuration.
8.1.4.3.2. Mistakes Made During Maintenance
8.1.4.4. Service Technician Errors
8.1.4.4.1. Improperly-Repaired Hardware
8.1.4.4.2. Fixing One Thing and Breaking Another
8.2. Backups
- To permit restoration of individual files
- To permit wholesale restoration of entire file systems
8.2.1. Different Data: Different Backup Needs
- A backup is nothing more than a snapshot of the data being backed up. It is a reflection of that data at a particular moment in time.
- Data that changes infrequently can be backed up infrequently, while data that changes often must be backed up more frequently.
- Operating System
- This data normally only changes during upgrades, the installation of bug fixes, and any site-specific modifications.
Note
Should you even bother with operating system backups? This is a question that many system administrators have pondered over the years. On the one hand, if the installation process is relatively easy, and if the application of bugfixes and customizations are well documented and easily reproducible, reinstalling the operating system may be a viable option.On the other hand, if there is the least doubt that a fresh installation can completely recreate the original system environment, backing up the operating system is the best choice, even if the backups are performed much less frequently than the backups for production data. Occasional operating system backups also come in handy when only a few system files must be restored (for example, due to accidental file deletion). - Application Software
- This data changes whenever applications are installed, upgraded, or removed.
- Application Data
- This data changes as frequently as the associated applications are run. Depending on the specific application and your organization, this could mean that changes take place second-by-second or once at the end of each fiscal year.
- User Data
- This data changes according to the usage patterns of your user community. In most organizations, this means that changes take place all the time.
Note
8.2.2. Backup Software: Buy Versus Build
- Schedules backups to run at the proper time
- Manages the location, rotation, and usage of backup media
- Works with operators (and/or robotic media changers) to ensure that the proper media is available
- Assists operators in locating the media containing a specific backup of a given file
- Purchase a commercially-developed solution
- Create an in-house developed backup system from scratch (possibly integrating one or more open source technologies)
- Changing backup software is difficult; once implemented, you will be using the backup software for a long time. After all, you will have long-term archive backups that you must be able to read. Changing backup software means you must either keep the original software around (to access the archive backups), or you must convert your archive backups to be compatible with the new software.Depending on the backup software, the effort involved in converting archive backups may be as straightforward (though time-consuming) as running the backups through an already-existing conversion program, or it may require reverse-engineering the backup format and writing custom software to perform the task.
- The software must be 100% reliable -- it must back up what it is supposed to, when it is supposed to.
- When the time comes to restore any data -- whether a single file or an entire file system -- the backup software must be 100% reliable.
8.2.3. Types of Backups
- Full backups
- Incremental backups
- Differential backups
8.2.3.1. Full Backups
8.2.3.2. Incremental Backups
8.2.3.3. Differential Backups
8.2.4. Backup Media
8.2.4.1. Tape
8.2.4.2. Disk
- Disk drives are not normally removable. One key factor to an effective backup strategy is to get the backups out of your data center and into off-site storage of some sort. A backup of your production database sitting on a disk drive two feet away from the database itself is not a backup; it is a copy. And copies are not very useful should the data center and its contents (including your copies) be damaged or destroyed by some unfortunate set of circumstances.
- Disk drives are expensive (at least compared to other backup media). There may be situations where money truly is no object, but in all other circumstances, the expenses associated with using disk drives for backup mean that the number of backup copies must be kept low to keep the overall cost of backups low. Fewer backup copies mean less redundancy should a backup not be readable for some reason.
- Disk drives are fragile. Even if you spend the extra money for removable disk drives, their fragility can be a problem. If you drop a disk drive, you have lost your backup. It is possible to purchase specialized cases that can reduce (but not entirely eliminate) this hazard, but that makes an already-expensive proposition even more so.
- Disk drives are not archival media. Even assuming you are able to overcome all the other problems associated with performing backups onto disk drives, you should consider the following. Most organizations have various legal requirements for keeping records available for certain lengths of time. The chance of getting usable data from a 20-year-old tape is much greater than the chance of getting usable data from a 20-year-old disk drive. For instance, would you still have the hardware necessary to connect it to your system? Another thing to consider is that a disk drive is much more complex than a tape cartridge. When a 20-year-old motor spins a 20-year-old disk platter, causing 20-year-old read/write heads to fly over the platter surface, what are the chances that all these components will work flawlessly after sitting idle for 20 years?
Note
Some data centers back up to disk drives and then, when the backups have been completed, the backups are written out to tape for archival purposes. This allows for the fastest possible backups during the backup window. Writing the backups to tape can then take place during the remainder of the business day; as long as the "taping" finishes before the next day's backups are done, time is not an issue.
8.2.4.3. Network
8.2.5. Storage of Backups
- Small, ad-hoc restoration requests from users
- Massive restorations to recover from a disaster
- Archival storage unlikely to ever be used again
- A data center pool used strictly for ad-hoc restoration requests
- An off-site pool used for off-site storage and disaster recovery
8.2.6. Restoration Issues
8.2.6.1. Restoring From Bare Metal
- Reinstall, followed by restore
- Here the base operating system is installed just as if a brand-new computer were being initially set up. Once the operating system is in place and configured properly, the remaining disk drives can be partitioned and formatted, and all backups restored from backup media.
- System recovery disks
- A system recovery disk is bootable media of some kind (often a CD-ROM) that contains a minimal system environment, able to perform most basic system administration tasks. The recovery environment contains the necessary utilities to partition and format disk drives, the device drivers necessary to access the backup device, and the software necessary to restore data from the backup media.
Note
8.2.6.2. Testing Backups
8.3. Disaster Recovery
8.3.1. Creating, Testing, and Implementing a Disaster Recovery Plan
- What events denote possible disasters
- What people in the organization have the authority to declare a disaster and thereby put the plan into effect
- The sequence of events necessary to prepare the backup site once a disaster has been declared
- The roles and responsibilities of all key personnel with respect to carrying out the plan
- An inventory of the necessary hardware and software required to restore production
- A schedule listing the personnel to staff the backup site, including a rotation schedule to support ongoing operations without burning out the disaster team members
- The sequence of events necessary to move operations from the backup site to the restored/new data center
Note
Note
8.3.2. Backup Sites: Cold, Warm, and Hot
- Cold backup sites
- Warm backup sites
- Hot backup sites
- Companies specializing in providing disaster recovery services
- Other locations owned and operated by your organization
- A mutual agreement with another organization to share data center facilities in the event of a disaster
8.3.3. Hardware and Software Availability
8.3.4. Availability of Backups
- To have the last backups brought to the backup site
- To arrange regular backup pickup and dropoff to the backup site (in support of normal backups at the backup site)
Note
8.3.5. Network Connectivity to the Backup Site
8.3.6. Backup Site Staffing
8.3.7. Moving Back Toward Normalcy
8.4. Red Hat Enterprise Linux-Specific Information
8.4.1. Software Support
8.4.2. Backup Technologies
Note
8.4.2.1. tar
tar
utility is well known among UNIX system administrators. It is the archiving method of choice for sharing ad-hoc bits of source code and files between systems. The tar
implementation included with Red Hat Enterprise Linux is GNU tar
, one of the more feature-rich tar
implementations.
tar
, backing up the contents of a directory can be as simple as issuing a command similar to the following:
tar cf /mnt/backup/home-backup.tar /home/
home-backup.tar
in /mnt/backup/
. The archive contains the contents of the /home/
directory.
tar czf /mnt/backup/home-backup.tar.gz /home/
tar
; to learn more about them, read the tar(1)
man page.
8.4.2.2. cpio
cpio
utility is another traditional UNIX program. It is an excellent general-purpose program for moving data from one place to another and, as such, can serve well as a backup program.
cpio
is a bit different from tar
. Unlike tar
, cpio
reads the names of the files it is to process via standard input. A common method of generating a list of files for cpio
is to use programs such as find
whose output is then piped to cpio
:
find /home/ | cpio -o > /mnt/backup/home-backup.cpio
cpio
archive file (containing the everything in /home/
) called home-backup.cpio
and residing in the /mnt/backup/
directory.
Note
find
has a rich set of file selection tests, sophisticated backups can easily be created. For example, the following command performs a backup of only those files that have not been accessed within the past year:
find /home/ -atime +365 | cpio -o > /mnt/backup/home-backup.cpio
cpio
(and find
); to learn more about them read the cpio(1)
and find(1)
man pages.
8.4.2.3. dump
/restore
: Not Recommended for Mounted File Systems!
dump
and restore
programs are Linux equivalents to the UNIX programs of the same name. As such, many system administrators with UNIX experience may feel that dump
and restore
are viable candidates for a good backup program under Red Hat Enterprise Linux. However, one method of using dump
can cause problems. Here is Linus Torvald's comment on the subject:
From: Linus Torvalds To: Neil Conway Subject: Re: [PATCH] SMP race in ext2 - metadata corruption. Date: Fri, 27 Apr 2001 09:59:46 -0700 (PDT) Cc: Kernel Mailing List <linux-kernel At vger Dot kernel Dot org> [ linux-kernel added back as a cc ] On Fri, 27 Apr 2001, Neil Conway wrote: > > I'm surprised that dump is deprecated (by you at least ;-)). What to > use instead for backups on machines that can't umount disks regularly? Note that dump simply won't work reliably at all even in 2.4.x: the buffer cache and the page cache (where all the actual data is) are not coherent. This is only going to get even worse in 2.5.x, when the directories are moved into the page cache as well. So anybody who depends on "dump" getting backups right is already playing Russian roulette with their backups. It's not at all guaranteed to get the right results - you may end up having stale data in the buffer cache that ends up being "backed up". Dump was a stupid program in the first place. Leave it behind. > I've always thought "tar" was a bit undesirable (updates atimes or > ctimes for example). Right now, the cpio/tar/xxx solutions are definitely the best ones, and will work on multiple filesystems (another limitation of "dump"). Whatever problems they have, they are still better than the _guaranteed_(*) data corruptions of "dump". However, it may be that in the long run it would be advantageous to have a "filesystem maintenance interface" for doing things like backups and defragmentation.. Linus (*) Dump may work fine for you a thousand times. But it _will_ fail under the right circumstances. And there is nothing you can do about it.
dump
/restore
on mounted file systems is strongly discouraged. However, dump
was originally designed to backup unmounted file systems; therefore, in situations where it is possible to take a file system offline with umount
, dump
remains a viable backup technology.
8.4.2.4. The Advanced Maryland Automatic Network Disk Archiver (AMANDA)
tar
or dump
to do the actual backups (although under Red Hat Enterprise Linux using tar
is preferable, due to the issues with dump
raised in Section 8.4.2.3, “dump
/restore
: Not Recommended for Mounted File Systems!”). As such, AMANDA backups do not require AMANDA in order to restore files -- a decided plus.
amanda(8)
man page.
8.5. Additional Resources
8.5.1. Installed Documentation
tar(1)
man page -- Learn how to archive data.dump(8)
man page -- Learn how to dump file system contents.restore(8)
man page -- Learn how to retrieve file system contents saved bydump
.cpio(1)
man page -- Learn how to copy files to and from archives.find(1)
man page -- Learn how to search for files.amanda(8)
man page -- Learn more about the commands that are part of the AMANDA backup system.- Files in
/usr/share/doc/amanda-server-<version>/
-- Learn more about AMANDA by reviewing these various documents and example files.
8.5.2. Useful Websites
- http://www.redhat.com/apps/support/ -- The Red Hat support homepage provides easy access to various resources related to the support of Red Hat Enterprise Linux.
- http://www.disasterplan.com/ -- An interesting page with links to many sites related to disaster recovery. Includes a sample disaster recovery plan.
- http://web.mit.edu/security/www/isorecov.htm -- The Massachusetts Institute of Technology Information Systems Business Continuity Planning homepage contains several informative links.
- http://www.linux-backup.net/ -- An interesting overview of many backup-related issues.
- http://www.linux-mag.com/1999-07/guru_01.html -- A good article from Linux Magazine on the more technical aspects of producing backups under Linux.
- http://www.amanda.org/ -- The Advanced Maryland Automatic Network Disk Archiver (AMANDA) homepage. Contains pointers to the various AMANDA-related mailing lists and other online resources.
8.5.3. Related Books
- The System Administrators Guide; Red Hat, Inc -- Includes a chapter on system recovery, which could be useful in bare metal restorations.
- Unix Backup &Recovery by W. Curtis Preston; O'Reilly &Associates -- Although not written specifically for Linux systems, this book provides in-depth coverage into many backup-related issues, and even includes a chapter on disaster recovery.
.gz
extension is traditionally used to signify that the file has been compressed with gzip
. Sometimes .tar.gz
is shortened to .tgz
to keep file names reasonably sized.
Appendix A. Revision History
Revision History | |||
---|---|---|---|
Revision 2-7.400 | 2013-10-31 | ||
| |||
Revision 2-7 | 2012-07-18 | ||
| |||
Revision 1.0-0 | Tue Sep 23 2008 | ||
|
Index
Symbols
- /etc/fstab file
- mounting file systems with, Mounting File Systems Automatically with /etc/fstab
- updating, Updating /etc/fstab
- /etc/group file
- group, role in, /etc/group
- user account, role in, /etc/group
- /etc/gshadow file
- group, role in, /etc/gshadow
- user account, role in, /etc/gshadow
- /etc/mtab file, Viewing /etc/mtab
- /etc/passwd file
- group, role in, /etc/passwd
- user account, role in, /etc/passwd
- /etc/shadow file
- group, role in, /etc/shadow
- user account, role in, /etc/shadow
- /proc/mdstat file, Checking Array Status With /proc/mdstat
- /proc/mounts file, Viewing /proc/mounts
A
- abuse, resource, What Barriers Are in Place To Prevent Abuse of Resources
- account (see user account)
- ATA disk drive
- adding, Adding ATA Disk Drives
- automation, Automation
- overview of, Automate Everything
B
- backups
- AMANDA backup software, The Advanced Maryland Automatic Network Disk Archiver (AMANDA)
- building software, Backup Software: Buy Versus Build
- buying software, Backup Software: Buy Versus Build
- data-related issues surrounding, Different Data: Different Backup Needs
- introduction to, Backups
- media types, Backup Media
- restoration issues, Restoration Issues
- bare metal restorations, Restoring From Bare Metal
- testing restoration, Testing Backups
- schedule, modifying, Modifying the Backup Schedule
- storage of, Storage of Backups
- technologies used, Backup Technologies
- cpio, cpio
- dump, dump/restore: Not Recommended for Mounted File Systems!
- tar, tar
- types of, Types of Backups
- differential backups, Differential Backups
- full backups, Full Backups
- incremental backups, Incremental Backups
- bandwidth-related resources (see resources, system, bandwidth)
- bash shell, automation and, Automation
- business, knowledge of, Know Your Business
C
- cache memory, Cache Memory
- capacity planning, Monitoring System Capacity
- CD-ROM
- file system (see ISO 9660 file system)
- centralized home directory, Home Directories
- chage command, User Account and Group Applications
- change control, Change Control
- chfn command, User Account and Group Applications
- chgrp command, File Permission Applications
- chmod command, File Permission Applications
- chown command, File Permission Applications
- chpasswd command, User Account and Group Applications
- color laser printers, Color Laser Printers
- communication
- necessity of, Communicate as Much as Possible
- Red Hat Enterprise Linux-specific information, Documentation and Communication
- CPU power (see resources, system, processing power)
D
- daisy-wheel printers (see impact printers)
- data
- shared access to, Who Can Access Shared Data, Where Users Access Shared Data
- global ownership issues, Global Ownership Issues
- device
- alternative to device names, Alternatives to Device File Names
- device names, alternatives to, Alternatives to Device File Names
- devlabel, naming with, Using devlabel
- file names, Device Files
- file system labels, File System Labels
- labels, file system, File System Labels
- naming convention, Device Naming Conventions
- naming with devlabel, Using devlabel
- partition, Partition
- type, Device Type
- unit, Unit
- whole-device access, Whole-Device Access
- devlabel, Using devlabel
- df command, Issuing the df Command
- disaster planning, Planning for Disaster
- power, backup, Backup Power
- generator, Providing Power For the Next Few Hours (and Beyond)
- motor-generator set, Providing Power For the Next Few Seconds
- outages, extended, Planning for Extended Outages
- UPS, Providing Power For the Next Few Minutes
- types of disasters, Types of Disasters
- air conditioning, Heating, Ventilation, and Air Conditioning
- application failures, Application Failures
- building integrity, Building Integrity
- electrical, Electricity
- electricity, quality of, Power Quality
- electricity, security of, The Security of Your Power
- environmental failures, Environmental Failures
- hardware failures, Hardware Failures
- heating, Heating, Ventilation, and Air Conditioning
- human errors, Human Errors
- HVAC, Heating, Ventilation, and Air Conditioning
- improper repairs, Improperly-Repaired Hardware
- improperly-used applications, Improper Use of Applications
- maintenance-related errors, Mistakes Made During Maintenance
- misconfiguration errors, Misconfiguration Errors
- mistakes during procedures, Mistakes Made During Procedures
- operating system crashes, Crashes
- operating system failures, Operating System Failures
- operating system hangs, Hangs
- operator errors, Operations Personnel Errors
- procedural errors, Failure to Follow Procedures
- service technician errors, Service Technician Errors
- software failures, Software Failures
- system administrator errors, System Administrator Errors
- user errors, End-User Errors
- ventilation, Heating, Ventilation, and Air Conditioning
- weather-related, Weather and the Outside World
- disaster recovery
- backup site, Backup Sites: Cold, Warm, and Hot
- network connectivity to, Network Connectivity to the Backup Site
- staffing of, Backup Site Staffing
- backups, availability of, Availability of Backups
- end of, Moving Back Toward Normalcy
- hardware availability, Hardware and Software Availability
- introduction to, Disaster Recovery
- plan, creating, testing, implementing, Creating, Testing, and Implementing a Disaster Recovery Plan
- software availability, Hardware and Software Availability
- disk drives, Hard Drives
- disk quotas
- enabling, Enabling Disk Quotas
- introduction to, Implementing Disk Quotas
- management of, Managing Disk Quotas
- overview of, Some Background on Disk Quotas
- block usage tracking, Tracks Disk Block Usage
- file-system specific, Per-File-System Implementation
- grace period, Grace Periods
- group specific, Per-Group Space Accounting
- hard limits, Hard Limits
- inode usage tracking, Tracks Disk Inode Usage
- soft limits, Soft Limits
- user specific, Per-User Space Accounting
- disk space (see storage)
- documentation
- Red Hat Enterprise Linux-specific information, Documentation and Communication
- documentation, necessity of, Document Everything
- dot-matrix printers (see impact printers)
E
- engineering, social, The Risks of Social Engineering
- execute permission, User Accounts, Groups, and Permissions
- EXT2 file system, EXT2
- EXT3 file system, EXT3
F
- file names
- device, Device Files
- file system
- labels, File System Labels
- free command, free, Red Hat Enterprise Linux-Specific Information
G
- GID, Usernames and UIDs, Groups and GIDs
- gnome-system-monitor command, The GNOME System Monitor -- A Graphical top
- gpasswd command, User Account and Group Applications
- group
- files controlling, Files Controlling User Accounts and Groups
- /etc/group, /etc/group
- /etc/gshadow, /etc/gshadow
- /etc/passwd, /etc/passwd
- /etc/shadow, /etc/shadow
- GID, Usernames and UIDs, Groups and GIDs
- management of, Managing User Accounts and Resource Access
- permissions related to, User Accounts, Groups, and Permissions
- shared data access using, Shared Groups and Data
- structure, determining, Determining Group Structure
- system GIDs, Usernames and UIDs, Groups and GIDs
- system UIDs, Usernames and UIDs, Groups and GIDs
- tools for managing, User Account and Group Applications
- gpasswd command, User Account and Group Applications
- groupadd command, User Account and Group Applications
- groupdel command, User Account and Group Applications
- groupmod command, User Account and Group Applications
- grpck command, User Account and Group Applications
- UID, Usernames and UIDs, Groups and GIDs
- group ID (see GID)
- groupadd command, User Account and Group Applications
- groupdel command, User Account and Group Applications
- groupmod command, User Account and Group Applications
- grpck command, User Account and Group Applications
H
- hard drives, Hard Drives
- hardware
- failures of, Hardware Failures
- service contracts, Service Contracts
- availability of parts, Parts Availability
- budget for, Available Budget
- coverage hours, Hours of Coverage
- depot service, Depot Service
- drop-off service, Depot Service
- hardware covered, Hardware to be Covered
- on-site technician, Zero Response Time -- Having an On-Site Technician
- response time, Response Time
- walk-in service, Depot Service
- skills necessary to repair, Having the Skills
- spares
- keeping, Keeping Spare Hardware
- stock, quantities, How Much to Stock?
- stock, selection of, What to Stock?
- swapping hardware, Spares That Are Not Spares
- home directory
- centralized, Home Directories
I
- IDE interface
- overview of, IDE/ATA
- impact printers, Impact Printers
- consumables, Impact Printer Consumables
- daisy-wheel, Impact Printers
- dot-matrix, Impact Printers
- line, Impact Printers
- inkjet printers, Inkjet Printers
- consumables, Inkjet Consumables
- intrusion detection systems, Security
- iostat command, The Sysstat Suite of Resource Monitoring Tools, Monitoring Bandwidth on Red Hat Enterprise Linux
- ISO 9660 file system, ISO 9660
L
- laser printers, Laser Printers
- color, Color Laser Printers
- consumables, Laser Printer Consumables
- line printers (see impact printers)
- logical volume management (see LVM)
- LVM
- contrasted with RAID, With LVM, Why Use RAID?
- data migration, Data Migration
- logical volume resizing, Logical Volume Resizing
- migration, data, Data Migration
- overview of, Logical Volume Management
- resizing, logical volume, Logical Volume Resizing
- storage grouping, Physical Storage Grouping
M
- managing
- printers, Printers and Printing
- memory
- monitoring of, Monitoring Memory
- resource utilization of, Physical and Virtual Memory
- virtual memory, Basic Virtual Memory Concepts
- backing store, Backing Store -- the Central Tenet of Virtual Memory
- overview of, Virtual Memory in Simple Terms
- page faults, Page Faults
- performance of, Virtual Memory Performance Implications
- performance, best case, Best Case Performance Scenario
- performance, worst case, Worst Case Performance Scenario
- swapping, Swapping
- virtual address space, Virtual Memory: The Details
- working set, The Working Set
- monitoring
- resources, Resource Monitoring
- system performance, System Performance Monitoring
- monitoring statistics
- bandwidth-related, Monitoring Bandwidth
- CPU-related, Monitoring CPU Power
- memory-related, Monitoring Memory
- selection of, What to Monitor?
- storage-related, Monitoring Storage
- mount points (see storage, file system, mount point)
- mounting file systems (see storage, file system, mounting)
- mpstat command, The Sysstat Suite of Resource Monitoring Tools
- MSDOS file system, MSDOS
N
- NFS, NFS
P
- page description languages (PDL), Printer Languages and Technologies
- Interpress, Printer Languages and Technologies
- PCL, Printer Languages and Technologies
- PostScript, Printer Languages and Technologies
- page faults, Page Faults
- PAM, Security
- partition, Partition
- attributes of, Partition Attributes
- geometry, Geometry
- type, Partition Type
- type field, Partition Type Field
- creation of, Partitioning, Partitioning
- extended, Extended Partitions
- logical, Logical Partitions
- overview of, Partitions/Slices
- primary, Primary Partitions
- passwd command, User Account and Group Applications
- password, Passwords
- aging, Password Aging
- big character set used in, Expanded Character Set
- longer, Longer Passwords
- memorable, Memorable
- personal info used in, Personal Information
- repeatedly used, The Same Password for Multiple Systems
- shortness of, Short Passwords
- small character set used in, Limited Character Set
- strong, Strong Passwords
- weak, Weak Passwords
- word tricks used in, Simple Word Tricks
- words used in, Recognizable Words
- written, Passwords on Paper
- perl, automation and, Automation
- permissions, User Accounts, Groups, and Permissions
- tools for managing
- chgrp command, File Permission Applications
- chmod command, File Permission Applications
- chown command, File Permission Applications
- philosophy of system administration, The Philosophy of System Administration
- physical memory (see memory)
- planning, importance of, Plan Ahead
- Pluggable Authentication Modules (see PAM)
- printers
- additional resources, Additional Resources
- color, Inkjet Printers
- CMYK, Inkjet Printers
- inkjet, Inkjet Printers
- laser, Color Laser Printers
- considerations, Printing Considerations
- duplex, Function
- languages (see page description languages (PDL))
- local, Networked Versus Local Printers
- managing, Printers and Printing
- networked, Networked Versus Local Printers
- types, Types of Printers
- color laser, Color Laser Printers
- daisy-wheel, Impact Printers
- dot-matrix, Impact Printers
- dye-sublimation, Other Printer Types
- impact, Impact Printers
- inkjet, Inkjet Printers
- laser, Laser Printers
- line, Impact Printers
- solid ink, Other Printer Types
- thermal wax, Other Printer Types
- processing power, resources related to (see resources, system, processing power)
Q
- quota, disk (see disk quotas)
R
- RAID
- arrays
- management of, Day to Day Management of RAID Arrays
- raidhotadd command, use of, Rebuilding a RAID array
- rebuilding, Rebuilding a RAID array
- status, checking, Checking Array Status With /proc/mdstat
- arrays, creating, Creating RAID Arrays
- after installation time, After Red Hat Enterprise Linux Has Been Installed
- at installation time, While Installing Red Hat Enterprise Linux
- contrasted with LVM, With LVM, Why Use RAID?
- creating arrays (see RAID, arrays, creating)
- implementations of, RAID Implementations
- hardware RAID, Hardware RAID
- software RAID, Software RAID
- introduction to, RAID-Based Storage
- levels of, RAID Levels
- nested RAID, Nested RAID Levels
- overview of, Basic Concepts
- raidhotadd command, use of, Rebuilding a RAID array
- RAM, Main Memory -- RAM
- read permission, User Accounts, Groups, and Permissions
- recursion (see recursion)
- Red Hat Enterprise Linux-specific information
- automation, Automation
- backup technologies
- backups technologies
- overview of, Backup Technologies
- bash shell, Automation
- communication, Documentation and Communication
- disaster recovery, Red Hat Enterprise Linux-Specific Information
- documentation, Documentation and Communication
- intrusion detection systems, Security
- PAM, Security
- perl, Automation
- resource monitoring
- resource monitoring tools, Red Hat Enterprise Linux-Specific Information
- free, Red Hat Enterprise Linux-Specific Information, Red Hat Enterprise Linux-Specific Information
- iostat, Monitoring Bandwidth on Red Hat Enterprise Linux
- OProfile, Red Hat Enterprise Linux-Specific Information
- sar, Monitoring Bandwidth on Red Hat Enterprise Linux, Monitoring CPU Utilization on Red Hat Enterprise Linux, Red Hat Enterprise Linux-Specific Information
- Sysstat, Red Hat Enterprise Linux-Specific Information
- top, Red Hat Enterprise Linux-Specific Information, Monitoring CPU Utilization on Red Hat Enterprise Linux
- vmstat, Red Hat Enterprise Linux-Specific Information, Monitoring Bandwidth on Red Hat Enterprise Linux, Monitoring CPU Utilization on Red Hat Enterprise Linux, Red Hat Enterprise Linux-Specific Information
- RPM, Security
- security, Security
- shell scripts, Automation
- software support, Software Support
- support, software, Software Support
- resource abuse, What Barriers Are in Place To Prevent Abuse of Resources
- resource monitoring, Resource Monitoring
- bandwidth, Monitoring Bandwidth
- capacity planning, Monitoring System Capacity
- concepts behind, Basic Concepts
- CPU power, Monitoring CPU Power
- memory, Monitoring Memory
- storage, Monitoring Storage
- system capacity, Monitoring System Capacity
- system performance, System Performance Monitoring
- tools
- free, free
- GNOME System Monitor, The GNOME System Monitor -- A Graphical top
- iostat, The Sysstat Suite of Resource Monitoring Tools
- mpstat, The Sysstat Suite of Resource Monitoring Tools
- OProfile, OProfile
- sa1, The Sysstat Suite of Resource Monitoring Tools
- sa2, The Sysstat Suite of Resource Monitoring Tools
- sadc, The Sysstat Suite of Resource Monitoring Tools
- sar, The Sysstat Suite of Resource Monitoring Tools, The sar command
- Sysstat, The Sysstat Suite of Resource Monitoring Tools
- top, top
- vmstat, vmstat
- tools used, Red Hat Enterprise Linux-Specific Information
- what to monitor, What to Monitor?
- resources, importance of, Know Your Resources
- resources, system
- bandwidth, Bandwidth and Processing Power
- buses role in, Buses
- buses, examples of, Examples of Buses
- capacity, increasing, Increase the Capacity
- datapaths, examples of, Examples of Datapaths
- datapaths, role in, Datapaths
- load, reducing, Reduce the Load
- load, spreading, Spread the Load
- monitoring of, Monitoring Bandwidth
- overview of, Bandwidth
- problems related to, Potential Bandwidth-Related Problems
- solutions to problems with, Potential Bandwidth-Related Solutions
- memory (see memory)
- processing power, Bandwidth and Processing Power
- application overhead, reducing, Reducing Application Overhead
- application use of, Applications
- applications, eliminating, Eliminating Applications Entirely
- capacity, increasing, Increasing the Capacity
- consumers of, Consumers of Processing Power
- CPU, upgrading, Upgrading the CPU
- facts related to, Facts About Processing Power
- load, reducing, Reducing the Load
- monitoring of, Monitoring CPU Power
- O/S overhead, reducing, Reducing Operating System Overhead
- operating system use of, The Operating System
- overview of, Processing Power
- shortage of, improving, Improving a CPU Shortage
- SMP, Is Symmetric Multiprocessing Right for You?
- symmetric multiprocessing, Is Symmetric Multiprocessing Right for You?
- upgrading, Upgrading the CPU
- storage (see storage)
- RPM, Security
- RPM Package Manager (see RPM)
S
- sa1 command, The Sysstat Suite of Resource Monitoring Tools
- sa2 command, The Sysstat Suite of Resource Monitoring Tools
- sadc command, The Sysstat Suite of Resource Monitoring Tools
- sar command, The Sysstat Suite of Resource Monitoring Tools, The sar command, Monitoring Bandwidth on Red Hat Enterprise Linux, Monitoring CPU Utilization on Red Hat Enterprise Linux, Red Hat Enterprise Linux-Specific Information
- reports, reading, Reading sar Reports
- SCSI disk drive
- adding, Adding SCSI Disk Drives
- SCSI interface
- overview of, SCSI
- security
- importance of, Security Cannot be an Afterthought
- Red Hat Enterprise Linux-specific information, Security
- setgid permission, Security, User Accounts, Groups, and Permissions
- setuid permission, Security, User Accounts, Groups, and Permissions
- shell scripts, Automation
- SMB, SMB
- SMP, Is Symmetric Multiprocessing Right for You?
- social engineering, risks of, The Risks of Social Engineering
- software
- support for
- documentation, Documentation
- email support, Web or Email Support
- on-site support, On-Site Support
- overview, Getting Help -- Software Support
- self support, Self Support
- telephone support, Telephone Support
- Web support, Web or Email Support
- sticky bit permission, User Accounts, Groups, and Permissions
- storage
- adding, Adding Storage, Adding Storage
- /etc/fstab, updating, Updating /etc/fstab
- ATA disk drive, Adding ATA Disk Drives
- backup schedule, modifying, Modifying the Backup Schedule
- configuration, updating, Updating System Configuration
- formatting, Formatting the Partition(s), Formatting the Partition(s)
- hardware, installing, Installing the Hardware
- partitioning, Partitioning, Partitioning
- SCSI disk drive, Adding SCSI Disk Drives
- deploying, Making the Storage Usable
- disk quotas, Disk Quota Issues (see disk quotas)
- file system, File Systems, File System Basics
- /etc/mtab file, Viewing /etc/mtab
- /proc/mounts file, Viewing /proc/mounts
- access control, Access Control
- access times, Tracking of File Creation, Access, Modification Times
- accounting, space, Accounting of Space Utilized
- creation times, Tracking of File Creation, Access, Modification Times
- df command, using, Issuing the df Command
- directories, Hierarchical Directory Structure
- display of mounted, Seeing What is Mounted
- enabling access to, Enabling Storage Access
- EXT2, EXT2
- EXT3, EXT3
- file-based, File-Based Storage
- hierarchical directory, Hierarchical Directory Structure
- ISO 9660, ISO 9660
- modification times, Tracking of File Creation, Access, Modification Times
- mount point, Mount Points
- mounting, Mounting File Systems
- mounting with /etc/fstab file, Mounting File Systems Automatically with /etc/fstab
- MSDOS, MSDOS
- space accounting, Accounting of Space Utilized
- structure, directory, Directory Structure
- VFAT, VFAT
- file-related issues, File-Related Issues
- file access, File Access
- file sharing, File Sharing
- management of, Managing Storage, Storage Management Day-to-Day
- application usage, Excessive Usage by an Application
- excessive use of, Excessive Usage by a User
- free space monitoring, Monitoring Free Space
- growth, normal, Normal Growth in Usage
- user issues, Handling a User's Excessive Usage
- mass-storage devices
- access arm movement, Access Arm Movement
- access arms, Access Arms
- addressing concepts, Storage Addressing Concepts
- addressing, block-based, Block-Based Addressing
- addressing, geometry-based, Geometry-Based Addressing
- block-based addressing, Block-Based Addressing
- command processing, Command Processing Time
- cylinder, Cylinder
- disk platters, Disk Platters
- electrical limitations of, Mechanical/Electrical Limitations
- geometry, problems with, Problems with Geometry-Based Addressing
- geometry-based addressing, Geometry-Based Addressing
- head, Head
- heads, Data reading/writing device
- heads reading, Heads Reading/Writing Data
- heads writing, Heads Reading/Writing Data
- I/O loads, performance, I/O Loads and Performance
- I/O loads, reads, Reads Versus Writes
- I/O loads, writes, Reads Versus Writes
- I/O locality, Locality of Reads/Writes
- IDE interface, IDE/ATA
- industry-standard interfaces, Present-Day Industry-Standard Interfaces
- interfaces for, Mass Storage Device Interfaces
- interfaces, historical, Historical Background
- interfaces, industry-standard, Present-Day Industry-Standard Interfaces
- latency, rotational, Rotational Latency
- mechanical limitations of, Mechanical/Electrical Limitations
- movement, access arm, Access Arm Movement
- overview of, An Overview of Storage Hardware
- performance of, Hard Drive Performance Characteristics
- platters, disk, Disk Platters
- processing, command, Command Processing Time
- readers versus writers, Multiple Readers/Writers
- rotational latency, Rotational Latency
- SCSI interface, SCSI
- sector, Sector
- monitoring of, Monitoring Storage
- network-accessible, Network-Accessible Storage, Network-Accessible Storage Under Red Hat Enterprise Linux
- partition
- attributes of, Partition Attributes
- extended, Extended Partitions
- geometry of, Geometry
- logical, Logical Partitions
- overview of, Partitions/Slices
- primary, Primary Partitions
- type field, Partition Type Field
- type of, Partition Type
- patterns of access, Storage Access Patterns
- RAID-based (see RAID)
- removing, Removing Storage, Removing Storage
- /etc/fstab, removing from, Remove the Disk Drive's Partitions From /etc/fstab
- data, removing, Moving Data Off the Disk Drive
- erasing contents, Erase the Contents of the Disk Drive, Erase the Contents of the Disk Drive
- umount command, use of, Terminating Access With umount
- technologies, The Storage Spectrum
- backup storage, Off-Line Backup Storage
- cache memory, Cache Memory
- CPU registers, CPU Registers
- disk drive, Hard Drives
- hard drive, Hard Drives
- L1 cache, Cache Levels
- L2 cache, Cache Levels
- main memory, Main Memory -- RAM
- off-line storage, Off-Line Backup Storage
- RAM, Main Memory -- RAM
- technologies, advanced, Advanced Storage Technologies
- swapping, Swapping
- symmetric multiprocessing, Is Symmetric Multiprocessing Right for You?
- Sysstat, Red Hat Enterprise Linux-Specific Information, The Sysstat Suite of Resource Monitoring Tools
- system administration
- philosophy of, The Philosophy of System Administration
- automation, Automate Everything
- business, Know Your Business
- communication, Communicate as Much as Possible
- documentation, Document Everything
- planning, Plan Ahead
- resources, Know Your Resources
- security, Security Cannot be an Afterthought
- social engineering, risks of, The Risks of Social Engineering
- unexpected occurrences, Expect the Unexpected
- users, Know Your Users
- system performance monitoring, System Performance Monitoring
- system resources (see resources, system)
T
- tools
- groups, managing (see group, tools for managing)
- resource monitoring, Red Hat Enterprise Linux-Specific Information
- free, free
- GNOME System Monitor, The GNOME System Monitor -- A Graphical top
- iostat, The Sysstat Suite of Resource Monitoring Tools
- mpstat, The Sysstat Suite of Resource Monitoring Tools
- OProfile, OProfile
- sa1, The Sysstat Suite of Resource Monitoring Tools
- sa2, The Sysstat Suite of Resource Monitoring Tools
- sadc, The Sysstat Suite of Resource Monitoring Tools
- sar, The Sysstat Suite of Resource Monitoring Tools, The sar command
- Sysstat, The Sysstat Suite of Resource Monitoring Tools
- top, top
- vmstat, vmstat
- user accounts, managing (see user account, tools for managing)
- top command, Red Hat Enterprise Linux-Specific Information, top, Monitoring CPU Utilization on Red Hat Enterprise Linux
U
- UID, Usernames and UIDs, Groups and GIDs
- unexpected, preparation for, Expect the Unexpected
- user account
- access control, Access Control Information
- files controlling, Files Controlling User Accounts and Groups
- /etc/group, /etc/group
- /etc/gshadow, /etc/gshadow
- /etc/passwd, /etc/passwd
- /etc/shadow, /etc/shadow
- GID, Usernames and UIDs, Groups and GIDs
- home directory
- centralized, Home Directories
- management of, Managing User Accounts and Resource Access, Managing User Accounts, Managing Accounts and Resource Access Day-to-Day
- job changes, Job Changes
- new hires, New Hires
- terminations, Terminations
- password, Passwords
- aging, Password Aging
- big character set used in, Expanded Character Set
- longer, Longer Passwords
- memorable, Memorable
- personal information used in, Personal Information
- repeatedly used, The Same Password for Multiple Systems
- shortness of, Short Passwords
- small character set used in, Limited Character Set
- strong, Strong Passwords
- weak, Weak Passwords
- word tricks used in, Simple Word Tricks
- words used in, Recognizable Words
- written, Passwords on Paper
- permissions related to, User Accounts, Groups, and Permissions
- resources, management of, Managing User Resources
- shared data access, Who Can Access Shared Data
- system GIDs, Usernames and UIDs, Groups and GIDs
- system UIDs, Usernames and UIDs, Groups and GIDs
- tools for managing, User Account and Group Applications
- chage command, User Account and Group Applications
- chfn command, User Account and Group Applications
- chpasswd command, User Account and Group Applications
- passwd command, User Account and Group Applications
- useradd command, User Account and Group Applications
- userdel command, User Account and Group Applications
- usermod command, User Account and Group Applications
- UID, Usernames and UIDs, Groups and GIDs
- username, The Username
- changes to, Dealing with Name Changes
- collisions in naming, Dealing with Collisions
- naming convention, Naming Conventions
- user ID (see UID)
- useradd command, User Account and Group Applications
- userdel command, User Account and Group Applications
- usermod command, User Account and Group Applications
- username, The Username
- changing, Dealing with Name Changes
- collisions between, Dealing with Collisions
- naming convention, Naming Conventions
- users
- importance of, Know Your Users
V
- VFAT file system, VFAT
- virtual address space, Virtual Memory: The Details
- virtual memory (see memory)
- vmstat command, Red Hat Enterprise Linux-Specific Information, vmstat, Monitoring Bandwidth on Red Hat Enterprise Linux, Monitoring CPU Utilization on Red Hat Enterprise Linux, Red Hat Enterprise Linux-Specific Information
W
- watch command, free
- working set, The Working Set
- write permission, User Accounts, Groups, and Permissions