- Problem
statement; what the problem or perceived problem is. What are the effects
of the problem?
Environmental changes: list all
changes prior to the issue taking place/being seen. This include network
changes, system/os changes, ds config changes, application changes, application
load changes etc.
DS Data (required*):
Full DS Version
32 bit versions
cd <install root>/lib
./ns-slapd -D <path to slapd
instance> -V
64 bit versions
cd <install root>/lib/64
./ns-slapd -D <path to slapd
instance> -V
DS Install Type; zip/pkg
cd <install
./carpet | grep carpetIsNativePkg
found in <path to slapd
DS Access & Error Logs
found in <path to slapd
DS Schema
found in <path to slapd
OS Data:
Get one of the following based on
your OS version
uname -a
prtconf -v
pkginfo –l
- Hung
or Unresponsive Process
NOTE: See DirTracer's Configurator
Option 1
Process Hung [1]
A hung or unresponsive directory
server process is one that has stopped responding to client requests.
In some cases the DS Process is
not actually hung. frozen or wedged but actually has one thread holding the
rest in a locked state until the single one free's up. If this happens, the
directory server will continue to accept new connections but will not process
their requests. You will also see the following as the directory server
exhausts all file descriptors (for connections). See: nsslapd-maxdescriptors
[24/Jul/2009:3:45:13 +0000] -
ERROR<12293> - Connection - conn=-1 op=-1 msgId=-1 - fd limit exceeded
Too many open file descriptors - not listening on new connection
Data Capture (required):
It is good to get the following
every 1-5 seconds. This helps show process movement and or trends; i.e. to
really show if it's hung or changing over time.
pstack <pid>
prstat -L -n <# of ds
threads>,<# of ds threads> <pid> 0 1
netstat -an
iostat -xnMCz -T d
If space is available, the best
way to get the root cause of a DS Hang is to grab one or more gcore files (when
available on an os version). This allows Sun Support to debug the actual
process contents to see which thread(s) are holding the process in a lock.
Gcore -o <gcore file name and
When you manually snap a gcore,
Sun will require and request the libraries used by the slapd server process in
order to debug the gcore contents. This is important due to the numerous OS
variations with regard to library versions. Using Pkgapp allows the customer to
quickly snapshot the servers OS, DS libraries, Pstack etc. which speeds up the
debugging process.
Pkgapp: for full usage see the pdf
document that comes with Pkgapp.
32 bit process
./pkgapp -c <gcore> -p
<install root>/lib
64 bit process
./pkgapp -c <gcore> -p
<install root>/lib/64
df -k
3. High
NOTE: See DirTracer's Configurator
Option 2
High CPU [2]
High cpu issues can occur when the
directory server is dealing with many ldap operations such as very restrictive
or excessive aci's, unindexed searches, group based searches etc.
Data Capture (required *):
It is good to get the following
every 1-5 seconds. This helps show process movement and or trends; i.e. is the
cpu steady, growing or shrinking.
pms.sh <slapd pid>
<interval> <numberofchecks>
pstack <pid>
prstat -L -n <# of ds
threads>,<# of ds threads> <pid> 0 1
netstat -an
iostat -xnMCz -T d
cn=monitor searches
If space is available, the best
way to get the root cause of a High CPU issue is to grab one or more gcore
files (when available on an os version). This allows Sun Support to debug the
actual process contents to see which thread(s) are using the most cpu as well
as what the thread is actually processing.
Gcore -o <gcore file name and
When you manually snap a gcore,
Sun will require and request the libraries used by the slapd server process in
order to debug the gcore contents. This is important due to the numerous OS
variations with regard to library versions. Using Pkgapp allows the customer to
quickly snapshot the servers OS, DS libraries, Pstack etc. which speeds up the
debugging process.
Pkgapp: for full usage see the pdf
document that comes with Pkgapp.
32 bit process
./pkgapp -c <gcore> -p
<install root>/lib
64 bit process
./pkgapp -c <gcore> -p
<install root>/lib/64
df -k
ACI search. Listing all ACI's will
show us
NOTE: See DirTracer's Configurator
Option 3
Replication [3]
Replication can issues can be seen
in many ways.
Updates not going through; i.e.
replication broken between one or more servers.
Updates slow to get through
replication debug logging for 20
mins on each of the affected servers. For example, if replication is slow or
broken from one master to a single consumer, then get debug loggin from each of
these servers "at the same time". Remember to note the current
infolog-area etc values before you change it to replication debug logging (8192).
Once you have gathered the logs for 20 minutes, change this back to the old
$ ldapmodify -h host -p port -D
"cn=Directory Manager" -w password
dn: cn=config
changetype: modify
replace: nsslapd-infolog-area #
nsslapd-errorlog-level in 5.1
nsslapd-infolog-area: 8192
ruv searches from the broken
backend on each of the affected servers.
$ ldapsearch -h host -p port -D
"cn=Directory Manager" -w password -b "<replicated
suffix>" -s base
cn=config search
$ ldapsearch -h host -p port -D
"cn=Directory Manager" -w password -b "cn=config" -s base
insync. Note the output from
insync and whether the delay(s) are getting worse, getting better or staying
the same.
The insync command indicates the
state of synchronization between a master replica and one or more consumer
replicas. The following command shows the state over a period of 30 seconds.
server-root/shared/bin/insync -s
"cn=Directory Manager:password@hostname1:ldap-port" -c
"cn=Directory Manager:password@hostname2:ldap-port" 30
repldisc. Repldisc or
"Replication Discovery" will display the replication topology in a
text based matrix
server-root/shared/bin/repldisc -D
"cn=Directory Manager" -w password -b <replicated suffix> -s
NOTE: See DirTracer's Configurator
Option 4
Crashing [4]
When the directory server process
has unexpectedly died gather the following data. For instructions on preparing
your system to produce core files or crash dumps in the event of a crash, see
1.6 Configuring the Operating System to Generate Core Files.
corefile (unix)/crash dump
(windows): pkgapp with the -i switch on Unix to "Include" the
pkgapp (unix based systems only).
./pkgapp -i -c <corefile> -p
<full path (path only) to process binary> -s <path to write tar
Pkgapp will gather the following
OS info
file corefile
pstack corefile. Execute pkgapp to see its full usage.
pmap corefile
pldd corefile
pflags corefile
df -k (unix based systems only)
NOTE: See DirTracer's Configurator
Option 5
Memory Leak [5]
Memory leaks are a very
troublesome problem to gather data for.
Unix OS':
Before proceeding with the next
set of steps use pms.sh to determine the memory growth profile of your
directory server process. Sun Support can plot the data from pms.sh to show if
over time there is a real issue with memory not freeing.
1) start the directory server
2) prime the directory server's
caches by using the following search.
$ ldapsearch -h host -p port -D
"cn=Directory Manager" -w password -b "<suffix>" -s
sub "(&(objectClass=*))" * >> /dev/null
3) Launch pms.sh (or perfmon) with
a large enough parameter (number of checks) that we can see the process size
increase significantly.
Ex: ./pms.sh 60 10000000000
>> /tmp/pms.mem.out
4) Test/use the ldap applications.
pms.sh can be found in the
<DirTracer Install Location>dirtracertools for various unix OS'
pms.sh -
Once it has been determined there
is a leak, you can use one of the following methods for determining which
function(s) are not freeing memory.
7) Server Down
NOTE: See DirTracer's Configurator
for the Server Down option
Same as 0 - Basic
8) Startup issues
NOTE: See DirTracer's Configurator Option 7
Basic Capture [7]
Basic Capture [7]
Most startup issues can be dealt with using
truss/strace/tusc/DebugView etc and trace debugging from the directory server.
truss -feao truss.out -rall -wall -o /tmp/truss.log ./start-slapd
truss -feao truss.out -rall -wall -o /tmp/truss.log ./start-slapd
tusc -v -fealT -rall -wall -o /tmp/truss.out ./start-slapd
tusc -v -fealT -rall -wall -o /tmp/truss.out ./start-slapd
strace -fv -o /tmp/strace.out ./start-slapd
strace -fv -o /tmp/strace.out ./start-slapd
DebugView is available at http://www.sysinternals.com/Utilities/DebugView.html
DebugView is available at http://www.sysinternals.com/Utilities/DebugView.html
Is may also be beneficial to gather directory server
debug logging during the startup process.
Once you have gathered the debug logs (after the
directory server starts), change this back to the old value.
$ ldapmodify -h host -p port -D "cn=Directory
Manager" -w password
dn: cn=config
changetype: modify
replace: nsslapd-infolog-area # nsslapd-errorlog-level in 5.1
nsslapd-infolog-area: 1
dn: cn=config
changetype: modify
replace: nsslapd-infolog-area # nsslapd-errorlog-level in 5.1
nsslapd-infolog-area: 1
9) High IO
NOTE: See DirTracer's Configurator
Option 7 for Basic Capture w/ gcores.
High IO issues can occur when the
directory server is dealing with many ldap operations such as massive writes,
purging, unindexed searches, group based searches etc.
Data Capture (required *):
It is good to get the following
every 1-5 seconds. This helps show process movement and or trends; i.e. is the
cpu steady, growing or shrinking.
pms.sh <slapd pid>
<interval> <numberofchecks>
pstack <pid>
prstat -L -n <# of ds
threads>,<# of ds threads> <pid> 0 1
netstat -an
iostat -xnMCz -T d
cn=monitor searches
If space is available, the best
way to get the root cause of a High IO issue is to grab one or more gcore files
(when available on an os version). This allows Sun Support to debug the actual
process contents to see which thread(s) are using the most cpu as well as what
the thread is actually processing.
Gcore -o <gcore file name and
When you manually snap a gcore,
Sun will require and request the libraries used by the slapd server process in
order to debug the gcore contents. This is important due to the numerous OS
variations with regard to library versions. Using Pkgapp allows the customer to
quickly snapshot the servers OS, DS libraries, Pstack etc. which speeds up the
debugging process.
Pkgapp: for full usage see the pdf
document that comes with Pkgapp.
32 bit process
./pkgapp -c <gcore> -p
<install root>/lib
64 bit process
./pkgapp -c <gcore> -p
<install root>/lib/64
df -k
9) High IO
NOTE: See DirTracer's Configurator
Option 7 for Basic Capture w/ gcores.
High IO issues can occur when the
directory server is dealing with many ldap operations such as massive writes,
purging, unindexed searches, group based searches etc.
Data Capture (required *):
It is good to get the following
every 1-5 seconds. This helps show process movement and or trends; i.e. is the
cpu steady, growing or shrinking.
pms.sh <slapd pid>
<interval> <numberofchecks>
pstack <pid>
prstat -L -n <# of ds
threads>,<# of ds threads> <pid> 0 1
netstat -an
iostat -xnMCz -T d
cn=monitor searches
If space is available, the best
way to get the root cause of a High IO issue is to grab one or more gcore files
(when available on an os version). This allows Sun Support to debug the actual
process contents to see which thread(s) are using the most cpu as well as what
the thread is actually processing.
Gcore -o <gcore file name and
When you manually snap a gcore,
Sun will require and request the libraries used by the slapd server process in
order to debug the gcore contents. This is important due to the numerous OS
variations with regard to library versions. Using Pkgapp allows the customer to
quickly snapshot the servers OS, DS libraries, Pstack etc. which speeds up the
debugging process.
Pkgapp: for full usage see the pdf
document that comes with Pkgapp.
32 bit process
./pkgapp -c <gcore> -p
<install root>/lib
64 bit process
./pkgapp -c <gcore> -p
<install root>/lib/64
df -k
11) SSL Cert issues
certutil -L -N -W
12) Install Issues
DS6 http://docs.sun.com/app/docs/doc/820-2768/install?a=view
install logs
truss putput
typescript or screen captures
truss -feao truss.out -rall -wall
-o /tmp/truss.log <install command>
tusc -v -fealT -rall -wall -o
/tmp/truss.out <install command>
strace -fv -o /tmp/strace.out
<install command>
DebugView is available at
Install Logs:
For Java Enterprise System
installations, collect installation error logs.
The log file is named after the
date and time that the installation failed. For example, a log file for an
installation that failed on December 16 at 3:32 p.m. would have a name like
On Solaris systems, installation
logs are located under /var/sadm/install/logs.
On Red Hat and HP-UX systems,
installation logs are located under /var/opt/sun/install/logs.
On Windows systems, installation
logs are located
13) DSCC/Console issues
See the following for gathering
data on the DSCC
14) Schema issues
See 0 - Basic
changes to the directory server
examples of the problem schema
errors/user info etc.
Labels parameters
No comments:
Post a Comment