- Problem
statement; what the problem or perceived problem is. What are the effects
of the problem?
Environmental changes: list all
changes prior to the issue taking place/being seen. This include network
changes, system/os changes, ds config changes, application changes, application
load changes etc.
DS Data (required*):
Full DS Version
32 bit versions
cd <install root>/lib
./ns-slapd -D <path to slapd
instance> -V
64 bit versions
cd <install root>/lib/64
./ns-slapd -D <path to slapd
instance> -V
DS Install Type; zip/pkg
cd <install
root>/dsee6/lib/bin
./carpet | grep carpetIsNativePkg
dse.ldif
found in <path to slapd
instance>/config
DS Access & Error Logs
found in <path to slapd
instance>/logs
DS Schema
found in <path to slapd
instance>/config/schema
OS Data:
Get one of the following based on
your OS version
/etc/release
/etc/redhat-release
/etc/SuSE-release
uname -a
prtconf -v
pkginfo –l
- Hung
or Unresponsive Process
NOTE: See DirTracer's Configurator
Option 1
Process Hung [1]
A hung or unresponsive directory
server process is one that has stopped responding to client requests.
In some cases the DS Process is
not actually hung. frozen or wedged but actually has one thread holding the
rest in a locked state until the single one free's up. If this happens, the
directory server will continue to accept new connections but will not process
their requests. You will also see the following as the directory server
exhausts all file descriptors (for connections). See: nsslapd-maxdescriptors
[24/Jul/2009:3:45:13 +0000] -
ERROR<12293> - Connection - conn=-1 op=-1 msgId=-1 - fd limit exceeded
Too many open file descriptors - not listening on new connection
References:
DS5
http://docs.sun.com/app/docs/doc/820-2768/gehxm?l=en&a=view
http://docs.sun.com/app/docs/doc/820-0437/6nc66m9qh?a=view
Data Capture (required):
It is good to get the following
every 1-5 seconds. This helps show process movement and or trends; i.e. to
really show if it's hung or changing over time.
pstack <pid>
prstat -L -n <# of ds
threads>,<# of ds threads> <pid> 0 1
netstat -an
iostat -xnMCz -T d
<INTERVAL> <NUMBEROFCHECKS>
If space is available, the best
way to get the root cause of a DS Hang is to grab one or more gcore files (when
available on an os version). This allows Sun Support to debug the actual
process contents to see which thread(s) are holding the process in a lock.
Gcore -o <gcore file name and
date>
When you manually snap a gcore,
Sun will require and request the libraries used by the slapd server process in
order to debug the gcore contents. This is important due to the numerous OS
variations with regard to library versions. Using Pkgapp allows the customer to
quickly snapshot the servers OS, DS libraries, Pstack etc. which speeds up the
debugging process.
Pkgapp: for full usage see the pdf
document that comes with Pkgapp.
Examples:
32 bit process
./pkgapp -c <gcore> -p
<install root>/lib
64 bit process
./pkgapp -c <gcore> -p
<install root>/lib/64
df -k
/var/adm/messages
3. High
CPU
NOTE: See DirTracer's Configurator
Option 2
High CPU [2]
High cpu issues can occur when the
directory server is dealing with many ldap operations such as very restrictive
or excessive aci's, unindexed searches, group based searches etc.
Data Capture (required *):
It is good to get the following
every 1-5 seconds. This helps show process movement and or trends; i.e. is the
cpu steady, growing or shrinking.
pms.sh <slapd pid>
<interval> <numberofchecks>
pstack <pid>
prstat -L -n <# of ds
threads>,<# of ds threads> <pid> 0 1
netstat -an
iostat -xnMCz -T d
<INTERVAL> <NUMBEROFCHECKS>
cn=monitor searches
If space is available, the best
way to get the root cause of a High CPU issue is to grab one or more gcore
files (when available on an os version). This allows Sun Support to debug the
actual process contents to see which thread(s) are using the most cpu as well
as what the thread is actually processing.
Gcore -o <gcore file name and
date>
When you manually snap a gcore,
Sun will require and request the libraries used by the slapd server process in
order to debug the gcore contents. This is important due to the numerous OS
variations with regard to library versions. Using Pkgapp allows the customer to
quickly snapshot the servers OS, DS libraries, Pstack etc. which speeds up the
debugging process.
Pkgapp: for full usage see the pdf
document that comes with Pkgapp.
Examples:
32 bit process
./pkgapp -c <gcore> -p
<install root>/lib
64 bit process
./pkgapp -c <gcore> -p
<install root>/lib/64
df -k
/var/adm/messages
ACI search. Listing all ACI's will
show us
4.Replication
NOTE: See DirTracer's Configurator
Option 3
Replication [3]
Replication can issues can be seen
in many ways.
Updates not going through; i.e.
replication broken between one or more servers.
Updates slow to get through
References:
DS5
http://docs.sun.com/app/docs/doc/820-0437/6nc66m9qj?a=view
DS6
http://docs.sun.com/app/docs/doc/820-2768/replication?a=view
replication debug logging for 20
mins on each of the affected servers. For example, if replication is slow or
broken from one master to a single consumer, then get debug loggin from each of
these servers "at the same time". Remember to note the current
infolog-area etc values before you change it to replication debug logging (8192).
Once you have gathered the logs for 20 minutes, change this back to the old
value.
$ ldapmodify -h host -p port -D
"cn=Directory Manager" -w password
dn: cn=config
changetype: modify
replace: nsslapd-infolog-area #
nsslapd-errorlog-level in 5.1
nsslapd-infolog-area: 8192
ruv searches from the broken
backend on each of the affected servers.
$ ldapsearch -h host -p port -D
"cn=Directory Manager" -w password -b "<replicated
suffix>" -s base
"(&(objectclass=nstombstone)(nsUniqueId=ffffffff-ffffffff-ffffffff-ffffffff))"
cn=config search
$ ldapsearch -h host -p port -D
"cn=Directory Manager" -w password -b "cn=config" -s base
"objectClass=*"
insync. Note the output from
insync and whether the delay(s) are getting worse, getting better or staying
the same.
The insync command indicates the
state of synchronization between a master replica and one or more consumer
replicas. The following command shows the state over a period of 30 seconds.
server-root/shared/bin/insync -s
"cn=Directory Manager:password@hostname1:ldap-port" -c
"cn=Directory Manager:password@hostname2:ldap-port" 30
repldisc. Repldisc or
"Replication Discovery" will display the replication topology in a
text based matrix
server-root/shared/bin/repldisc -D
"cn=Directory Manager" -w password -b <replicated suffix> -s
host:ldap-port
5.
Crashing
NOTE: See DirTracer's Configurator
Option 4
Crashing [4]
When the directory server process
has unexpectedly died gather the following data. For instructions on preparing
your system to produce core files or crash dumps in the event of a crash, see
1.6 Configuring the Operating System to Generate Core Files.
References:
DS5
http://docs.sun.com/app/docs/doc/820-0437/data-for-crash-problems?a=view
OS
http://docs.sun.com/app/docs/doc/820-0437/6nc66m9ql?a=view
corefile (unix)/crash dump
(windows): pkgapp with the -i switch on Unix to "Include" the
corefile.
pkgapp (unix based systems only).
./pkgapp -i -c <corefile> -p
<full path (path only) to process binary> -s <path to write tar
file>
Pkgapp will gather the following
automatically
OS info
file corefile
pstack corefile. Execute pkgapp to see its full usage.
pmap corefile
pldd corefile
pflags corefile
df -k (unix based systems only)
6.Memory
Leaks
NOTE: See DirTracer's Configurator
Option 5
Memory Leak [5]
Memory leaks are a very
troublesome problem to gather data for.
References:
DS6
http://docs.sun.com/app/docs/doc/820-2768/gegyp?a=view
Unix OS':
Before proceeding with the next
set of steps use pms.sh to determine the memory growth profile of your
directory server process. Sun Support can plot the data from pms.sh to show if
over time there is a real issue with memory not freeing.
1) start the directory server
instance
2) prime the directory server's
caches by using the following search.
$ ldapsearch -h host -p port -D
"cn=Directory Manager" -w password -b "<suffix>" -s
sub "(&(objectClass=*))" * >> /dev/null
3) Launch pms.sh (or perfmon) with
a large enough parameter (number of checks) that we can see the process size
increase significantly.
Ex: ./pms.sh 60 10000000000
>> /tmp/pms.mem.out
4) Test/use the ldap applications.
pms.sh can be found in the
<DirTracer Install Location>dirtracertools for various unix OS'
pms.sh -
http://www.sun.com/bigadmin/scripts/indexSjs.html
Once it has been determined there
is a leak, you can use one of the following methods for determining which
function(s) are not freeing memory.
7) Server Down
NOTE: See DirTracer's Configurator
for the Server Down option
Same as 0 - Basic
8) Startup issues
NOTE: See DirTracer's Configurator Option 7
Basic Capture [7]
Basic Capture [7]
Most startup issues can be dealt with using
truss/strace/tusc/DebugView etc and trace debugging from the directory server.
truss
etc
debug
logging
Solaris
truss -feao truss.out -rall -wall -o /tmp/truss.log ./start-slapd
truss -feao truss.out -rall -wall -o /tmp/truss.log ./start-slapd
HP-UX
tusc -v -fealT -rall -wall -o /tmp/truss.out ./start-slapd
tusc -v -fealT -rall -wall -o /tmp/truss.out ./start-slapd
Redhat/SuSE
strace -fv -o /tmp/strace.out ./start-slapd
strace -fv -o /tmp/strace.out ./start-slapd
Windows
DebugView is available at http://www.sysinternals.com/Utilities/DebugView.html
.
DebugView is available at http://www.sysinternals.com/Utilities/DebugView.html
Is may also be beneficial to gather directory server
debug logging during the startup process.
Once you have gathered the debug logs (after the
directory server starts), change this back to the old value.
$ ldapmodify -h host -p port -D "cn=Directory
Manager" -w password
dn: cn=config
changetype: modify
replace: nsslapd-infolog-area # nsslapd-errorlog-level in 5.1
nsslapd-infolog-area: 1
dn: cn=config
changetype: modify
replace: nsslapd-infolog-area # nsslapd-errorlog-level in 5.1
nsslapd-infolog-area: 1
9) High IO
NOTE: See DirTracer's Configurator
Option 7 for Basic Capture w/ gcores.
High IO issues can occur when the
directory server is dealing with many ldap operations such as massive writes,
purging, unindexed searches, group based searches etc.
Data Capture (required *):
It is good to get the following
every 1-5 seconds. This helps show process movement and or trends; i.e. is the
cpu steady, growing or shrinking.
pms.sh <slapd pid>
<interval> <numberofchecks>
pstack <pid>
prstat -L -n <# of ds
threads>,<# of ds threads> <pid> 0 1
netstat -an
iostat -xnMCz -T d
<INTERVAL> <NUMBEROFCHECKS>
cn=monitor searches
If space is available, the best
way to get the root cause of a High IO issue is to grab one or more gcore files
(when available on an os version). This allows Sun Support to debug the actual
process contents to see which thread(s) are using the most cpu as well as what
the thread is actually processing.
Gcore -o <gcore file name and
date>
When you manually snap a gcore,
Sun will require and request the libraries used by the slapd server process in
order to debug the gcore contents. This is important due to the numerous OS
variations with regard to library versions. Using Pkgapp allows the customer to
quickly snapshot the servers OS, DS libraries, Pstack etc. which speeds up the
debugging process.
Pkgapp: for full usage see the pdf
document that comes with Pkgapp.
Examples:
32 bit process
./pkgapp -c <gcore> -p
<install root>/lib
64 bit process
./pkgapp -c <gcore> -p
<install root>/lib/64
df -k
/var/adm/messages
9) High IO
NOTE: See DirTracer's Configurator
Option 7 for Basic Capture w/ gcores.
High IO issues can occur when the
directory server is dealing with many ldap operations such as massive writes,
purging, unindexed searches, group based searches etc.
Data Capture (required *):
It is good to get the following
every 1-5 seconds. This helps show process movement and or trends; i.e. is the
cpu steady, growing or shrinking.
pms.sh <slapd pid>
<interval> <numberofchecks>
pstack <pid>
prstat -L -n <# of ds
threads>,<# of ds threads> <pid> 0 1
netstat -an
iostat -xnMCz -T d
<INTERVAL> <NUMBEROFCHECKS>
cn=monitor searches
If space is available, the best
way to get the root cause of a High IO issue is to grab one or more gcore files
(when available on an os version). This allows Sun Support to debug the actual
process contents to see which thread(s) are using the most cpu as well as what
the thread is actually processing.
Gcore -o <gcore file name and
date>
When you manually snap a gcore,
Sun will require and request the libraries used by the slapd server process in
order to debug the gcore contents. This is important due to the numerous OS
variations with regard to library versions. Using Pkgapp allows the customer to
quickly snapshot the servers OS, DS libraries, Pstack etc. which speeds up the
debugging process.
Pkgapp: for full usage see the pdf
document that comes with Pkgapp.
Examples:
32 bit process
./pkgapp -c <gcore> -p
<install root>/lib
64 bit process
./pkgapp -c <gcore> -p
<install root>/lib/64
df -k
/var/adm/messages
11) SSL Cert issues
certutil -L -N -W
trusted_db_passwd
12) Install Issues
References:
DS5
http://docs.sun.com/app/docs/doc/820-0437/6nc66m9qg?a=view/
DS6 http://docs.sun.com/app/docs/doc/820-2768/install?a=view
install logs
truss putput
typescript or screen captures
help.
Truss:
Solaris
truss -feao truss.out -rall -wall
-o /tmp/truss.log <install command>
HP-UX
tusc -v -fealT -rall -wall -o
/tmp/truss.out <install command>
Redhat/SuSE
strace -fv -o /tmp/strace.out
<install command>
Windows
DebugView is available at
http://www.sysinternals.com/Utilities/DebugView.html.
Install Logs:
For Java Enterprise System
installations, collect installation error logs.
The log file is named after the
date and time that the installation failed. For example, a log file for an
installation that failed on December 16 at 3:32 p.m. would have a name like
Java_Enterprise_System*_install.B12161532.
On Solaris systems, installation
logs are located under /var/sadm/install/logs.
On Red Hat and HP-UX systems,
installation logs are located under /var/opt/sun/install/logs.
On Windows systems, installation
logs are located
13) DSCC/Console issues
See the following for gathering
data on the DSCC
http://docs.sun.com/app/docs/doc/820-2768/gexfm?a=view
14) Schema issues
See 0 - Basic
changes to the directory server
schema
examples of the problem schema
errors/user info etc.
Labels parameters
No comments:
Post a Comment