My last blog entry spoke of ways that you can prepare an HP ProLiant server running Red Hat Linux to capture data in the event of a server hang or crash. Click here to view my last blog update.
This time I’ll speak a bit further on what you can do to gather data from your server if you actually encounter a crash.
Let’s first assume you have an outright panic and the server crashes. If the server has been configured to capture dumps (see my last blog entry), then you should be well positioned to gather the data you need for your Linux service vendor to determine the cause of the crash and identify a solution. In future blog updates I hope to speak further on how you yourself can examine this data and locate solutions. For now though, let’s just concentrate on what you’ll need to gather.
The Red Hat sosreport captures a great deal of information about your server, including logs, server configuration, and some key hardware information. It’s an easy to run tool, which will create a zipped tar file that you can supply to your Linux support vendor.
2. If your kernel crash dump software was configured properly in advance there should be a new directory containing a dump and log in /var/crash. Please tar up and gzip this data. Note that dump files can be very large. Ask your service vendor to set up an ftp site to transfer the dump to.
3. For HP ProLiant hardware you can gather additional information from the Integrated Lights Out (ILO) interface. Capture the Integrated Management Log and the ILO log. This is especially important if there are recent messages.
If you have HP agents and tools installed you can gather this information from the command line.
The following command will capture the hardware log, temperature, fan and Automated Server Recovery status to myhplogs.txt
# hplog -v -t -f -a STATUS > myhplogs.txt
Having the HP agents installed also gives you access to a number additional diagnostic tools that can be run from the HP System Management Homepage or from the command line. Here are a few examples:
The following command will captre detailed hardware configuration and diagnostics for HP and send it to a file named hpdiags.txt
# /opt/hp/hpdiags/hpdiags – p –o $HOME/hpdiags.txt
This command will capture detailed storage array diagnostics to hpadu.txt:
# hpacucli "ctrl all show config detail" > hpadu.txt
Gathering the above information in advance of placing a support call will help speed the diagnostic process.
My next blog entry will talk about steps you can take to capture information and a dump if your server is experiencing hangs, but doesn’t crash on its own.