Virtual Machine Logs
As I described in my last post; after unpacking a vm-support you’ll have several subdirectories.
bootbank etc locker proc tmp usr var vmfs
Just as on a real ESXi host the VM logs are contained under the vmfs directory. One key difference is that the vm-support only captures logs for VMs that are registered on the host the vm-support was taken from. If you need the logs from a specific VM you’ll need to first identify which host currently is running that VM then capture a vm-support from that host. Under the vmfs directory you’ll see one or more lun ids. You can translate these lun IDs to real names by looking in the tmp information file volume_list.vpa.txt.
Typically the directories under each volume are named after the VM they contain. For example:
Let’s burrow down further into one of these VM directories.The vm-support captures the VM definition file and files describing the Virtual disks along with logs and a file named filelist.txt which contain the names of all the files that were in the directory.
VM1.vmdk VM1.vmx filelist.127677508.txt vmware-1.log vmware-3.log vmware.log
VM1.vmsd VM1.vmxf vmdk-types.txt vmware-2.log vmware-4.log
The logs periodically roll. The most current VM log is vmware.log vmware-1.log is the next oldest and so on.
You can get a good picture of how the VM is configured by looking at the .vmx file. This is where all the answers are stored when you create a VM and this file basically defines the VM for VMware. The logs contain messages specific to that VM from a VMware perspective. You’ll be able to determine when the VM has been powered on, off, migrated, etc. You can also examine this log for any VM specific errors. This log should not be confused with OS logs within the VM itself; rather they should be viewed as virtual hardware logs.
ESX/ESXi host logs
The logs for the actual physical host are located under the var/log directory.
Like the VM logs, these logs also rotate, with the current log having no number and older logs labeled .1 .2 .3 etc. On an ESX host there are two sets of logs. The messages log contains the console VM log information and the vmkernel log contains the ESX host hypervisor information. On an ESXi host, since there is no separate console VM, all messages are logged to the messages file.
Like the VM logs, if you are looking for a specific event all of the entries are time stamped, however VMware is not exactly consistent with their time stamp format. The VM logs list milliseconds, where the host logs don’t. It gets even worse when we look at the agent logs later. The format is completely different. Anyway, with these stamps it’s possible to correlate events from VMs recorded in the VM logs with host events recorded in /var/log/messages.
A typical approach to looking at a problem is to get a rough time frame for an event, such as a host locking up or a VM locking up, then use this to search the logs for time stamps just prior to that event. Even without a strong knowledge of the error logging messages you can often locate a solution by picking out suspicious events or items specifically labeled as errors, then searching for KB articles and other web resources to explain these messages.
Sometimes the error message will even contain an explanation and a pointer to a KB article such as this example:
May 06 20:06:47.186: vcpu-0| [msg.hbacommon.locklost] The lock protecting /vmfs/volumes/
May 06 20:06:47.186: vcpu-0| This is most likely due to underlying storage having
May 06 20:06:47.186: vcpu-0| problems,resulting in this virtual machine getting powered-on
May 06 20:06:47.186: vcpu-0| on another ESX host as well.This virtual machine needs
May 06 20:06:47.186: vcpu-0| to be powered off on this host now. Kindly confirm that
May 06 20:06:47.186: vcpu-0| the virtual machine is running successfully on another host
May 06 20:06:47.186: vcpu-0| before clicking the OK button
May 06 20:06:47.186: vcpu-0|
May 06 20:06:47.186: vcpu-0| For more information regarding this problem kindly refer to
May 06 20:06:47.186: vcpu-0| http://kb.vmware.com/kb/1006936
May 07 05:53:26.144: vcpu-0| Msg_Question: msg.hbacommon.locklost reply=0
May 07 05:53:26.145: vcpu-0| Exiting because of failed disk operation.
VMware Agent Logs
Besides the VM and Host logs there are also important logs from the VMware agents that are captured by vm-support. These logs are located under var/log/vmware. They record communication between vCenter and the vSphere client and the hosts. They also record any errors these agents encounter.
The hostd logs, located under /var/log/vmware, contain information on the agent that manages and configures the ESX host and virtual machines. Depending on the version of ESX/ESXi you may find the older logs have been gzipped. You’ll need to uncompress these with gunzip before examining them. The time stamp on these logs is also formatted differently than the messages and VM logs.
[2012-05-02 15:09:34.836 10FA1B90 verbose 'ha-license-manager'] Load: Loading existing file: /etc/vmware/license.cfg
[2012-05-02 15:09:51.936 112A8B90 verbose 'DvsManager'] PersistAllDvsInfo called
[2012-05-02 15:09:52.682 10FA1B90 verbose 'DvsTracker'] FetchSwitches: added 1 items
[2012-05-02 15:09:52.682 10FA1B90 verbose 'DvsTracker'] FetchDVPortgroups: added 10 items
Like the other VM logs, you’ll want to identify the time of an event and look at the messages surrounding it. Once again, you can search key words such as “error” or “warning” or “failed” and without having a complete understanding of all messages you can search for matches in KB articles and on the web.
Another important set of agent logs are the vpxa logs located under /var/log/vmware/vpx. These logs record the activity of the agent that communicates with vCenter. They have a format similar to the hostd logs. You can examine these logs to obtain more detail from the host side about any warnings or messages you see in vCenter.
Section for VMware VirtualCenter Agent, pid=126218680, version=5.0.0, build=build-623373, option=Release
2012-05-02T17:42:36.910Z [17A78B90 verbose 'Default' opID=SWI-a4932b8] [VpxaMoVm::CheckMoVm] did not find a VM with ID 39 in the vmList
2012-05-02T17:42:36.922Z [17A78B90 verbose 'Default' opID=SWI-a4932b8] [VpxaAlarm] VM with vmid = 39 not found
Almost all VMware administrators have encountered problems where the agents, for one reason or another, have had to be restarted. Examination of the hostd or vpxa logs might explain what led up to the problem.
There are other logs and files contained in the vmware-support diagnostic bundle. You could almost write a book about just the diagnostics. If your host has crashed, vm-support will also capture a dump file. I’ll be away for a few weeks on vacation, but when I return my next post will show how to convert this file to text and make some sense of it.