Exporting Data From a VMware vSphere Environment For Fun And Profit

In this post we will see how we can export data from a VMware vSphere environment from the command-line and then plot some nice graphs of it, because everybody loves graphs, right? :)

For exporting of data from a VMware vSphere environment we are going to use vPoller - VMware vSphere Distributed Pollers and for plotting of the graphs we will be using matplotlib.

For installation and configuration of vPoller please refer to the vPoller Github repository which contains all the details you need in order to get vPoller installed and configured.

So, without further ado let's start with the interesting stuff.

Using vPoller we can export data in various formats using the vPoller Helpers. Example of such a helper is the vpoller.helpers.csvhelper which returns data in CSV format, which makes it easy for us to then use that data for plotting graphs.

Let's first check the overall CPU usage of our ESXi hosts from our vSphere environment:

  1. $ vpoller-client -m host.discover -p summary.quickStats.overallCpuUsage -V vc01.example.org
  2. {
  3.     "msg": "Successfully discovered objects",
  4.     "result": [
  5.         {
  6.             "summary.quickStats.overallCpuUsage": 7065,
  7.             "name": "esx-1"
  8.         },
  9.         {
  10.             "summary.quickStats.overallCpuUsage": 12383
  11.             "name": "esx-2"
  12.         },
  13.         {
  14.             "summary.quickStats.overallCpuUsage": 3945,
  15.             "name": "esx-3"
  16.         },
  17.         {
  18.             "summary.quickStats.overallCpuUsage": 4776,
  19.             "name": "esx-4"
  20.                 },
  21.         {
  22.             "summary.quickStats.overallCpuUsage": 4638,
  23.             "name": "esx-5"
  24.                 },
  25.         {
  26.             "summary.quickStats.overallCpuUsage": 8520,
  27.             "name": "esx-6"
  28.                 },
  29.         {
  30.             "summary.quickStats.overallCpuUsage": 10001,
  31.             "name": "esx-7"
  32.                 },
  33.         {
  34.             "summary.quickStats.overallCpuUsage": 2481,
  35.             "name": "esx-8"
  36.         },
  37.         {
  38.             "summary.quickStats.overallCpuUsage": 5064,
  39.             "name": "esx-9"
  40.         },
  41.     ],
  42.     "success": 0
  43. }

The result returned by default from vPoller is in JSON format. In order to convert this result to CSV we will use the vpoller.helpers.csvhelper helper. Let's do that now:

  1. $ vpoller-client -H vpoller.helpers.csvhelper -m host.discover -p summary.quickStats.overallCpuUsage -V vc01.example.org
  2. name,summary.quickStats.overallCpuUsage
  3. esx-1,7065
  4. esx-2,12383
  5. esx-3,3945
  6. esx-4,4776
  7. esx-5,4638
  8. esx-6,8520
  9. esx-7,10001
  10. esx-8,2481
  11. esx-9,5064

We would also want to save this data in a file somewhere, so later we can use it to plot our graph of overall CPU usage.

  1. $ vpoller-client -H vpoller.helpers.csvhelper -m host.discover -p summary.quickStats.overallCpuUsage -V vc01.example.org > hosts-cpu-usage.csv

Great, we've just exported the overall CPU usage of our ESXi hosts, but that is just some raw data.

The following script uses matplotlib in order to plot the graph for us.

  1. #!/usr/bin/env python
  2.  
  3. import numpy as np
  4. from matplotlib import pyplot as plt
  5.  
  6. def main():
  7.     data = np.genfromtxt(
  8.         fname='hosts-cpu-usage.csv',
  9.         delimiter=',',
  10.         dtype=None,
  11.         skip_header=True
  12.     )
  13.    
  14.     hosts = [row[0] for row in data]
  15.     usage = [row[1] for row in data]
  16.    
  17.     index = np.arange(len(data))
  18.     width = 0.85
  19.  
  20.     plt.xlabel('ESXi Host')
  21.     plt.ylabel('MHz')
  22.  
  23.     plt.title('Overall CPU Usage')
  24.     plt.xticks(index + width / 2.0, hosts, rotation=45)
  25.  
  26.     plt.bar(index, usage, width, color='y')
  27.     plt.savefig('esx-cpu-usage.png', bbox_inches='tight')
  28.  
  29. if __name__ == '__main__':
  30.     main()

And here is the result from our script:

Wait, it would be really usefuly if could see what is the CPU usage compared to our total CPU resources. Can we do that?

Sure, we can!

But first let's export some additional properties from our vSphere environment:

  1. $ vpoller-client -H vpoller.helpers.csvhelper -m host.discover -p summary.quickStats.overallCpuUsage,hardware.cpuInfo.numCpuCores,hardware.cpuInfo.hz -V vc01.example.org > hosts-cpu-usage-2.csv

The command above will export the following information about our ESXi hosts:

  • Number of CPU cores
  • CPU speed per core
  • Overall CPU usage

This is the data we've just exported from our VMware vSphere environment using vPoller:

  1. hardware.cpuInfo.hz,hardware.cpuInfo.numCpuCores,name,summary.quickStats.overallCpuUsage
  2. 2800098945,8,esx-1,7065
  3. 2800098378,8,esx-2,12383
  4. 2800098399,12,esx-3,3945
  5. 2800098483,12,esx-4,4776
  6. 2800099092,8,esx-5,4638
  7. 2800098504,8,esx-6,8520
  8. 2800098903,16,esx-7,10001
  9. 2800098252,16,esx-8,2481
  10. 2800098399,8,esx-9,5064

Now we can plot our graph which shows the overall CPU usage compared to the total CPU resources we have:

  1. #!/usr/bin/env python
  2.  
  3. import numpy as np
  4. import matplotlib.pyplot as plt
  5.  
  6. def main():
  7.     data = np.genfromtxt(
  8.         fname='hosts-cpu-usage-2.csv',
  9.         delimiter=',',
  10.         dtype=None,
  11.         skip_header=True
  12.     )
  13.  
  14.     cpu_speed = [row[0] / 1000000.0 * row[1] for row in data]
  15.     cpu_usage = [row[3] for row in data]
  16.     hosts     = [row[2] for row in data]
  17.  
  18.     index = np.arange(len(data))
  19.     width = 0.85
  20.  
  21.     plt.xlabel('ESXi Host')
  22.     plt.ylabel('MHz')
  23.    
  24.     plt.title('CPU Usage')
  25.     plt.xticks(index + width / 10.0, hosts, rotation=45)
  26.  
  27.     p1 = plt.bar(index, cpu_speed, width, color='g')
  28.     p2 = plt.bar(index, cpu_usage, width, color='y')
  29.  
  30.     plt.legend(('CPU Speed', 'CPU Usage'), loc='center left', bbox_to_anchor=(1, 0.5))
  31.  
  32.     plt.savefig('esx-cpu-usage-2.png', bbox_inches='tight')
  33.  
  34. if __name__ == '__main__':
  35.     main()

And here is what the result looks like:

Pretty nice, isn't it? Now we can get an overview of how our ESXi hosts are performing.

Okay, let's try out something different now. Let's check our datastores.

First we export Datastore properties using vPoller for capacity and free space.

  1. $ vpoller-client -H vpoller.helpers.csvhelper -m datastore.discover -V vc01.example.org -p summary.capacity,info.freeSpace > datastores.csv

And this is the CSV file we got from the above command:

  1. info.freeSpace,name,summary.capacity
  2. 55582916608,datastore-1,898721906688
  3. 949532753920,datastore-2,1599149680640
  4. 1631043190784,datastore-3,2299149680640
  5. 653556449280,datastore-4,751350841344
  6. 254283874304,datastore-5,1098721906688
  7. 128705363968,datastore-6,898721906688
  8. 296936527360,datastore-7,1198721906688

We are going to use this Python script in order to plot a graph for our datastores:

  1. #!/usr/bin/env python
  2.  
  3. import numpy as np
  4. import matplotlib.pyplot as plt
  5.  
  6. def main():
  7.     data = np.genfromtxt(
  8.         fname='datastores.csv',
  9.         delimiter=',',
  10.         dtype=None,
  11.         skip_header=True
  12.     )
  13.    
  14.     # The CSV file contains the datastore free space and capacity in bytes, so
  15.     # we convert this to GB first
  16.     free       = [(row[0] / 1073741824) for row in data]
  17.     capacity   = [(row[2] / 1073741824) for row in data]
  18.     datastores = [row[1] for row in data]
  19.  
  20.     index = np.arange(len(data))
  21.     width = 0.85
  22.  
  23.     plt.xlabel('Datastore')
  24.     plt.ylabel('GB')
  25.    
  26.     plt.title('Datastores')
  27.     plt.xticks(index + width / 10.0, datastores, rotation=45)
  28.    
  29.     p1 = plt.bar(index, capacity, width, color='g')
  30.     p2 = plt.bar(index, free, width, color='y')
  31.    
  32.     plt.legend(('Capacity', 'Free Space'), loc='center left', bbox_to_anchor=(1, 0.5))
  33.    
  34.     plt.savefig('datastores.png', bbox_inches='tight')
  35.  
  36. if __name__ == '__main__':
  37.     main()

And here is the result from our script:

From the graph above we can see that some datastores would require attention soon, as we are running out of free disk space :)

Let's check our Virtual Machines now. Using the Python script below we can get an overview of our environment and see how many of the Virtual Machines are running VMware Tools and how many are not.

First let's export the Virtual Machines data from our vSphere environment using vPoller:

  1. $ vpoller-client -H vpoller.helpers.csvhelper -m vm.discover -p guest.toolsRunningStatus -V vc01.example.org > vms-tools-state.csv

And here is the Python script we are going to use:

  1. #!/usr/bin/env python
  2.  
  3. import numpy as np
  4. import matplotlib.pyplot as plt
  5.  
  6. def main():
  7.     data = np.genfromtxt(
  8.         fname='vms-tools-state.csv',
  9.         delimiter=',',
  10.         dtype=None,
  11.         skip_header=True
  12.     )
  13.    
  14.     # Get the Virtual Machines state of VMware Tools
  15.     vm_states = {}
  16.     for item in data:
  17.         name, state = item[1], item[0]
  18.        
  19.         if state in vm_states:
  20.             vm_states[state].append(name)
  21.         else:
  22.             vm_states[state] = [name]
  23.  
  24.     # Calculate percentage
  25.     total_vms = len(data)
  26.     labels    = sorted(vm_states)
  27.     sizes     = []
  28.  
  29.     for state in labels:
  30.         num_vms    = len(vm_states[state])
  31.         percentage = float(num_vms) / float(total_vms) * 100.0
  32.         sizes.append(percentage)
  33.  
  34.     # Get the index of the larger slice in the pie
  35.     explode   = [0 for n in xrange(len(labels))]
  36.     max_index = sizes.index(max(sizes))
  37.     explode[max_index] = 0.1
  38.  
  39.     plt.pie(
  40.         sizes,
  41.         explode=explode,
  42.         labels=labels,
  43.         autopct='%1.1f%%',
  44.         shadow=True,
  45.         startangle=90
  46.     )
  47.  
  48.     plt.axis('equal')
  49.     plt.savefig('vms-tools-state.png')
  50.  
  51. if __name__ == '__main__':
  52.     main()

And here is the result from our script:

Oh boy, seems like we do have a significant number of Virtual Machines that we need to take care of and get VMware Tools up and running there.

As a last example we will see how to get and overview of the Operating Systems we run in our environment.

First let's export some data from our VMware vSphere environment using vPoller:

  1. $ vpoller-client -H vpoller.helpers.csvhelper -m vm.discover -p config.guestFullName -V vc01.example.org > vms-os-type.csv

Now we will use this Python script which would give us an overview of the Operating Systems we run in our environment:

  1. #!/usr/bin/env python
  2.  
  3. import numpy as np
  4. import matplotlib.pyplot as plt
  5.  
  6. def main():
  7.     data = np.genfromtxt(
  8.         fname='vms-os-type.csv',
  9.         delimiter=',',
  10.         dtype=None,
  11.         skip_header=True
  12.     )
  13.  
  14.     os_types = {}
  15.     for item in data:
  16.         os, vm = item[0], item[1]
  17.  
  18.         if os in os_types:
  19.             os_types[os].append(vm)
  20.         else:
  21.             os_types[os] = [vm]
  22.  
  23.     # Calculate percentage
  24.     total_vms = len(data)
  25.     labels    = sorted(os_types.keys())
  26.     sizes = []
  27.    
  28.     for os in labels:
  29.         num_vms    = len(os_types[os])
  30.         percentage = float(num_vms) / float(total_vms) * 100.0
  31.         sizes.append(percentage)
  32.  
  33.     # Get the larger slice from the pie
  34.     explode   = [0 for n in xrange(len(labels))]
  35.     max_index = sizes.index(max(sizes))
  36.     explode[max_index] = 0.1
  37.  
  38.     plt.pie(
  39.         sizes,
  40.         labels=labels,
  41.         explode=explode,
  42.         shadow=True,
  43.         startangle=90,
  44.         autopct='%1.1f%%'
  45.     )
  46.  
  47.     plt.axis('equal')
  48.     plt.savefig('vms-os-types.png', bbox_inches='tight')
  49.  
  50. if __name__ == '__main__':
  51.     main()

And this is how the result from our script looks like:

And that was all, hope you liked it!