vSAN – Check VM Storage Policy & Compliance

As I continue to work with vSAN I discover there’s way more to do than just move some VMs over and you’re on your way. With multiple vSAN clusters each with different configurations I needed a way to monitor the current setup and check for changes. While creating a simple script to check which VM Storage Policy is assigned to each VM isn’t very difficult, a creating a script to check the storage policy of VMs across multiple vSAN datastores proved to be a little more difficult.

We run multiple PowerCLI scripts to check health and configuration drift (thanks to a special tool created by Nick Farmer) in our environment. In the event that a new vCenter is added or new vSAN datastore is deployed, we needed a simple script that can be run without any intervention or modification. Now we can be alerted when the proper VM storage policies isn’t assigned or the current policy is out of compliance.

To further complicate things in our setup, we create a new VM Storage Policy that contains the name of the cluster in which it’s assigned.  Due to the potential differences in each vSAN cluster (stripes, failures to tolerate, replication factor, RAID, etc) having a single Storage Policy does not work for us. In the event a VM is migrated from one vSAN cluster to another we need to check that the VM storage policy matches vSAN datastore cluster policy.

What this script does is grab all the clusters in a vCenter that have vSAN enabled. For each cluster that is found with vSAN enabled, it is filtering only the VMs that live on vSAN storage (with the name of “<cluster>-vsan”. Then we get the storage based policy management (Get-SpbmEntityConfiguration) of those VMs. The script then filters for a storage policy that doesn’t contain the cluster name OR a compliance status that is compliant.

$vsanClusters = Get-cluster | Where-Object {$_.vsanenabled -eq "True"}
foreach ($cluster in $vsanClusters)
{
$Cluster | get-vm |?{($_.extensiondata.config.datastoreurl|%{$_.name}) -like "*-vsan*"} |
Get-SpbmEntityConfiguration | Where-Object {$_.storagepolicy -notlike "*$Cluster*" -or $_.compliancestatus -notlike "*compliant*"} |
Select-Object Entity,storagepolicy,compliancestatus
}

Once this is run we can see the output below. I’ve obscured the names of the VMs, but we can see that there are still 12 VMs that are using the default vSAN Storage Policy instead of the cluster-specific storage policy they should be using. In addition, we see that the compliance status is currently out of date on most of these VMs. These VMs reside on 2 separate clusters and there are also 2 VMs that were filtered because they are on local storage in these clusters instead of vSAN.

storagepolicy01-12202016

 

vSAN – Check VM Storage Policy & Compliance

VSAN – Compliance Status is Out of Date

Occasionally the Compliance status of the performance service will go to the “out of date” status. This is not an alert that is thrown anywhere within vCenter. You will have to check this status by logging into the vSphere web client, locating your vCenter, choose the cluster, clicking on “Manage” then choosing “Health and Performance” under “Virtual SAN”
ComplianceStatus-a

As I have recently fixed this issue the above screenshot shows the “Compliant” status. Below are the steps to get to that point.

1. In the box for “Performance Service” click “Edit storage policy”
ComplianceStatus-01

2. If there is a storage policy available in the drop down, select it and click “OK”. This will apply that policy and perform the compliance check.
ComplianceStatus-02

For the lucky few where that works, that’s all you need to do. If the storage policy list is empty you’ll need to restart the vsanmgmtd service on each of the hosts.

3. Enable SSH on each of the hosts in the VSAN cluster and using an SSH client (like putty), SSH to a host and run the following command to restart the vsanmgmtd service (this is a non-impactful operation and should be able to be performed during production hours with no impact)
a. /etc/init.d/vsanmgmtd restart

4. Repeat that command on each of the hosts in the cluster until they have all restarted their services
ComplianceStatus-04

5. Wait 5 minutes and then check to see if you are able to select a storage policy for the performance service. If not, move on to step 6

6. Now we’ll need to restart the vSphere Profile-Driven Storage Service on the vCenter server. This is also non-impactful and should be able to be performed in the middle of the day. If you’re using vCenter on windows, connect to the Windows server and restart the “Vmware vSphere Profile-Driven Storage Service”. If using VCSA (like this example) you’ll need to SSH to the VCSA and run the command below
a. Service vmware-sps restart

7. After the vmware-sps service restarts, log out of the web client and wait for 5 minutes while the storage profile service  completes its restart.

8. Log back in to the web client, navigate to the vCenter server, click “Manage” then choose the “Storage Providers” tab
ComplianceStatus-08

9. Click the Synchronize Providers button to resync the state of the environment
ComplianceStatus-09

10. Wait another 5 minutes while these synchronize completes. After 5 minutes, navigate to the VSAN cluster in the web client. Click on “Manage” then choose “Settings” and locate “Health and Performance” under the “Virtual SAN” section
ComplianceStatus-10

11. In the Performance Service box, click the “Edit Storage Policy” button
ComplianceStatus-11

12. From the drop down list you should be able to select the appropriate VSAN storage policy and then click “OK”
ComplianceStatus-12

13. After this is selected the compliance status should change to “Compliant” and you should be all set.

So far these are the only steps that I have needed to follow in order to fix this issue. Let me know if there are any other fixes available.

 

 

 

VSAN – Compliance Status is Out of Date

Deploy VSAN Witness Appliance

The VSAN Witness Host is a virtual appliance that is deployed into an existing vCenter server. When deploying a 2 node or stretched cluster, the witness appliance acts as a tie breaker to determine which node(s) are still available in the event the nodes lose communication with each other. The witness The witness is deployed just like any other virtual appliance, but will require access to the management network and the network you’ve designated as your VSAN network. This appliance must be run OUTSIDE your VSAN cluster. This means that you cannot add this host as a member of the existing VSAN cluster and you also should not run it as a virtual appliance inside your existing VSAN cluster.

1. Choose the cluster that will host the appliance. Click on “File” then “Deploy OVF Template”
step01

2. Browse to select the .OVA file and click “Next”
step02

3. Review the details of the appliance and click “Next”
step03

4. Review the license agreement and click “Accept” followed by “Next”
step04

5. Enter the name of the Witness Appliance and its location then click “Next”
step05

6. Choose the appropriate size of the appliance and click “Next”
step06
a. As this is a test, I’m choose the “Tiny” size. You can ignore the disk component requires for any size. As this is a virtual appliance, it will deploy the appropriately sized drives that will designated as SSD and spinning disl

7. Choose the provisioning type and click “Next”
step07
a. This is appliance is being deployed to a separate VSAN cluster than the one it will be acting as the witness for. This appliance can be deployed on shared storage, local storage, or another VSAN datastore.

8. Choose the appropriate networks for management and witness (VSAN). In this deployment, management lives on the “VM Network” and witness (VSAN) traffic is on the “VM-VSANnetwork”. This network is shared with the vMotion network and just needed an additional VM Portgroup created on each of the hosts in the cluster where this appliance is being deployed. Click “Next”
step08

9. Enter a root password for this appliance. Remember, this is a host that you will need to login to in order to administer so if there is a standard root password that you use it would be a good idea to use that here. Click “Next”
step09

10. Review the deployment settings and then click “Finish”
step10

11. Once deployed, you will need to configure the appliance like any other host. Power on the appliance and open the console, press F2 and login as root with the password you assigned in step 9
step11

12. Scroll to “Configure Management Network” and press “Enter”
step12

13. Ensure the Network Adapter assigned to your management network is “vmnic0”
step13
a. Set a VLAN (if necessary) for the management network, then assign your IPv4 and/or IPv6 settings for the management network to make it accessible on your network. Assigned DNS as needed as well. Press “ESC” and then press “Y” to configure settings and restart the management network
step13a

14. Once the host can communicate on the network, add it as a new host in vCenter.
a. Remember that this host should not be part of your VSAN cluster or any other cluster. It should be a standalone host in your datacenter.

15. Select the host in the vCenter client and configure networking for it. Locate the “witnessSwitch” and click “Properties”
step15

16. Select the “witnessPg” and click “Edit”
step16

17. On the “IP Settings” tab, enter the IP and subnet mask for the VSAN traffic network. Click “OK” at the bottom”
step17

18. Once you have confirmed that network settings are successful, login to the vSphere web client and navigate to the VSAN cluster to be configured

19. Click the “Manage” tab, then choose “Fault Domains & Stretched Cluster” under “Virtual SAN”
step19

20. In the “Streteched Cluster” box click “Configure”
step20

21. Name the fault domains and place the hosts into the appropriate fault domain. This is a 2 node cluster with 1 host in each fault domain. Click “Next”
step21

22. Locate the VSAN witness appliance host that was added to this vcenter and click “Next”
step22

23. Choose the flash drive for and the HDD for cache and capacity and click “Next”
step23

24. Review the settings and click “Finish”
step24

25. Once completed, you will now see the status of the stretched cluster as “Enabled”, the preferred fault domain and the designated witness host.
step25

Deploy VSAN Witness Appliance

VSAN – Host Not Contributing Stats

After an upgrade or maintenance on one or more of the nodes in a VSAN cluster one of the hosts can stop contributing performance stats. This is not a production down issue, but should be addressed to see the most up-to-date stats across all the nodes.

The fix for this is one of three things, but each of them involves turning off performance statistics on the cluster which will cause all historical performance stats to be removed. My hope is that VMware will fix this issue in an upcoming release because a loss of historical is not tolerable in all environments.

1. View the health of the VSAN by logging into the vCenter web client. Navigate to the appropriate vCenter and cluster, then click the “Monitor” tab, followed by “Virtual SAN” then click on “Health.” Expanded “Performance service” and click the warning for “All hosts contributing stats”
step01

2. At the bottom you will now see the list of hosts that are not contributing stats
step02

3. Now that we’ve identified the problem host, we need to disable VSAN performance service temporarily. Navigate to the “Manage” tab for this cluster then click on “Health and Performance” under “Virtual SAN”
step03

4. Click “Turn off” in the “Performance Service” box

a. Click “OK” to confirm stopping the service which will erase all existing performance data
step04a

5. Confirm Perform Service has been disabled by refreshing the page
step05

6. SSH to the affected host (using putty or similar SSH client) we identified in step 2 (you may have to enable SSH on the host before you can connect).
step06

7. Run the command below to restart the VSAN management agent. This should have no production impact so it is safe to perform outside of a maintenance window.
a. /etc/init.d/vsanmgmtd restart
step07a

8. Once the service has been restarted, go back to the vCenter web client and the click the “Edit” button for the Performance Service box
step08

9. Select the appropriate storage policy from the drop down list, ensure the “Turn ON Virtual SAN performance service” box is checked and click “OK”
step09

10. Confirm that the performance service is turned on and reporting healthy
step10

11. Navigate back to the “Monitor” tab and then “Virtual SAN” clicking the on the “Health” section. Click “Retest” to verify that all hosts are contributing stats.
step11

If this does not fix the issue, you can restart the process, but this time instead of restarting the vsanmgmt service on the one node, do it on all of the nodes in the cluster. Once the services have been restarted across all nodes then restart the performance service and all nodes should be contributing stats.

I have also seen a case where restarting the service on all nodes didn’t fix the problem. In that scenario I was able to fix the problem by entering maintenance mode on the problem node and choose “full data migration” so all the data would be removed from the cluster. After that was complete I completely rebuilt the host from scratch (including wiping the disks claimed by VSAN) then moving it back into the cluster. I haven’t heard from VMware of any other ways to fix this issue.

VSAN – Host Not Contributing Stats