VSAN – Compliance Status is Out of Date

Occasionally the Compliance status of the performance service will go to the “out of date” status. This is not an alert that is thrown anywhere within vCenter. You will have to check this status by logging into the vSphere web client, locating your vCenter, choose the cluster, clicking on “Manage” then choosing “Health and Performance” under “Virtual SAN”
ComplianceStatus-a

As I have recently fixed this issue the above screenshot shows the “Compliant” status. Below are the steps to get to that point.

1. In the box for “Performance Service” click “Edit storage policy”
ComplianceStatus-01

2. If there is a storage policy available in the drop down, select it and click “OK”. This will apply that policy and perform the compliance check.
ComplianceStatus-02

For the lucky few where that works, that’s all you need to do. If the storage policy list is empty you’ll need to restart the vsanmgmtd service on each of the hosts.

3. Enable SSH on each of the hosts in the VSAN cluster and using an SSH client (like putty), SSH to a host and run the following command to restart the vsanmgmtd service (this is a non-impactful operation and should be able to be performed during production hours with no impact)
a. /etc/init.d/vsanmgmtd restart

4. Repeat that command on each of the hosts in the cluster until they have all restarted their services
ComplianceStatus-04

5. Wait 5 minutes and then check to see if you are able to select a storage policy for the performance service. If not, move on to step 6

6. Now we’ll need to restart the vSphere Profile-Driven Storage Service on the vCenter server. This is also non-impactful and should be able to be performed in the middle of the day. If you’re using vCenter on windows, connect to the Windows server and restart the “Vmware vSphere Profile-Driven Storage Service”. If using VCSA (like this example) you’ll need to SSH to the VCSA and run the command below
a. Service vmware-sps restart

7. After the vmware-sps service restarts, log out of the web client and wait for 5 minutes while the storage profile service  completes its restart.

8. Log back in to the web client, navigate to the vCenter server, click “Manage” then choose the “Storage Providers” tab
ComplianceStatus-08

9. Click the Synchronize Providers button to resync the state of the environment
ComplianceStatus-09

10. Wait another 5 minutes while these synchronize completes. After 5 minutes, navigate to the VSAN cluster in the web client. Click on “Manage” then choose “Settings” and locate “Health and Performance” under the “Virtual SAN” section
ComplianceStatus-10

11. In the Performance Service box, click the “Edit Storage Policy” button
ComplianceStatus-11

12. From the drop down list you should be able to select the appropriate VSAN storage policy and then click “OK”
ComplianceStatus-12

13. After this is selected the compliance status should change to “Compliant” and you should be all set.

So far these are the only steps that I have needed to follow in order to fix this issue. Let me know if there are any other fixes available.

 

 

 

VSAN – Compliance Status is Out of Date

Deploy VSAN Witness Appliance

The VSAN Witness Host is a virtual appliance that is deployed into an existing vCenter server. When deploying a 2 node or stretched cluster, the witness appliance acts as a tie breaker to determine which node(s) are still available in the event the nodes lose communication with each other. The witness The witness is deployed just like any other virtual appliance, but will require access to the management network and the network you’ve designated as your VSAN network. This appliance must be run OUTSIDE your VSAN cluster. This means that you cannot add this host as a member of the existing VSAN cluster and you also should not run it as a virtual appliance inside your existing VSAN cluster.

1. Choose the cluster that will host the appliance. Click on “File” then “Deploy OVF Template”
step01

2. Browse to select the .OVA file and click “Next”
step02

3. Review the details of the appliance and click “Next”
step03

4. Review the license agreement and click “Accept” followed by “Next”
step04

5. Enter the name of the Witness Appliance and its location then click “Next”
step05

6. Choose the appropriate size of the appliance and click “Next”
step06
a. As this is a test, I’m choose the “Tiny” size. You can ignore the disk component requires for any size. As this is a virtual appliance, it will deploy the appropriately sized drives that will designated as SSD and spinning disl

7. Choose the provisioning type and click “Next”
step07
a. This is appliance is being deployed to a separate VSAN cluster than the one it will be acting as the witness for. This appliance can be deployed on shared storage, local storage, or another VSAN datastore.

8. Choose the appropriate networks for management and witness (VSAN). In this deployment, management lives on the “VM Network” and witness (VSAN) traffic is on the “VM-VSANnetwork”. This network is shared with the vMotion network and just needed an additional VM Portgroup created on each of the hosts in the cluster where this appliance is being deployed. Click “Next”
step08

9. Enter a root password for this appliance. Remember, this is a host that you will need to login to in order to administer so if there is a standard root password that you use it would be a good idea to use that here. Click “Next”
step09

10. Review the deployment settings and then click “Finish”
step10

11. Once deployed, you will need to configure the appliance like any other host. Power on the appliance and open the console, press F2 and login as root with the password you assigned in step 9
step11

12. Scroll to “Configure Management Network” and press “Enter”
step12

13. Ensure the Network Adapter assigned to your management network is “vmnic0”
step13
a. Set a VLAN (if necessary) for the management network, then assign your IPv4 and/or IPv6 settings for the management network to make it accessible on your network. Assigned DNS as needed as well. Press “ESC” and then press “Y” to configure settings and restart the management network
step13a

14. Once the host can communicate on the network, add it as a new host in vCenter.
a. Remember that this host should not be part of your VSAN cluster or any other cluster. It should be a standalone host in your datacenter.

15. Select the host in the vCenter client and configure networking for it. Locate the “witnessSwitch” and click “Properties”
step15

16. Select the “witnessPg” and click “Edit”
step16

17. On the “IP Settings” tab, enter the IP and subnet mask for the VSAN traffic network. Click “OK” at the bottom”
step17

18. Once you have confirmed that network settings are successful, login to the vSphere web client and navigate to the VSAN cluster to be configured

19. Click the “Manage” tab, then choose “Fault Domains & Stretched Cluster” under “Virtual SAN”
step19

20. In the “Streteched Cluster” box click “Configure”
step20

21. Name the fault domains and place the hosts into the appropriate fault domain. This is a 2 node cluster with 1 host in each fault domain. Click “Next”
step21

22. Locate the VSAN witness appliance host that was added to this vcenter and click “Next”
step22

23. Choose the flash drive for and the HDD for cache and capacity and click “Next”
step23

24. Review the settings and click “Finish”
step24

25. Once completed, you will now see the status of the stretched cluster as “Enabled”, the preferred fault domain and the designated witness host.
step25

Deploy VSAN Witness Appliance

VSAN – Host Not Contributing Stats

After an upgrade or maintenance on one or more of the nodes in a VSAN cluster one of the hosts can stop contributing performance stats. This is not a production down issue, but should be addressed to see the most up-to-date stats across all the nodes.

The fix for this is one of three things, but each of them involves turning off performance statistics on the cluster which will cause all historical performance stats to be removed. My hope is that VMware will fix this issue in an upcoming release because a loss of historical is not tolerable in all environments.

1. View the health of the VSAN by logging into the vCenter web client. Navigate to the appropriate vCenter and cluster, then click the “Monitor” tab, followed by “Virtual SAN” then click on “Health.” Expanded “Performance service” and click the warning for “All hosts contributing stats”
step01

2. At the bottom you will now see the list of hosts that are not contributing stats
step02

3. Now that we’ve identified the problem host, we need to disable VSAN performance service temporarily. Navigate to the “Manage” tab for this cluster then click on “Health and Performance” under “Virtual SAN”
step03

4. Click “Turn off” in the “Performance Service” box

a. Click “OK” to confirm stopping the service which will erase all existing performance data
step04a

5. Confirm Perform Service has been disabled by refreshing the page
step05

6. SSH to the affected host (using putty or similar SSH client) we identified in step 2 (you may have to enable SSH on the host before you can connect).
step06

7. Run the command below to restart the VSAN management agent. This should have no production impact so it is safe to perform outside of a maintenance window.
a. /etc/init.d/vsanmgmtd restart
step07a

8. Once the service has been restarted, go back to the vCenter web client and the click the “Edit” button for the Performance Service box
step08

9. Select the appropriate storage policy from the drop down list, ensure the “Turn ON Virtual SAN performance service” box is checked and click “OK”
step09

10. Confirm that the performance service is turned on and reporting healthy
step10

11. Navigate back to the “Monitor” tab and then “Virtual SAN” clicking the on the “Health” section. Click “Retest” to verify that all hosts are contributing stats.
step11

If this does not fix the issue, you can restart the process, but this time instead of restarting the vsanmgmt service on the one node, do it on all of the nodes in the cluster. Once the services have been restarted across all nodes then restart the performance service and all nodes should be contributing stats.

I have also seen a case where restarting the service on all nodes didn’t fix the problem. In that scenario I was able to fix the problem by entering maintenance mode on the problem node and choose “full data migration” so all the data would be removed from the cluster. After that was complete I completely rebuilt the host from scratch (including wiping the disks claimed by VSAN) then moving it back into the cluster. I haven’t heard from VMware of any other ways to fix this issue.

VSAN – Host Not Contributing Stats

Track Datastore Add & Removes With PowerCLI

While working with the data protection team at my job I was asked if there was any way to track new datastores being added to a vSphere cluster. When new LUNs are allocated to our vSphere clusters, the data protection team isn’t always made aware ahead of time. Normally this isn’t a big deal, but in our case we have a product that requires access to specified datastores for backups. In order to maintain access to these virtual machines for backup purposes, we need to be notified when new datastores are added.

As I sat and thought about how I could accomplish this task I came up with a couple ideas, but figured a scheduled task with PowerCLI/PowerShell would be the easiest to implement. In this script we will connect to the vCenter server, get all the datastores in the cluster, write a file daily with a date stamp, then compare the current and previous day’s datastore output files and write that to a third file that will only display the new datastores that have been added or the datastores that have been removed.

I’ve broken down the script so I can explain each section making it easy to understand. Before I had any knowledge of PowerShell/PowerCLI, modifying something to fit my environment when I didn’t understand what was happening at each step was time consuming and frustrating.

1. This is where we define the name of the vCenter instance we’ll be connecting to and the name of the cluster we’re interested in.

$vCenter = "LabvCenter.domain.com"
$Cluster = "LabCluster"

2. This is where we define the output location for our datastores and difference file. I chose to drop it into a folder named for the cluster, but that can be removed.

$filePath = "C:\test\" + $Cluster + "\"

3. This is where we connect to vCenter and then immediately wait 15 seconds which can fix issues of commands running before security warnings are displayed

Connect-VIserver $vCenter
Start-Sleep -s 15

4. This will gather all the datastores in the cluster and exclude any datastore that has a name containing “*-local”. The wildcard is important because the local datastores contain “servername + -local” and if the wildcard wasn’t there all of the local datastores would be included because no datastore is named exactly “-local”

$Datastores = Get-Cluster -Name $Cluster | Get-Datastore | Where {$_.Name -notlike "*-local"}

5. I prefer the format of 2 digit month, 2 digit day, 2 digit year. This will get the current date of the system running this script, then convert it to this format of 051415 for example.

$today = (Get-Date).ToString("MMddyy")
$yesterday = (Get-Date).AddDays(-1).ToString("MMddyy")
$2DaysAgo = (Get-Date).AddDays(-2).ToString("MMddyy")

6. This will set the file name and location for the output from 2 days ago. If that file exists, it will be removed. Rather than have an output from every day saved until I manually remove it, this process seemed better. I chose to delete the file from 2 days ago as opposed to deleting yesterday’s file after we run the comparison in case we see a huge change in the difference file we can manually compare the 2 files to try to find the error.

$2DayOldFile = $filepath + $Cluster + $2DaysAgo + ".txt"
If (Test-Path $2DayOldFile){Remove-Item $2DayOldFile}

7. This will set the file path and name to the file path defined at the top, plus the cluster name, plus the date and add .txt to the end.

$CurrentFile = $filePath + $Cluster + $today + ".txt"
$YesterdaysFile = $filePath + $Cluster + $yesterday + ".txt"

8. Here we are exporting all the datastores from Step 4 by name and outputting to the file name/path defined in Step 7.

$Datastores | Select Name | Out-File $CurrentFile

9. This is where we set the name and path for the difference file that will track the datastore add/remove.

$DifferenceFile = $filePath + "Datastore-Changes" + ".txt"

10. This will read the content from today’s content and yesterday’s content.

$YesterdaysContent = Get-Content $YesterdaysFile
$CurrentContent = Get-Content $CurrentFile

11. Here we are comparing the content we just read in step 10.

$Compare = Compare-Object $YesterdaysContent $CurrentContent

12. The standard way “Compare-Object” outputs this data shows difference with a side indicator of <= or => depending on where the difference exists. Rather than remember which file was read first to determine whether a datastore was added or removed, we change the column names. If a datastore existed yesterday, but is missing today it is labeled as “Removed”. If a datastore didn’t exist yesterday, but does today it is labeled as “Added”.

$compare | foreach {
if ($_.sideindicator -eq '<=')
{$_.sideindicator = "Removed"}

if ($_.sideindicator -eq '=>')
{$_.sideindicator = "Added"}
}

13. This will take the results from step 11 with formatting of step 12 then change the column names. The list of objects compared is normally named “InputObject” and then “Added or Removed” is normally “SideIndicator”. Maybe that’s fine, but I prefer something a little easier to read. I’ve renamed “InputObject” to “Datastore” but also I add the current date and we change “SideIndicator” to “Added or Removed”. Once that is done, we output that file to the path and name defined in Step 9. The reason why we include the current date in the “Datastore” column is because we are using “-Append” with the “Out-File” command. This will add a dated entry of changes that occurred to the bottom of the existing (or new) output file. This means we aren’t overwriting the same file every day, we are just adding to it. In case you forget to check this file for a few days you won’t lose that data.

$Compare |
select @{l='Datastore' + ' - ' + (Get-Date);e={$_.InputObject}},@{l='Added or Removed';e={$_.SideIndicator}} |
Out-File -Append $DifferenceFile

Now that we know what this thing does, let’s see it in action. I have run the output over 3 days and this is how the output file is displayed. We can see that on 05-14-15 we added Lab-Datastore-10 which didn’t exist on 05-13-15. Then on 05-15-15 we removed Lab-Datastore-03 and we added -11 and -12.
image

When running the script I commented out the removal of the 2 day old file so we could compare manually. Now we have an output file created (Datastore-Changes.txt) that should show the differences.
image

Inside Datastore-Changes.txt we see that on 5/14 the datastore “Lab-Datastore-10” was added and on 5/15 we lost Lab-Datastore-03, but added 11 and 12.

image

We can delete this file at any time and the next time this script runs we’ll create a brand new file. This means there is no dependency on this file already existing in order for the script to run and doesn’t require us to keep a long list of all the datastore add/removes for all eternity. Now you’ll just need to save the script schedule it to run using Windows Task Scheduler.

Below is the full scripts with comments.

#Define the vCenter Server and Cluster
$vCenter = "LabvCenter.domain.com"
$Cluster = "LabCluster"

#Set the path location for the output files
$filePath = "C:\test\" + $Cluster + "\"

#Connect to the vCenter Server and sleep for 15 seconds (necessary for security warnings)
Connect-VIserver $vCenter
Start-Sleep -s 15

#Get a list of all the datastores
$Datastores = Get-Cluster -Name $Cluster | Get-Datastore | Where {$_.Name -notlike "*-local"}

#Get the current date in the correct format
$today = (Get-Date).ToString("MMddyy")
$yesterday = (Get-Date).AddDays(-1).ToString("MMddyy")
$2DaysAgo = (Get-Date).AddDays(-2).ToString("MMddyy")

#Delete the output from 2 days ago (Remove this section if you want to keep the history)
$2DayOldFile = $filepath + $Cluster + $2DaysAgo + ".txt"
If (Test-Path $2DayOldFile){Remove-Item $2DayOldFile}

#Set the filename to include today's date
$CurrentFile = $filePath + $Cluster + $today + ".txt"
$YesterdaysFile = $filePath + $Cluster + $yesterday + ".txt"

#Export those datastores to a TXT file
$Datastores | Select Name | Out-File $CurrentFile

#Set file name & path for difference file
$DifferenceFile = $filePath + "Datastore-Changes" + ".txt"

#Get the content for yesterday and today's files
$YesterdaysContent = Get-Content $YesterdaysFile
$CurrentContent = Get-Content $CurrentFile

#Compare yesterday's and today's files
$Compare = Compare-Object $YesterdaysContent $CurrentContent

#Change the source/target column to "Removed" and "Added"
$compare | foreach { 
      if ($_.sideindicator -eq '')
        {$_.sideindicator = "Added"}
     }

#Change the column name output to "Datastore + Date" and "Added or Removed" then output to file
 $Compare | 
   select @{l='Datastore' + ' - ' + (Get-Date);e={$_.InputObject}},@{l='Added or Removed';e={$_.SideIndicator}} |
   Out-File -Append $DifferenceFile
Track Datastore Add & Removes With PowerCLI

Create New NFS Project on Tegile

The basis of a Project on the Tegile array is applying permissions and policies to a single volume or group of volumes. This means that changes made at the Project level can propagate to the volumes that live inside that project. If new IP addresses need to be added for read/write and root access for all the volumes, that can be handled at the Project-level instead of having to modify each export. However, you still have the ability to make changes at the individual volume level if that’s required.

In this setup, I’ll create a new project to host my Windows workloads in VMware. I’ll create a volume for Windows 2012 Operating System files and allow all the host on my NFS network read/write and root access to this volume.

1. Login to the web interface of the Tegile array
tegileproject020415-step1
2. Click on “Data”
tegileproject020415-step2
3. Click on the Pool that will host this new project
tegileproject020415-step3
4. In the “Project” window, click “Add Project”
tegileproject020415-step4
5. Enter the name of the Project , choose the Purpose, and select “NFS” for access type. Click “Next”
tegileproject020415-step5
6. Enter the Share Name, enter the number of mount points (more can be added later), and enter any Share limits or reservations. Click “Next”
tegileproject020415-step6
7. Set “NFS Sharing” to “on”. Set “Access Mode” to “Read-Write”, set “Access Type” to “IP” and enter the individual IP addresses or the subnet that will have access to this share. Check the box for “Root Access” then click “Add”. Repeat for each IP/Subnet then click “Next”
tegileproject020415-step7
8. Set your snapshot policy (if required). This can be configured at a later time as well. Click “Next”
tegileproject020415-step8
9. Review your settings and click “Finish”
tegileproject020415-step9
10. Click on the newly created Project and then you will see the volume share name and the mountpoint
tegileproject020415-step10

At this point, we just need to mount this new Volume on our ESXi hosts which can be done manually through the vSphere client on each host or we can do it through PowerCLI which will do it a lot faster.

In the example above, the name of our Volume is “2012_OS”, the path is our share mountpoint (/export/PDX_Windows/2012_OS) and the IP of the Tegile is 192.168.1.15. You’ll need to define the hostname as it appears in vCenter. To mount to a single host we can use the following PowerCLI command:

New-Datastore -NFS -VMHost "Hostname" -Name "2012_OS" -Path /export/PDX_Windows/2012_OS -NfsHost 192.168.1.15

To mount to an entire cluster, we can use this command after defining the name of the cluster as it appears in vCenter:

Get-Cluster "ClusterName" | Get-VMHost | New-Datastore -NFS -Name "2012_OS" -Path /export/PDX_Windows/2012_OS -NfsHost 192.168.1.15
Create New NFS Project on Tegile

Provision New Floating IP on Tegile

As I begin the process of reconfiguring my Tegile from a test/lab array into a production array I thought it would be a great opportunity to document more of the setup and provisioning steps involved in administering the array. In our environment we are using 10gbe without configuring LACP on the switches and letting the Tegile handle the network availability. Obviously, every environment is different, this is just the approach we took for this array.

These steps walk you through the process of provisioning an additional VLAN on the 10gbe interfaces and then creating a floating IP address that is owned by the node running the disk pool.

1. Login to the non-shared management IP of each HA Node
2. Login as “admin” with the correct password
tegileIP012715-step2
3. Click on the “Settings” tab and then click “Network”
tegileIP012715-step3
4. Under “Network Settings” on the left column, click on “Interface”
tegileIP012715-step4
5. Under “Physical Network Interfaces”, click on one of the 10gbe interfaces (named ixgbe2 and ixgbe3 on this array). Click the “+” to add a VLAN
tegileIP012715-step5
6. Enter the name of the VLAN following the guidelines below and the VLAN number and click “OK”.
a. Our naming convention is protocol + interface number + _ + VLAN number. We are using cifs on interface “ixgbe3” and the VLAN is 100
tegileIP012715-step6
7. Click “OK” to this message about saving the config
tegileIP012715-step7
8. Repeat step 5 for the other 10gbe interface changing the name to reflect the number of the other interface.
tegileIP012715-step8
9. Click “Save” to bring these new VLAN online
tegileIP012715-step9
    a. Notice that the state changes to “up” after saving
tegileIP012715-step9a
10. Now we need to assign an IP address to these interfaces. We are not using LACP, so under “IP Groups” click the “Add IP Group” button
tegileIP012715-step10
11. Click the arrow next to “Network Properties”. Enter the name, check the boxes next to the newly created VLANs we added to each interface, then enter the IP address and subnet of this new subnet. Click “OK”
     a. The naming convention is “ipmp + _ + protocol + filer node number. IPMP is “IP Multipathing”, cifs is the protocol, and this is node “A” which is the first node
tegileIP012715-step11a
     b. Click “OK” to this message about saving the config
tegileIP012715-step11b
12. Now we see the IPMP group has been created, but isn’t up.
tegileIP012715-step12
13. Click the “Save” button at the button
tegileIP012715-step13
14. Click “OK” for confirmation
tegileIP012715-step14
15. Now we can see that the interface is up
tegileIP012715-step15
16. Repeat these steps on the other node of the HA pair. Changing the IP Group name to “ipmp_cifs2” and choosing a different IP address
tegileIP012715-step16
17. Back on the primary node, click on “Settings” then “HA”
tegileIP012715-step17
18. On the active resource group (we only have 1 which is “Resource Group A” click “Add Floating IP”
tegileIP012715-step18
19. Enter the shared IP address and netmask (this is a unique IP and different than either of the IP addresses entered earlier) then choose the IP Groups we created on each node. Click “OK”
tegileIP012715-step19
20. Now we have a new IP address that will be used by whichever node owns the Resource Group
tegileIP012715-step20

The steps are pretty straightforward, but can be confusing in the beginning. Our local SE from Tegile walked us through this config when we were evaluating, but it was important for me to know how to do these things on my own.

Provision New Floating IP on Tegile

Change IP of vCSA

While changing the IP address of my vCenter Server is not something I’ve ever had to do before that changed this week. In my quest to separate networks into more logical groupings instead of everything living on the same subnet I had to change the IP address of my vCenter Server Appliance to place it on a new network along with the hosts it was managing. There is apparently a right way and a wrong way to do this.

I logged into the vCSA web interface (vCenterIP:5480), clicked on the “Network” tab and then click on “Address” and assumed this would be the correct place. So I changed the IP address and clicked “Save Settings” then rebooted the appliance.

changeip012315-step1

Yeah…that wasn’t right. As I watched the appliance boot from the console I saw a lot of errors being thrown trying to access services running on the old address and failing. Then I decided to shut down (not reboot) the vCSA and try a different method. This is a pretty simple process, but in case you’re looking for the right way of doing it, this is what worked for me.

Once the appliance is powered off, right click and choose “Edit Settings”
changeip012315-step2

Click the “Options” tab then choose “Properties” under “vApp Options”
changeip012315-step3

Enter the new IP address, gateway, and any other information that is changing. If you’re moving it to a new portgroup, update that now as well and click “OK”
changeip012315-step4

Once the changes have been made, power on the appliance and you should see the new addresses being referenced during start up.
changeip012315-step5

And now that start up is complete, we see the new IPs listed for managing the appliance and you should be able to connect on the new IP.
changeip012315-step6

Like I said, this is a very simple process. Once the vCSA was running, my hosts were notified of the change and were still in their cluster. Nothing bad happened and the lab continued to function as expected.

Change IP of vCSA