Failed to change NetApp volume to data-protection (volume busy)

This relationship had been intentionally broken for some testing on the destination volume and when resync was issued, it had failed due to volume busy.

                            Healthy: false
                   Unhealthy Reason: Scheduled update failed to start. (Destination volume must be a data-protection volume.)
           Constituent Relationship: false
            Destination Volume Node: 
                    Relationship ID: aa9b0b54-64d9-11e5-be3f-00a0984ad3aa
               Current Operation ID: 1bed480d-1554-11e7-aa85-00a098a230de
                      Transfer Type: resync
                     Transfer Error: -
                   Current Throttle: 103079214
          Current Transfer Priority: normal
                 Last Transfer Type: resync
                Last Transfer Error: Failed to change the volume to data-protection. (Volume busy)

To check the snapshots on the volume for busy status and dependency:
snapshot show -vserver 'vserver_name' -volume 'volume_name' -fields busy, owners  

In this case, a running NDMP backup session was preventing the resync.

To list NDMP backup sessions:
system services ndmp status  

The system services ndmp status command lists all the NDMP sessions in the cluster. By default it lists the following details about the active sessions:

To list details for a NDMP backup session:
system services ndmp status -node 'node_name' -session-id 'session-id'  

From here you can confirm this is the NDMP session you need to kill by referencing the ‘Data Path’ field. This should be the path to the volume that is failing the resync.

To kill NDMP backup session:
system services ndmp kill 'session-id' -node 'node_name'  

The system services ndmp kill command is used to terminate a specific NDMP session on a particular node in the cluster. This command is not supported on Infinite Volumes.

After clearing the busy snapshot application dependency, I was able to successfully issue the resync as per normal operations.

Remote session was disconnected because no Remote Desktop client access licenses available

Received this error today trying to RDP into a server:


Solution was to delete the registry key: HKEY_LOCAL_MACHINESOFTWAREMicrosoftMSLicensing

In Windows Powershell (Run as Admin):
Remove-Item HKLM:SOFTWAREMicrosoftMSLicensing  

You’ll be prompted to confirm the deletion:

Confirm

The item at HKLM:SOFTWAREMicrosoftMSLicensing has children and the Recurse parameter was not specified. If you continue, all
children will be removed with the item. Are you sure you want to continue?
[Y] Yes [A] Yes to All [N] No [L] No to All [S] Suspend [?] Help (default is “Y”):

After this, you need to re-run the Remote Desktop Connection (Run as Admin) to recreate the key.

VMware SRM Error: “Failed to create snapshots of replica devices”

Today I encountered VMware SRM error “Failed to create snapshots of replica device Cause: SRA command ‘testFailoverStart’ failed. Storage port not found Either Storage port information provided in NFS list is incorrect else Verify the “isPv4″ option in ontap_config file matches the ipaddress in NFS field.”


Found the solution in kb article 000016026: https://kb.netapp.com/support/s/article/ka11A0000001BN5QAM/sra-command-testfailoverstart-failed-storage-port-not-found-either-storage-port-information-provided-in-nfs-list-is-incorrect?language=en_US

Looks to be caused by the firewall-policy of the SVM data LIFs. These were set for “mgmt”, which are not detected by the SRA according to the kb article.

To change the firewall-policy from “mgmt” to “data”:
net int modify -vserver [vserver_name] -lif [data_lif_name] -firewall-policy data  
To list LIFs by firewall-policy:
net int show -fields firewall-policy  

Article also advises checking the ontap_config file on the SRM server to

ensure that the NFS IP address on the controller is correct and the IP address format mentioned in the NFS address field matches the value set for the isipv4 option in the ontap_config file

By default, the configuration file is located at install_dirProgram FilesVMwareVMware vCenter Site Recovery ManagerstoragesraONTAPontap_config.txt. You’ll look for the “isPv4” option.

NetApp OCUM Error: “Cluster cannot be deleted when discovery is in progress.”

I opened a case recently as I had a cluster in OnCommand Unified Manager that was no longer polling. When I tried to delete and recreate I received the error “Cluster cannot be deleted when discovery is in progress.” I am running version 7.0.

Turns out I had hit a Burt. The Burt # is 1053008, and you can view it by logging into the support site with your credentials.

The fix is here: https://kb.netapp.com/support/s/article/An-incomplete-removal-of-a-cluster-via-the-UM-dashboard-prevents-further-collection-when-the-cluster-is-re-added

The link doesn’t seem to be working right now, but I copied the details below.

Note: for vApps you will need to use diag shell, instructions can be found here: https://kb.netapp.com/support/s/article/ka31A00000012qfQAA/How-to-access-the-OnCommand-Virtual-Machine-DIAG-shell

Symptom
• Error unable to discover cluster, Cluster already exists.
• When a cluster is added to the OnCommand Unified Manager (UM) Dashboard, then this even gets logged:

Failed to add cluster 172.16.42.16. An internal error has occurred. Contact technical support. Details: Cannot update server (com.netapp.oci.server.UpdateTaskException [68-132-983])

When the user attempts to remove the cluster, it fails indicating that it is being acquired.
Navigating to Health, Settings, Manager datasources and observing that the datasource is failing.
• For UM, the ocumserver-debug log may contain:
2016-12-19 08:06:14,583 DEBUG [oncommand] [reconcile-0] [c.n.dfm.collector.OcieJmsListener] OCIE JMS notification message received: {DatasourceName=Unknown, DatasourceID=-1, ClusterId=3387647, ChangeType=ADDED, UpdateTime=1482152623430, MessageType=CHANGE}
• When a cluster is added to the UM Dashboard this message may be displayed indicating that the issue is in the OnCommand Performance Manager (OPM) database:
Cluster in a MetroCluster configuration is added only to Unified Manager. Cluster add failed for Performance Manager.

Note: The MetroCluster part of the message is not relevant but is included as that full message is possible.
• For OPM, the ocfserver-debug log may contain:
2016-12-19 09:15:00,013 ERROR [system] [taskScheduler-5] [o.s.s.s.TaskUtils$LoggingErrorHandler] Unexpected error occurred in scheduled task.
com.netapp.ocf.collector.OcieException: com.onaro.sanscreen.acquisition.sessions.AcquisitionUnitException [35]
Failed to getById id:-1
<>
Caused by: com.onaro.sanscreen.acquisition.sessions.AcquisitionUnitException: Failed to getById id:-1
<>

Cause
This is under investigation in Documented Issue 1053008.

Because the cluster is successfully removed from the datasource tables but not the inventory tables, when the cluster is re-added there is a disconnect between these two tables. Attempting to re-add the inventory and performance fail due to duplicate entries tied to the old objects in the database as the values are not unique.
Solution
1. Shutdown the OPM host.
2. Shutdown the UM host.
3. Take a VMware snapshot or other backup per your company policy.
4. Boot UM
5. When the UM WebUI is accessible, boot OPM.
6. Check MYSQL in order to determine which hosts have the invalid datasource ID.
For vApps
Use KB 000030068, get to the diag shell
diag@OnCommand:~# sudo mysql -e “select datasourceId, name, managementIp from netapp_model.cluster where datasourceId = -1;”
+————–+————-+————–+
| datasourceId | name| managementIp |
+————–+————-+————–+
| -1 | clusterName | 10.0.0.2 |
+————–+————-+————–+
diag@OnCommand:~#

For RHEL.
diag@OnCommand:~# sudo mysql -e “select datasourceId, name, managementIp from netapp_model.cluster where datasourceId = -1;”
+————–+————-+————–+
| datasourceId | name| managementIp |
+————–+————-+————–+
| -1 | clusterName | 10.0.0.2 |
+————–+————-+————–+
diag@OnCommand:~#

Windows
A. Open a Windows Command Prompt window
B. Browse to the MySQLMySQL Server 5.6bin directory
EXAMPLE:>cd “Program FilesMySQLMySQL Server 5.6bin”Authenticate MYSQL to access the Database:> mysql -u -p
C. When you press [ENTER] the system will prompt you to enter the user’s password.
a. NOTE: The user and password were created when MYSQL was first installed on the Windows host. There is not a NETAPP default user that can be used to authenticate MYSQL.
MySQL>select datasourceId, name, managementIp from netapp_model.cluster where datasourceId = -1;”
+————–+————-+————–+
| datasourceId | name| managementIp |
+————–+————-+————–+
| -1 | clusterName | 10.0.0.1 |
+————–+————-+————–+

  1. Download the attached script appropriate for the version of UM and OPM.
    For vApps,
    A. Use an application such as FileZilla or WinSCP to upload the script to the /upload directory on the vApp.
    B. Use KB 000030068, get to the diag shell
    C. Add the execute attribute to the script.
    a. Syntax# sudo chmod +x /jail/upload/BURT1053008_um_70_2016-12-27.sh
    For RHEL
    A. Be sure to have sudo or root access to the host.
    B. Move the script to /var/logs/ocum/
    C. Add the execute attribute to the script.
    a. Syntax# sudo chmod +x /var/logs/ocum/BURT1053008_um_70_2016-12-27.sh
    For Windows (UM only)
    A. Browse to the Directory where MYSQL is installed MySQLMySQL Server 5.6bin
    a. EXAMPLE: C:Program FilesMySQLMySQL Server 5.6bin
    B. Save a copy of the following Script in the ‘MySQL Server 5.6bin’ directory: BURT1053008_um_70_Windows”

  2. Execute the script.
    A.For vApps: sudo /jail/upload/scriptname
    B. For RHEL: sudo /var/logs/ocum/script
    name
    C. For WIndows: MYSQL.ext -u -p MySQLMySQL Server 5.6bin >Script_name

  3. Confirm that the datasources with ID -1 are gone:
    vApps and RHEL: diag@OnCommand:~# sudo mysql -e “select datasourceId, name, managementIp from netappmodel.cluster where datasourceId = -1;”
    Windows:
    A.Authenticate MYSQL as per the steps outlined above, then run Syntax:>
    select datasourceId, name, managementIp from netapp
    model.cluster where datasourceId = -1;
    B. Type: Exit to exit MYSQL
  4. Reboot the host
  5. Once UM is back up, perform this same process with OPM.
  6. Once UM an OPM have been corrected, perform a discovery of the cluster and verify that it is showing up within the WebUI.
  7. If the same failure occurs, please contact NetApp Technical Support for further assistance.
    After I ran the scripts provided in the BURT, on both the OnCommand Unified Manager and OnCommand Performance Manager servers, the cluster no longer showed up in inventory and I could add succesfully.

Add NFS datastore using VMWare PowerCLI

To add a NFS datastore to a single VMHost:

Get-VMHost "HOSTNAME" | New-Datastore -Nfs -Name "DATASTORE_NAME" -Path "VOLUME_MOUNT_PATH" -NfsHost "IP_OR_HOSTNAME_OF_NFS_HOST"  

To add a NFS datastore to all VMHosts in a Cluster:

Get-Cluster "CLUSTERNAME" | Get-VMHost "HOSTNAME" | New-Datastore -Nfs -Name "DATASTORE_NAME" -Path "VOLUME_MOUNT_PATH" -NfsHost "IP_OR_HOSTNAME_OF_NFS_HOST"