SQL Server Availability Groups – items to check

When there is an availability group issue

Run the following set of queries on the primary:

SELECT cluster_name,quorum_type_desc,quorum_state_desc FROM sys.dm_hadr_cluster;
SELECT member_name,member_type_desc,member_state_desc,number_of_quorum_votes
FROM sys.dm_hadr_cluster_members
ORDER BY member_name;
SELECT primary_replica,primary_recovery_health_desc,synchronization_health_desc
FROM sys.dm_hadr_availability_group_states;
SELECT * FROM sys.dm_hadr_availability_replica_cluster_nodes ORDER BY replica_server_name;
SELECT A.replica_server_name,A.join_state_desc,B.role_desc,B.operational_state_desc,
B.connected_state_desc,B.recovery_health_desc,B.synchronization_health_desc
FROM sys.dm_hadr_availability_replica_cluster_states A,
sys.dm_hadr_availability_replica_states B
WHERE A.replica_id = B.replica_id and A.group_id = B.group_id
ORDER BY replica_server_name;
SELECT A.replica_server_name,B.database_name,B.is_failover_ready,B.is_database_joined,
C.synchronization_state_desc,C.synchronization_health_desc,C.database_state_desc
FROM sys.dm_hadr_availability_replica_cluster_states A,
sys.dm_hadr_database_replica_cluster_states B,
sys.dm_hadr_database_replica_states C
WHERE A.replica_id = B.replica_id and
B.replica_id = C.replica_id and
B.group_database_id = C.group_database_id
ORDER BY replica_server_name;

and check the following items:

  • SQL Server Errorlogs
  • Windows cluster log – Powershell Get-ClusterLog -> %WINDIR%\cluster\reports -> Cluster.log
  • Windows System event log
  • Clustered diagnostic log files in the SQL Server \LOG directory with file names SRVNAME_SQLINSTANCENAME_SQLDIAG_XXX.XEL. The cluster diagnostic log contents can be viewed and filtered by opening the files in SQL Server Management Studio.

Also check items in

https://docs.microsoft.com/en-us/sql/database-engine/availability-groups/windows/troubleshoot-always-on-availability-groups-configuration-sql-server

  1. Accounts – Same domain account+login in master on both servers OR different domain accounts+login in master on both servers+grant the account connect on the mirroring endpoint OR use certificates.
  2. Check mirroring endpoints with correct port and in STATE=STARTED
  3. Check login on other server has connect permission on the mirroring endpoint
  4. Check endpoint URL, fully qualifeid domain name guaranteed to work
  5. Check connectivity to the endpoint port from the other machine in both directions
  6. Check READ_ONLY_ROUTING_URL port connectivity.

and

https://blogs.msdn.microsoft.com/alwaysonpro/2014/11/26/diagnose-unexpected-failover-or-availability-group-in-resolving-state/

  • Open Clustered diagnostic log files in SSMS and filter on state_desc=error
  • Open Cluster diagnostic logs and check for name component_health_result and availability_group_is_alive_failure
  • Open the Cluster Log and check for “is not healthy” and “SQL Server Availability Group”

and

https://support.microsoft.com/en-gb/help/2833707/troubleshooting-automatic-failover-problems-in-sql-server-2012-alwayso?lipi=urn:li:page:d_flagship3_messaging;m05iXFssTryyLKTl1wRM9g%3D%3D

  • Check Windows Cluster Log for failoverCount and check Failover Cluster Manager->Roles->Properties->Failover tab->Maximum Failures in the Specified Period
  • SQL Server Database Engine resource DLL connects to the instance of SQL Server that is hosting the primary replica by using ODBC in order to monitor health. NT AUTHORITY\SYSTEM login account needs Alter Any Availability Group,Connect SQL,View server state on secondary replicas. Check Windows Cluster Log for messages like “Failed to run diagnostics command” and “The user does not have permission to perform this action”
  • Use queries below to check secondary replica is in SYNCHRONIZED status and is_failover_ready=1.

Also https://social.msdn.microsoft.com/Forums/sqlserver/en-US/d9d4589f-2cb5-405d-a8b9-10e9f1230e13/can-not-create-listner-for-high-availability-group-of-always-on-in-sql-2012-on-cluster-environment?forum=sqldisasterrecovery&lipi=urn%3Ali%3Apage%3Ad_flagship3_messaging%3BSKtmCjYBT8mVfIoum8vrqg%3D%3D

  • The attempt to create network name and IP address for the listener is failed.
  • Check that if the ‘Primary DNS suffix of this computer’ is configured correctly
  • Add start up account of cluster service to SQL Server login and grant sysadmin role (Start up account of cluster service will be nt authority\system by default).

Also Failover Cluster Manager->Services and applications->AG Properties->Increase VerboseLogging .

Advertisements