Friday, 20 March 2015

Exchange 2013 DAG with Dynamic Quorum (Part 2)

Now that we know the importance of quorum in a windows cluster and how it affects Database Availability Groups [DAG], let us look at an example of the new dynamic quorum model introduced in Windows Server 2012.

Let us consider a three node DAG. As previously discussed, because this DAG has an odd number of members, no witness server is required. To start off with, we ensure that everything is up and running smoothly with no failures:
Image
Figure 2.1: Cluster Status
By looking at the cluster summary in the Failover Cluster Manager, we can see that the quorum mode is set to Node Majority:
Image
Figure 2.2: Cluster Summary
Using the Shell, we can confirm that Dynamic Quorum is enabled for this particular cluster (this is the default behavior):
Image
Figure 2.3: Dynamic Quorum Enabled
In this DAG, there are three mailbox databases (DB02, DB03 and DB04) with database copies across all three members. At this stage, all servers are operational and all database copies are mounted on server EXMBX1 with their respective copies all healthy:
Image
Figure 2.4: DAG and Database Status
Using the Shell, we see the properties for each of the members of the cluster, including their dynamic weight:
Image
Figure 2.5: Cluster Node Properties
As it can be seen in the screenshot above from the DynamicWeight property, each node currently has 1 vote. For this DAG, two out of three votes are required to achieve and maintain quorum, which is the case at the moment.
Now, let us shut down one of the DAG members. To make this test more interesting, let us shut down the member that is currently hosting all active copies of the three databases, server EXMBX1:
Image
Figure 2.6: Cluster Status – One Node Offline
As expected, all databases were failed over to one of the remaining servers, in this case server EXMBX2:
Image
Figure 2.7: DAG and Database Status - One Node Offline
So far, this is the exact same behavior as one would expect with previous versions of Exchange and Windows Server. The difference with dynamic quorum, is that it now removes the vote from the node with the lowest ID, with only one node keeping a vote. This is because with only two nodes remaining in a cluster we cannot have a majority as the majority of two is two. So, in order to avoid the cluster from shutting down, one of the votes is removed, thus only requiring one vote to maintain the cluster:
Image
Figure 2.8: Cluster Node Properties - One Node Offline
If this was a Windows Server 2008 or 2008 R2 cluster, quorum would still be maintained and the DAG would continue to operate without any issues. The difference with Windows Server 2012 is in what happens when we lose another node. In this case, the whole DAG would typically go offline as the remaining node would not be able to achieve majority. However, with dynamic quorum this is not the case!
Let us now shut down EXMBX2 which has all databases mounted. We could also shut down EXMBX3 which is the only node with a vote, it does not matter.
Image
Figure 2.9: Cluster Status – Two Nodes Offline
In this case, the vote remains on EXMBX3 (if we were to shut down EXMBX3, the vote would be transferred to EXMBX2) and the cluster remains up and running even with just one node remaining!
Image
Figure 2.10: Cluster Node Properties - Two Nodes Offline
While with previous editions of Windows Server, the DAG would be brought down in such a scenario, not with dynamic cluster. All databases are successfully failed over to the remaining node and the DAG remains unaffected:
Image
Figure 2.11: DAG and Database Status - Two Nodes Offline
If we look at the cluster summary in the Failover Cluster Manager we are presented with a warning alerting us that if this remaining node fails, the entire cluster will fail. This is obvious as EXMBX3 is the only node of the cluster that remains operational, but still a very useful warning.
Image
Figure 2.12: Cluster Summary
Remember that dynamic quorum only works if the following two conditions are met:
  1. The cluster has a failure of a node or several nodes in a sequential order;
  2. The cluster has already achieved quorum.
If multiple servers fail at the same time (for example, in a disaster recovery scenario where an entire datacenter with more than one cluster member becomes unavailable), the cluster will not be able to dynamically adjust the quorum majority requirement.

Node Vote

A feature we did not cover in this article is node votes. Since Windows Server 2008 R2 SP1 administrators have the ability to stop a node from being able to participate in the voting process, meaning servers can be configured not to have a vote. Regardless of vote assignment, all nodes continue to function in the cluster, receive cluster database updates and can host applications.
This might be useful in certain disaster recovery scenarios. For example, in a multisite cluster, administrators could remove votes from the nodes in a backup site so that those nodes do not affect quorum calculations. This configuration, however, is only recommended for manual failover across sites and even then there are Exchange features and properties more appropriate to deal with these scenarios.
The configured vote of a node can be verified by looking at the NodeWeight property of the cluster node by using the Get-ClusterNode PowerShell cmdlet as we can see in Figure 2.10. A value of 0 indicates that the node does not have a quorum vote configured, while a value of 1 indicates that the quorum vote of the node is assigned and it is managed by the cluster. The vote assignment for all cluster nodes can also be verified by using the Validate Cluster Quorum validation test.
Note that it is not recommended to change the node weight of cluster members. If dynamic quorum management is enabled, only the nodes that are configured to have node votes assigned can have their votes assigned or removed dynamically.
If, for some reason, you still want to change this property, you can do so by using the Shell. The following example removes the quorum vote from node EXMBX1 on the local cluster:
(Get-ClusterNode EXMBX1).NodeWeight=0
And the following example adds the quorum vote back to node EXMBX1:
(Get-ClusterNode EXMBX1).NodeWeight=1

Conclusion

This simple scenario shows the great improvement Windows Server 2012 Failover Clustering Dynamic Quorum can bring to Database Availability Groups. Some failure scenarios will still cause the entire DAG to go offline, such as when multiple servers fail simultaneously, but when designed and maintained appropriately, dynamic quorum can increase the availability of any Exchange environment.

No comments:

Post a Comment