Now that we know the importance of quorum
in a windows cluster and how it affects Database Availability Groups
[DAG], let us look at an example of the new dynamic quorum model
introduced in Windows Server 2012.
Let us consider a three node DAG. As
previously discussed, because this DAG has an odd number of members, no
witness server is required. To start off with, we ensure that everything
is up and running smoothly with no failures:
Figure 2.1: Cluster Status
By looking at the cluster summary in the Failover Cluster Manager, we can see that the quorum mode is set to Node Majority:
Figure 2.2: Cluster Summary
Using the Shell, we can confirm that Dynamic Quorum is enabled for this particular cluster (this is the default behavior):
Figure 2.3: Dynamic Quorum Enabled
In this DAG, there are three mailbox
databases (DB02, DB03 and DB04) with database copies across all three
members. At this stage, all servers are operational and all database
copies are mounted on server EXMBX1 with their respective copies all healthy:
Figure 2.4: DAG and Database Status
Using the Shell, we see the properties for each of the members of the cluster, including their dynamic weight:
Figure 2.5: Cluster Node Properties
As it can be seen in the screenshot above from the DynamicWeight
property, each node currently has 1 vote. For this DAG, two out of
three votes are required to achieve and maintain quorum, which is the
case at the moment.
Now, let us shut down one of the DAG
members. To make this test more interesting, let us shut down the member
that is currently hosting all active copies of the three databases,
server EXMBX1:
Figure 2.6: Cluster Status – One Node Offline
As expected, all databases were failed over to one of the remaining servers, in this case server EXMBX2:
Figure 2.7: DAG and Database Status - One Node Offline
So far, this is the exact same behavior as
one would expect with previous versions of Exchange and Windows Server.
The difference with dynamic quorum, is that it now removes the vote from
the node with the lowest ID, with only one node keeping a vote. This is
because with only two nodes remaining in a cluster we cannot have a
majority as the majority of two is two. So, in order to avoid the
cluster from shutting down, one of the votes is removed, thus only
requiring one vote to maintain the cluster:
Figure 2.8: Cluster Node Properties - One Node Offline
If this was a Windows Server 2008 or 2008
R2 cluster, quorum would still be maintained and the DAG would continue
to operate without any issues. The difference with Windows Server 2012
is in what happens when we lose another node. In this case, the whole
DAG would typically go offline as the remaining node would not be able
to achieve majority. However, with dynamic quorum this is not the case!
Let us now shut down EXMBX2 which has all databases mounted. We could also shut down EXMBX3 which is the only node with a vote, it does not matter.
Figure 2.9: Cluster Status – Two Nodes Offline
In this case, the vote remains on EXMBX3 (if we were to shut down EXMBX3, the vote would be transferred to EXMBX2) and the cluster remains up and running even with just one node remaining!
Figure 2.10: Cluster Node Properties - Two Nodes Offline
While with previous editions of Windows
Server, the DAG would be brought down in such a scenario, not with
dynamic cluster. All databases are successfully failed over to the
remaining node and the DAG remains unaffected:
Figure 2.11: DAG and Database Status - Two Nodes Offline
If we look at the cluster summary in the
Failover Cluster Manager we are presented with a warning alerting us
that if this remaining node fails, the entire cluster will fail. This is
obvious as EXMBX3 is the only node of the cluster that remains operational, but still a very useful warning.
Figure 2.12: Cluster Summary
Remember that dynamic quorum only works if the following two conditions are met:
- The cluster has a failure of a node or several nodes in a sequential order;
- The cluster has already achieved quorum.
If multiple servers fail at the same time
(for example, in a disaster recovery scenario where an entire datacenter
with more than one cluster member becomes unavailable), the cluster
will not be able to dynamically adjust the quorum majority requirement.
Node Vote
A feature we did not cover in this article
is node votes. Since Windows Server 2008 R2 SP1 administrators have the
ability to stop a node from being able to participate in the voting
process, meaning servers can be configured not to have a vote.
Regardless of vote assignment, all nodes continue to function in the
cluster, receive cluster database updates and can host applications.
This might be useful in certain disaster
recovery scenarios. For example, in a multisite cluster, administrators
could remove votes from the nodes in a backup site so that those nodes
do not affect quorum calculations. This configuration, however, is only
recommended for manual failover across sites and even then there are
Exchange features and properties more appropriate to deal with these
scenarios.
The configured vote of a node can be verified by looking at the NodeWeight property of the cluster node by using the Get-ClusterNode
PowerShell cmdlet as we can see in Figure 2.10. A value of 0 indicates
that the node does not have a quorum vote configured, while a value of 1
indicates that the quorum vote of the node is assigned and it is
managed by the cluster. The vote assignment for all cluster nodes can
also be verified by using the Validate Cluster Quorum validation test.
Note that it is not recommended to change
the node weight of cluster members. If dynamic quorum management is
enabled, only the nodes that are configured to have node votes assigned
can have their votes assigned or removed dynamically.
If, for some reason, you still want to
change this property, you can do so by using the Shell. The following
example removes the quorum vote from node EXMBX1 on the local cluster:
(Get-ClusterNode EXMBX1).NodeWeight=0
And the following example adds the quorum vote back to node EXMBX1:
(Get-ClusterNode EXMBX1).NodeWeight=1
Conclusion
This simple scenario shows the great improvement Windows Server 2012 Failover Clustering Dynamic Quorum
can bring to Database Availability Groups. Some failure scenarios will
still cause the entire DAG to go offline, such as when multiple servers
fail simultaneously, but when designed and maintained appropriately,
dynamic quorum can increase the availability of any Exchange
environment.
No comments:
Post a Comment