Friday, 20 March 2015

Exchange 2013 DAG with Dynamic Quorum (Part 1)

When an administrator creates a Database Availability Group [DAG], it is initially created as an empty object in Active Directory [AD]. This object is used to store relevant information about the DAG, such as server membership information. When the first server is added to the DAG, a failover cluster is automatically created for the DAG and used exclusively by the DAG. DAGs make limited use of Windows failover clustering technology, such as the cluster heartbeat, cluster networks and the cluster database (for storing data like database state changes from active to passive or vice versa, or from mounted to dismounted and vice versa). As each, when a subsequent server is added to the DAG, it is joined to the underlying cluster, the cluster's quorum model is automatically adjusted by Exchange, and the server is added to the DAG object in AD.

Failover clusters use the concept of quorum, which uses a consensus of voters to ensure that only one subset of the cluster members (which could be all members or a majority of members) is functioning at one time. Highly available Mailbox servers in previous versions of Exchange also use failover clustering and its concept of quorum, so this is not a new concept. Quorum represents a shared view of members and resources, and the term quorum is also used to describe the physical data that represents the configuration within the cluster that is shared between all cluster members. As a result, all DAGs require their underlying failover cluster to have quorum. If the cluster loses quorum, all DAG operations terminate and all mounted databases hosted in the DAG are dismounted.
Quorum is important to 1) ensure consistency so each of the members always has a view of the cluster that is consistent with the other members; and 2) to act as a tie-breaker to avoid partitioning (such as split brain syndrome scenarios) and to make sure that only one collection of the members in the DAG is considered official.

Majority Node Set Clustering

Majority Node Set [MNS] is a Windows Clustering model used since early versions of Exchange. This model requires 50% of the voters (servers and/or one file share witness) to be up and running.
DAGs with an even number of members use the failover cluster's Node and File Share Majority quorum mode, which uses an external witness server that acts as a tie-breaker. In this quorum mode, each DAG member gets a vote. In addition, the witness server is used to provide one DAG member with a weighted vote. The cluster quorum data is stored by default on the system disk of each member of the DAG and is kept consistent across those disks. A file on the witness server (thus the name File Share) is used to keep track of which member has the most updated copy of the data - the witness server does not have a copy of the cluster quorum data.
In this mode, a majority of the voters must be operational and able to communicate with each other to maintain quorum. If a majority of the voters cannot communicate with each other, the DAG's underlying cluster loses quorum and the DAG will require administrator intervention to become operational again. When the witness server is needed for quorum, any member of the DAG that can communicate with the witness server can place a Server Message Block [SMB] lock on the witness server's witness.log file. The DAG member that locks the witness server (the locking node) retains an additional vote for quorum purposes. The DAG members in contact with the locking node are in the majority and maintain quorum. Any DAG members that cannot contact the locking node are in the minority and therefore lose quorum.
Consider a DAG with four members. Because this DAG has an even number of members, an external witness server is used to provide one of the cluster members with a fifth, tie-breaking vote. To maintain a majority of voters (and therefore quorum), at least three voters must be able to communicate with each other. At any time, a maximum of two voters can be offline without disrupting service and data access. If three or more voters are offline, the DAG loses quorum and all databases are dismounted.
Image
Figure 1.1: Database Availability Group with an Even Number of Members
The following formula helps administrators calculate how many nodes in a cluster have to be available before the cluster is brought offline: (n / 2) + 1 where n is the number of DAG nodes within the DAG (note that n/2 is always rounded down). So, in this example, we have: (5/2)+1 = 2+1 = 3.
DAGs with an odd number of members use the failover cluster's Node Majority quorum mode. In this mode, each member gets a vote and each member's local system disk is used to store the cluster quorum data. If the configuration of the DAG changes, that change is reflected across the different disks. The change is only considered to have been committed and made persistent if that change is made to the disks on half the members (rounding down) plus one. For example, in a three-member DAG, the change must be made on one plus one members, or two members in total. In this scenario, and using the formula above, only one server can be down at one time. If a second server is also offline, the entire cluster will be brought offline.
Image
Figure 1.2: Database Availability Group with an Odd Number of Members

Windows Server 2012

Windows Server 2012 introduced a new model called Failover Clustering Dynamic Quorum, which we can use with Exchange. When using Dynamic Quorum, the cluster dynamically manages the vote assignment to nodes based on the state of each node. When a node shuts down or crashes, it loses its quorum vote. When a node successfully re-joins the cluster, it regains its quorum vote. By dynamically adjusting the assignment of quorum votes, the cluster can increase or decrease the number of quorum votes that are required to keep it running. This enables the cluster to maintain availability during sequential node failures or shutdowns.
With a dynamic quorum, the cluster quorum majority is determined by the set of nodes that are active members of the cluster at any time. This is an important distinction from the cluster quorum in Windows Server 2008 R2 where the quorum majority is fixed, based on the initial cluster configuration.
Important:
The advantage this brings, is that it is now possible for a cluster to run even if the number of nodes remaining in the cluster is less than 50%! By dynamically adjusting the quorum majority requirement, the cluster can sustain sequential node shutdowns down to a single node and still keep running. It does not allow the cluster to sustain a simultaneous failure of a majority of voting members though. To continue running, the cluster must always have a quorum majority at the time of a node shutdown or failure.
The cluster-assigned dynamic vote of a node can be verified with the DynamicWeight property of the cluster node by using the Get-ClusterNode cmdlet. A value of 0 indicates that the node does not have a quorum vote, while a value of 1 indicates that the node has a quorum vote:
Image
Figure 1.3: Dynamic Weight Property of a Dynamic Quorum
To change the quorum configuration in a failover cluster by using the Failover Cluster Manager, follow these steps:
  1. In Failover Cluster Manager, select the cluster that you want to change;
  2. With the cluster selected, under Actions, click More Actions, and then click Configure Cluster Quorum Settings:
Image
Figure 1.4: Configure Cluster Quorum Settings Option
  1. The Configure Cluster Quorum Wizard appears. Click Next:
Image
Figure 1.5: Configure Cluster Quorum Wizard
  1. On the Select Quorum Configuration Option page, the default is to allow the cluster to automatically configure the quorum settings that are optimal for our current cluster configuration (Use typical settings). To configure quorum management settings and to add or change the quorum witness, click Advanced quorum configuration and witness selection and then click Next:
Image
Figure 1.6: Select Quorum Configuration Option
  1. On the Select Voting Configuration page, select All Nodes and click Next. For certain scenarios, you might want to assign votes only to a subset of the nodes or even to No Nodes. This is generally not recommended, because it does not allow nodes to participate in quorum voting and it requires configuring a disk witness which becomes the single point of failure for the cluster.
Image
Figure 1.7: Select Voting Configuration
  1. On the Configure Quorum Management page, you can enable or disable the Allow cluster to dynamically manage the assignment of node votes option. Selecting this option enables dynamic quorum which increases the availability of the cluster by allowing it to continue running in failure scenarios that are not possible when this option is disabled. This option is enabled by default and it is strongly recommended not to disable it:
Image
Figure 1.8: Configure Quorum Management
  1. On the Select Quorum Witness page, select an option to configure a disk witness or a file share witness. The wizard indicates the witness selection options that are recommended for our cluster. In this case, because the current DAG has an odd number of members, no witness is required:
Image
Figure 1.9: Select Quorum Witness
  1. Click Next. Confirm your selections on the confirmation page that appears and then click Next:
Image
Figure 1.10: Confirmation
After the wizard runs and the Summary page appears, if you want to view a report of the tasks that the wizard performed, click View Report. The most recent report will remain in the systemroot\Cluster\Reports folder with the name QuorumConfiguration.mht.
You can also use the Shell to check if dynamic quorum is being used by running the following cmdlet:
Image
Figure 1.11: Checking Dynamic Quorum Configuration
To enable or disable dynamic quorum through the Shell, simply set the DynamicQuorum property to 1 (enabled) or to 0 (disabled) by running:
(Get-Cluster “cluster_name”).DynamicQuorum=0

Conclusion

In the first part of this article series, we had a high level overview of the importance of quorum in a windows cluster and how it affects Database Availability Groups. We also looked at the advantages of the new Dynamic Quorum in Windows Server 2012.

No comments:

Post a Comment