Friday, 20 March 2015

Transport High Availability in Exchange 2013 (Part 1)

Transport high availability seems to be one of those areas that does not really worry Exchange administrators - they just assume an e-mail, once in Exchange, will always be delivered as long as the target mailbox is available. However, what if a server trying to send a newsletter to 40 recipients fails all of a sudden? What happens to those 40 e-mails? High availability for the Exchange transport pipeline is crucial to guarantee that no e-mails are lost while in transit. This is also very important in large deployments with dozens of servers where e-mails sometimes cross countries before being delivered to their destination mailbox.

Transport High Availability

Let us look at how high availability is achieved to ensure no e-mails are lost while in transit. This is done by keeping redundant copies of e-mails both before and after they are successfully delivered. Transport Dumpster was introduced in Exchange 2007 and Shadow Redundancy in Exchange 2010. Exchange 2013 took these two features a step further.
As a brief summary, the following are the main improvements done to transport high availability:
  • Shadow redundancy generates a redundant copy of an e-mail on a different server before it is accepted. If Shadow redundancy is not supported by the sending server, that is not a problem as we will shortly see;
  • Shadow redundancy uses both DAGs and AD sites as boundaries for transport high availability, which eliminates unnecessary redundant e-mail traffic across DAGs or AD sites;
  • The Transport Dumpster feature, now Safety Net, has been further improved. Safety Net temporarily stores e-mails processed successfully by the Transport service;
  • Safety Net is now made redundant on a different server, avoiding a single point of failure as mailbox databases and the Transport service are located on the same server.
The diagram below shows, at a high level, an overview of transport high availability in action in Exchange 2013:
Image
Figure 1.1: Exchange 2013 Transport High Availability
  1. Mailbox server MBX1 receives an e-mail from a server outside the boundary for transport high availability;
  2. Before acknowledging the e-mail was received, MBX1 starts an SMTP session to another Exchange 2013 Mailbox server named MBX3 within the boundary for transport high availability and MBX3 creates a shadow copy of the e-mail. In this scenario, MBX1 holds the primary e-mail (primary server) and MBX3 holds the shadow e-mail (shadow server);
  3. On MBX1, Transport service processes the primary e-mail:
    • As the recipient’s mailbox is hosted by MBX1, the Transport service passes the e-mail to the local Mailbox Transport service;
    • The e-mail is delivered to the local database by the Mailbox Transport service;
    • MBX1 queues a discard status for MBX3 indicating the primary e-mail was processed successfully, and moves a copy of the primary e-mail to the local Primary Safety Net (the e-mail remains in the same queue database, but moves to a different queue).
  4. MBX3 polls MBX1 periodically for a status of the primary e-mail;
  5. MBX3 eventually determines that MBX1 processed the primary e-mail successfully and moves the shadow e-mail to the local Shadow Safety Net (again, the e-mail remains in the same queue database, but moves to a different queue).
The e-mail is kept in both the Primary Safety Net and Shadow Safety Net until it expires based on a configurable timeout value (2 days by default). In case a database failover happens before this timeout period is reached, the Primary Safety Net on MBX1 resubmits the e-mail. If MBX1 is not available, then MBX3 resubmits the e-mail from its Shadow Safety Net.

Shadow Redundancy

Shadow redundancy was introduced in Exchange 2010 as a mechanism to generate redundant copies of e-mails before being delivered to mailboxes. It delayed deleting an e-mail from the transport database on a Hub Transport server until the server confirmed the next hop in the delivery path for the e-mail had completed its delivery. In case the next hop crashed before having the chance to report back to the transport server a successful delivery, the transport server would resubmit the e-mail to that next hop. The XSHADOW verb was used to advertise Shadow Redundancy support.
As we know, CAS servers do not queue e-mails locally and it does not matter if Shadow Redundancy is not supported by the sending server. The Front End Transport service will keep the SMTP session open with the sending server while the e-mail is being transmitted to the Transport service and a shadow copy is created on a different Mailbox server. Only after this happens, the SMTP session with the sending server is terminated, ensuring that an e-mail acknowledgment is only sent to the server if the e-mail is made redundant in Exchange.
The main Shadow Redundancy enhancement in Exchange 2013 is the creation of a redundant copy of any e-mails before acknowledging to the sending server successfully receiving them. Because a sending server does not need to support Shadow Redundancy, any e-mails entering the transport pipeline in Exchange 2013 are made redundant while in transit.
As we saw in Figure 1.1, more than one Exchange 2013 Mailbox servers (standalone or multi-role) are required for Shadow Redundancy:
  • If not a DAG member, the other server has to be in the local AD;
  • If a DAG member, the other server has to be a member of the same DAG. In this case, the other server can be in the local or in a remote AD site, with preference given to a server in a remote site for site resiliency.
Unfortunately there are three scenarios under which Shadow Redundancy is not able to protect e-mails in transit: in an environment with a single Exchange server, when a DAG is under-provisioned, or when both primary and shadow servers fail simultaneously.

Creating Shadow E-mails

The purpose of Shadow Redundancy is to have two copies of an e-mail within the boundary of transport high availability while this is in transit. In which server and at what stage the redundant copy gets created will depend on where the e-mail comes from and where it is going to according to the following factors:
  • E-mails arriving from outside the boundary of transport high availability;
  • E-mails sent outside the boundary of transport high availability;
  • E-mails arriving from a Mailbox server within the boundary of transport high availability.
As explained in the Exchange 2013 Mail Flow article, a boundary for transport high availability is either a DAG for servers that are DAG members (including DAGs across multiple AD sites) or an AD site for servers that are not DAG members. This means that if an e-mail crosses a boundary, Shadow Redundancy is initiated or restarted, as it never tracks e-mails beyond a boundary.
Coexistence scenarios with Exchange 2010 Hub Transport servers are a particular case, which we will explore shortly.
E-mails arriving from outside the boundary of transport high availability
When the Transport service receives an e-mail arriving from outside the boundary of transport high availability, it does not matter if the sending server supports Shadow Redundancy or not as Shadow Redundancy will create a redundant copy of the e-mail on a different Mailbox server within the boundary while the initial SMTP session with the sending server remains active. Only when the primary server gets acknowledgement that the shadow e-mail was created successfully, will it acknowledge receipt of the e-mail back to the sender and close the SMTP session.
E-mails sent outside the boundary of transport high availability
When an e-mail is sent outside the boundary of transport high availability and the receiving SMTP server acknowledges successfully receiving the e-mail, the sending Exchange 2013 server moves the e-mail into Safety Net. After the primary e-mail is transmitted successfully across the boundary, it cannot be resubmitted from Safety Net.
In the following picture, we can see an internal user sending an e-mail to an external recipient and the Mailbox server EXMBX1 creating a shadow copy of that e-mail on server EXMBX2:
Image
Figure 1.2: Creating a Shadow Copy
E-mails arriving from a Mailbox server within the boundary of transport high availability
When the Transport service on a server accepts an e-mail whose destination is the same DAG or AD site, the next hop for the e-mail is generally the final destination itself. Shadow redundancy is achieved by keeping another copy of the e-mail anywhere in the same DAG or AD site.
Shadow redundancy with legacy Hub Transport servers
In a coexistence scenario with Exchange 2010 servers (not 2007), Shadow Redundancy is maintained as well. For example, when an e-mail is sent within the same AD site from a Hub Transport server to a 2013 Mailbox server, the Hub Transport server uses the XSHADOW command to advertise it supports Shadow Redundancy. However, the 2013 server does not advertise its support in order to prevent the 2010 server from creating a shadow copy of the e-mail on a 2013 Mailbox server.
When an e-mail is sent within the same AD site from a 2013 Mailbox server to a 2010 Hub Transport server, the 2013 server shadows the e-mail on behalf the 2010 server. When the 2013 server receives acknowledgement from the 2010 server that the e-mail was successfully received, the 2013 server moves the e-mail into Safety Net. However, once the e-mail is moved into Safety Net, it is never resubmitted to the 2010 Hub Transport server.

Conclusion

In this first article, we started exploring transport high availability in Exchange 2013, namely Shadow Redundancy. In the next article we will finish Shadow Redundancy before going into Safety Net.

No comments:

Post a Comment