Transport high availability seems to be one
of those areas that does not really worry Exchange administrators -
they just assume an e-mail, once in Exchange, will always be delivered
as long as the target mailbox is available. However, what if a server
trying to send a newsletter to 40 recipients fails all of a sudden? What
happens to those 40 e-mails? High availability for the Exchange
transport pipeline is crucial to guarantee that no e-mails are lost
while in transit. This is also very important in large deployments with
dozens of servers where e-mails sometimes cross countries before being
delivered to their destination mailbox.
Transport High Availability
Let us look at how high availability is
achieved to ensure no e-mails are lost while in transit. This is done by
keeping redundant copies of e-mails both before and after they are
successfully delivered. Transport Dumpster was introduced in Exchange 2007 and Shadow Redundancy in Exchange 2010. Exchange 2013 took these two features a step further.
As a brief summary, the following are the main improvements done to transport high availability:
- Shadow redundancy generates a redundant copy of an e-mail on a different server before it is accepted. If Shadow redundancy is not supported by the sending server, that is not a problem as we will shortly see;
- Shadow redundancy uses both DAGs and AD sites as boundaries for transport high availability, which eliminates unnecessary redundant e-mail traffic across DAGs or AD sites;
- The Transport Dumpster feature, now Safety Net, has been further improved. Safety Net temporarily stores e-mails processed successfully by the Transport service;
- Safety Net is now made redundant on a different server, avoiding a single point of failure as mailbox databases and the Transport service are located on the same server.
The diagram below shows, at a high level, an overview of transport high availability in action in Exchange 2013:
Figure 1.1: Exchange 2013 Transport High Availability
- Mailbox server MBX1 receives an e-mail from a server outside the boundary for transport high availability;
- Before acknowledging the e-mail was received, MBX1 starts an SMTP session to another Exchange 2013 Mailbox server named MBX3 within the boundary for transport high availability and MBX3 creates a shadow copy of the e-mail. In this scenario, MBX1 holds the primary e-mail (primary server) and MBX3 holds the shadow e-mail (shadow server);
- On MBX1, Transport service processes the primary e-mail:
- As the recipient’s mailbox is hosted by MBX1, the Transport service passes the e-mail to the local Mailbox Transport service;
- The e-mail is delivered to the local database by the Mailbox Transport service;
- MBX1 queues a discard status for MBX3 indicating the primary e-mail was processed successfully, and moves a copy of the primary e-mail to the local Primary Safety Net (the e-mail remains in the same queue database, but moves to a different queue).
- MBX3 polls MBX1 periodically for a status of the primary e-mail;
- MBX3 eventually determines that MBX1 processed the primary e-mail successfully and moves the shadow e-mail to the local Shadow Safety Net (again, the e-mail remains in the same queue database, but moves to a different queue).
The e-mail is kept in both the Primary
Safety Net and Shadow Safety Net until it expires based on a
configurable timeout value (2 days by default). In case a database
failover happens before this timeout period is reached, the Primary
Safety Net on MBX1 resubmits the e-mail. If MBX1 is not available, then
MBX3 resubmits the e-mail from its Shadow Safety Net.
Shadow Redundancy
Shadow redundancy was introduced in
Exchange 2010 as a mechanism to generate redundant copies of e-mails
before being delivered to mailboxes. It delayed deleting an e-mail from
the transport database on a Hub Transport server until the server
confirmed the next hop in the delivery path for the e-mail had completed
its delivery. In case the next hop crashed before having the chance to
report back to the transport server a successful delivery, the transport
server would resubmit the e-mail to that next hop. The XSHADOW verb was used to advertise Shadow Redundancy support.
As we know, CAS servers do not queue
e-mails locally and it does not matter if Shadow Redundancy is not
supported by the sending server. The Front End Transport service will
keep the SMTP session open with the sending server while the e-mail is
being transmitted to the Transport service and a shadow copy is created
on a different Mailbox server. Only after this happens, the SMTP session
with the sending server is terminated, ensuring that an e-mail
acknowledgment is only sent to the server if the e-mail is made
redundant in Exchange.
The main Shadow Redundancy enhancement in
Exchange 2013 is the creation of a redundant copy of any e-mails before
acknowledging to the sending server successfully receiving them. Because
a sending server does not need to support Shadow Redundancy, any
e-mails entering the transport pipeline in Exchange 2013 are made
redundant while in transit.
As we saw in Figure 1.1, more than one
Exchange 2013 Mailbox servers (standalone or multi-role) are required
for Shadow Redundancy:
- If not a DAG member, the other server has to be in the local AD;
- If a DAG member, the other server has to be a member of the same DAG. In this case, the other server can be in the local or in a remote AD site, with preference given to a server in a remote site for site resiliency.
Unfortunately there are three scenarios
under which Shadow Redundancy is not able to protect e-mails in transit:
in an environment with a single Exchange server, when a DAG is
under-provisioned, or when both primary and shadow servers fail
simultaneously.
Creating Shadow E-mails
The purpose of Shadow Redundancy is to have
two copies of an e-mail within the boundary of transport high
availability while this is in transit. In which server and at what stage
the redundant copy gets created will depend on where the e-mail comes
from and where it is going to according to the following factors:
- E-mails arriving from outside the boundary of transport high availability;
- E-mails sent outside the boundary of transport high availability;
- E-mails arriving from a Mailbox server within the boundary of transport high availability.
As explained in the Exchange 2013 Mail Flow
article, a boundary for transport high availability is either a DAG for
servers that are DAG members (including DAGs across multiple AD sites)
or an AD site for servers that are not DAG members. This means that if
an e-mail crosses a boundary, Shadow Redundancy is initiated or
restarted, as it never tracks e-mails beyond a boundary.
Coexistence scenarios with Exchange 2010 Hub Transport servers are a particular case, which we will explore shortly.
E-mails arriving from outside the boundary of transport high availability
When the Transport service receives an
e-mail arriving from outside the boundary of transport high
availability, it does not matter if the sending server supports Shadow
Redundancy or not as Shadow Redundancy will create a redundant copy of
the e-mail on a different Mailbox server within the boundary while the
initial SMTP session with the sending server remains active. Only when
the primary server gets acknowledgement that the shadow e-mail was
created successfully, will it acknowledge receipt of the e-mail back to
the sender and close the SMTP session.
E-mails sent outside the boundary of transport high availability
When an e-mail is sent outside the boundary
of transport high availability and the receiving SMTP server
acknowledges successfully receiving the e-mail, the sending Exchange
2013 server moves the e-mail into Safety Net. After the primary e-mail
is transmitted successfully across the boundary, it cannot be
resubmitted from Safety Net.
In the following picture, we can see an
internal user sending an e-mail to an external recipient and the Mailbox
server EXMBX1 creating a shadow copy of that e-mail on server EXMBX2:
Figure 1.2: Creating a Shadow Copy
E-mails arriving from a Mailbox server within the boundary of transport high availability
When the Transport service on a server
accepts an e-mail whose destination is the same DAG or AD site, the next
hop for the e-mail is generally the final destination itself. Shadow
redundancy is achieved by keeping another copy of the e-mail anywhere in
the same DAG or AD site.
Shadow redundancy with legacy Hub Transport servers
In a coexistence scenario with Exchange
2010 servers (not 2007), Shadow Redundancy is maintained as well. For
example, when an e-mail is sent within the same AD site from a Hub
Transport server to a 2013 Mailbox server, the Hub Transport server uses
the XSHADOW command to advertise it supports Shadow Redundancy.
However, the 2013 server does not advertise its support in order to
prevent the 2010 server from creating a shadow copy of the e-mail on a
2013 Mailbox server.
When an e-mail is sent within the same AD
site from a 2013 Mailbox server to a 2010 Hub Transport server, the 2013
server shadows the e-mail on behalf the 2010 server. When the 2013
server receives acknowledgement from the 2010 server that the e-mail was
successfully received, the 2013 server moves the e-mail into Safety
Net. However, once the e-mail is moved into Safety Net, it is never
resubmitted to the 2010 Hub Transport server.
Conclusion
In this first article, we started exploring
transport high availability in Exchange 2013, namely Shadow Redundancy.
In the next article we will finish Shadow Redundancy before going into
Safety Net.
No comments:
Post a Comment