|
|
"It is
impossible for ideas to compete in the marketplace if no forum for
their presentation is provided or available." � �Thomas Mann, 1896
Scaling your
services with ZXTM Global Load Balancer
Contributed by:
Zeus Technology, Inc.
Introduction
"The average multinational
corporation loses more than 1 million hours of productivity because of
applications failure. Depending on the industry, each hour of downtime
can cost businesses �3 million or more1"
"Each hour of application
downtime costs Fortune 1000 companies in excess of $300,000, according
to nearly one-third of respondents at companies that track the business
cost and impact2"
However you measure it, the cost of application
downtime can be very high for many organizations. For organizations that
provide applications and services over the Internet, the probability of
downtime is even higher.
There are two commonly used techniques to
minimize
the chance of a failure causing downtime in network-based applications.
These are Server Load Balancing and
Global Server Load Balancing.
Server Load Balancing within a Datacenter
Techniques like server load balancing and
clustering are often used within a datacenter to build clusters of
fault-tolerant, scalable applications. These clusters are resilient to
isolated failures - for example, a server machine developing a hardware
fault - and they allow the administrator to add more capacity to his
application when required.
However, a clustered, fault-tolerant application
running in a single datacenter is still vulnerable to downtime:
-
The application may fail because of a single,
critical point of failure such as a database or SAN, or it may fail
because of administrator error.
-
The datacenter may be disrupted due to a
catastrophic natural or man-made disaster - power failure because of
rolling blackouts, maintenance errors or even terrorist attack.
-
The datacenter may become unavailable because
of a denial-of-service attack mounted against a different service
running in that datacenter, or because of a failure in its local
internet connectivity.
Organizations who wish to protect against these
risks often choose to deploy a Global Server Load Balancing solution which
routes application traffic to multiple distinct datacenters and removes the
single point of failure.
Global Server Load Balancing between Datacenters
Global Server Load Balancing (GSLB) systems manage
how clients are connected to a datacenter, when a service is hosted in
multiple distinct datacenters.
1 Yankee Group, April 2006, "Overcoming
Applications Ignorance: New Services to Enable Agility"
2 mValent Market Survey - Challenges and Priorities
for Fortune 1000 companies
-
In an
Active-Passive configuration, one
datacenter is nominated the active one for each service. The other
datacenters are idle for that service. If the active datacenter becomes
unavailable, one of the passive datacenters becomes active and all
clients are directed to it.
-
In an
Active-Active configuration, all
datacenters are used and clients are load-balanced between them based on
datacenter performance and proximity.
The primary purpose of a GSLB system is
Business Continuity - to ensure that services
are always available, even when one or more service locations (datacenters)
becomes unavailable.
A second purpose of GSLB is
Improve Customer Experience - to load-balance
each user to the best datacenter from a choice of several. The choice can be
based on datacenter performance and proximity, so that clients are directed
to the datacenter that is closest and is performing the best. This way, the
client gets the best possible level of service.
Who might use a Global Server Load Balancing
solution?
A GSLB solution is relevant to any
organization:
1. Who provides or depends on an internet-based
service, such as a public-facing web site, or a network-based application
for internal use.
2. Who cannot countenance service failure, whether
this results in lost productivity, lost revenue or lost customers.
3. Who wishes to establish an advantageous SLA
(service level agreement) with its users or customers, providing them with a
superior and competitive level of service.
This white paper discusses the implementation
details of a DNS-based Global Server Load Balancing solution, with
particular reference to Zeus’ ZXTM GLB product.
Examples
Disaster Recovery
A specialist music and book retailer turns over
orders in excess of $10,000 per day. Any period where users could not access
the online shop would result in significant loss of revenue and reputation.
The retailer hosts their primary website in a
hosting facility in New York, and replicates all database transactions to a
second backup website in Boston. During normal operation, users are directed
to the New York website, but if that website becomes unavailable, a GSLB
system directs all users to the backup site in Boston.
When a contractor severed a fiber optic cable in
the New York hosting facility, the GSLB device detected that the site was no
longer accessible and immediately started directing users to the backup site
in Boston instead. Because the database was continually replicated, users
were able to continue with their transactions and complete their purchases.
Providing high levels of service
A UK-based publishing company publishes several
prestigious scientific journals. Universities and research institutions
across the world pay a subscription to access the content of these journals
electronically.
A disaster recovery solution is required because
the paid subscribers will not tolerate downtime. In addition, many of the
subscribers in the US, Far East and Australasia report that the website is
slow, and it can take too long to download the PDF content they have paid
for.
The publishing company establishes mirror sites in
the US and Japan and uses a GSLB device to seamlessly direct each user to
the site that is geographically closest to them. Download times for many
customers drop by up to 75%.
Upselling services to Hosting Customers
An innovative ISP was seeking additional services
they could provide to their hosting customers.
Using data replication to a server platform located
in a different datacenter, the ISP was able to synchronize customers’ web
content between two locations. With a GSLB device, he was able to direct
traffic for some customer sites to the City North datacenter, and other
sites to the City South datacenter, and thus control and manage the
bandwidth used by each datacenter.
The ISP’s customer’s SLA contracts contained
exclusions for major datacenter failure caused by elements outside the ISP’s
control. For an additional fee, the ISP was able to upsell a premium hosting
package that included a datacenter failover service to minimize the risk of
a datacenter failure rendering a customer’s site inaccessible.
How does Global Server Load Balancing work?
DNS-based Global Server Load Balancing
The majority of GSLB devices function by
manipulating the DNS (Domain Name System) resolution process.
An application such as a web browser needs to
locate a service on the intranet before it can use it. Services are
published using a
Domain Name, such as
www.zeus.com.
Behind the scenes, the application uses a process
called ‘DNS Resolution’ to find out the
IP Address of the internet server that provides
the service with the given domain name. The DNS system is very much like a
global internet phone book - you may know an individual by their full name
(for example, "Tim Berners Lee"), but you need to look up their phone number
before you can get in touch with them.
Different servers in different locations will have
different IP addresses. A GSLB device controls how domain names are resolved
to IP addresses, and thus controls which datacenter clients are directed to.

Several users access
http://www.zeus.com, but are
directed to different datacenters:
-
When users in the US try to access
www.zeus.com, they are directed to IP address 45.6.1.12
-
Users in other locations are directed to IP
address 103.12.253.4
In order to effectively deploy a GSLB solution, you
need a good understanding of how the DNS system functions. For background
reading, you may find the Zeus publication "A Layman’s Guide to DNS" useful.
Other GSLB designs
Other techniques are sometimes used to load balance
users across several globally-distributed datacenters.
Application Level Redirection
Some application protocols, such as HTTP, allow for
‘redirection’ messages. A user accesses
www.zeus.com, but receives a
redirect sending him to us.zeus.com, which resolves to just one of the
datacenters.
This method is effective at controlling precisely
which datacenter a user is sent to, but it does not cater for datacenter
failure, and users may bookmark or distribute links to us.zeus.com,
bypassing the load-balancing decision.
Generally, this method needs to be implemented by a
DNS-based GSLB system to ensure that
www.zeus.com is always available and a
traffic management device to control how and when users are redirected.
Triangulation
With triangulation, incoming network traffic is
distributed across one or more datacenters using round-robin DNS. When a
datacenter receives a request, it determines whether it is best suited to
respond to the request, or whether it should forward the request to a
different datacenter.
With Layer 4 triangulation, the first datacenter
forwards the request to the second datacenter, and the second responds
directly to the remote client. The request and response data takes three
hops across the network. Layer 4 triangulation may not be possible if one of
the service providers deploys egress filtering to defeat connection
source-address spoofing (a technique often used to prevent SPAM email).
With Layer 7 triangulation, the first datacenter
forwards the request to the second, and the second datacenter replies back
to the first. The first datacenter then relays the response back to the
client. The requests and response data takes four hops over the network.
Triangulation can load-balance very compute
intensive application requests, but it generally does not improve response
time, it is bandwidth-intensive and it does not cater for primary datacenter
failure.
BGP Routing Control
BGP (Border Gateway Protocol) is the core routing
protocol of the Internet. By manipulating BGP routing tables, it is possible
to move blocks of IP addresses from one physical network location to another
in a very different location.
BGP routing control can be used by an ISP to
provide large-scale failover, but it is too expensive and coarse to provide
fine-grained load balancing control for an individual service.
Introducing ZXTM Global Load Balancer
ZXTM Global Load Balancer (ZXTM GLB) is a DNS-based
global server load balancing system.
Typical deployment procedure
ZXTM GLB can be deployed in a step-by-step, low
risk manner with minimal interference or disruption to existing
infrastructure.
The ZXTM GLB devices work alongside the existing
DNS infrastructure, taking the DNS responses and manipulating them to
control where each remote user is directed to. The ZXTM GLB devices do not
replace any existing DNS servers, and all DNS information is stored on the
DNS servers as before.
Begin with Round-Robin DNS
For example, suppose that the service
www.zeus.com
is hosted in two different locations, with IP addresses 21.2.12.1 and
45.4.54.5. Without a GSLB device, the DNS server would normally be
configured to return both of these IP addresses when queries about
www.zeus.com. The IP addresses would be returned in a different order each
time using a process called Round-Robin DNS, and clients would connect to
one of the datacenters.
Add in ZXTM GLB
ZXTM GLB builds on this standard configuration by
manipulating the round-robin DNS responses:

1. The end user makes a DNS request for
www.zeus.com.
2. ZXTM GLB forwards the DNS request to the
existing DNS server.
3. The DNS server responds with all IP addresses in
a round-robin fashion.
4. ZXTM GLB chooses one IP address and masks out
the others from the response.
The key load-balancing decision that ZXTM GLB
performs is to decide which IP address(es) should be returned to each remote
user. This decision directly controls which datacenter each remote user
uses.
Just one change needs to be made to the DNS
information so that clients make DNS lookups through the GLB device rather
than directly to the DNS servers. This change can be made by altering the NS
record for the domain, or by adding a CNAME. Please refer to the ZXTM GLB
documentation for more information.
DNS TTLs
DNS information is commonly cached (remembered) by
intermediaries across the network. This caching behavior is advantageous
because it reduces the amount of DNS traffic, but can impede the operation
of a DNS-based Global Server Load Balancing device.
An important element in a DNS response is the TTL
(time to live) value. This value informs any intermediaries as to how long
the DNS response can be cached for. ZXTM GLB can rewrite TTL values in the
DNS responses it has managed, overwriting a long default value with a much
shorter one. The effect of the change (increased DNS traffic) can be easily
observed using the real-time visualization tools in ZXTM GLB, so you can
chose a suitable value that balances traffic rates with responsive failover.
How does ZXTM GLB work in practice?
One or more ZXTM GLB devices are deployed in each
datacenter. The ZXTM GLB devices monitor the performance and availability of
their own datacenter, and broadcast that information to the other ZXTM GLB
devices in the other datacenters.

This way, every ZXTM GLB device knows the
availability and performance of every datacenter.
Active-Active load balancing configurations
Any ZXTM GLB device may receive a DNS request a
service running in the datacenters. When the datacenters are running in
active-active mode, the ZXTM GLB device chooses
which datacenter the user should be directed to. This decision is based on
three criteria:
-
Datacenter Availability: If a datacenter has failed, users are
not directed there.
-
Datacenter Performance:
Datacenters with better response times are preferred over slower, more
overloaded datacenters.
-
Geographic Proximity:
ZXTM GLB uses a comprehensive database that maps IP address to
geographic location, and calculates the geographic distance between the
end user and each datacenter.
The decision can be tuned so that it is based
purely on load, purely on geographic location, or on a mixture of the two:

The benefits of an active-active load balancing
mode are that they give better datacenter utilization, that users get the
best possible level of service from the closest, best performing datacenter,
and the configuration provides full failover in the event of a datacenter
failure.
However, you may not wish to use an active-active
configuration if the applications you are balancing cannot be run in
multiple datacenters simultaneously - for example, because they depend on a
single database or SAN that cannot be continuously replicated over multiple
sites. In this case, an active-passive configuration is more appropriate.
Additionally, one side-effect of an active-active
load balancing mode is that an end user may spontaneously be redirected from
one datacenter to another when his client software makes a fresh DNS
request. For example, the datacenter he is accessing may become overloaded
and the load-balancing algorithm may assign him to a different datacenter.
If this behavior is undesirable, you can overcome
it by several methods. You can use the fully deterministic ‘Geo’
load-balancing method, or you can use Application-level redirection to
detect user’s sessions and forcibly direct him to a particular datacenter
when required. Please consult the ‘Multi-site session persistence with ZXTM
GLB and ZXTM’ document for a full description of this technique.
Active-Passive load balancing configurations
When the datacenters are running in
active-passive mode, the load balancing
decision is much simpler. You first specify the order in which the
datacenters should be used:

All users are directed to the first datacenter
(Hudson in this case) so long as that datacenter is available.
If the first datacenter fails, all users are
directed to the second datacenter (Cambridge); you can build arbitrarily
long chains of datacenters for multiple levels of failover.
If the first datacenter recovers, you can specify
how the service should fail back. If automatic failback is enabled, users
will immediately be directed to the first datacenter again. If it is
disabled, users continue to use the second datacenter until the
administrator manually indicates that the first datacenter is ready to
receive traffic again.
The benefit of this configuration is that it gives
a very deterministic, controllable disaster recovery solution, ideally
suited for complex, stateful applications.
Availability and Performance Checking
ZXTM GLB checks the performance and correct
operation of the services in the local datacenter using a range of
application monitors. These monitors can run simple tests like network
pings, or complex tests like HTTP GETs to verify that returned pages match
particular criteria.
Performance data can optionally be deduced from the
response times from selected monitors, or it can be supplied separately
using a standards-compliant SOAP interface. This performance data is used to
weight how much each datacenter is used when the Load or
Adaptive load balancing algorithm is selected.
ZXTM GLB can also run an external connectivity
monitor to verify that its datacenter has connectivity to an upstream
location on the Internet.
ZXTM GLB broadcasts the health and performance data
to the other ZXTM GLB devices in the other datacenters. It deduces that
other datacenters are available if it hears the health and performance
information from the ZXTM GLBs in those datacenters. For this reason,
organizations typically operate a pair of ZXTM GLB devices in each
datacenter, thus removing a possible single-point-of-failure within each
datacenter.
Conclusion
ZXTM GLB is a complete DNS-based Global Server Load
Balancing solution that provides:
-
Business Continuity in the event of catastrophic datacenter
failure
-
Improved Customer Experience
by routing users to the closest, best performing
datacenter
ZXTM GLB is very easy to deploy, with minimal
infrastructure changes and very little operational risk.
The rich real-time visualization and reporting in
ZXTM GLB gives a clear picture of the effectiveness of the Global Server
Load Balancing configuration and the activity of your users globally at any
time.

The Global Map view in ZXTM GLB shows real-time
site activity. It is ideal for public display in a network operations center
or boardroom!
For Further Information
To find out more about ZXTM Global
Load Balancer or to arrange a demonstration or product evaluation, please
visit
http://www.zeus.com/products/zxtmglb/
The ZXTM KnowledgeHub is a key resource for
developers and system administrators wishing to learn about ZXTM and Zeus’
Traffic Management solutions. It is located at
http://knowledgehub.zeus.com/
Editorial Policy: Nothing you read in
The Business Forum Journal
should ever be construed to be the
opinion of, statements condoned by, or advice from,
The Business Forum Institute, its staff, workers,
officers, members, directors, sponsors or
shareholders. We pass no opinion whatsoever on the
content of what we publish, nor do we accept any
responsibility for the claims, or any of the
statements made, within anything published herein.
We merely aim to provide an academic forum and an
information sourcing vehicle for the benefit of the
business and the academic communities of the Pacific
States of America and the World. Therefore, readers
must always determine for themselves where the
statistics, comments, statements and advice that are
published herein are gained from and act, or not
act, upon such entirely and always at their own
risk. We accept absolutely no liability
whatsoever, nor take any responsibility for what
anyone does, or does not do, based upon what is
published herein, or information gained through the
use of links to other web sites included herein.
Please refer to our:
legal
disclaimer
The Business
Forum Beverly Hills, California, United States of America
Email:
[email protected]
Graphics by
DawsonDesign
Webmaster:
bruceclay.com
|
|