|
Cluster Basics | |
Intro to Microsoft Cluster Server | |
High Availability | |
Manageability | |
Scalability | |
Application and Service Support | |
Microsoft Cluster Server and Windows NT Load Balancing Service |
What is a server "cluster"?
A server cluster is a group of independent servers managed as a single system
for higher availability, easier manageability, and greater scalability.
What does it take to create a server cluster?
The minimum requirements for a server cluster are (a) two servers connected by a
network, (b) a method for each server to access the other's disk data, and (c)
special cluster software like Microsoft® Cluster Server (MSCS). The
special software provides services such as failure detection, recovery, and the
ability to manage the servers as a single system. Further the hardware must be
on the Microsoft Hardware Compatibility list (see http://www.microsoft.com/hwtest/hcl
under clustering).
What are the benefits of server clustering?
There are three primary benefits to server clustering: improved availability,
easier manageability, and more cost-effective scalability. Using Microsoft
Cluster Server as an example:
|
||
|
||
|
What are clusters used for?
Customer surveys indicate that MSCS clusters will be used as highly available
multipurpose platforms, mirroring the current uses of the Microsoft Windows NT®
Server operating system. Surveyed customers suggested that the most common uses
of MSCS clusters will be mission-critical database management, file/intranet
data sharing, messaging, and general business applications.
When a cluster is recovering from a server failure, how does the surviving
server get access to the failed server's disk data?
There are basically three techniques that clusters use to make disk data
available to more than one server:
|
||
|
||
|
Intro to Microsoft Cluster Server |
What is "Wolfpack"?
"Wolfpack" was the code name for Microsoft Cluster Server.
What is Microsoft Cluster Server (MSCS)?
MSCS is a built-in feature of Windows NT Server, Enterprise Edition. It is
software that supports the connection of two servers into a "cluster"
for higher availability and easier manageability of data and applications. MSCS
can automatically detect and recover from server or application failures. It can
be used to move server workload to balance utilization and to provide for
planned maintenance without downtime. And, over time, MSCS will also become a
platform for highly scalable, cluster-aware applications.
How many servers can be in an MSCS cluster?
The initial release of MSCS supports clusters with two servers. A future version
referred to as MSCS "Phase 2" will support larger clusters, and will
include enhanced services to simplify the creation of highly scalable,
cluster-aware applications.
When will MSCS be available?
MSCS is available now in Windows NT Server 4.0 Enterprise Edition.
What other companies were involved in the development of MSCS?
Microsoft worked closely with leading hardware vendors, software vendors, and
customers in the specification and development of MSCS and its API. These other
companies participated through five different programs:
|
||
|
||
|
||
|
||
|
In what languages will MSCS be available?
Microsoft Windows NT Server, Enterprise Edition 4.0, which included MSCS
1.0, is available in English, French, German, Japanese, and Spanish.
Through what channels is Windows NT Server, Enterprise Edition be
available?
Microsoft Windows NT Server, Enterprise Edition is available to customers
through all standard channels: reseller, retail, OEM, and the Microsoft Select
licensing program.
What versions of Windows NT Server does MSCS support?
MSCS software is only available as a built-in feature of Windows NT Server
4.0, Enterprise Edition.
Will MSCS be extended beyond Windows NT Server to Windows NT Workstation?
There is currently no plan to extend cluster support to Windows NT
Workstation. MSCS software has been designed and written to closely integrate
with the architecture and features of Windows NT Server, including its
server-oriented networking and directory services capabilities.
What clients can connect to an MSCS cluster?
Any client that can connect to Windows NT Server through TCP/IP will work
with MSCS. This includes Microsoft MS-DOS®, Microsoft Windows®
3.x, Windows 95, Windows NT, Apple Macintosh, and UNIX. MSCS does not
require any special software on the client for transparent recovery of services
that connect to clients through standard IP protocols.
High Availability |
How does MSCS provide high availability?
MSCS uses software "heartbeats" to detect failed applications or
servers. In the event of a server failure, it employs a "shared
nothing" clustering architecture that automatically transfers ownership of
resources (such as disk drives and IP addresses) from a failed server to a
surviving server. It then restarts the failed server's workload on the surviving
server. All of this—from detection to restart—typically takes under a
minute. If an individual application fails (but the server does not), MSCS will
typically try to restart the application on the same server; if that fails, it
moves the application's resources and restarts it on the other server. The
cluster administrator can use a graphical console to set various recovery
policies, such as dependencies between applications, whether or not to restart
an application on the same server, and whether or not to automatically "failback"
(rebalance) workloads when a failed server comes back online.
Can MSCS provide "zero downtime"?
No. MSCS can dramatically reduce planned and unplanned downtime. However, even
with MSCS, a server could still experience downtime from the following events:
|
||
|
||
|
Microsoft recommends that clusters be used as one element in customers' overall programs to provide high integrity and high availability for their mission-critical server-based data and applications.
Is MSCS failover transparent to users?
MSCS does not require any special software on client computers, so the user
experience during failover depends on the nature of the client side of their
client-server application. Client reconnection is often transparent, because
MSCS has restarted the applications, file shares, and so on, at exactly the same
IP address.
If a client is using "state-less" connections such as a standard browser connection, then it would be unaware of a failover if it occurred between server requests. If a failure occurs while a client is connected to the failed resource, then the client will receive whatever standard notification is provided by the client side of the application in use when the server side becomes unavailable. This might be, for example, the standard "Abort, Retry, or Cancel?" prompt you get when using Windows Explorer to download a file at the time a server or network goes down. In this case, client reconnection is not automatic (the user must choose "Retry"), but the user is fully informed of what's happening and has a simple, well-understood method of reestablishing contact with the server. Of course, in the meantime, MSCS is busily restarting the service or application so that, when the user chooses "Retry," it reappears as if it never went away.
For client-side applications that have "state-full" connections to the server, a new logon is typically required following a server failure. In many cases, this approach is required for security purposes. For example, this is how SAP R/3 works—if the server connection is lost, the user is prompted to log on again to make sure it's the same user accessing the application.
Even with state-full connections, it's possible for an application to automatically reconnect following a failover. For example, when Microsoft demonstrated SAP R/3 failover at Microsoft Scalability Day in New York City on May 20, it was accessed through an Active browser application that had automatically (and securely) cached the user's ID and password from the initial logon. Thus, when the server connection was momentarily lost during the failover demo, the client application automatically logged on again using the cached ID and password. This was done using standard IP connections, running a simple Microsoft Visual Basic® development system program within an HTML document through the Microsoft ActiveX® technology.
When a server comes back online following a failure, is there any human
intervention required to get it back "up and running," or is the
heartbeat enough for the other server to include it once again?
No manual intervention is required. When a server running Microsoft Cluster
Server, say "Server A," boots, it starts the MSCS service
automatically. MSCS in turn checks the interconnect (and network if necessary)
to find the other server in its cluster, say "Server B." If Server A
finds Server B, then Server A rejoins the cluster and Server B updates it with
current cluster status info. Server A then initiates "failback,"
moving back failed-over workload from Server B to Server A at an appropriate
time.
What is "failback," and how does it work in MSCS?
"Failback" is the ability to automatically rebalance the workload in a
cluster when a failed server comes back online. This is a standard feature of
MSCS. For example, say "Server A" has crashed and its workload
failed-over to "Server B." When Server A reboots, it automatically
finds Server B and rejoins the cluster. It then checks to see if any of the
cluster groups running on Server B would "prefer" to be running on
Server A. If so, it automatically moves those groups from Server B to Server A
as soon as the time is right. Failback properties—that is, which groups can
failback, which is their preferred server, and during what hours the time is
"right" for failback—are all set from the cluster administration
console.
Can the servers in an MSCS cluster be located at separate locations for
recovery from site disasters?
Not at this time. All of the cluster configurations currently being considered
for validation use SCSI connections to storage resources, which limits the
distance between clustered servers to the distance supported by standard SCSI.
This is typically no more than 25 meters, though there are SCSI extender
technologies that can potentially stretch the connection up to 1,000 meters.
Note that Windows NT Server customers already have several choices for software that can mirror data to remote disaster recovery sites, including solutions from N.S.I., Octopus, Veritas, and Vinca. Most of these vendors have already announced that their disaster site mirroring solutions will also work with MSCS clusters.
Can MSCS restore registry keys for an application from one server to the
other when doing failover?
Yes. Recovery of an application's registry information is a configurable feature
that is available to the Generic Application and Generic Service resource types.
Basically, you tell it what registry keys to log and recover, and that's all
there is to it. This capability should be used if the application or service
stores volatile information in specific registry keys. If this is done, when the
resource comes online on another node, it will have the same registry
information as the previously online resource.
When an application restarts on another server following a failure, does
it re-start from a copy of the application?
No. The new server (say, "Server 2") would start the application from
the same physical disks as Server 1, since ownership of the application's disks
on the shared SCSI bus had been moved from Server 1 to Server 2 as one of the
first steps in the failover process. This approach assures that the application
always restarts from its last known state, as recorded on its disk drives (and,
if you use the available option, as recorded in its registry keys.)
Can MSCS restore an application's "state" at the time of its
failure rather than requiring a complete restart?
MSCS can restore the state of an application's registry keys, but any other
state information must be managed and restored by the application. Applications
need to provide some model for persistence to insure that state can be
recaptured. For example, Microsoft SQL Server™ uses transaction
logs to provide this assurance. If a server running Microsoft SQL Server
crashes, upon restart the application uses its transaction logs to bring the
database back to a known state. With a cluster, just as with a single server,
good application design and the use of ACID (Atomic, Consistent, Isolated, and
Durable) transaction properties are important.
What is the granularity of resource failover?
MSCS supports failover of "virtual servers," which usually correspond
to applications, Web sites, print queues, or file shares (including their disk
spindles, files, IP addresses, and so on). MSCS also provides cluster-wide
services that are simultaneously available on all servers in the cluster,
including cluster administration, performance monitoring, event viewing, a
cluster name, and cluster time synchronization.
What is a "quorum disk" and how does it help MSCS provide high
availability?
It's a disk spindle that MSCS uses to determine whether or not another server is
up or down. Technically, it's a resource that can only be owned by one server at
a time, and for which servers can negotiate for ownership. Negotiating for the
quorum drive allows MSCS to avoid "split brain" situations where both
servers are active and think the other server is down. (This can happen when,
for example, the cluster interconnect is lost and network response time is
problematic.) The use of a quorum resource is one of the sophisticated
algorithms that Microsoft got by working with pioneers in clustering such as
Digital and Tandem.
Manageability |
How does MSCS improve the manageability of servers?
MSCS gives administrators a graphical console from which they can monitor and
manage all of the resources in a cluster as if it was a single system. Using the
familiar standards of a Microsoft Windows graphical user interface, an
administrator can use the cluster console to:
audit the status of all servers and applications in the cluster. | |
set up new applications, file shares, print queues, and so on, for high availability. | |
administer the recovery policies for applications and resources. | |
take applications offline, bring them back online, and move them from one server to another. |
The ability to graphically move workload from one server to another with only a momentary pause in service (typically less than a minute) means administrators can easily unload servers for planned maintenance without taking important data and applications offline for long periods of time.
Does MSCS provide administrators with a "single system image"?
Yes. MSCS provides administrators a single graphical console to manage all of
the applications and resources in a cluster. The MSCS console presents cluster
resources by physical server, and by "virtual server" (or
"cluster group"). This allows administrators to centrally manage the
cluster as a collection of virtual application-oriented servers, or as a
collection of physical resources when appropriate.
Can MSCS be remotely managed?
Yes. An authorized user can run the MSCS administration console from any Windows NT
Workstation or Windows NT Server on the network. In the version of MSCS
accompanying Windows NT Server, Enterprise Edition 5.0, the cluster
administration console will be a "snap-in" to the Microsoft Management
Console, providing scriptable, remoteable access, including access through
Internet protocols from a browser.
How does MSCS help administrators do "rolling upgrades" of their
servers?
With MSCS, server administrators no longer have to do all their maintenance
within those rare windows of opportunity when no users are online. Instead, they
can simply wait until a convenient off-peak time when one of the servers in the
cluster has enough horsepower for all of the cluster workload. They then
point-and-click to move all the workload onto one server, and they're ready to
perform maintenance on the unloaded server. Once the maintenance is complete and
tested, they bring that server back online and it automatically rejoins the
cluster, ready for work. When convenient, the administrator repeats the process
to perform maintenance on the other server in the cluster. This ability to keep
applications and data online while performing server maintenance is often
referred to as doing "rolling upgrades" to your servers.
Will Microsoft support "rolling upgrades" of future server
products using MSCS clusters?
It is Microsoft's goal to support "rolling upgrades" between releases
of Microsoft server software using MSCS clusters. However, we cannot commit to
this for all releases of all products. Persistent storage formats must
occasionally change to accommodate new capabilities, and changes in persistent
storage occasionally require applications to be taken offline while storage or
indices are restructured. Microsoft will commit to always providing smooth
upgrades between releases of all our products, and we'll use MSCS to provide
seamless rolling upgrades whenever possible.
Scalability |
How will MSCS enhance server scalability?
The manageability benefits of the initial version of MSCS will simplify many of
the processes currently used to improve scalability, such as upgrading server
hardware and installing new versions of applications.
A future version of MSCS, "Phase 2," will support clusters containing large numbers of servers, and will provide enhanced abilities that simplify the creation of highly scalable, cluster-aware applications.
The Microsoft cluster strategy White Paper said MSCS is already
architected for multiple nodes. Has MSCS been tested on multinode clusters? If
so, why is Microsoft waiting to deliver multinode support?
Yes, Microsoft and other vendors have tested MSCS clusters with more than two
servers. These clusters "work" in that they are stable and the
administrator's console provides basic management for the multiserver
environment. However, the algorithms and features in the current software must
be extended and thoroughly tested on larger clusters before customers can
reliably use a multinode MSCS cluster for production work, or gain enhanced
cluster benefits. In addition, Microsoft will have to extend the cluster
hardware validation procedures to accommodate the additional requirements of
multinode clusters.
Microsoft has architected MSCS for multinode support in preparation for the coming "Phase 2" version. Today's multinode tests have proven the architecture is correct. However, there are two key reasons Microsoft is limiting the initial release to two-server clusters:
How will MSCS help do load balancing?
"Load balancing" is the ability to move work from a very busy server
to a less-busy server. MSCS will support load balancing in four ways over time:
Should cluster-aware applications developed for MSCS use a shared-disk or
shared-nothing architecture for greatest scalability?
Microsoft recommends a shared-nothing architecture for cluster-aware
applications because of its greater scalability potential. With shared-disk
applications, copies of the application running on two or more servers in the
cluster share concurrent read/write access to a single set of disk files,
mediating ownership of the files using a "distributed lock manager" (DLM).
A shared-nothing application, on the other hand, avoids the potential bottleneck
of shared resources and a DLM by partitioning or replicating the data so that
each server in the cluster works primarily with its own data and disk resources.
In theory, MSCS can support either type of application. However, Microsoft has
no plans at this time to include a DLM in the MSCS cluster services, so vendors
would have to develop or license a DLM to implement a shared-disk application on
MSCS. Microsoft has chosen to use the shared-nothing architecture for future
versions of Microsoft BackOffice® family applications because of
that architecture's greater potential for cluster-enabled scalability.
Will MSCS ever have a Distributed Lock Manager (DLM)?
Microsoft will not include a distributed lock manager in the first release of
MSCS. Enhancements in future releases will be determined based on customer
requirements.
When will Microsoft offer a parallel version of Microsoft SQL Server that
runs on multiple servers at the same time for automatic load balancing and
scalability?
The next major release after Microsoft SQL Server 7.0 is planned to offer
cluster-enabled scalability on MSCS clusters. It will use a scalable
"shared nothing" architecture to spread a single database across
multiple servers. Although this is an important direction for Microsoft SQL
Server, it must be kept in perspective: It will only be needed by a small
percent of customers. Cluster-enabled scalability will only be needed by
extremely large enterprise applications which are (a) too large to run on a
single high-end SMP server (for example, eight-processor SMP with 4 GB of RAM),
and (b) cannot be partitioned to run on a distributed network using MTS.
What are Microsoft's plans for supporting Distributed Message Passing (DMP)?
Distributed Message Passing is one of the intracluster communications techniques
that are planned for Phase 2 of MSCS. (Another is I/O shipping.) Applications
will be able to access MSCS DMP services through extensions to the Cluster API.
MSCS in turn will host the DMP services over a variety of interconnect
technologies including new low-latency drivers based on the Virtual Interface
(VI) architecture. The result will be a standard infrastructure for supporting a
new generation of scalable, cluster-aware applications.
Application and Service Support |
What types of applications and services will benefit from MSCS clustering?
There are three types of server applications that will benefit from MSCS
clusters:
|
||
|
||
|
What software vendors will offer cluster-aware applications for MSCS?
Software vendors that have already announced plans to offer products for MSCS
clusters include Baan, Cheyenne, Computer Associates (CA/Unicenter TNG), HP (ClusterView),
IBM (DB2), NetIQ, Octopus, Oracle (Oracle 7 Failsafe), SAP, Vinca, and, of
course, Microsoft (Microsoft SQL Server, Enterprise Edition, and Exchange
Server, Enterprise Edition.) For an up-to-date list of announced products that
support MSCS, refer to the Microsoft Windows NT Server, Enterprise Edition
Solutions Directory look here.
Will Microsoft validate or logo software products that work with MSCS?
Microsoft will not have a validation program for MSCS-based software products at
first. It is expected that once MSCS clusters are deployed in volume and there
are sufficient examples of cluster-aware application products to evaluate,
Microsoft will extend its Microsoft BackOffice logo program to include, at a
minimum, validation of support for basic failover operation on an MSCS cluster.
What are Microsoft's plans for supporting Microsoft SQL Server on MSCS
clusters?
Microsoft SQL Server, Enterprise Edition version 6.5 is available now and
provides "active/active" cluster support (for example, both servers
can be running SQL Server, with each server supporting its own databases).
Microsoft SQL Server 7.0, currently in beta test, will include additional
cluster-aware enhancements that provide for faster recovery in the event of a
server or application failure. The version of Microsoft SQL Server that follows
release 7.0 will include new features for shared-nothing scalability on MSCS
clusters (for example, a single database will be able to span multiple servers).
What are Microsoft's plans for supporting Microsoft Exchange Server on
MSCS clusters?
Microsoft will support the Enterprise Edition of Microsoft Exchange Server on
MSCS clusters. Exchange Server 5.5 Enterprise Edition provides
"active/passive" failover on an MSCS cluster. This means Exchange
Server 5.5 Enterprise Edition is able to run on one server in the cluster at a
time, and MSCS will be able to automatically restart Exchange Server on the
other server following an application or server failure. Future versions of
Exchange Server will be enhanced for active/active failover (for example,
ability to run Exchange Server simultaneously on both servers).
Can the standard versions of Microsoft SQL Server 6.5 or Exchange Server
5.0 be set up for failover on a cluster using the "generic
application" capability of MSCS?
Technically proficient customers who want to test Microsoft SQL Server 6.5 or
Exchange Server 5.0 on a cluster may do so using the generic application
capability of MSCS. However, the setup can be complex, and will not be supported
by Microsoft support services. Therefore, customers should only do so for
testing purposes, not for production deployments. Microsoft SQL Server,
Enterprise Edition version 6.5, and the "Osmium" release of Exchange
Server, Enterprise Edition will feature a simplified cluster setup procedure,
and will be fully supported for failover on MSCS clusters.
Will Microsoft SNA Server benefit from MSCS?
No, because Microsoft SNA Server already provides a hot failover capability
independent of MSCS.
Will Microsoft Proxy Server benefit from MSCS?
No, because the current version of Microsoft Proxy Server has its own capability
for chaining together multiple servers for high availability and scalability.
Will Microsoft Systems Management Server benefit from MSCS?
No, MSCS will not provide high availability for the current release of Microsoft
Systems Management Server. Microsoft intends to provide cluster-enabled high
availability for Systems Management Server in a future release.
Can MSCS failover a Windows NT Server Directory (Domain) Controller?
No, because it is already possible to have backup directory service controllers
for high availability. Servers in an MSCS cluster may be either primary or
backup directory controllers for Windows NT Directory Services.
Can MSCS failover a WINS (Windows Internet Name Service) server?
No, because it is already possible to have backup WINS servers for high
availability.
Can MSCS failover Remote Access Services (RAS)?
Remote Access Services cannot benefit from MSCS at this time since there is no
standard method for doing software failover of modem connections. For higher
reliability of dial-up connections, you can use the RAS Multi-Link capability
first introduced in Windows NT Server 4.0.
Can MSCS failover Microsoft Distributed File System (Dfs) directories?
Not in Windows NT Server, Enterprise Edition 4.0. The version of Dfs in Windows
2000 Server will provide directory replication for fault tolerance. When used on
the Enterprise Edition of Windows 2000 Server, Dfs will also work with MSCS
failover for fast recovery from server crashes.
What versions of Oracle will benefit from MSCS clusters?
Oracle has announced that Oracle Failsafe 2.0 is available for Oracle7 customers
at no extra cost. It provides "active/active" database failover on
MSCS clusters (for example, can run on both servers at the same time, and either
can failover to the other server in the event of an application or server
failure).
Does Tandem NonStop SQL/MX use MSCS?
Tandem NonStop SQL/MX uses MSCS clustering services when running on a two-server
cluster. NonStop SQL/MX uses its own single-application clustering services when
running on a cluster with more than two servers. Customers who want high
availability plus database scalability up to the performance provided by two
high-end SMP servers, will benefit by running NonStop SQL/MX on MSCS to gain the
additional benefits of high availability for other services and applications on
the cluster. Customers who require additional scalability would use the built-in
single-application cluster services of NonStop SQL/MX, trading off general
availability services for the ability to scale on more than two servers.
Microsoft Cluster Server and Windows NT Load Balancing Service |
How does Microsoft Cluster Server work with Windows NT Load Balancing
Service?
Windows NT load balancing service is fully complementary to Microsoft
Cluster Server. Microsoft Clustering Service provides a non-stop reliable
platform for data base, messaging and related application services through
fail-over clustering for two nodes. Windows NT Load Balancing Service
balances and distributes client connections (TCP/IP connections) over multiple
servers. In a three tier model, MSCS handles the application layer and the data
layer, while the Convoy or Windows NT Load Balancing Service is focused on
handling the front end connections. When used together, Microsoft Cluster Server
and Windows NT Load Balancing Service provide customers with a highly
scalable, reliable and available system. This is an industry leading way to
combine transactional systems with a web-based front end, and to deliver the
scale, availability and robustness demanded by enterprise class customers.
Return to previous menu