Microsoft Cluster Service (MSCS)

	Hardware Validation
	Servers
	Storage
	Interconnect
	Networking

Windows Load Balancing Service (WLBS)

	Scalability
	Web Server Applications
	WLBS Hardware
	Configuration and Support

HARDWARE VALIDATION

How is MSCS cluster hardware validated?
Complete cluster configurations (two particular servers plus a storage solution) are tested and validated using the Microsoft Cluster Hardware Compatibility Test (HCT). Anyone with an appropriate lab setup can run the test. The test procedure takes at least four days, and one-half of a full-time-equivalent Microsoft Certified Professional. The result of a successful test is an encrypted file that is returned to Microsoft Windows Hardware Quality Lab (WHQL). Upon validation of the test results, WHQL posts the tested configuration on the Microsoft Hardware Compatibility List (HCL).

Are there restrictions on who can validate configurations, or to how many configurations they can validate for MSCS?
There is no limit to the number of cluster configurations anyone can validate using the Microsoft Cluster Hardware Compatibility Test (HCT). Because of the lab setup, personnel, and time required to validate a cluster configuration, it is likely that system vendors, component vendors, system integration firms, and professional test labs will primarily do validations.

Where is the Microsoft Hardware Compatibility List (HCL) for MSCS?
The Microsoft HCL can be queried from the Microsoft web site at http://www.microsoft.com/hwdq/hcl/. To view validated cluster configurations, choose Category "Cluster".

The HCL also has Categories "Cluster/SCSI Adapter" and "Cluster/Raid". What are the devices in these categories?
To help vendors more rapidly validate cluster configurations for customers, components in those categories have passed Cluster Component Candidate testing. Customers should note that inclusion in those HCL Categories does NOT qualify a component for Microsoft Cluster support services. Those services are only available for validated configurations (Category "Cluster" on the Hardware Compatibility List).

How can the Cluster Hardware Compatibility Test (HCT) be obtained?
The Cluster HCT is now included in the standard Windows NT HCT, or may be obtained from the Microsoft Windows Hardware Quality Lab (WHQL). Email requests for the Cluster HCT to [email protected].

What are the general requirements for MSCS cluster hardware?
The most important criteria for MSCS hardware is that it be included in a validated Cluster configuration on the Microsoft Hardware Compatibility List, indicating it has passed the Microsoft Cluster Hardware Compatibility Test. Microsoft will only support MSCS when used on a validated cluster configuration. Validation is only available for complete configurations that were tested together, not on individual components.

A cluster configuration is composed of two servers, storage, and networking. Here are the general requirements for MSCS cluster hardware for Windows NT Server, Enterprise Edition 4.0:

Servers

Two PCI-based machines running Windows NT Server, Enterprise Edition. MSCS can run on Intel and compatible systems (Pentium 90 or higher processor), or RISC-based system with an Alpha processor. However, you cannot mix Intel Architecture and RISC servers in the same cluster.

Each server needs at least 64 MB of RAM; at least 500 MB of available hard disk space; a CD-ROM drive; Microsoft Mouse or compatible pointing device; and a VGA, Super VGA, or video graphics adapter compatible with Windows NT Server 4.0.

Storage

Each server needs to be attached to a shared, external SCSI bus that is separate from the system disk bus. The SCSI adapters need to be PCI. Applications and data are stored on one or more disks attached to this bus. There must be enough storage capacity on this bus for all of the applications running in the cluster environment. This configuration allows MSCS to migrate the applications between machines.

Microsoft recommends hardware RAID for all disks on the shared SCSI bus, to eliminate disk drives as a potential single point of failure. This means using either a RAID storage unit, or a SCSI host adapter that implements RAID across "dumb" disks.

Network

Each server needs at least two network cards. Typically, one is the public/corporate net and the other is a private net between the two nodes. The net adapters need to be PCI.

A static IP address is needed for each group of applications that move as a unit between nodes. MSCS can project the identity of multiple servers from a single cluster by using multiple IP addresses and computer names.

The Microsoft Hardware Compatibility List (HCL) includes 3 cluster Categories: "Cluster", "Cluster/SCSI Adapter", and "Cluster/Raid". What are these and how do they compare?
The most important Category for customers is "Cluster". This is the list of validated cluster configurations on which Microsoft can support Windows NT Server Enterprise Edition clustering. The other two categories are for vendors, system integrators, and test labs that are validating cluster configurations for customers. They list candidate components that have been partially tested by Microsoft within a particular cluster configuration. Microsoft does this candidate testing to help vendors and others more rapidly validate complete configurations for customers. When you query the HCL for these Categories ("Cluster/SCSI adapter" and "Cluster/Raid"), a message is displayed which explains this situation.

Here is a diagram which shows how all these various Hardware Compatibility Tests (HCTs) are used to validate each component of a cluster, and then the entire configuration together:

SERVERS

What system vendors are offering MSCS cluster configurations?
For a current list of validated cluster configurations, refer to the Microsoft Hardware Compatibility List (HCL) at http://microsoft.com/hwtest/hcl. Choose Category "Cluster".

Can an MSCS cluster be built from servers that use Digital Alpha processors?
Yes. Windows NT Server Enterprise Edition is available for both Intel and Alpha processors. Digital has already validated several cluster configurations for MSCS using Alpha-based servers. Note that MSCS does not permit mixing Intel-based and Alpha-based servers in the same cluster. This is because there are some differences in the on-disk format used by each type of processor, so MSCS would not be able to failover disks from one to the other.

Is it necessary that both servers within a cluster be identical?
The Cluster Hardware Compatibility Test does not require that both servers in a validated configuration be identical. MSCS runs on Windows NT Server, Enterprise Edition so a validated MSCS cluster can potentially contain any two servers that are validated to run that version of Windows NT. (One exception: you cannot mix Alpha and Intel Architecture processors in the same cluster.) Note that MSCS hardware validation only applies to a complete cluster configuration-two particular servers and a storage solution-so it is unlikely that system vendors will validate clusters containing servers from more than one system manufacturer. However, it is conceivable that system integrators or component vendors might validate mixed-vendor clusters in response to customer demand.

Will MSCS run on our existing servers?
This depends on whether or not your existing servers have been validated within a complete cluster configuration. There is a hardware validation process for MSCS clusters, just as there is for other Microsoft system software. An MSCS validation tests a complete cluster configuration, including specific models of servers and storage systems. Customers concerned about whether a server they buy today will work in an MSCS cluster in the future should check the Microsoft Hardware Compatibility List, and see if the server appears in any of the validated cluster configurations (choose Category "Cluster".) If not, the customer should question their hardware vendor about the vendor's plans to validate that server in MSCS cluster configurations.

Do you expect customers to implement clusters on their existing equipment?
This is potentially possible, and could eventually become quite common, but most of the initial customers will probably acquire new cluster systems. The validation process for MSCS will test complete cluster configurations-i.e., servers and storage together-not just individual components. Thus, if customers are already using selected servers and/or storage subsystems that have been validated within a complete MSCS cluster configuration, then they would be able to implement a cluster with those components by adding the rest of the hardware included in the validated configuration.

Is there any limit on the number of processors in today's 2-server clusters?
The architectural maximum is 64 processors in a 2-server cluster. Validated 2-server configurations at the time this FAQ was written could potentially have from 2 to 20 processors. Here's where those numbers come from:

A validated cluster configuration can potentially include any server that is on the Microsoft Hardware Compatibility List for Windows NT Server.

The Enterprise Edition of Windows NT Server is architected for up to 32 processors per server, so the architectural maximum is 64 processors in a 2-server cluster.

The largest server available for Windows NT Server/E at the time this FAQ was written has 10 processors.

The Microsoft Cluster Hardware Compatibility Test does not require that the two servers in a validated configuration have the same number of processors.

Thus, a validated MSCS cluster configuration today can theoretically have any combination of servers with from one to ten processors each, for a potential current maximum of 20 processors in a 2-server cluster.

STORAGE

What storage connection techniques does MSCS support?
MSCS is architected to work with standard Windows NT Server storage drivers, so it can potentially support any of the current or anticipated storage interconnections available through Win32 or Windows Driver Model. All of the cluster configurations currently validated for MSCS use standard PCI-based SCSI connections (including SCSI over fibre channel).

Does MSCS support fibre channel disk connections?
Yes. In reality this doesn't fundamentally change the way MSCS uses disks. Fiber connections are still using SCSI devices, simply hosted on a Fibre Channel bus instead of a SCSI bus. Conceptually, this is encapsulating the SCSI commands within Fibre Channel. Therefore, the SCSI commands upon which MSCS relies (Reserve/Release and Bus Reset) still function as they do over standard (or, non-fiber) SCSI.

Does MSCS prefer one type of SCSI signaling over the other (or, differential versus single-ended)?
When not using fibre channel, MSCS works best with differential SCSI with the "Y" cables. The termination should be outside the systems so that losing power in the system does not cause the termination on the SCSI bus to be lost. Also, note that good drives in good electrical/mechanical enclosures make this work better as well.

Does MSCS support RAID on disks in a cluster?
Yes. Hardware RAID should be used to protect disks connected to the shared multi-initiator SCSI bus or Fibre Channel bus. Other disks in the cluster may be protected by either hardware RAID or by the built-in software RAID ("FTDISK") capability of Windows NT Server.

Why doesn't MSCS support Windows NT Server software RAID ("FTDISK") for disks connected to the shared disk bus?
The current FTDISK capability in Windows NT Server provides excellent, cost-effective protection of disks connected to a single server. However, its architecture is not well suited to some situations that can occur when doing failover of disk resources connected to two servers through multi-initiator SCSI. Microsoft plans to enhance FTDISK in a future release to address this issue. In the meantime, disks connected to a Windows NT Server machine through multi-initiator SCSI can be fully protected by widely available hardware RAID.

Which hardware RAID devices does MSCS support?
Support for any particular RAID device depends on its inclusion in a validated cluster configuration. Validated cluster configurations are listed on the Microsoft Hardware Compatibility List.

Does MSCS support PCI RAID controllers?
Selected PCI RAID controllers may be validated within an MSCS cluster configuration. Some of these controllers store information about the state of the array on the card-not on the drives themselves-so it's possible that the cards in the two servers might not be in synch at the moment a failover occurs. For this reason, RAID controllers that store information in the controller will not work with MSCS. MSCS cluster configurations will only be validated with RAID solutions that store the meta-data for RAID sets on the disks themselves so that it is independent of the controllers.

Are there any plans to support a shared solid state drive?
No shared solid state drives have yet been tested, but there is nothing that would preclude their use. As long as the SCSI 2 reserve/release and bus reset functions are available these devices should work with MSCS.

Is it possible to add hard drives to an MSCS cluster without rebooting?
It depends on whether the drive cabinet supports this, since Windows NT will not do so until the Microsoft^® Windows^® 2000 Server release. There are examples of RAID cabinets validated for Windows NT that support changing volumes on the fly (with RAID parity.)

Can CD-ROM or removable-media disks such as a Jaz^™ Drive be used as a cluster disk resource?
No. All devices on the shared SCSI bus must be "physical disk" class devices, so no CD-ROM, and no removable media.

Can an MSCS cluster include dual-path SCSI to eliminate the SCSI bus as a single point of failure?
Yes, Microsoft Cluster Server will work with dual-path SCSI controllers and hardware RAID solutions provided, of course, that the dual-path solution is included within a validated cluster configuration on the Microsoft Hardware Compatibility List.

INTERCONNECT

What is a cluster "interconnect"?
It is recommended that MSCS clusters have a private network between the servers in the cluster. This private network is generally called an "interconnect," or a "system area network" (SAN). The interconnect is used for cluster-related communications. Carrying this communication over a private network provides dependable response time, which can enhance cluster performance. It also enhances reliability by providing an alternate communication path between the servers. This assures MSCS services will continue to function even if one of the servers in the cluster loses its network connections.

What type of information is carried over the cluster interconnect?
The interconnect in an MSCS cluster will potentially carry the following five types of information:

Server "heartbeats": These tell MSCS that another server is up and running.

Replicated state information: MSCS does this so that every server in the cluster knows which cluster groups and resources are running on every other server.

Cluster commands: MSCS software on one server can issue a command to the MSCS software on another server. For example, when moving an application, MSCS actually tells its current server to take it offline, and then tells the new server to bring it online.

Application commands: A cluster-aware application might use the interconnect to communicate among copies of the application running on multiple servers. This is generally referred to as "function shipping" or "distributed message passing".

Application data: A cluster-aware application might use the interconnect to transfer data between servers. This is generally called "input/output (I/O) shipping."

Can a cluster have more than one interconnect?
An MSCS cluster can only have a single private network, but MSCS will automatically revert to a public network connection for heartbeat and other cluster communications should it ever lose the heartbeat over the interconnect. Also, note that some vendors offer high-performance interconnect products that include redundant paths for fault tolerance.

What type of network is required for an MSCS cluster interconnect?
A validated MSCS cluster configuration can use as its interconnect virtually any network technology that is validated for Windows NT Server. This includes, for example, 10BaseT ethernet, 100BaseT ethernet, and specialized interconnect technologies such as Tandem ServerNet.

When is it necessary to have a high-performance interconnect such as 100BaseT Ethernet or Tandem ServerNet?
Interconnect performance can potentially affect cluster performance under two scenarios: (1) the cluster is running thousands of cluster groups and/or resources, or (2) the cluster is running a scalable, cluster-aware application that uses the interconnect to transfer high volumes of transactions or data. In either of these cases, customers should choose a cluster configuration with a higher-speed interconnect such as 100BaseT, or Tandem ServerNet. Cluster-aware applications that use MSCS to achieve very high levels of scalability will most likely become common in the MSCS "Phase 2" timeframe. Thus higher-speed interconnects are likely to become more important in larger, Phase 2 clusters.

There has been a lot of talk about "man in the middle" and "replay" attacks on machines connected across the Internet. Will MSCS clusters be vulnerable to this same type of attack if someone illegally connects to the interconnect between the servers?
No. MSCS employs packet signing for intracluster communications to protect against replay attacks.

When will MSCS support interconnects based on the Virtual Interface Architecture?
Microsoft expects to support interconnects based on the VI Architecture specification in Phase 2 of MSCS, which is scheduled for beta test in 1998. For more information on VI Architecture, refer to http://developer.intel.com/design/servers/vi/the_spec/specification.htm.

NETWORKING

Does MSCS support the failover of IP addresses?
Yes.

Does MSCS support other network protocols such as IPX?
MSCS network failover is based on IETF standard IP, so it supports all IP-based protocols such as TCP, UDP, and NBT. Non-IP protocols such as IPX are not supported.

How does MSCS do IP failover?
MSCS has the ability to failover (move) an IP address from one cluster node to another. The ability to failover an IP address depends on two things: 1) support for dynamic registration and deregistration of IP addresses, and 2) the ability to update the physical network address translation caches of other systems attached to the subnet on which an address is registered.

Dynamic address (de)registration is already implemented in Windows NT Server to support leasing IP addresses using the Dynamic Host Configuration Protocol (DHCP). To bring an IP Address resource online, the MSCS software issues a command to the TCP/IP driver to register the specified address. A similar command exists to deregister an address when the corresponding MSCS resource is taken offline.

The procedure for updating the address translation caches of other systems on a LAN is contained in the Address Resolution Protocol (ARP) procedure, which is implemented by Windows NT Server. ARP is an IETF standard, RFC 826. RFC 826 can be obtained on the Internet from http://www.netsys.com/rfc/rfc826.txt.

How does MSCS update router tables when doing IP failover?
As part of its automatic recovery procedures, MSCS will issue IETF standard ARP "flush" commands to routers to flush the machine addresses (MACs) related to IP addresses that are being moved to a different server.

How does the Address Resolution Protocol (ARP) cause systems on a LAN to update their tables that translate IP addresses to physical machine (MAC) addresses?
The ARP specification states that all systems receiving an ARP request must update their physical address mapping for the source of the request. (The source IP address and physical network address are contained in the request.) As part of the IP address registration process, the Windows NT TCP/IP driver broadcasts an ARP request on the appropriate LAN several times. This request asks the owner of the specified IP address to respond with its physical network address. By issuing a request for the IP address being registered, Windows NT Server can detect IP address conflicts; if a response is received, the address cannot be safely used. When it issues this request, though, Windows NT Server specifies the IP address being registered as the source of the request. Thus, all systems on the network will update their ARP cache entries for the specified address, and the registering system becomes the new owner of the address. Note that if an address conflict does occur, the responding system can send out another ARP request for the same address, forcing the other systems on the subnet to update their caches again. Windows NT Server does this when it detects a conflict with an address that it has successfully registered.

MSCS uses ARP broadcasts to reset MAC addresses, but ARP broadcasts don't pass routers. So what about clients behind the routers?
If the clients were behind routers, they would be using the router(s) to access the subnet where the MSCS servers were located. Accordingly, the clients would use their router (gateway) to pass the packets to the routers through whatever route (OSPF, RIP, and so on) is designated. The end result is that their packet is forwarded to a router on the same subnet as the MSCS cluster. This router's ARP cache is consistent with the MAC address(es) that have been modified during a failover. Packets thereby get to the correct Virtual server, without the remote clients ever having seen the original ARP broadcast.

Can an MSCS cluster be connected to different IP subnets?
(This is possible with a single Windows NT-based server, even with a single NIC, by binding different IP addresses to the NIC and by letting Windows NT Server route between them.) For example, can MSCS support the following configuration:

Yes, MSCS permits servers in a cluster to be connected to multiple subnets. MSCS supports physical multi-homing no differently than Windows NT Server does. The scenario shown in the picture above is perfectly acceptable. The two external subnets (1 and 2) could connect the same clients (redundant fabrics) or two different sets of clients. In this scenario, one of the external subnets (#1 or #2) would also have to be a backup for intracluster communication (or, back up the private subnet #3), in order to eliminate all single points of failure that could split the cluster. Note that MSCS would not support a slightly different scenario: NodeA only on Subnet1, NodeB only on Subnet2, with Subnet1 and Subnet2 connected by a router. This is because there is no way for MSCS to failover an IP address resource between two different subnets.

Can MSCS use a second Network Interface Card (NIC) as a hot backup to a primary NIC?
MSCS can only do this for the cluster interconnect. That is, it provides the ability to use an alternate network for the cluster interconnect if the primary network fails. This eliminates an interconnect NIC from being a single point of failure. There are vendors who offer fault tolerant NICs for Windows NT Server, and these can be used for the NICs that connect the servers to the client network.

How do you specify to MSCS which NIC to use for the interconnect, and which NIC(s) to use as backup interconnects?
The MSCS setup allows administrators to specify the exact role that a NIC provides to the cluster. There are three possible roles for each NIC in a cluster:

Use for all communications (Cluster and client)

Use only for internal cluster communications (cluster only)

Use only for client access

The typical MSCS cluster will have one NIC on each server designated for internal communications (cluster only), and one or more other NICs designated for all communications (cluster and client). In that case, the cluster-only NIC is the primary interconnect, and the "all communications" NIC(s) server as backup interconnects if the primary ever fails.

Examples of client-only NICs include a LAN/WAN/Internet connection where it would be ineffective or impolite to do heartbeats and cluster traffic.

Can MSCS work with "smart switches" that maintain a 1-to-1 mapping of MAC addresses to IP addresses? Will MSCS be continually forcing these devices to flush and reset their MAC-to-IP maps due to its use of multiple IPs per MAC, plus the ARP flushes when doing IP failover?
These switches are quite common in VLAN configurations in which the level 2 network fabric uses level 3 address information for switching packets. These switches only cache one IP address for each MAC address. Such layering "violation" allows switch vendors to do better lookups and use existing routing protocols to distribute host routes plus MAC addresses. MSCS can work with these switches, but it might affect their performance. If customers experience this problem, there are two possible solutions: (1) have a router sit between the cluster and the switch, or (2) disable the "smarts" on the smart switches.

Can an MSCS cluster work on switched ethernet, token ring, and ATM?
So long as the network protocol is IP, MSCS can accomplish IP address failover using any network interface cards validated for Windows NT. This is true whether the media is ethernet, switched ethernet, token ring, ATM, etc. Does MSCS let you map multiple IP addresses to a single network name for multi-homing an application? Yes. Simply create resources in the application's cluster group for the network name and each of the IP addresses, and then make the network name resource dependent on all of the IP addresses.

SCALABILITY

How is Windows NT Load Balancing Service different from other clustering and load balancing solutions that I have heard about?
Windows NT Load Balancing Service scales the performance of TCP/IP services, such as a Web, proxy, or FTP servers, in addition to ensuring their high availability. Other clustering solutions improve fault-tolerance but do not scale the performance of individual applications. However, they complement Windows NT Load Balancing Service’s capabilities by making back-end database servers highly available. Most other load balancing solutions introduce single points of failure or performance bottlenecks and cost significantly more. For additional information on how WLBS compares to other TCP/IP load balancing solutions, please see the competitive matrix.

Why is scaled performance important to my Internet services?
As Internet services have become essential to conducting daily business worldwide, they need to be able to handle a large volume of client requests without creating unwanted delays. By scaling performance using Windows NT Load Balancing Service, you can add up to thirty-two Windows NT servers to your cluster to keep up with the demand placed on these services.

How does Windows NT Load Balancing Service Software scale performance?
To scale performance, you run a copy of a TCP/IP service, such as a Web server, on each host within the cluster. Windows NT Load Balancing Service transparently distributes the client requests among the hosts and lets the clients access the cluster using one or more "virtual" IP addresses. Windows NT Load Balancing Service automatically load balances the number of connections handled by each host. Windows NT Load Balancing Service's unique, fully distributed, software architecture avoids the use of a centralized dispatcher, which enables it to deliver the industry's best performance and fault-tolerance.

Do I need a shared disk or disk replication to use Windows NT Load Balancing Service?
No. Since all copies of the TCP/IP service run independently and access only their local disks (for example, to fetch Web pages), they usually do not share information except by accessing a back-end database in some applications (such as electronic commerce).

Web Server Applications

I would like to use Windows NT Load Balancing Service to scale a Web server. How can I keep the Web pages on all of the cluster hosts up-to-date?
Since Web pages change relatively infrequently, you can simply copy the new Web pages to the local disks on all of the hosts. You can also use commercial file replication software to automatically reflect changes throughout the cluster. Microsoft provides this capability with Content Replication Server (CRS).

Does WLBS support multihomed Web servers?
Yes. WLBS treats all of the IP addresses assigned to it as virtual except for the single dedicated IP address assigned to each server.

My Web server calls a networked database server to store and retrieve information for clients. Can I still use Windows NT Load Balancing Service?
Yes. Since all of the Web servers in the cluster access a shared database server, updates will be synchronized by this server, which will keep the shared data consistent. Most of the performance bottleneck typically is in the Web server. Windows NT Load Balancing Service scales the performance of this front-end service and ensures that it provides continuous service to clients. Note that the back-end database server can be made highly available by using Microsoft Cluster Service (MSCS).

Does Windows NT Load Balancing Service support client sessions and SSL?
Yes. You can optionally specify that all network connections from the same client IP address be handled by a single server unless that server fails. In addition, you can direct all client requests from a TCP/IP Class C address range to a single server. This feature ensures that clients which use multiple proxy servers to access the cluster will have their TCP connections directed to the same cluster host.

WLBS HARDWARE

Do I need special hardware to interconnect my cluster hosts?
No. The cluster hosts are interconnected over a single local area network using standard Ethernet (10, 100, or gigabit) or FDDI adapter cards. In addition, a separate cluster interconnect is not needed; the cluster hosts communicate over the same subnet that connects to clients.

Can I use a switch to interconnect the cluster hosts?
Yes.

CONFIGURATION AND SUPPORT

Does Windows NT Load Balancing Service generate a lot of network traffic?
No. Each Windows NT Load Balancing Service host broadcasts a packet about once a second to tell the other hosts its status. Almost all of network bandwidth is available for client/server communications.

How do I configure WLBS to work with my XYZ application?
WLBS port rules for common applications:

HTTP Web servers typically listen on port 80. Affinity should be set to 'None', unless the Web server maintains client state in its memory, in which case set affinity to 'Single' or 'Class C'.

HTTPS HTTP over SSL (encrypted Web traffic) is usually handled on port 443. Affinity should be set to 'Single' or 'Class C' to ensure that client connections are always handled by the server that has SSL session established.

FTP FTP uses port 21 for control connection from the client and port 20 for return data connection from the server. Create two port rules that cover ports 20-21 and 1024-65,535 with affinity 'Single' or 'Class C' to ensure that both data and control connections are handled by the same server.

TFTP TFTP servers (BOOTP, etc.) use port 69 and can easily be load balanced with WLBS. Affinity should be set to 'None', when creating port rule covering port 69.

SMTP WLBS can be used effectively for scaling high-volume SMTP mailers. In this case you should use port 25 and set affinity to 'None'.

NBT NetBIOS over TCP/IP uses port 139 on the server. Affinity can be set to either 'None' or 'Single', but we recommend 'Single' for maximum compatibility with server applications.

How can I obtain technical support?
The Windows NT Load Balancing Service is supported by Microsoft Product Support Services (PSS). You can get more information on PSS at the Microsoft Product Support Services Web site.

How do I obtain Windows NT Load Balancing Service Software?
WLBS is currently available for download from the Windows NT Server download page.

Requirements for server clusters

An application can run on a server cluster under the following conditions:

	If the application communicates with clients over a network, the connection with clients is configured to use an IP-based protocol. Examples are TCP/IP, Distributed Component Object Model (DCOM), Named Pipes, and remote procedure call (RPC) over TCP/IP.
	The application must be able to specify where the application data is stored.
	The application must be able to restart and recover from failover (process of taking resources, either individually or in a group, offline on one node and bringing them back online on another node. The offline and online transitions occur in a predefined order, with resources that are dependent on other resources taken offline before and brought online after the resources upon which they depend. See also node; offline; online; resource).
	Clients that connect to applications on a server cluster must be able to attempt to reconnect in the event of network failure.

Configuring cluster network hardware

Choosing network hardware

The nodes of a cluster must be connected by one or more physically independent networks (sometimes referred to as interconnects). Although your server cluster can function with only one interconnect, two interconnects are strongly recommended; two interconnects eliminate any single point of failure that could disrupt communication between nodes.

Before you install the Cluster service, you must configure all nodes to use the TCP/IP protocol over all interconnects. Each network adapter must have an assigned Internet protocol (IP) address that is on the same network as the corresponding network adapter on the other nodes. Therefore, there can be no routers between two cluster nodes. However, routers can be placed between the cluster and its clients.

Network roles

You must configure each cluster network to have one of four roles. Each network can support:

	Only node-to-node communication.
	Only client-to-cluster communication.
	Both node-to-node communication and client-to-cluster communication.
	No cluster-related communication.

Networks that support only node-to-node communication are referred to as private networks. Networks that support only client-to-cluster communication are known as public networks. Networks that support both are known as mixed networks. For more information on networks and network roles, see Server cluster networks

Windows 2000 supports the use of DHCP addresses as private node addresses (you configure private node addresses through the Network and Dial-up Connections folder, not through cluster software). The Cluster service uses the Plug and Play network support to handle the events that occur when a DHCP-allocated IP address changes.

However, using short-term DHCP leases for a private node address has some disadvantages. First, when the address lease expires for a network, the node cannot communicate over that network until it obtains a new address. This makes the node's availability dependent on the availability of the DHCP server; the node cannot communicate over the network until it obtains its address from the DHCP server. Second, if an address lease expires on a network configured for public communication (communication between cluster nodes and clients), it might trigger failover for resource groups that used an IP Address resource on that network.

If possible, use either static IP addresses or a permanently leased DHCP address for private node addresses.

The Cluster service does not support the use of IP addresses assigned from a DHCP server for the cluster administration address (which is associated with the cluster name) or any IP Address resources.

Installing the Cluster service on computers with logically multi-homed adapters

A logically multihomed adapter is one that has two IP addresses assigned to it. These adapters can be used only for node-to-node cluster communication if their primary addresses are on the same IP subnet. If the primary addresses on all nodes are not on the same IP subnet, reorder the IP addresses assigned to the adapter by using the Network and Dial-up Connections folder.

Choosing IP addresses for private networks, an interconnect connects only the cluster nodes and does not support any other network clients, you can assign it a private Internet protocol (IP) network address instead of using one of your enterprise's official IP network addresses.

By agreement with the Internet Assigned Numbers Authority (IANA), several IP networks are always left available for private use within an enterprise. These reserved numbers are:

	10.0.0.0 through 10.255.255.255 (Class A)
	172.16.0.0 through 172.31.255.255 (Class B)
	192.168.0.0 through 192.168.255.255 (Class C)

You can use any of these networks or one of their subnets to configure a private interconnect for a cluster. For example, address 10.0.0.1 can be assigned to the first node with a subnet mask of 255.0.0.0. Address 10.0.0.2 can be assigned to a second node, and so on. No default gateway or WINS servers should be specified for this network.

Ask your network administrator which of the private networks or subnets you can use within your enterprise before configuring your cluster.

These private network addresses should never be routed. For more information on private IP network addresses, see Resources

Server cluster networks

A network (sometimes called an interconnect) performs one of the following roles in a cluster:

	A private network carries internal cluster communication. The Cluster service authenticates all internal communication, but administrators who are particularly concerned about security can restrict internal communication to physically secure networks. Network adapters on private networks should not point to name resolution servers on the public network. Otherwise, a client might inadvertently receive an Internet protocol (IP) address from a name resolution server and not be able to use the address because there is no physical route from the client to the computer with which the IP address is associated.
	A public network provides client systems with access to cluster application services. IP Address resources are created on networks that provide clients with access to cluster services.
	A mixed (public-and-private) network carries internal cluster communication and connects client systems to cluster application services.
	A network that is designated is neither public nor private carries traffic unrelated to cluster operation.

Preventing network failure

The Cluster service uses all available private and mixed networks for internal communication. Configure multiple networks as private or mixed to protect the cluster from a single network failure.

If there is only one such network available and it fails, the cluster nodes stop communicating with each other. When two nodes are unable to communicate, they are said to be partitioned. After two nodes become partitioned, the Cluster service automatically shuts down on one node to guarantee the consistency of application data and the cluster configuration. This can lead to the unavailability of all cluster resources.

For example, if each node has only one network adapter, and the network cable on one of the nodes fails, each node (because it is unable to communicate with the other) attempts to take control of the quorum resource. There is no guarantee that the node with a functioning network connection will gain control of the quorum resource. If the node with the failed network cable gains control, the entire cluster is unavailable to network clients.

However, if each node has at least two networks configured, one as public and the other as private, the Cluster service can detect a public network failure and fail over all resources that depend on a particular network adapter (through its IP address) to a node where this network is available. This is accomplished because the private network is still functioning properly.

Note that all nodes must have at least one subnet in common.

Node-to-node communication

The Cluster service does not use public networks for internal communication, even if a public network is the only available network. For example, suppose a cluster has Network A configured as private and Network B configured as public, and Network A fails. The Cluster service does not use Network B because it is public; thus, the nodes stop communicating and the cluster breaks apart.

For more information on the concepts in this topic, see:

	Nodes
	Quorum resource
	Network state management and failure detection

Features of server clusters

Server clusters provide high availability (for server clusters, the restarting of a failed application or the dispersion of the work to remaining computers when a computer or application in the server cluster fails), scalability, and manageability for resources and applications by clustering multiple servers, called nodes (for tree structures, a location on the tree that can have links to one or more items below it. For local area networks (LANs), a device that is connected to the network and is capable of communicating with other network devices. See also LAN). For server clusters, a computer system running Windows 2000 Advanced Server that is an active or inactive member of a cluster. See also server cluster.">nodes The purpose of server clusters is to preserve client access to applications and resources during failures and planned outages. If a cluster node, resource, or application is unavailable due to failure or maintenance, resources and applications migrate to an available node within the cluster.

Server clusters can combine up to four nodes. Windows 2000 Datacenter Server supports up to four nodes in a cluster. Windows 2000 Advanced Server is limited to two nodes. A server cluster cannot be made up of nodes running both Windows 2000 Advanced Server and Windows 2000 Datacenter Server. In a three-node or four-node server cluster, all nodes must run Windows 2000 Datacenter Server. Similarly, a two-node cluster must be made up of computers running either Windows 2000 Advanced Server or Windows 2000 Datacenter Server, but not both.

Server clusters achieve high availability by:

Detecting node or application failures. In the event of a failure, failover (process of taking resources, either individually or in a group, offline on one node and bringing them back online on another node. The offline and online transitions occur in a predefined order, with resources that are dependent on other resources taken offline before and brought online after the resources upon which they depend. See also node; offline; online; resource) occurs. Ownership of resources, such as disk drives and Internet Protocol (IP) addresses, is automatically transferred from a failed node to a surviving node. The part of the workload that is capable of restarting on the surviving node is then restarted. For example, a print queue might be restarted, but an SQL query that was running when the failure occurred might not restart. When a failed node comes back online, the cluster automatically rebalances the workload through a process called failback.

Minimizing planned downtime caused by maintenance or upgrades. You can perform a rolling upgrade from Windows NT 4.0 Enterprise Edition to Windows 2000 Advanced Server only if Service Pack 4 or later has been installed. It is recommended that you install the latest released Service Pack. You cannot upgrade to Windows 2000 Datacenter Server from Windows 2000 Advanced Server, Windows 2000 Server, Windows NT Server 4.0, or Windows NT Server 4.0, Enterprise Edition. Rolling upgrades for Service Packs and future versions of Windows 2000 Datacenter Server will be supported.

Administrators can move a node's workload onto other nodes and maintain or upgrade applications or hardware on the unloaded node. Once the maintenance or upgrade is completed and tested, the node is brought back online by the administrator, and it automatically rejoins the cluster. This process can be repeated when maintaining or upgrading other nodes in the cluster.

Important

You cannot upgrade to Windows 2000 Datacenter Server from Windows 2000 Advanced Server, Windows 2000 Server, Windows NT Server 4.0, or Windows NT Server 4.0, Enterprise Edition. Rolling upgrades for Service Packs and future versions of Windows 2000 Datacenter Server will be supported.

Server clusters achieve scalability by:

Every node in the cluster running its own workload. Server clusters use active/active clustering, in which all computers are running their own workload. This means every node in the cluster is available to do real work, and every node in the cluster is also available to recover the resources and workload of any other node in the cluster. There is no need to have a wasted, idle server waiting for a failure.

Server clusters improve manageability by:

Managing a cluster as a single system. You use cluster management applications, such as Cluster Administrator, to configure, control, and monitor resources for the entire cluster. You can install copies of Cluster Administrator, available through Windows 2000 Administration Tools, included on the Windows 2000 Server, Windows 2000 Advanced Server, and Windows 2000 Datacenter Server compact disc sets, on other computers on your network running Windows 2000. You can also administer a Windows 2000 server cluster remotely from a computer running Windows NT 4.0 Service Pack 3 or later, using the Windows NT Server, Enterprise Edition 4.0, Cluster Administrator tool. Cluster Administrator allows you to manage cluster objects, establish groups, initiate failover, handle maintenance, and monitor cluster activity through a graphical console. For more information on Cluster Administrator, see Using Cluster Administrator which the Cluster Administrator is automatically installed on a cluster node when you install the Cluster service (in Windows 2000).

The majority of this information came from the Microsoft website.

Return to previous menu

	Two PCI-based machines running Windows NT Server, Enterprise Edition. MSCS can run on Intel and compatible systems (Pentium 90 or higher processor), or RISC-based system with an Alpha processor. However, you cannot mix Intel Architecture and RISC servers in the same cluster.
	Each server needs at least 64 MB of RAM; at least 500 MB of available hard disk space; a CD-ROM drive; Microsoft Mouse or compatible pointing device; and a VGA, Super VGA, or video graphics adapter compatible with Windows NT Server 4.0.

	Each server needs at least two network cards. Typically, one is the public/corporate net and the other is a private net between the two nodes. The net adapters need to be PCI.
	A static IP address is needed for each group of applications that move as a unit between nodes. MSCS can project the identity of multiple servers from a single cluster by using multiple IP addresses and computer names.

	A validated cluster configuration can potentially include any server that is on the Microsoft Hardware Compatibility List for Windows NT Server.
	The Enterprise Edition of Windows NT Server is architected for up to 32 processors per server, so the architectural maximum is 64 processors in a 2-server cluster.
	The largest server available for Windows NT Server/E at the time this FAQ was written has 10 processors.
	The Microsoft Cluster Hardware Compatibility Test does not require that the two servers in a validated configuration have the same number of processors.
	Thus, a validated MSCS cluster configuration today can theoretically have any combination of servers with from one to ten processors each, for a potential current maximum of 20 processors in a 2-server cluster.

	Server "heartbeats": These tell MSCS that another server is up and running.
	Replicated state information: MSCS does this so that every server in the cluster knows which cluster groups and resources are running on every other server.
	Cluster commands: MSCS software on one server can issue a command to the MSCS software on another server. For example, when moving an application, MSCS actually tells its current server to take it offline, and then tells the new server to bring it online.
	Application commands: A cluster-aware application might use the interconnect to communicate among copies of the application running on multiple servers. This is generally referred to as "function shipping" or "distributed message passing".
	Application data: A cluster-aware application might use the interconnect to transfer data between servers. This is generally called "input/output (I/O) shipping."

	Use for all communications (Cluster and client)
	Use only for internal cluster communications (cluster only)
	Use only for client access

	HTTP Web servers typically listen on port 80. Affinity should be set to 'None', unless the Web server maintains client state in its memory, in which case set affinity to 'Single' or 'Class C'.
	HTTPS HTTP over SSL (encrypted Web traffic) is usually handled on port 443. Affinity should be set to 'Single' or 'Class C' to ensure that client connections are always handled by the server that has SSL session established.
	FTP FTP uses port 21 for control connection from the client and port 20 for return data connection from the server. Create two port rules that cover ports 20-21 and 1024-65,535 with affinity 'Single' or 'Class C' to ensure that both data and control connections are handled by the same server.
	TFTP TFTP servers (BOOTP, etc.) use port 69 and can easily be load balanced with WLBS. Affinity should be set to 'None', when creating port rule covering port 69.
	SMTP WLBS can be used effectively for scaling high-volume SMTP mailers. In this case you should use port 25 and set affinity to 'None'.
	NBT NetBIOS over TCP/IP uses port 139 on the server. Affinity can be set to either 'None' or 'Single', but we recommend 'Single' for maximum compatibility with server applications.