Detecting dead applications while the server is alive in NLB

Windows NLB works fine and removes the computer from the cluster when the computer is dead.

But what happens if the application dies but the server is still running fine? How did you solve this problem?

thank

0


source to share


4 answers


Not using NLB.

Hardware balancers often have configurable "probe" functions to determine if a server is responding to requests. This can be access to the real port / URL of the application, or a specific "healthcheck" URL that is returned only if the application is healthy.

Other parameters on them look at the queue / time taken to respond to requests



Cisco put it this way:

The Cisco CSM continuously monitors the availability of servers and applications using a variety of probes, in-band health monitoring, return code check, and Dynamic Feedback Protocol (DFP). When a real server or gateway, Cisco CSM redirects traffic to another location. Servers added and removed without disrupting the service system are easily expanded or downward.

(from here: http://www.cisco.com/en/US/products/hw/modules/ps2706/products_data_sheet09186a00800887f3.html#wp1002630 )

+1


source


Presumably with Windows NLB there is some way to programmatically set node weights? The nodes are supposed to self-monitor, and if there is any problem (for example, a specific node on disk is smaller), set its weight to zero so that it does not receive any additional traffic.

However, this needs to be carefully designed and further monitoring of people is required to make sure you don't end up with a single mistake causing the entire cluster to announce itself.



You can't really hope to deal with the "byzantine general" situation in NLB; an incorrectly broken node might think it's ok, looks good, but being completely unable to do any actual work. The trick is to try to minimize the possibility of these situations occurring in production.

0


source


There are several levels of checking the health of a network application.

  • - server machine?
    • is the application (service) running?
    • - a service that accepts network connections?
    • Does the service match the correct "are you ok" query?
    • is the service really doing the real work? (this will also check the server systems for the maintenance you are doing)

My experience with NLB may not be complete, but I will describe what I know. NLB can do 1 and 2. With custom coding, you can add other levels with varying difficulty. With some network architectures, this can be very difficult.

Most hardware balancers from vendors such as Cisco or F5 can be easily configured to run 3 or 4. The Layer 5 test still requires special coding.

0


source


We start in a situation where all nodes are part of a cluster but are inactive. We run our own service monitor, which makes a request for the service locally through the frontend. If the response was successful, we start node (let it start processing NLB traffic). If the answer fails, we will stop the node from traffic.

Any intermediate steps Darron described are irrelevant. Whether it worked or not, that's the only thing we care about. If the machine is not available, the rest of the NLB cluster will consider it unsuccessful.

0


source







All Articles