Checking the health of your computer Google Compute Engine

Question

Checking the health of your computer Google Compute Engine

I have a node.js application on two VM instances that I am trying to load with NLB. To check that my servers are up and running, I have a health check request "/health.txt" on the application's internal listening port. I have two instances configured the same with the same tags, firewall rules, etc., but the health check cannot work with one instance continuously, I can perform the check using curl on my internal network or externally, and the test works fine in in both cases, but the network balancer always reports that one instance is missing.

I used ngrep and ran from a health instance, I see:

T 169.254.169.254:65374 -> my.pub.ip.addr:3000 [S]
#
T my.pub.ip.addr:3000 -> 169.254.169.254:65374 [AS]
#
T 169.254.169.254:65374 -> my.pub.ip.addr:3000 [A]
#
T 169.254.169.254:65374 -> my.pub.ip.addr:3000 [AP]
GET /health.txt HTTP/1.1.
Host: my.pub.ip.addr:3000.
.

#
T my.pub.ip.addr:3000 -> 169.254.169.254:65374 [A]
#
T my.pub.ip.addr:3000 -> 169.254.169.254:65374 [AP]
HTTP/1.1 200 OK.
X-Powered-By: NitroPCR.
Accept-Ranges: bytes.
Date: Fri, 14 Nov 2014 20:00:40 GMT.
Cache-Control: public, max-age=86400.
Last-Modified: Thu, 24 Jul 2014 17:58:46 GMT.
ETag: W/"2198506076".
Content-Type: text/plain; charset=UTF-8.
Content-Length: 13.
Connection: keep-alive.
.

#
T 169.254.169.254:65374 -> my.pub.ip.addr:3000 [AR]

But with the example of GCE claims are unhealthy, I see this:

T 169.254.169.254:61179 -> my.pub.ip.addr:3000 [S]
#
T 169.254.169.254:61179 -> my.pub.ip.addr:3000 [S]
#
T 169.254.169.254:61180 -> my.pub.ip.addr:3000 [S]
#
T 169.254.169.254:61180 -> my.pub.ip.addr:3000 [S]
#
T 169.254.169.254:61180 -> my.pub.ip.addr:3000 [S]

But if I twist the same file from my healthy copy> unhealthy copy, my "unhealthy" copy responds perfectly.

+3

google-compute-engine

regretoverflow 14 nov. '14 at 19:30

source to share

1 answer

regretoverflow · Answer 1 · 2014-11-15T18:54:02+0000

I got this job after talking to the Google Compute Engine team. There is a maintenance process in the GCE VM that should start at boot and keep running as long as the VM is alive. This process is called google-address-manager. It should work at runlevels 0-6. For some reason, this service stopped and won't start when one of my VMs boots / reboots. Starting the service manually. Here are the steps we took to determine what went wrong: (This is a Debian virtual machine)

sudo ip route list table all

The route table will be displayed. The table should have a route to the load balancer's public IP address:

local lb.pub.ip.addr dev eth0  table local  proto 66  scope host

If it doesn't, check if google-address-manager is running:

sudo service google-address-manager status

If it is not running, start it:

sudo service google-address-manager start

If it starts up ok check your route table and you should now have a route to your load balancer. You can also manually add this route:

sudo /sbin/ip route add to local lb.pub.ip.addr/32 dev eth0 proto 66

We still haven't worked out why the address manager stopped and won't start on boot, but at least LB Pool is healthy

Checking the health of your computer Google Compute Engine

More articles: