Analyzing the availability of one Google Cloud Spanner

The Single Lane Wrench is advertised with a 99.99% SLA. The US configuration will have exactly three copies per node, all in Council Bluffs, Iowa. Can you share information that violates why 99.99% (~ one hour of downtime per year) is believable, especially in the case of geographic localized disasters? I assume that Google has done a thorough analysis, otherwise it will not advertise the SLA, but I cannot find a detailed document.

In the event of a regional failure, what recovery procedures will Google perform and with what recovery time / expected data loss?

(I understand that a multi-region may be available and have seen some pricing data, but does not discuss it here).

+3


source to share


1 answer


Spanner automatically copies data for high availability. As you said, regional instances have three complete copies of the data. The key is that they are replicated across three zones in a region, which have independent power, cooling, network, etc. Zones usually fail independently of each other, so your other replicas can continue to service reads and writes even if one zone goes down. Multi-area provides even greater availability by replicating across regions.

Zone crashes are very rare and will be transparent to your application; Cloud Spanner automatically redirects requests to replicas that can serve the request. It would be even rarer if the region fell with data loss. Google is taking many measures against natural disasters.



We will provide managed backups next, but they will still be stored in Google datacenters. We are also working on a Dataflow connector to help you import / export data if you want to manage your backups.

+1


source







All Articles