Balancing fails on server exit
I wrote a simple set of microservices with the following architecture:
For all, I added spring-boot-starter-actuator
to add an endpoint /health
.
In the Zuul / Ribbon config, I added:
zuul:
ignoredServices: "*"
routes:
home-service:
path: /service/**
serviceId: home-service
retryable: true
home-service:
ribbon:
listOfServers: localhost:8080,localhost:8081
eureka.enabled: false
ServerListRefreshInterval: 1
This way, each time the client GET http://localhost:7070/service/home
calls, loadbalancer will select one of the two HomeService to run on port 8080 or 8081 and call its endpoint /home
.
But, when one of the HomeService shuts down, loadbalancer doesn't seem to know (despite the configuration ServerListRefreshInterval
) and fails error=500
if it tries to invoke the shutdown instance.
How can I fix this?
source to share
I got and tested solution from spring-cloud team .
The solution is here on github
Summarizing:
- I added
org.springframework.retry.spring-retry
to my classpath zuul - I added
@EnableRetry
zuul to my application - I have set the following properties in my zuul config
application.yml
server:
port: ${PORT:7070}
spring:
application:
name: gateway
endpoints:
health:
enabled: true
sensitive: true
restart:
enabled: true
shutdown:
enabled: true
zuul:
ignoredServices: "*"
routes:
home-service:
path: /service/**
serviceId: home-service
retryable: true
retryable: true
home-service:
ribbon:
listOfServers: localhost:8080,localhost:8081
eureka.enabled: false
ServerListRefreshInterval: 100
retryableStatusCodes: 500
MaxAutoRetries: 2
MaxAutoRetriesNextServer: 1
OkToRetryOnAllOperations: true
ReadTimeout: 10000
ConnectTimeout: 10000
EnablePrimeConnections: true
ribbon:
eureka:
enabled: false
hystrix:
command:
default:
execution:
isolation:
thread:
timeoutInMilliseconds: 30000
source to share
Debug timeouts can be tricky given that there are only three routing levels (Zuul -> Hystrix -> Ribbon), not including async runlevels and retry mechanism. The following diagram is valid for Spring Cloud releases Camden.SR6 and newer (I tested this on Dalston.SR1):
Zuul routes the request through RibbonRoutingFilter
, which creates a Ribbon command with the request context. The Ribbon command then creates a LoadBalancer command that uses spring-retry to execute the command, choosing a retry policy for RetryTemplate
according to Zuul's settings. @EnableRetry
does nothing in this case, because this annotation allows you to wrap methods with the annotation @Retryable
when you try to proxy again.
This means that the duration of your command is limited to the lesser of these two (see this post ):
- [
HystrixTimeout
] which is the timeout for the called Hystrix command - [
RibbonTimeout * MaxAutoRetries * MaxAutoRetriesNextServer
] (only try again if Zuul enabled them in its configuration), where [RibbonTimeout = ConnectTimeout + ReadTimeout
] is on the http client.
For debugging purposes it is convenient to create a breakpoint in the RetryableRibbonLoadBalancingHttpClient#executeWithRetry
or method RetryableRibbonLoadBalancingHttpClient#execute
. At the moment you have:
-
ContextAwareRequest
an instance (for example,RibbonApacheHttpRequest
orOkHttpRibbonRequest
) with a request context that contains the Zuul propertyretryable
; -
LoadBalancedRetryPolicy
intsance with a load balancing context that contains the Ribbon propertiesmaxAutoRetries
,maxAutoRetriesNextServer
andokToRetryOnAllOperations
; -
RetryCallback
an instance with requestConfig that contains the HttpClientconnectTimeout
andsocketTimeout
; -
RetryTemplate
an instance with a retry policy selected.
If the breakpoint is not hit, it means that the org.springframework.cloud.netflix.ribbon.apache.RetryableRibbonLoadBalancingHttpClient
bean was not created. This happens when the spring-retry library is not on the classpath.
source to share