The worker was unable to connect to the master (incorrect association) on the same computer - even if the url is correct
The master flash log includes the following:
05/15/19 21:05:19 INFO Remoting: Remoting has started; listening address: [ akka.tcp: // sparkMaster@mellyrn.local : 7077 ]
But the Worker cannot connect:
05/15/19 21:27:13 INFO Working: connecting to master akka.tcp: // sparkMaster@mellyrn.local : 7077 / user / master ... 05/15/19 21:27:13 WARN Remoting: tried bind to an inaccessible remote address [ akka.tcp: // sparkMaster@mellyrn.local : 7077 ]. The address is now in 5000 ms, all messages to this address will be delivered dead letters. Reason: Connection refused: mellyrn.local / 25.101.19.24: 7077 15/05/19 21:27:25 INFO Worker: Retry connecting to master (Attempt # 1) 05/15/19 21:27:25 INFO Worker: connecting to master akka.tcp: // sparkMaster@mellyrn.local : 7077 / user / master ... 05/15/19 21:27:25 WARN Remoting: Tried talking to inaccessible remote address [akka.tcp: // sparkMaster @ mellyrn.local: 7077]. The address is now in 5000 ms, all messages to this address will be delivered dead letters. Reason: Connection refused: mellyrn.local / 25.101.19.24: 7077
Any hints what to try here?
source to share
It looks like these errors were intermittent - and because of the host machine, memory was completely unavailable at the time . After shutting down some unrelated memory storage processes, the above errors mostly went away.
There is still an order of magnitude delay in reaching the Master / Worker association of several tens of seconds, which I would like to understand.
Note that there were no log messages describing the low memory situation.
source to share