Hortonworks Nodemanager starts but then fails: Connection refused: 8042

Question

Hortonworks Nodemanager starts but then fails: Connection refused: 8042

I am trying to resolve an issue with a recently added datanode in our Hortonworks cluster. The namenode YARN node manager will fail shortly after starting. The following error message log appears:

Connection failed to http://(ipaddress):8042/ws/v1/node/info (Traceback (most recent call last):
  File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/alerts/alert_nodemanager_health.py", line 166, in execute
    connection_timeout=curl_connection_timeout, kinit_timer_ms = kinit_timer_ms)
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/curl_krb_request.py", line 198, in curl_krb_request
    _, curl_stdout, curl_stderr = get_user_call_output(curl_command, user=user, env=kerberos_env)
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/get_user_call_output.py", line 61, in get_user_call_output
    raise ExecutionFailed(err_msg, code, files_output[0], files_output[1])
ExecutionFailed: Execution of 'curl --location-trusted -k --negotiate -u : -b /var/lib/ambari-agent/tmp/cookies/4268dd36-9f72-4be0-8d82-5f0a124a3a72 -c /var/lib/ambari-agent/tmp/cookies/4268dd36-9f72-4be0-8d82-5f0a124a3a72 http://gdcdrwhdb821.dir.ucb-group.com:8042/ws/v1/node/info --connect-timeout 5 --max-time 7 1>/tmp/tmp7pZrbM 2>/tmp/tmpgM4wdg' returned 7.   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0curl: (7) Failed connect to (ipaddress):8042; Connection refused
)

This doesn't really tell me why the connection was refused, except that no matter which thread process matches port 8042, the following fails:

netstat -tulpn | grep 8042

I've searched for another nodemanager log, possibly with more information, but can't find anything useful in / var / log / hadoop-yarn or thread. nodemanager.local-dirs / yarn.nodemanager.log-dirs

Are there other places where I can look for nionemanager error logs? Does anyone know what could be causing this?

Edit: After re-checking, I found this useful bit in / var / log / hadoop -yarn / yarn / yarn-yarn-nodemanager- (ipaddress) .log

2017-04-19 14:01:14,670 FATAL nodemanager.NodeManager (NodeManager.java:initAndStartNodeManager(549)) - Error starting NodeManager
org.apache.hadoop.service.ServiceStateException: java.lang.ClassNotFoundException: org.apache.spark.network.yarn.YarnShuffleService

+3

yarn

Koen de couck Apr 19 17 at 11:51

source to share

4 answers

Could you fix it?

I faced a similar problem today.

I stopped YARN on my HDP cluster and deleted the / var / log / hadoop -yarn / nodemanager / recovery-state directory and started YARN again.

Now nodemanager works fine.

+1

thanuja Dec 20. 17 at 9:24 am

source to share

This also works fine in my side. Please stop yarn service at a specific knot, not complete yarn service.

0

Khairul June 10. 18 at 10:24

source to share

I stopped YARN on my HDP cluster and deleted the / var / log / hadoop-yarn / nodemanager / recovery-state directory and started YARN again.

It worked for me too. I think it was a permission file issue.

0

Danillo Gontijo June 24. 19 at 13:53

source to share

PRASANNA SARAF · Accepted Answer · 2018-07-06T20:54:46+0000

Not sure if this helps now. You may have already solved this.

You are using an external shuffle service. This works as a helper service inside the nodemanager service. Currently it cannot find the service jar shuffle in the classpath.

Please add the location of the shuffle jar file to yarn.application.classpath in yarn-site.xml

Hortonworks Nodemanager starts but then fails: Connection refused: 8042

More articles: