Supervisor buffer connection error loading jar with halo

I am setting up a storm cluster node. So I have 3 zooker nodes, 1 halo, 2 supervisors and 1 storm client node. So when I look at my settings with zookeeper and nimbus as well as zookeeper and supervisor everything looks good. But when it comes to the supervisor trying to pull the jar file out of the nimbus data directory, the supervisor gets a "Connection failed" message. Out of frustration, I even opened the tcp and udp ports (0-65535) between the boxes, but I still get the connection refused.

I have verified that the permissions in the nimbus data directory are pretty open and the supervisor should be able to get to the directory and pull the file out ok. Here are the magazines.

Nimbus.log:

2014-11-23 07:07:50 b.s.zookeeper [INFO] Zookeeper state update: :connected:none
2014-11-23 07:07:50 o.a.z.ClientCnxn [INFO] EventThread shut down
2014-11-23 07:07:50 o.a.z.ZooKeeper [INFO] Session: 0x249d964a3c20008 closed
2014-11-23 07:07:50 c.n.c.f.i.CuratorFrameworkImpl [INFO] Starting
2014-11-23 07:07:50 o.a.z.ZooKeeper [INFO] Initiating client connection, connectString=172.31.40.214:2181,172.31.45.110:2181,172.31.47.13:2181/storm sessionTimeout=20000 watcher=com.netflix.curator.ConnectionState@40160f3d
2014-11-23 07:07:50 o.a.z.ClientCnxn [INFO] Opening socket connection to server /172.31.40.214:2181
2014-11-23 07:07:50 o.a.z.ClientCnxn [INFO] Socket connection established to ip-172-31-40-214.us-west-2.compute.internal/172.31.40.214:2181, initiating session
2014-11-23 07:07:50 o.a.z.ClientCnxn [INFO] Session establishment complete on server ip-172-31-40-214.us-west-2.compute.internal/172.31.40.214:2181, sessionid = 0x149d964a86c001d, negotiated timeout = 20000
2014-11-23 07:07:50 b.s.d.nimbus [INFO] Delaying event :remove for 30 secs for TestingStormClusterTopology-1-1416724578
2014-11-23 07:07:50 b.s.d.nimbus [INFO] Starting Nimbus server...

2014-11-23 07:08:20 b.s.d.nimbus [INFO] Killing topology: TestingStormClusterTopology-1-1416724578
2014-11-23 07:08:22 b.s.d.nimbus [INFO] Cleaning up TestingStormClusterTopology-1-1416724578

2014-11-23 07:09:39 b.s.d.nimbus [INFO] Uploading file from client to /home/ubuntu/data/storm/nimbus/inbox/stormjar-dc265069-ebde-482f-abee-ccb7915fa663.jar
2014-11-23 07:09:39 b.s.d.nimbus [INFO] Finished uploading file from client: /home/ubuntu/data/storm/nimbus/inbox/stormjar-dc265069-ebde-482f-abee-ccb7915fa663.jar
2014-11-23 07:09:39 b.s.d.nimbus [INFO] Received topology submission for TestingStormClusterTopology with conf {"topology.max.task.parallelism" nil, "topology.acker.executors" nil, "topology.kryo.register" nil, "topology.kryo.decorators" (), "topology.name" "TestingStormClusterTopology", "storm.id" "TestingStormClusterTopology-1-1416726579", "topology.workers" 3}
2014-11-23 07:09:39 b.s.d.nimbus [INFO] Activating TestingStormClusterTopology: TestingStormClusterTopology-1-1416726579
2014-11-23 07:09:39 b.s.s.EvenScheduler [INFO] Available slots: (["30d36d53-ee60-4667-8a37-44c674da23e7" 6703] ["30d36d53-ee60-4667-8a37-44c674da23e7" 6702] ["30d36d53-ee60-4667-8a37-44c674da23e7" 6701] ["30d36d53-ee60-4667-8a37-44c674da23e7" 6700])
2014-11-23 07:09:39 b.s.d.nimbus [INFO] Setting new assignment for topology id TestingStormClusterTopology-1-1416726579: #backtype.storm.daemon.common.Assignment{:master-code-dir "/home/ubuntu/data/storm/nimbus/stormdist/TestingStormClusterTopology-1-1416726579", :node->host {"30d36d53-ee60-4667-8a37-44c674da23e7" "ip-172-31-43-254.us-west-2.compute.internal"}, :executor->node+port {[2 2] ["30d36d53-ee60-4667-8a37-44c674da23e7" 6702], [3 3] ["30d36d53-ee60-4667-8a37-44c674da23e7" 6701], [4 4] ["30d36d53-ee60-4667-8a37-44c674da23e7" 6703], [5 5] ["30d36d53-ee60-4667-8a37-44c674da23e7" 6702], [6 6] ["30d36d53-ee60-4667-8a37-44c674da23e7" 6701], [7 7] ["30d36d53-ee60-4667-8a37-44c674da23e7" 6703], [8 8] ["30d36d53-ee60-4667-8a37-44c674da23e7" 6702], [9 9] ["30d36d53-ee60-4667-8a37-44c674da23e7" 6701], [1 1] ["30d36d53-ee60-4667-8a37-44c674da23e7" 6703]}, :executor->start-time-secs {[1 1] 1416726579, [9 9] 1416726579, [8 8] 1416726579, [7 7] 1416726579, [6 6] 1416726579, [5 5] 1416726579, [4 4] 1416726579, [3 3] 1416726579, [2 2] 1416726579}}
2014-11-23 07:11:42 b.s.d.nimbus [INFO] Executor TestingStormClusterTopology-1-1416726579:[2 2] not alive
2014-11-23 07:11:42 b.s.d.nimbus [INFO] Executor TestingStormClusterTopology-1-1416726579:[3 3] not alive
2014-11-23 07:11:42 b.s.d.nimbus [INFO] Executor TestingStormClusterTopology-1-1416726579:[4 4] not alive
2014-11-23 07:11:42 b.s.d.nimbus [INFO] Executor TestingStormClusterTopology-1-1416726579:[5 5] not alive
2014-11-23 07:11:42 b.s.d.nimbus [INFO] Executor TestingStormClusterTopology-1-1416726579:[6 6] not alive
2014-11-23 07:11:42 b.s.d.nimbus [INFO] Executor TestingStormClusterTopology-1-1416726579:[7 7] not alive
2014-11-23 07:11:42 b.s.d.nimbus [INFO] Executor TestingStormClusterTopology-1-1416726579:[8 8] not alive
2014-11-23 07:11:42 b.s.d.nimbus [INFO] Executor TestingStormClusterTopology-1-1416726579:[9 9] not alive
2014-11-23 07:11:42 b.s.d.nimbus [INFO] Executor TestingStormClusterTopology-1-1416726579:[1 1] not alive
2014-11-23 07:11:42 b.s.d.nimbus [INFO] Setting new assignment for topology id TestingStormClusterTopology-1-1416726579: #backtype.storm.daemon.common.Assignment{:master-code-dir "/home/ubuntu/data/storm/nimbus/stormdist/TestingStormClusterTopology-1-1416726579", :node->host {}, :executor->node+port {}, :executor->start-time-secs {[1 1] 1416726579, [9 9] 1416726579, [8 8] 1416726579, [7 7] 1416726579, [6 6] 1416726579, [5 5] 1416726579, [4 4] 1416726579, [3 3] 1416726579, [2 2] 1416726579}}
2014-11-23 07:11:52 b.s.d.nimbus [INFO] Executor TestingStormClusterTopology-1-1416726579:[2 2] not alive
2014-11-23 07:11:52 b.s.d.nimbus [INFO] Executor TestingStormClusterTopology-1-1416726579:[3 3] not alive

      

And here is the superisor.log file.

Supervisor.log

2014-11-23 07:08:55 b.s.d.supervisor [INFO] Starting Supervisor with conf {"dev.zookeeper.path" "/tmp/dev-storm-zookeeper", "topology.tick.tuple.freq.secs" nil, "topology.builtin.metrics.bucket.size.secs" 60, "topology.fall.back.on.java.serialization" true, "topology.max.error.report.per.interval" 5, "zmq.linger.millis" 5000, "topology.skip.missing.kryo.registrations" false, "storm.messaging.netty.client_worker_threads" 1, "ui.childopts" "-Xmx768m", "storm.zookeeper.session.timeout" 20000, "nimbus.reassign" true, "topology.trident.batch.emit.interval.millis" 500, "nimbus.monitor.freq.secs" 10, "logviewer.childopts" "-Xmx128m", "java.library.path" "/opt/jdk", "topology.executor.send.buffer.size" 1024, "storm.local.dir" "/home/ubuntu/data/storm", "storm.messaging.netty.buffer_size" 5242880, "supervisor.worker.start.timeout.secs" 120, "topology.enable.message.timeouts" true, "nimbus.cleanup.inbox.freq.secs" 600, "nimbus.inbox.jar.expiration.secs" 3600, "drpc.worker.threads" 64, "topology.worker.shared.thread.pool.size" 4, "nimbus.host" "localhost", "storm.messaging.netty.min_wait_ms" 100, "storm.zookeeper.port" 2181, "transactional.zookeeper.port" nil, "topology.executor.receive.buffer.size" 1024, "transactional.zookeeper.servers" nil, "storm.zookeeper.root" "/storm", "storm.zookeeper.retry.intervalceiling.millis" 30000, "supervisor.enable" true, "storm.messaging.netty.server_worker_threads" 1, "storm.zookeeper.servers" ["172.31.40.214" "172.31.45.110" "172.31.47.13"], "transactional.zookeeper.root" "/transactional", "topology.acker.executors" nil, "topology.transfer.buffer.size" 1024, "topology.worker.childopts" nil, "drpc.queue.size" 128, "worker.childopts" "-Xmx768m", "supervisor.heartbeat.frequency.secs" 5, "topology.error.throttle.interval.secs" 10, "nimbus.host.ip" "172.31.47.40", "zmq.hwm" 0, "drpc.port" 3772, "supervisor.monitor.frequency.secs" 3, "drpc.childopts" "-Xmx768m", "topology.receiver.buffer.size" 8, "task.heartbeat.frequency.secs" 3, "topology.tasks" nil, "storm.messaging.netty.max_retries" 30, "topology.spout.wait.strategy" "backtype.storm.spout.SleepSpoutWaitStrategy", "topology.max.spout.pending" nil, "storm.zookeeper.retry.interval" 1000, "topology.sleep.spout.wait.strategy.time.ms" 1, "nimbus.topology.validator" "backtype.storm.nimbus.DefaultTopologyValidator", "supervisor.slots.ports" [6700 6701 6702 6703], "topology.debug" false, "nimbus.task.launch.secs" 120, "nimbus.supervisor.timeout.secs" 60, "topology.message.timeout.secs" 30, "task.refresh.poll.secs" 10, "topology.workers" 1, "supervisor.childopts" "-Xmx256m", "nimbus.thrift.port" 6627, "topology.stats.sample.rate" 0.05, "worker.heartbeat.frequency.secs" 1, "topology.tuple.serializer" "backtype.storm.serialization.types.ListDelegateSerializer", "topology.disruptor.wait.strategy" "com.lmax.disruptor.BlockingWaitStrategy", "nimbus.task.timeout.secs" 30, "storm.zookeeper.connection.timeout" 15000, "topology.kryo.factory" "backtype.storm.serialization.DefaultKryoFactory", "drpc.invocations.port" 3773, "logviewer.port" 8000, "zmq.threads" 1, "storm.zookeeper.retry.times" 5, "storm.thrift.transport" "backtype.storm.security.auth.SimpleTransportPlugin", "topology.state.synchronization.timeout.secs" 60, "supervisor.worker.timeout.secs" 30, "nimbus.file.copy.expiration.secs" 600, "storm.messaging.transport" "backtype.storm.messaging.netty.Context", "logviewer.appender.name" "A1", "storm.messaging.netty.max_wait_ms" 1000, "drpc.request.timeout.secs" 600, "storm.local.mode.zmq" false, "ui.port" 8080, "nimbus.childopts" "-Xmx1024m", "storm.cluster.mode" "distributed", "topology.optimize" true, "topology.max.task.parallelism" nil}
2014-11-23 07:08:56 c.n.c.f.i.CuratorFrameworkImpl [INFO] Starting
2014-11-23 07:08:56 o.a.z.ZooKeeper [INFO] Initiating client connection, connectString=172.31.40.214:2181,172.31.45.110:2181,172.31.47.13:2181 sessionTimeout=20000 watcher=com.netflix.curator.ConnectionState@76a78717
2014-11-23 07:08:56 o.a.z.ClientCnxn [INFO] Opening socket connection to server /172.31.47.13:2181
2014-11-23 07:08:56 o.a.z.ClientCnxn [INFO] Socket connection established to ip-172-31-47-13.us-west-2.compute.internal/172.31.47.13:2181, initiating session
2014-11-23 07:08:56 o.a.z.ClientCnxn [INFO] Session establishment complete on server ip-172-31-47-13.us-west-2.compute.internal/172.31.47.13:2181, sessionid = 0x349d964c0d30018, negotiated timeout = 20000
2014-11-23 07:08:56 b.s.zookeeper [INFO] Zookeeper state update: :connected:none
2014-11-23 07:08:56 o.a.z.ClientCnxn [INFO] EventThread shut down
2014-11-23 07:08:56 o.a.z.ZooKeeper [INFO] Session: 0x349d964c0d30018 closed
2014-11-23 07:08:56 c.n.c.f.i.CuratorFrameworkImpl [INFO] Starting
2014-11-23 07:08:56 o.a.z.ZooKeeper [INFO] Initiating client connection, connectString=172.31.40.214:2181,172.31.45.110:2181,172.31.47.13:2181/storm sessionTimeout=20000 watcher=com.netflix.curator.ConnectionState@603043f6
2014-11-23 07:08:56 o.a.z.ClientCnxn [INFO] Opening socket connection to server /172.31.40.214:2181
2014-11-23 07:08:56 o.a.z.ClientCnxn [INFO] Socket connection established to ip-172-31-40-214.us-west-2.compute.internal/172.31.40.214:2181, initiating session
2014-11-23 07:08:56 o.a.z.ClientCnxn [INFO] Session establishment complete on server ip-172-31-40-214.us-west-2.compute.internal/172.31.40.214:2181, sessionid = 0x149d964a86c001f, negotiated timeout = 20000
2014-11-23 07:08:56 b.s.d.supervisor [INFO] Starting supervisor with id 30d36d53-ee60-4667-8a37-44c674da23e7 at host ip-172-31-43-254.us-west-2.compute.internal


2014-11-23 07:09:39 b.s.d.supervisor [INFO] Downloading code for storm id TestingStormClusterTopology-1-1416726579 from /home/ubuntu/data/storm/nimbus/stormdist/TestingStormClusterTopology-1-1416726579
2014-11-23 07:09:39 b.s.event [ERROR] Error when processing event
java.lang.RuntimeException: org.apache.thrift7.transport.TTransportException: java.net.ConnectException: Connection refused
    at backtype.storm.utils.NimbusClient.getConfiguredClient(NimbusClient.java:21) ~[storm-core-0.9.0.1.jar:na]
    at backtype.storm.utils.Utils.downloadFromMaster(Utils.java:226) ~[storm-core-0.9.0.1.jar:na]
    at backtype.storm.daemon.supervisor$fn__6326.invoke(supervisor.clj:396) ~[storm-core-0.9.0.1.jar:na]
    at clojure.lang.MultiFn.invoke(MultiFn.java:172) ~[clojure-1.4.0.jar:na]
    at backtype.storm.daemon.supervisor$mk_synchronize_supervisor$this__6251.invoke(supervisor.clj:290) ~[storm-core-0.9.0.1.jar:na]
    at backtype.storm.event$event_manager$fn__3072.invoke(event.clj:24) ~[storm-core-0.9.0.1.jar:na]
    at clojure.lang.AFn.run(AFn.java:24) [clojure-1.4.0.jar:na]
    at java.lang.Thread.run(Thread.java:745) [na:1.7.0_65]
Caused by: org.apache.thrift7.transport.TTransportException: java.net.ConnectException: Connection refused
    at org.apache.thrift7.transport.TSocket.open(TSocket.java:183) ~[libthrift7-0.7.0-2.jar:0.7.0-2]
    at org.apache.thrift7.transport.TFramedTransport.open(TFramedTransport.java:81) ~[libthrift7-0.7.0-2.jar:0.7.0-2]
    at backtype.storm.security.auth.SimpleTransportPlugin.connect(SimpleTransportPlugin.java:66) ~[storm-core-0.9.0.1.jar:na]
    at backtype.storm.security.auth.ThriftClient.<init>(ThriftClient.java:46) ~[storm-core-0.9.0.1.jar:na]
    at backtype.storm.utils.NimbusClient.<init>(NimbusClient.java:30) ~[storm-core-0.9.0.1.jar:na]
    at backtype.storm.utils.NimbusClient.<init>(NimbusClient.java:26) ~[storm-core-0.9.0.1.jar:na]
    at backtype.storm.utils.NimbusClient.getConfiguredClient(NimbusClient.java:19) ~[storm-core-0.9.0.1.jar:na]
    ... 7 common frames omitted
Caused by: java.net.ConnectException: Connection refused
    at java.net.PlainSocketImpl.socketConnect(Native Method) ~[na:1.7.0_65]
    at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) ~[na:1.7.0_65]
    at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200) ~[na:1.7.0_65]
    at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) ~[na:1.7.0_65]
    at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) ~[na:1.7.0_65]
    at java.net.Socket.connect(Socket.java:579) ~[na:1.7.0_65]
    at org.apache.thrift7.transport.TSocket.open(TSocket.java:178) ~[libthrift7-0.7.0-2.jar:0.7.0-2]
    ... 13 common frames omitted
2014-11-23 07:09:39 b.s.util [INFO] Halting process: ("Error when processing an event")

      

So, I'm trying to figure out if I need to publish public and / or private keys through these fields. I know how to generate public secret keys (ssh-keygen), but I'm not sure what the strategy should be to share keys between fields.

I'm not even sure if this is the problem, I was just confused about what the connection failure might mean. Apologies for the long post, but I wanted to provide as much information as possible.

+3


source to share


3 answers


The problem came from nimbus, the dispatcher was unable to find a connection to the port provided to the dispatcher to connect.



+1


source


Haha, I finally find the result of this question.

Most of the time this is caused by a nimbus thrift buffer overflow, so you can set storm.yaml

with a larger value, for example:



nimbus.thrift.max_buffer_size: 20480000

hope this helps :)

0


source


I ran into this problem today. And finally, I found that it was because of nimbus.seeds

. In the new version after 1.x (include 1.x), the parameter used to define the nimbus host is nimbus.seeds

. However, in the old version like I used 0.9.5, the nimbus.host

. Check it out, maybe it can help you.

0


source







All Articles