I have a strange problem I am working on for days now, without a
solution so far:
I set up a new cluster consisting of only one machine. I can submit
topologies and they run fine for hours, but as soons as I let a second
machine connect to the nimbus, the topologies crash and just disappear.
When I try to resubmit them, they will show up again for some seconds,
but also disappear after some seconds. The topologies all worked fine on
another cluster (Ubuntu 10.04, Zookeeper 3.4.3 & Storm 0.8.0). I
reinstalled everything several times from scratch, but I'm still getting
the same problem. Here's the system and versions I am using:
* Ubuntu 12.04
* Storm 0.8.1
* Zookeeper 3.4.5 (also tried 3.4.3)
* zeromq 2.1.7
* jzmq (latest official branch, also tried the branch linked in the wiki
tutorial, but I had problems compiling it on 12.04 without tricks)
* python 2.6
I have attached the logs, but I can not find anything helpfull in them
at the moment. A NPE happens in nimbus, but I think thats in cause of a
crashing supervisor and is not the cause itself. I would appreciate any
help, I have no real idea what else to try anymore.
Best regards,
Thomas