Hi,
I'm trying to deploy a trident topology to a production cluster (just
running on my development desktop at the moment, while I'm getting it up
and running).
However, I seem to be having a problem when I deploy the topology to the
cluster.
I have a number of stateQuery and partitionPesist stages in the topology,
along with a single spout (reading tuples off a hornetq queue).
While the entire storm topology works properly in local mode, when I deploy
it to the production storm instance it behaves in a peculiar way.
The spout always seems to properly emit tuples - however, when I add more
than a couple of stateQuery stages to my topology
the topology fails to process any tuples thus emitted. With fewer stages in
the topology, it works fine in production mode, but after
I add in an extra stage in the topology beyond a particular tipping point,
it no longer processes any tuples emitted from the spout.
As I said it always works in local mode.
I've gone through my topology line by line, commenting out the different
stages in an attempt to figure out whether there's a particular
stateQuery stage that causes the hang. I haven't been able to identify
anything in particular that may be the cause, however: it seems as if adding
even a dummy, do-nothing stateQuery to the topology after a few stages
causes the topology in production mode to not process any tuples.
Naturally I suspected some particular part of my topology being at fault
(and it probably is), but the fact that a dummy stateQuery when added
causes the same symtoms is confusing me. I'm not expert in reading the
storm logs, but I haven't spotted anything suspicious in either the nimbus,
supervisor or worker logs (no errors or warnings of any relevance).
I don't have access to the code from where I am right now (I can post later
if it will be useful) but I've spent a long time on this at this stage and
am thinking that perhaps the behavior I'm seeing might ring a bell with
someone.
Thanks in advance
Denis