I'm writing a topology to parallelize a very long computation we're
doing -- it can take 30 min. on average.

The calculation requires several partitioning and grouping of the
tuples, and I'm having a problem with the groupings -- I'm yet to find
the best way to do that in storm (another hurdle is that most of the
code is written in Ruby, not all of it portable to JRuby.)

In detail, the process is like this:

Start with a single tuple, first partition it into 10-20 tuples, than
partition (again) each of the above into another 120 tuples.
Process each of the ~2000 tuples independently of each other, group
all back, calculate some more data for each tuple based on the rest,
partition into the former 2000 again, do some independent work on
each, and group back the results.

For obvious reasons, I'd rather not have the grouping mechanics done
inside the bolt, as it introduce state into it.

I've been thinking about using Trident, but I'm not sure it is
tailored towards my kind of computation (i.e. a long one,) and would
require some more lifting to get it working with shell process doing
the calculation.
Am I wrong in assuming that?

CoordinatedBolt might work, but I don't need DRPC -- there isn't any
listener waiting for an immediate answer.

What approach do you recommend I'd take? (Have a Trident topology that
calls shell processes? Find a way to use DRPC? Something else?)



Search Discussions

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupstorm-user @
postedDec 11, '12 at 8:17p
activeDec 11, '12 at 8:17p

1 user in discussion

Shay Elkin: 1 post



site design / logo © 2022 Grokbase