FAQ
(Storm version 0.8.1)

I'm in the process of performance tuning some topologies we recently converted over to trident, and I'm having trouble getting the resulting bolts parallelized the way I want.

In storm, it's pretty straightforward. For example, if I set a bolt in a topology like so:

builder.setBolt(SPLIT_BOLT_ID, splitBolt, 3).shuffleGrouping(SENTENCE_SPOUT_ID);

Then the "splitBolt" will be assigned 3 tasks in the topology.

With trident however, it's not as clear (at least not to me) since you set parallelism on the Stream class. We have a trident topology that's not unlike the one depicted here:

https://raw.github.com/wiki/nathanmarz/storm/images/trident-to-storm2.png

So looking at the first spout and bolt in that diagram (upper left), If I wanted to assign the spout a parallelism hint of 1, and the first bolt a parallelism of 3, I would think I would do something like the following:

Stream stream = topology.newStream("myStream", spout);
stream = stream.parallelismHint(1).each(…).each(…).parallelismHint(3).groupBy(…);

But I'm not seeing the results I'm expecting. I've tried moving the "parallelismHint()" calls around within the topology definition, and am completely baffled by how it plays out when deployed to a cluster. I'm using storm-ui to determine how each resulting bolt got parallelized (which may be the problem). In some cases attempting to set the parallelism of a bolt actually altered the parallelism of the spout.

I'm assuming (perhaps wrongly) that if a trident topology compiles down to 5 bolts, that they will be numbered ("bolt0" through "bolt4") consistently between topology submissions -- i.e. If a topology is submitted/killed multiple times, can I safely assume that "bolt0" always represents the same bolt?

Am I missing something simple? I can't share the actual topology code, but could put together a simple example if that would help.

Thanks in advance,

- Taylor

--
You received this message because you are subscribed to the Google Groups "storm-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to storm-user+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Search Discussions

  • Nathan Marz at Mar 10, 2013 at 8:58 pm
    I recommend using the "name" function to name portions of your stream so
    that the UI shows you what bolts correspond to what sections.

    Trident packs operations into as few bolts as possible. In addition, it
    *never* repartitions your stream unless you've done an operation that
    explicitly involves a repartitioning (e.g. shuffle, groupBy, partitionBy,
    global aggregation, etc). This property of Trident ensures that you can
    control the ordering/semi-ordering of how things are processed. So in this
    case, everything before the groupBy has to have the same parallelism or
    else Trident would have to repartition the stream. And since you didn't say
    you wanted the stream repartitioned, it can't do that. You can get a
    different parallelism for the spout vs. the each's following by introducing
    a repartitioning operation, like so:

    stream.parallelismHint(1).shuffle().each(…).each(…).
    parallelismHint(3).groupBy(…);


    On Sat, Mar 9, 2013 at 11:17 AM, P. Taylor Goetz wrote:

    (Storm version 0.8.1)

    I'm in the process of performance tuning some topologies we recently
    converted over to trident, and I'm having trouble getting the resulting
    bolts parallelized the way I want.

    In storm, it's pretty straightforward. For example, if I set a bolt in a
    topology like so:

    builder.setBolt(SPLIT_BOLT_ID, splitBolt,
    3).shuffleGrouping(SENTENCE_SPOUT_ID);

    Then the "splitBolt" will be assigned 3 tasks in the topology.

    With trident however, it's not as clear (at least not to me) since you set
    parallelism on the Stream class. We have a trident topology that's not
    unlike the one depicted here:

    https://raw.github.com/wiki/nathanmarz/storm/images/trident-to-storm2.png

    So looking at the first spout and bolt in that diagram (upper left), If I
    wanted to assign the spout a parallelism hint of 1, and the first bolt a
    parallelism of 3, I would think I would do something like the following:

    Stream stream = topology.newStream("myStream", spout);
    stream =
    stream.parallelismHint(1).each(…).each(…).parallelismHint(3).groupBy(…);

    But I'm not seeing the results I'm expecting. I've tried moving the
    "parallelismHint()" calls around within the topology definition, and am
    completely baffled by how it plays out when deployed to a cluster. I'm
    using storm-ui to determine how each resulting bolt got parallelized (which
    may be the problem). In some cases attempting to set the parallelism of a
    bolt actually altered the parallelism of the spout.

    I'm assuming (perhaps wrongly) that if a trident topology compiles down to
    5 bolts, that they will be numbered ("bolt0" through "bolt4") consistently
    between topology submissions -- i.e. If a topology is submitted/killed
    multiple times, can I safely assume that "bolt0" always represents the same
    bolt?

    Am I missing something simple? I can't share the actual topology code, but
    could put together a simple example if that would help.

    Thanks in advance,

    - Taylor

    --
    You received this message because you are subscribed to the Google Groups
    "storm-user" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to storm-user+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.



    --
    Twitter: @nathanmarz
    http://nathanmarz.com

    --
    You received this message because you are subscribed to the Google Groups "storm-user" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to storm-user+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Simon Cooper at Jul 24, 2013 at 2:28 pm
    What's the rules used to decide the parallelism of each bolt? What is the
    scope of the effect of the parallelismHint method, and what happens if
    there's more than one applied to the same bolt?

    Thanks,
    SimonC
    On Sunday, 10 March 2013 20:58:00 UTC, Nathan Marz wrote:

    I recommend using the "name" function to name portions of your stream so
    that the UI shows you what bolts correspond to what sections.

    Trident packs operations into as few bolts as possible. In addition, it
    *never* repartitions your stream unless you've done an operation that
    explicitly involves a repartitioning (e.g. shuffle, groupBy, partitionBy,
    global aggregation, etc). This property of Trident ensures that you can
    control the ordering/semi-ordering of how things are processed. So in this
    case, everything before the groupBy has to have the same parallelism or
    else Trident would have to repartition the stream. And since you didn't say
    you wanted the stream repartitioned, it can't do that. You can get a
    different parallelism for the spout vs. the each's following by introducing
    a repartitioning operation, like so:

    stream.parallelismHint(1).shuffle().each(…).each(…).
    parallelismHint(3).groupBy(…);



    On Sat, Mar 9, 2013 at 11:17 AM, P. Taylor Goetz <ptg...@gmail.com<javascript:>
    wrote:
    (Storm version 0.8.1)

    I'm in the process of performance tuning some topologies we recently
    converted over to trident, and I'm having trouble getting the resulting
    bolts parallelized the way I want.

    In storm, it's pretty straightforward. For example, if I set a bolt in a
    topology like so:

    builder.setBolt(SPLIT_BOLT_ID, splitBolt,
    3).shuffleGrouping(SENTENCE_SPOUT_ID);

    Then the "splitBolt" will be assigned 3 tasks in the topology.

    With trident however, it's not as clear (at least not to me) since you
    set parallelism on the Stream class. We have a trident topology that's not
    unlike the one depicted here:

    https://raw.github.com/wiki/nathanmarz/storm/images/trident-to-storm2.png

    So looking at the first spout and bolt in that diagram (upper left), If I
    wanted to assign the spout a parallelism hint of 1, and the first bolt a
    parallelism of 3, I would think I would do something like the following:

    Stream stream = topology.newStream("myStream", spout);
    stream =
    stream.parallelismHint(1).each(…).each(…).parallelismHint(3).groupBy(…);

    But I'm not seeing the results I'm expecting. I've tried moving the
    "parallelismHint()" calls around within the topology definition, and am
    completely baffled by how it plays out when deployed to a cluster. I'm
    using storm-ui to determine how each resulting bolt got parallelized (which
    may be the problem). In some cases attempting to set the parallelism of a
    bolt actually altered the parallelism of the spout.

    I'm assuming (perhaps wrongly) that if a trident topology compiles down
    to 5 bolts, that they will be numbered ("bolt0" through "bolt4")
    consistently between topology submissions -- i.e. If a topology is
    submitted/killed multiple times, can I safely assume that "bolt0" always
    represents the same bolt?

    Am I missing something simple? I can't share the actual topology code,
    but could put together a simple example if that would help.

    Thanks in advance,

    - Taylor

    --
    You received this message because you are subscribed to the Google Groups
    "storm-user" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to storm-user+...@googlegroups.com <javascript:>.
    For more options, visit https://groups.google.com/groups/opt_out.



    --
    Twitter: @nathanmarz
    http://nathanmarz.com
    --
    You received this message because you are subscribed to the Google Groups "storm-user" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to storm-user+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Ak at Dec 10, 2013 at 8:47 am
    显然这么做是无效的

    在 2013年3月11日星期一UTC+8上午4时58分00秒,Nathan Marz写道:
    I recommend using the "name" function to name portions of your stream so
    that the UI shows you what bolts correspond to what sections.

    Trident packs operations into as few bolts as possible. In addition, it
    *never* repartitions your stream unless you've done an operation that
    explicitly involves a repartitioning (e.g. shuffle, groupBy, partitionBy,
    global aggregation, etc). This property of Trident ensures that you can
    control the ordering/semi-ordering of how things are processed. So in this
    case, everything before the groupBy has to have the same parallelism or
    else Trident would have to repartition the stream. And since you didn't say
    you wanted the stream repartitioned, it can't do that. You can get a
    different parallelism for the spout vs. the each's following by introducing
    a repartitioning operation, like so:

    stream.parallelismHint(1).shuffle().each(…).each(…).
    parallelismHint(3).groupBy(…);



    On Sat, Mar 9, 2013 at 11:17 AM, P. Taylor Goetz <ptg...@gmail.com<javascript:>
    wrote:
    (Storm version 0.8.1)

    I'm in the process of performance tuning some topologies we recently
    converted over to trident, and I'm having trouble getting the resulting
    bolts parallelized the way I want.

    In storm, it's pretty straightforward. For example, if I set a bolt in a
    topology like so:

    builder.setBolt(SPLIT_BOLT_ID, splitBolt,
    3).shuffleGrouping(SENTENCE_SPOUT_ID);

    Then the "splitBolt" will be assigned 3 tasks in the topology.

    With trident however, it's not as clear (at least not to me) since you
    set parallelism on the Stream class. We have a trident topology that's not
    unlike the one depicted here:

    https://raw.github.com/wiki/nathanmarz/storm/images/trident-to-storm2.png

    So looking at the first spout and bolt in that diagram (upper left), If I
    wanted to assign the spout a parallelism hint of 1, and the first bolt a
    parallelism of 3, I would think I would do something like the following:

    Stream stream = topology.newStream("myStream", spout);
    stream =
    stream.parallelismHint(1).each(…).each(…).parallelismHint(3).groupBy(…);

    But I'm not seeing the results I'm expecting. I've tried moving the
    "parallelismHint()" calls around within the topology definition, and am
    completely baffled by how it plays out when deployed to a cluster. I'm
    using storm-ui to determine how each resulting bolt got parallelized (which
    may be the problem). In some cases attempting to set the parallelism of a
    bolt actually altered the parallelism of the spout.

    I'm assuming (perhaps wrongly) that if a trident topology compiles down
    to 5 bolts, that they will be numbered ("bolt0" through "bolt4")
    consistently between topology submissions -- i.e. If a topology is
    submitted/killed multiple times, can I safely assume that "bolt0" always
    represents the same bolt?

    Am I missing something simple? I can't share the actual topology code,
    but could put together a simple example if that would help.

    Thanks in advance,

    - Taylor

    --
    You received this message because you are subscribed to the Google Groups
    "storm-user" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to storm-user+...@googlegroups.com <javascript:>.
    For more options, visit https://groups.google.com/groups/opt_out.



    --
    Twitter: @nathanmarz
    http://nathanmarz.com
    --
    You received this message because you are subscribed to the Google Groups "storm-user" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to storm-user+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Ak at Dec 10, 2013 at 9:13 am
    storm 0.9 rc3 under the test is invalid

    在 2013年12月10日星期二UTC+8下午4时47分13秒,ak写道:
    显然这么做是无效的

    在 2013年3月11日星期一UTC+8上午4时58分00秒,Nathan Marz写道:
    I recommend using the "name" function to name portions of your stream so
    that the UI shows you what bolts correspond to what sections.

    Trident packs operations into as few bolts as possible. In addition, it
    *never* repartitions your stream unless you've done an operation that
    explicitly involves a repartitioning (e.g. shuffle, groupBy, partitionBy,
    global aggregation, etc). This property of Trident ensures that you can
    control the ordering/semi-ordering of how things are processed. So in this
    case, everything before the groupBy has to have the same parallelism or
    else Trident would have to repartition the stream. And since you didn't say
    you wanted the stream repartitioned, it can't do that. You can get a
    different parallelism for the spout vs. the each's following by introducing
    a repartitioning operation, like so:

    stream.parallelismHint(1).shuffle().each(…).each(…).
    parallelismHint(3).groupBy(…);


    On Sat, Mar 9, 2013 at 11:17 AM, P. Taylor Goetz wrote:

    (Storm version 0.8.1)

    I'm in the process of performance tuning some topologies we recently
    converted over to trident, and I'm having trouble getting the resulting
    bolts parallelized the way I want.

    In storm, it's pretty straightforward. For example, if I set a bolt in a
    topology like so:

    builder.setBolt(SPLIT_BOLT_ID, splitBolt,
    3).shuffleGrouping(SENTENCE_SPOUT_ID);

    Then the "splitBolt" will be assigned 3 tasks in the topology.

    With trident however, it's not as clear (at least not to me) since you
    set parallelism on the Stream class. We have a trident topology that's not
    unlike the one depicted here:

    https://raw.github.com/wiki/nathanmarz/storm/images/trident-to-storm2.png

    So looking at the first spout and bolt in that diagram (upper left), If
    I wanted to assign the spout a parallelism hint of 1, and the first bolt a
    parallelism of 3, I would think I would do something like the following:

    Stream stream = topology.newStream("myStream", spout);
    stream =
    stream.parallelismHint(1).each(…).each(…).parallelismHint(3).groupBy(…);

    But I'm not seeing the results I'm expecting. I've tried moving the
    "parallelismHint()" calls around within the topology definition, and am
    completely baffled by how it plays out when deployed to a cluster. I'm
    using storm-ui to determine how each resulting bolt got parallelized (which
    may be the problem). In some cases attempting to set the parallelism of a
    bolt actually altered the parallelism of the spout.

    I'm assuming (perhaps wrongly) that if a trident topology compiles down
    to 5 bolts, that they will be numbered ("bolt0" through "bolt4")
    consistently between topology submissions -- i.e. If a topology is
    submitted/killed multiple times, can I safely assume that "bolt0" always
    represents the same bolt?

    Am I missing something simple? I can't share the actual topology code,
    but could put together a simple example if that would help.

    Thanks in advance,

    - Taylor

    --
    You received this message because you are subscribed to the Google
    Groups "storm-user" group.
    To unsubscribe from this group and stop receiving emails from it, send
    an email to storm-user+...@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.



    --
    Twitter: @nathanmarz
    http://nathanmarz.com
    --
    You received this message because you are subscribed to the Google Groups "storm-user" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to storm-user+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupstorm-user @
postedMar 9, '13 at 7:17p
activeDec 10, '13 at 9:13a
posts5
users4
websitestorm-project.net
irc#storm-user

People

Translate

site design / logo © 2021 Grokbase