FAQ
I'm new to go, so that please bear with me.

I have this master process that coordinates a bunch of other worker
processes. Master keeps a list of pending tasks. Workers fetch tasks and
reply when they are done. Workers also send a heartbeat from time to time.

Should I have one go routine in master per worker? How should I structure
the RPC? Should I keep one connection per worker?

Any help is appreciated.

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Search Discussions

  • Benjamin Measures at Mar 1, 2014 at 12:12 am

    On Friday, 28 February 2014 20:52:07 UTC, Igor Gatis wrote:

    I have this master process that coordinates a bunch of other worker
    processes. Master keeps a list of pending tasks. Workers fetch tasks and
    reply when they are done. Workers also send a heartbeat from time to time.

    Should I have one go routine in master per worker? How should I structure
    the RPC? Should I keep one connection per worker?
    This is modelled in the Google IO 2010
    talk: http://talks.golang.org/2010/io/talk.pdf (and associated source
    http://talks.golang.org/2010/io). That should be a good starting point.

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Igor Gatis at Mar 1, 2014 at 1:46 pm
    Hi Benjamin,

    That's a nice example, but it has no rpc.

    I'm looking for a go idiomatic way to design an one thread per rpc client.
    I know how to write it in, say, c++ or java.

    For example, I found go advanced patterns presentation quite nice. But the
    go app is a client, not a server.
    On Feb 28, 2014 9:12 PM, "Benjamin Measures" wrote:
    On Friday, 28 February 2014 20:52:07 UTC, Igor Gatis wrote:

    I have this master process that coordinates a bunch of other worker
    processes. Master keeps a list of pending tasks. Workers fetch tasks and
    reply when they are done. Workers also send a heartbeat from time to time.

    Should I have one go routine in master per worker? How should I structure
    the RPC? Should I keep one connection per worker?
    This is modelled in the Google IO 2010 talk:
    http://talks.golang.org/2010/io/talk.pdf (and associated source
    http://talks.golang.org/2010/io). That should be a good starting point.

    --
    You received this message because you are subscribed to the Google Groups
    "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Egon at Mar 1, 2014 at 2:29 pm
    What's the full story? The recommended way varies depending on what you are
    trying to accomplish.

    + egon
    On Friday, February 28, 2014 10:52:07 PM UTC+2, Igor Gatis wrote:

    I'm new to go, so that please bear with me.

    I have this master process that coordinates a bunch of other worker
    processes. Master keeps a list of pending tasks. Workers fetch tasks and
    reply when they are done. Workers also send a heartbeat from time to time.

    Should I have one go routine in master per worker? How should I structure
    the RPC? Should I keep one connection per worker?

    Any help is appreciated.
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Igor Gatis at Mar 1, 2014 at 3:16 pm
    Fair enough. I'm a bit frustrated with hadoop streaming, for handful of
    reasons. When I heard of go, I thought it would be the perfect tool.

    So I want to implement mapreduce in go. Master creates tasks (e.g.
    calculate splits, map splits into temp data, reduce temp data). Master
    keeps this queue of pending tasks. Worker sends a rpc request GetNextTask
    and the master assigns a task to it. From time to time, worker sends a
    heartbeat with counters. Once done, worker sends TaskDone so master can
    mark the task as done and assign a pending task. Master needs to keep track
    of heartbeat time outs in order to reassign tasks.

    So, I'm mostly concerned about master's code. I thought of keeping a go
    routine per worker. But i'm not sure how RPCs should be handled. One
    implementation I wrote has this RPC which receives request with worker I'd
    and routes it to the worker channel. I was wondering whether I could avoid
    that by having a TCP connection per worker always open. RPC for each worker
    process would reach its worker go routine directly.

    I'm not 100% sure about this design though.
    On Mar 1, 2014 11:29 AM, "egon" wrote:

    What's the full story? The recommended way varies depending on what you
    are trying to accomplish.

    + egon
    On Friday, February 28, 2014 10:52:07 PM UTC+2, Igor Gatis wrote:

    I'm new to go, so that please bear with me.

    I have this master process that coordinates a bunch of other worker
    processes. Master keeps a list of pending tasks. Workers fetch tasks and
    reply when they are done. Workers also send a heartbeat from time to time.

    Should I have one go routine in master per worker? How should I structure
    the RPC? Should I keep one connection per worker?

    Any help is appreciated.
    --
    You received this message because you are subscribed to the Google Groups
    "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Egon at Mar 1, 2014 at 4:25 pm

    On Saturday, March 1, 2014 5:16:07 PM UTC+2, Igor Gatis wrote:
    Fair enough. I'm a bit frustrated with hadoop streaming, for handful of
    reasons. When I heard of go, I thought it would be the perfect tool.

    So I want to implement mapreduce in go. Master creates tasks (e.g.
    calculate splits, map splits into temp data, reduce temp data). Master
    keeps this queue of pending tasks. Worker sends a rpc request GetNextTask
    and the master assigns a task to it. From time to time, worker sends a
    heartbeat with counters. Once done, worker sends TaskDone so master can
    mark the task as done and assign a pending task. Master needs to keep track
    of heartbeat time outs in order to reassign tasks.

    So, I'm mostly concerned about master's code. I thought of keeping a go
    routine per worker. But i'm not sure how RPCs should be handled. One
    implementation I wrote has this RPC which receives request with worker I'd
    and routes it to the worker channel. I was wondering whether I could avoid
    that by having a TCP connection per worker always open. RPC for each worker
    process would reach its worker go routine directly.
    You are still not telling the entire story.

    What hardware are you running it on? Single computer, cluster or some
    service? If it's a cluster how are they communicating?
    What kind of data? How much data (1MB, 1GB, 1TB, 1PB)? How often do you
    need to process it?

    What do you want to calculate e.g. here is input X, I need to calculate Y.

    Also, your problem is not "how to implement map-reduce"... your problem is
    "how to process data X to get result Y in an environment Z efficiently".

    + egon

    I'm not 100% sure about this design though.
    On Mar 1, 2014 11:29 AM, "egon" <egon...@gmail.com <javascript:>> wrote:

    What's the full story? The recommended way varies depending on what you
    are trying to accomplish.

    + egon
    On Friday, February 28, 2014 10:52:07 PM UTC+2, Igor Gatis wrote:

    I'm new to go, so that please bear with me.

    I have this master process that coordinates a bunch of other worker
    processes. Master keeps a list of pending tasks. Workers fetch tasks and
    reply when they are done. Workers also send a heartbeat from time to time.

    Should I have one go routine in master per worker? How should I
    structure the RPC? Should I keep one connection per worker?

    Any help is appreciated.
    --
    You received this message because you are subscribed to the Google Groups
    "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to golang-nuts...@googlegroups.com <javascript:>.
    For more options, visit https://groups.google.com/groups/opt_out.
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Igor Gatis at Mar 1, 2014 at 6:58 pm
    Commodity hardware, cluster with 100s, 1G to 10T. Frequency depends. And
    yes, its mapreduce.
    On Mar 1, 2014 1:25 PM, "egon" wrote:

    On Saturday, March 1, 2014 5:16:07 PM UTC+2, Igor Gatis wrote:

    Fair enough. I'm a bit frustrated with hadoop streaming, for handful of
    reasons. When I heard of go, I thought it would be the perfect tool.

    So I want to implement mapreduce in go. Master creates tasks (e.g.
    calculate splits, map splits into temp data, reduce temp data). Master
    keeps this queue of pending tasks. Worker sends a rpc request GetNextTask
    and the master assigns a task to it. From time to time, worker sends a
    heartbeat with counters. Once done, worker sends TaskDone so master can
    mark the task as done and assign a pending task. Master needs to keep track
    of heartbeat time outs in order to reassign tasks.

    So, I'm mostly concerned about master's code. I thought of keeping a go
    routine per worker. But i'm not sure how RPCs should be handled. One
    implementation I wrote has this RPC which receives request with worker I'd
    and routes it to the worker channel. I was wondering whether I could avoid
    that by having a TCP connection per worker always open. RPC for each worker
    process would reach its worker go routine directly.
    You are still not telling the entire story.

    What hardware are you running it on? Single computer, cluster or some
    service? If it's a cluster how are they communicating?
    What kind of data? How much data (1MB, 1GB, 1TB, 1PB)? How often do you
    need to process it?

    What do you want to calculate e.g. here is input X, I need to calculate Y.

    Also, your problem is not "how to implement map-reduce"... your problem is
    "how to process data X to get result Y in an environment Z efficiently".

    + egon

    I'm not 100% sure about this design though.
    On Mar 1, 2014 11:29 AM, "egon" wrote:

    What's the full story? The recommended way varies depending on what you
    are trying to accomplish.

    + egon
    On Friday, February 28, 2014 10:52:07 PM UTC+2, Igor Gatis wrote:

    I'm new to go, so that please bear with me.

    I have this master process that coordinates a bunch of other worker
    processes. Master keeps a list of pending tasks. Workers fetch tasks and
    reply when they are done. Workers also send a heartbeat from time to time.

    Should I have one go routine in master per worker? How should I
    structure the RPC? Should I keep one connection per worker?

    Any help is appreciated.
    --
    You received this message because you are subscribed to the Google
    Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send
    an email to golang-nuts...@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
    --
    You received this message because you are subscribed to the Google Groups
    "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Egon at Mar 1, 2014 at 9:11 pm

    On Saturday, March 1, 2014 8:58:50 PM UTC+2, Igor Gatis wrote:
    Commodity hardware, cluster with 100s, 1G to 10T. Frequency depends. And yes, its mapreduce.
    Sorry if that seemed a little nitpicky. We just get programmers that have been programming 0.5 years and to those who have been programming over 30 years. Those who assume that just making things parallel make it go faster, those who over-engineer and those who are experts in their field.... hence to properly answer the questions the full story is necessary. Also with the full story there could be subtleties, like batching multiple jobs if things are IO bound etc.... basically it makes easier to give a better answer. Maybe there exists a package, maybe some other language is even more appropriate... etc. Try to give the full story the first time, it just avoids a lot of back&forth questions, and you'll get your answer faster.

    Anyways...

    You should be able to hold those connections open, although I haven't tried it myself. https://groups.google.com/forum/?nomobile=true#!golang-nuts/coc6bAl2kPM/ypNLG3I4mk0J

    Whether the architecture will hold up easily for 10T is hard to tell... i.e. what if your master goes down, so may want to do some replication of master to have some other node to be able to take over. At those scales restarting the whole process would be annoying.

    + egon
    On Mar 1, 2014 1:25 PM, "egon" wrote:



    On Saturday, March 1, 2014 5:16:07 PM UTC+2, Igor Gatis wrote:
    Fair enough. I'm a bit frustrated with hadoop streaming, for handful of reasons. When I heard of go, I thought it would be the perfect tool.


    So I want to implement mapreduce in go. Master creates tasks (e.g. calculate splits, map splits into temp data, reduce temp data). Master keeps this queue of pending tasks. Worker sends a rpc request GetNextTask and the master assigns a task to it. From time to time, worker sends a heartbeat with counters. Once done, worker sends TaskDone so master can mark the task as done and assign a pending task. Master needs to keep track of heartbeat time outs in order to reassign tasks.



    So, I'm mostly concerned about master's code. I thought of keeping a go routine per worker. But i'm not sure how RPCs should be handled. One implementation I wrote has this RPC which receives  request with worker I'd and routes it to the worker channel. I was wondering whether I could avoid that by having a TCP connection per worker always open. RPC for each worker process would reach its worker go routine directly.



    You are still not telling the entire story.


    What hardware are you running it on? Single computer, cluster or some service? If it's a cluster how are they communicating?


    What kind of data? How much data (1MB, 1GB, 1TB, 1PB)? How often do you need to process it?


    What do you want to calculate e.g. here is input X, I need to calculate Y.



    Also, your problem is not "how to implement map-reduce"... your problem is "how to process data X to get result Y in an environment Z efficiently".


    + egon




    I'm not 100% sure about this design though.

    On Mar 1, 2014 11:29 AM, "egon" wrote:



    What's the full story? The recommended way varies depending on what you are trying to accomplish.


    + egon

    On Friday, February 28, 2014 10:52:07 PM UTC+2, Igor Gatis wrote:


    I'm new to go, so that please bear with me.

    I have this master process that coordinates a bunch of other worker processes. Master keeps a list of pending tasks. Workers fetch tasks and reply when they are done. Workers also send a heartbeat from time to time.




    Should I have one go routine in master per worker? How should I structure the RPC? Should I keep one connection per worker?

    Any help is appreciated.





    --

    You received this message because you are subscribed to the Google Groups "golang-nuts" group.

    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.

    For more options, visit https://groups.google.com/groups/opt_out.







    --

    You received this message because you are subscribed to the Google Groups "golang-nuts" group.

    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.

    For more options, visit https://groups.google.com/groups/opt_out.
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupgolang-nuts @
categoriesgo
postedFeb 28, '14 at 8:52p
activeMar 1, '14 at 9:11p
posts8
users3
websitegolang.org

People

Translate

site design / logo © 2022 Grokbase