FAQ
A very basic question: where to store my personal global variables such
that the map and/or reduce functions can see it?

Thanks,
James

Search Discussions

  • James Yu at Oct 12, 2007 at 3:27 am
    Problem solved. Please ignore.
    On 10/11/07, James Yu wrote:

    A very basic question: where to store my personal global variables such
    that the map and/or reduce functions can see it?

    Thanks,
    James
  • Bob Futrelle at Oct 12, 2007 at 3:31 am
    Yeah, but what's the answer?

    - rpf
    On 10/11/07, James Yu wrote:
    Problem solved. Please ignore.
    On 10/11/07, James Yu wrote:

    A very basic question: where to store my personal global variables such
    that the map and/or reduce functions can see it?

    Thanks,
    James
  • James Yu at Oct 12, 2007 at 4:55 am
    For example:

    I put all user global variables in a class I called MyGlobals

    public class MyGlobals {
    static public int var1;
    ...
    }

    Then, in whatever map function I have, I can refer to my globals like this:

    public void map(LongWritable key, Text value, OutputCollector output,
    Reporter reporter) throws IOException {
    ....
    int i = MyGlobals.var1;
    ...
    }

    Sorry about the stupid question and answer.
    On 10/11/07, Bob Futrelle wrote:

    Yeah, but what's the answer?

    - rpf
    On 10/11/07, James Yu wrote:
    Problem solved. Please ignore.
    On 10/11/07, James Yu wrote:

    A very basic question: where to store my personal global variables
    such
    that the map and/or reduce functions can see it?

    Thanks,
    James
  • Dennis Kubes at Oct 12, 2007 at 12:10 pm
    You can also use a MapRunnable implementation but that would allow
    "global" only to each Map task.

    Dennis Kubes

    James Yu wrote:
    For example:

    I put all user global variables in a class I called MyGlobals

    public class MyGlobals {
    static public int var1;
    ...
    }

    Then, in whatever map function I have, I can refer to my globals like this:

    public void map(LongWritable key, Text value, OutputCollector output,
    Reporter reporter) throws IOException {
    ....
    int i = MyGlobals.var1;
    ...
    }

    Sorry about the stupid question and answer.
    On 10/11/07, Bob Futrelle wrote:
    Yeah, but what's the answer?

    - rpf
    On 10/11/07, James Yu wrote:
    Problem solved. Please ignore.
    On 10/11/07, James Yu wrote:
    A very basic question: where to store my personal global variables
    such
    that the map and/or reduce functions can see it?

    Thanks,
    James
  • Peter W. at Oct 12, 2007 at 11:17 pm
    James,

    I think you can put those variables inside the mapper or reducer
    without creating a separate public class.

    untested code follows...

    public static class R extends MapReduceBase implements Reducer
    {
    private static Set s=new HashSet();

    public void reduce(WritableComparable wc,Iterator it,
    OutputCollector o,Reporter r)throws IOException
    {
    while(it.hasNext())
    {
    if(...)
    ...;
    else
    s.add(((IntWritable)it.next()).get());
    }
    // loop, append then add set values to output collector (wc key)

    Bye,

    Peter W.

    On Oct 12, 2007, at 5:09 AM, Dennis Kubes wrote:

    You can also use a MapRunnable implementation but that would allow
    "global" only to each Map task.

    Dennis Kubes

    James Yu wrote:
    For example:
    I put all user global variables in a class I called MyGlobals
    public class MyGlobals {
    static public int var1;
    ...
    }
    Then, in whatever map function I have, I can refer to my globals
    like this:
    public void map(LongWritable key, Text value, OutputCollector output,
    Reporter reporter) throws IOException {
    ....
    int i = MyGlobals.var1;
    ...
    }
    Sorry about the stupid question and answer.
    On 10/11/07, Bob Futrelle wrote:
    Yeah, but what's the answer?

    - rpf
    On 10/11/07, James Yu wrote:
    A very basic question: where to store my personal global
    variables
    such
    that the map and/or reduce functions can see it?

    Thanks,
    James
  • Owen O'Malley at Oct 12, 2007 at 3:44 pm

    On Oct 11, 2007, at 9:54 PM, James Yu wrote:

    I put all user global variables in a class I called MyGlobals.
    Since map/reduce is distributed in general, you should be careful of
    using global variables. I find it to be better practice to keep all
    of the state variables in either the Mapper or Reducer itself to
    remind myself that it is _not_ shared between Mappers, Reducers, and
    the launching program.

    -- Owen
  • Benjamin Reed at Oct 12, 2007 at 5:17 pm
    You could put the variables in ZooKeeper and then they would be shared :)

    ben
    On Friday 12 October 2007, Owen O'Malley wrote:
    On Oct 11, 2007, at 9:54 PM, James Yu wrote:
    I put all user global variables in a class I called MyGlobals.
    Since map/reduce is distributed in general, you should be careful of
    using global variables. I find it to be better practice to keep all
    of the state variables in either the Mapper or Reducer itself to
    remind myself that it is _not_ shared between Mappers, Reducers, and
    the launching program.

    -- Owen
  • Benjamin Reed at Oct 12, 2007 at 5:49 pm
    Oops, sorry I didn't realize this was going to an external list. Please don't
    think about this comment until we release ZooKeeper :) (Hopefully, by the end
    of next week.)

    thanx
    ben
    On Friday 12 October 2007, Benjamin Reed wrote:
    You could put the variables in ZooKeeper and then they would be shared :)

    ben
    On Friday 12 October 2007, Owen O'Malley wrote:
    On Oct 11, 2007, at 9:54 PM, James Yu wrote:
    I put all user global variables in a class I called MyGlobals.
    Since map/reduce is distributed in general, you should be careful of
    using global variables. I find it to be better practice to keep all
    of the state variables in either the Mapper or Reducer itself to
    remind myself that it is _not_ shared between Mappers, Reducers, and
    the launching program.

    -- Owen
  • James Yu at Oct 13, 2007 at 12:10 am
    What is the best practice if I DO need to have some global variables
    accessible to ALL mappers and ALL reducers which are distributed? Is there
    recommendations?

    -- James
    On 10/12/07, Owen O'Malley wrote:
    On Oct 11, 2007, at 9:54 PM, James Yu wrote:

    I put all user global variables in a class I called MyGlobals.
    Since map/reduce is distributed in general, you should be careful of
    using global variables. I find it to be better practice to keep all
    of the state variables in either the Mapper or Reducer itself to
    remind myself that it is _not_ shared between Mappers, Reducers, and
    the launching program.

    -- Owen
  • Ted Dunning at Oct 13, 2007 at 6:15 am
    If you can do with read only constants, then you can define static finals
    somewhere or other. They won't really be global, but since you never change
    them, that won't matter.

    If you just want global status indicators, then look at what the reporter
    provides.

    If you really want read/write global variables, then you have a real
    problem. In fact, that is the shared memory emulation problem all over
    again and that is what map-reduce is intended to side step. Such programs
    can often be re-written so that you have an extra map reduce step or you
    have additional input that gets sorted out to the mapper or reducer that
    needs the values.

    If you really, really can't restate your program in this fashion, then you
    probably don't have a problem that is suitable for map-reduce. You might be
    able to make use of something like hbase to give you database like
    operations, but you may just have different kind of problem. You might be
    surprised at what a wide variety of problems are amenable to map-reduce
    formulation.

    What is it that makes you want these global variables?

    On 10/12/07 5:09 PM, "James Yu" wrote:

    What is the best practice if I DO need to have some global variables
    accessible to ALL mappers and ALL reducers which are distributed? Is there
    recommendations?

    -- James
    On 10/12/07, Owen O'Malley wrote:
    On Oct 11, 2007, at 9:54 PM, James Yu wrote:

    I put all user global variables in a class I called MyGlobals.
    Since map/reduce is distributed in general, you should be careful of
    using global variables. I find it to be better practice to keep all
    of the state variables in either the Mapper or Reducer itself to
    remind myself that it is _not_ shared between Mappers, Reducers, and
    the launching program.

    -- Owen
  • James Yu at Oct 13, 2007 at 7:32 am
    Ted,

    Thanks for your explanation.
    Actually I ran into a coding situation where my map function (or all map
    functions in distributed machines) to use (read only in my case) an
    ArrayList which I populate according to the content of a file at the
    launching of the whole program. I needed to make sure all map functions
    (and even reduce functions) can see the same copy of that ArrayList.
    What is the proper way to do this?

    --James
    On 10/12/07, Ted Dunning wrote:



    If you can do with read only constants, then you can define static finals
    somewhere or other. They won't really be global, but since you never
    change
    them, that won't matter.

    If you just want global status indicators, then look at what the reporter
    provides.

    If you really want read/write global variables, then you have a real
    problem. In fact, that is the shared memory emulation problem all over
    again and that is what map-reduce is intended to side step. Such programs
    can often be re-written so that you have an extra map reduce step or you
    have additional input that gets sorted out to the mapper or reducer that
    needs the values.

    If you really, really can't restate your program in this fashion, then you
    probably don't have a problem that is suitable for map-reduce. You might
    be
    able to make use of something like hbase to give you database like
    operations, but you may just have different kind of problem. You might be
    surprised at what a wide variety of problems are amenable to map-reduce
    formulation.

    What is it that makes you want these global variables?

    On 10/12/07 5:09 PM, "James Yu" wrote:

    What is the best practice if I DO need to have some global variables
    accessible to ALL mappers and ALL reducers which are distributed? Is there
    recommendations?

    -- James
    On 10/12/07, Owen O'Malley wrote:
    On Oct 11, 2007, at 9:54 PM, James Yu wrote:

    I put all user global variables in a class I called MyGlobals.
    Since map/reduce is distributed in general, you should be careful of
    using global variables. I find it to be better practice to keep all
    of the state variables in either the Mapper or Reducer itself to
    remind myself that it is _not_ shared between Mappers, Reducers, and
    the launching program.

    -- Owen
  • Ted Dunning at Oct 13, 2007 at 5:30 pm
    The easy way is to put this initialization into the construction of the map
    or reduce object. Each map would have a private copy separate from every
    other private copy, but since maps get called many, many times this
    construction cost is, on average, small.

    On 10/13/07 12:32 AM, "James Yu" wrote:

    Ted,

    Thanks for your explanation.
    Actually I ran into a coding situation where my map function (or all map
    functions in distributed machines) to use (read only in my case) an
    ArrayList which I populate according to the content of a file at the
    launching of the whole program. I needed to make sure all map functions
    (and even reduce functions) can see the same copy of that ArrayList.
    What is the proper way to do this?

    --James
    On 10/12/07, Ted Dunning wrote:



    If you can do with read only constants, then you can define static finals
    somewhere or other. They won't really be global, but since you never
    change
    them, that won't matter.

    If you just want global status indicators, then look at what the reporter
    provides.

    If you really want read/write global variables, then you have a real
    problem. In fact, that is the shared memory emulation problem all over
    again and that is what map-reduce is intended to side step. Such programs
    can often be re-written so that you have an extra map reduce step or you
    have additional input that gets sorted out to the mapper or reducer that
    needs the values.

    If you really, really can't restate your program in this fashion, then you
    probably don't have a problem that is suitable for map-reduce. You might
    be
    able to make use of something like hbase to give you database like
    operations, but you may just have different kind of problem. You might be
    surprised at what a wide variety of problems are amenable to map-reduce
    formulation.

    What is it that makes you want these global variables?

    On 10/12/07 5:09 PM, "James Yu" wrote:

    What is the best practice if I DO need to have some global variables
    accessible to ALL mappers and ALL reducers which are distributed? Is there
    recommendations?

    -- James
    On 10/12/07, Owen O'Malley wrote:
    On Oct 11, 2007, at 9:54 PM, James Yu wrote:

    I put all user global variables in a class I called MyGlobals.
    Since map/reduce is distributed in general, you should be careful of
    using global variables. I find it to be better practice to keep all
    of the state variables in either the Mapper or Reducer itself to
    remind myself that it is _not_ shared between Mappers, Reducers, and
    the launching program.

    -- Owen
  • Dennis Kubes at Oct 13, 2007 at 9:50 pm

    James Yu wrote:
    Ted,

    Thanks for your explanation.
    Actually I ran into a coding situation where my map function (or all map
    functions in distributed machines) to use (read only in my case) an
    ArrayList which I populate according to the content of a file at the
    launching of the whole program. I needed to make sure all map functions
    (and even reduce functions) can see the same copy of that ArrayList.
    What is the proper way to do this?
    If it needs to be available to both maps and reduces you may want to
    consider writing something out to DFS in a temp directory and then
    initialize in a MapRunnable for your map tasks and initialize once at
    the beginning of your reduce task.

    Dennis Kubes
    --James
    On 10/12/07, Ted Dunning wrote:


    If you can do with read only constants, then you can define static finals
    somewhere or other. They won't really be global, but since you never
    change
    them, that won't matter.

    If you just want global status indicators, then look at what the reporter
    provides.

    If you really want read/write global variables, then you have a real
    problem. In fact, that is the shared memory emulation problem all over
    again and that is what map-reduce is intended to side step. Such programs
    can often be re-written so that you have an extra map reduce step or you
    have additional input that gets sorted out to the mapper or reducer that
    needs the values.

    If you really, really can't restate your program in this fashion, then you
    probably don't have a problem that is suitable for map-reduce. You might
    be
    able to make use of something like hbase to give you database like
    operations, but you may just have different kind of problem. You might be
    surprised at what a wide variety of problems are amenable to map-reduce
    formulation.

    What is it that makes you want these global variables?

    On 10/12/07 5:09 PM, "James Yu" wrote:

    What is the best practice if I DO need to have some global variables
    accessible to ALL mappers and ALL reducers which are distributed? Is there
    recommendations?

    -- James
    On 10/12/07, Owen O'Malley wrote:
    On Oct 11, 2007, at 9:54 PM, James Yu wrote:

    I put all user global variables in a class I called MyGlobals.
    Since map/reduce is distributed in general, you should be careful of
    using global variables. I find it to be better practice to keep all
    of the state variables in either the Mapper or Reducer itself to
    remind myself that it is _not_ shared between Mappers, Reducers, and
    the launching program.

    -- Owen

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedOct 12, '07 at 2:17a
activeOct 13, '07 at 9:50p
posts14
users7
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase