FAQ
I have a new-API
Partitioner<http://hadoop.apache.org/common/docs/r0.20.0/api/org/apache/hadoop/mapreduce/Partitioner.html>object
whose behavior needs to change based on values passed in from the job
configuration. It looks like there's no way to do this because the
Partitioner never gets passed a context, and the Hadoop framework creates
the actual Partitioner instance so I can't write a constructor. How do I
parameterize my Partitioner?

Search Discussions

  • Harsh J at May 4, 2011 at 6:35 am
    Hello,
    On Wed, May 4, 2011 at 5:13 AM, W.P. McNeill wrote:
    I have a new-API
    Partitioner<http://hadoop.apache.org/common/docs/r0.20.0/api/org/apache/hadoop/mapreduce/Partitioner.html>object
    whose behavior needs to change based on values passed in from the job
    configuration. It looks like there's no way to do this because the
    Partitioner never gets passed a context, and the Hadoop framework creates
    the actual Partitioner instance so I can't write a constructor. How do I
    parameterize my Partitioner?
    You can make your class implement Configurable, and then use the
    Configuration object passed to its methods automatically upon its
    instantiation. This way you can use parameters available in the
    Configuration (of the submitted Job) at least.

    I'm not sure how you could pass a complete 'context' object to it.

    --
    Harsh J
  • W.P. McNeill at May 4, 2011 at 3:56 pm
    I only wanted the context as a way of getting at the configuration, so
    making the class implement Configurable will solve my problem.
    On Tue, May 3, 2011 at 11:35 PM, Harsh J wrote:

    Hello,
    On Wed, May 4, 2011 at 5:13 AM, W.P. McNeill wrote:
    I have a new-API
    Partitioner<
    http://hadoop.apache.org/common/docs/r0.20.0/api/org/apache/hadoop/mapreduce/Partitioner.html
    object
    whose behavior needs to change based on values passed in from the job
    configuration. It looks like there's no way to do this because the
    Partitioner never gets passed a context, and the Hadoop framework creates
    the actual Partitioner instance so I can't write a constructor. How do I
    parameterize my Partitioner?
    You can make your class implement Configurable, and then use the
    Configuration object passed to its methods automatically upon its
    instantiation. This way you can use parameters available in the
    Configuration (of the submitted Job) at least.

    I'm not sure how you could pass a complete 'context' object to it.

    --
    Harsh J
  • Harsh J at May 5, 2011 at 9:17 am
    Hello,
    On Wed, May 4, 2011 at 9:25 PM, W.P. McNeill wrote:
    I only wanted the context as a way of getting at the configuration, so
    making the class implement Configurable will solve my problem.
    Good to know. I believe this ought to be documented. I've opened
    https://issues.apache.org/jira/browse/MAPREDUCE-2474 for the same.

    --
    Harsh J
  • 王志强 at May 5, 2011 at 9:43 am
    Hi, guys
    As the topic shows, how can I use different process methods to process data according to input file name in map function? Ie, May I get the input file name that current process line belong to?
    Austin
  • Harsh J at May 5, 2011 at 9:57 am
    Moving this to mapreduce--user@ since that is more appropriate for
    hadoop-mapreduce questions (bcc: common-user@).

    2011/5/5 王志强 <wangzhiqiang@360.cn>:
    Hi, guys
    As the topic shows, how can I use different process methods to process data according to input file name in map function? Ie, May I get the input file name that current process line belong to?
    Austin
    You can get the filename using these additional JVM properties of the
    MapTasks: http://hadoop.apache.org/common/docs/r0.20.2/mapred_tutorial.html#Task+JVM+Reuse
    [Look at the table right below the link anchor for all properties, and
    map.input.file in particular]

    --
    Harsh J
  • Itzhak Pan at May 5, 2011 at 10:14 am
    Hi,

    I think MultipleInputs may fit your need.

    Yaozhen

    在 2011 5 5 17:43,"王志强" <wangzhiqiang@360.cn>写道:
    Hi, guys
    As the topic shows, how can I use different process methods to process
    data according to input file name in map function? Ie, May I get the input
    file name that current process line belong to?
    Austin
  • W.P. McNeill at May 5, 2011 at 4:03 pm
    The other thing you want to document is that the setConf() function is
    called by the Hadoop reflection utilities whenever it creates a Configurable
    object. I guess that's the only way it could happen, but I still wasn't sure
    how this worked until I stepped through it in the debugger.

    You also should document under what circumstances getConf() would be called.
    I'm still not clear what the scenario would be.
    On Thu, May 5, 2011 at 2:16 AM, Harsh J wrote:

    Hello,
    On Wed, May 4, 2011 at 9:25 PM, W.P. McNeill wrote:
    I only wanted the context as a way of getting at the configuration, so
    making the class implement Configurable will solve my problem.
    Good to know. I believe this ought to be documented. I've opened
    https://issues.apache.org/jira/browse/MAPREDUCE-2474 for the same.

    --
    Harsh J
  • W.P. McNeill at May 6, 2011 at 4:18 pm
    Here is a configurable custom partitioner template along with a discussion
    of when the configurable interface methods are called:
    http://cornercases.wordpress.com/2011/05/06/an-example-configurable-partitioner/
    .
    On Thu, May 5, 2011 at 9:03 AM, W.P. McNeill wrote:

    The other thing you want to document is that the setConf() function is
    called by the Hadoop reflection utilities whenever it creates a Configurable
    object. I guess that's the only way it could happen, but I still wasn't sure
    how this worked until I stepped through it in the debugger.

    You also should document under what circumstances getConf() would be
    called. I'm still not clear what the scenario would be.

    On Thu, May 5, 2011 at 2:16 AM, Harsh J wrote:

    Hello,
    On Wed, May 4, 2011 at 9:25 PM, W.P. McNeill wrote:
    I only wanted the context as a way of getting at the configuration, so
    making the class implement Configurable will solve my problem.
    Good to know. I believe this ought to be documented. I've opened
    https://issues.apache.org/jira/browse/MAPREDUCE-2474 for the same.

    --
    Harsh J

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedMay 3, '11 at 11:43p
activeMay 6, '11 at 4:18p
posts9
users4
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase