FAQ
Hi,

I would like to create an example to read an index file and the data
file that is produced as output in the map function. Can anyone give
me an example, please?

Thanks,
--
Pedro

Search Discussions

  • Gregory Lawrence at Oct 14, 2010 at 5:23 pm
    Pedro,

    I'm not sure I fully understand your question but if you are asking how to read in an index file in addition to the standard job input, you should look into writing your own setup function. It may look something like the following:

    public void setup(Context context) throws IOException, InterruptedException {
    Configuration conf = context.getConfiguration();
    initialize(conf);

    Path path = new Path(fileName);
    FileSystem fs = path.getFileSystem(conf);
    BufferedReader reader = new BufferedReader(new InputStreamReader(fs.open(path)));
    ...

    The setup function should also initialize any necessary data structures (e.g., hash tables). This, of course, assumes that your index file is small enough to fit in memory. You should also look into using the distributed cache option, as it should speed things up, especially when multiple Mapper/Reducer tasks run in sequence on the same machine.

    Regards,
    Greg Lawrence

    On 10/13/10 12:00 PM, "Pedro Costa" wrote:

    Hi,

    I would like to create an example to read an index file and the data
    file that is produced as output in the map function. Can anyone give
    me an example, please?

    Thanks,
    --
    Pedro
  • Pedro Costa at Oct 14, 2010 at 6:36 pm
    - My question is because I would like to read the map output data file
    and I don't know why.
    When I mean I don't know why, it's because I know that the Index file
    contains the information about the start offset, the raw length, and
    the compression length of the data file, and if I want to read the
    data file I also have to pay attention to the type of key and value
    that fills the file. I just would like to build an example to read the
    data file with the help of the index file, and I don't know how to do
    it?

    - What the difference between the
    org.apache.hadoop.mapred.IFile.Reader and the
    org.apache.hadoop.fs.FSDataInputStream?

    Thanks,



    On Thu, Oct 14, 2010 at 6:21 PM, Gregory Lawrence wrote:
    Pedro,

    I’m not sure I fully understand your question but if you are asking how to
    read in an index file in addition to the standard job input, you should look
    into writing your own setup function. It may look something like the
    following:

    public void setup(Context context) throws IOException, InterruptedException
    {
    Configuration conf = context.getConfiguration();
    initialize(conf);

    Path path = new Path(fileName);
    FileSystem fs = path.getFileSystem(conf);
    BufferedReader reader = new BufferedReader(new
    InputStreamReader(fs.open(path)));
    ...

    The setup function should also initialize any necessary data structures
    (e.g., hash tables). This, of course, assumes that your index file is small
    enough to fit in memory. You should also look into using the distributed
    cache option, as it should speed things up, especially when multiple
    Mapper/Reducer tasks run in sequence on the same machine.

    Regards,
    Greg Lawrence

    On 10/13/10 12:00 PM, "Pedro Costa" wrote:

    Hi,

    I would like to create an example to read an index file and the data
    file that is produced as output in the map function. Can anyone give
    me an example, please?

    Thanks,
    --
    Pedro


    --
    Pedro
  • Gregory Lawrence at Oct 14, 2010 at 7:20 pm
    Pedro,

    Could you explain what you mean by index file? Generally speaking, mapper output files are written as text files, sequence files, or some other format. What format uses an additional index file? In my experience, examining the contents of a text or sequence file can be accomplished by typing:

    hadoop fs -text filename.txt

    This should print out the contents in a human-readable format.

    Regards,
    Greg Lawrence

    On 10/14/10 11:35 AM, "Pedro Costa" wrote:

    - My question is because I would like to read the map output data file
    and I don't know why.
    When I mean I don't know why, it's because I know that the Index file
    contains the information about the start offset, the raw length, and
    the compression length of the data file, and if I want to read the
    data file I also have to pay attention to the type of key and value
    that fills the file. I just would like to build an example to read the
    data file with the help of the index file, and I don't know how to do
    it?

    - What the difference between the
    org.apache.hadoop.mapred.IFile.Reader and the
    org.apache.hadoop.fs.FSDataInputStream?

    Thanks,



    On Thu, Oct 14, 2010 at 6:21 PM, Gregory Lawrence wrote:
    Pedro,

    I'm not sure I fully understand your question but if you are asking how to
    read in an index file in addition to the standard job input, you should look
    into writing your own setup function. It may look something like the
    following:

    public void setup(Context context) throws IOException, InterruptedException
    {
    Configuration conf = context.getConfiguration();
    initialize(conf);

    Path path = new Path(fileName);
    FileSystem fs = path.getFileSystem(conf);
    BufferedReader reader = new BufferedReader(new
    InputStreamReader(fs.open(path)));
    ...

    The setup function should also initialize any necessary data structures
    (e.g., hash tables). This, of course, assumes that your index file is small
    enough to fit in memory. You should also look into using the distributed
    cache option, as it should speed things up, especially when multiple
    Mapper/Reducer tasks run in sequence on the same machine.

    Regards,
    Greg Lawrence

    On 10/13/10 12:00 PM, "Pedro Costa" wrote:

    Hi,

    I would like to create an example to read an index file and the data
    file that is produced as output in the map function. Can anyone give
    me an example, please?

    Thanks,
    --
    Pedro


    --
    Pedro

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupmapreduce-user @
categorieshadoop
postedOct 13, '10 at 7:00p
activeOct 14, '10 at 7:20p
posts4
users2
websitehadoop.apache.org...
irc#hadoop

2 users in discussion

Pedro Costa: 2 posts Gregory Lawrence: 2 posts

People

Translate

site design / logo © 2021 Grokbase