FAQ
I need to use the output of the reduce, but I don't know how to do.
use the wordcount program as an example if i want to collect the wordcount
into a hashtable for further use, how can i do?
the example just show how to let the result onto disk.
myemail is : andy2005cst@gmail.com
looking forward your help. thanks a lot.
--
View this message in context: http://www.nabble.com/HELP%3A-I-wanna-store-the-output-value-into-a-list-not-write-to-the-disk-tp22844277p22844277.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.

Search Discussions

  • Rasit OZDAS at Apr 2, 2009 at 1:02 pm
    Hi, hadoop is normally designed to write to disk. There are a special file
    format, which writes output to RAM instead of disk.
    But I don't have an idea if it's what you're looking for.
    If what you said exists, there should be a mechanism which sends output as
    objects rather than file content across computers, as far as I know there is
    no such feature yet.

    Good luck.

    2009/4/2 andy2005cst <andy2005cst@gmail.com>
    I need to use the output of the reduce, but I don't know how to do.
    use the wordcount program as an example if i want to collect the wordcount
    into a hashtable for further use, how can i do?
    the example just show how to let the result onto disk.
    myemail is : andy2005cst@gmail.com
    looking forward your help. thanks a lot.
    --
    View this message in context:
    http://www.nabble.com/HELP%3A-I-wanna-store-the-output-value-into-a-list-not-write-to-the-disk-tp22844277p22844277.html
    Sent from the Hadoop core-user mailing list archive at Nabble.com.

    --
    M. Raşit ÖZDAŞ
  • Andy2005cst at Apr 2, 2009 at 1:40 pm
    thanks for your reply. Let me explain more clearly, since Map Reduce is just
    one step of my program, I need to use the output of reduce for furture
    computation, so i do not need to want to wirte the output into disk, but
    wanna to get the collection or list of the output in RAM. if it directly
    wirtes into disk, I have to read it back into RAM again.
    you have mentioned a special file format, will you please show me what is
    it? and give some example if possible.

    thank you so much.


    Rasit OZDAS wrote:
    Hi, hadoop is normally designed to write to disk. There are a special file
    format, which writes output to RAM instead of disk.
    But I don't have an idea if it's what you're looking for.
    If what you said exists, there should be a mechanism which sends output as
    objects rather than file content across computers, as far as I know there
    is
    no such feature yet.

    Good luck.

    2009/4/2 andy2005cst <andy2005cst@gmail.com>
    I need to use the output of the reduce, but I don't know how to do.
    use the wordcount program as an example if i want to collect the
    wordcount
    into a hashtable for further use, how can i do?
    the example just show how to let the result onto disk.
    myemail is : andy2005cst@gmail.com
    looking forward your help. thanks a lot.
    --
    View this message in context:
    http://www.nabble.com/HELP%3A-I-wanna-store-the-output-value-into-a-list-not-write-to-the-disk-tp22844277p22844277.html
    Sent from the Hadoop core-user mailing list archive at Nabble.com.

    --
    M. Raşit ÖZDAŞ
    --
    View this message in context: http://www.nabble.com/HELP%3A-I-wanna-store-the-output-value-into-a-list-not-write-to-the-disk-tp22844277p22848070.html
    Sent from the Hadoop core-user mailing list archive at Nabble.com.
  • Rasit OZDAS at Apr 2, 2009 at 3:22 pm
    Andy, I didn't try this feature. But I know that Yahoo had a
    performance record with this file format.
    I came across a file system included in hadoop code (probably that
    one) when searching the source code.
    Luckily I found it: org.apache.hadoop.fs.InMemoryFileSystem
    But if you have a lot of big files, this approach won't be suitable I think.

    Maybe someone can give further info.

    2009/4/2 andy2005cst <andy2005cst@gmail.com>:
    thanks for your reply. Let me explain more clearly, since Map Reduce is just
    one step of my program, I need to use the output of reduce for furture
    computation, so i do not need to want to wirte the output into disk, but
    wanna to get the collection or list of the output in RAM. if it directly
    wirtes into disk, I have to read it back into RAM again.
    you have mentioned a special file format, will you please show me what is
    it? and give some example if possible.

    thank you so much.


    Rasit OZDAS wrote:
    Hi, hadoop is normally designed to write to disk. There are a special file
    format, which writes output to RAM instead of disk.
    But I don't have an idea if it's what you're looking for.
    If what you said exists, there should be a mechanism which sends output as
    objects rather than file content across computers, as far as I know there
    is
    no such feature yet.

    Good luck.

    2009/4/2 andy2005cst <andy2005cst@gmail.com>
    I need to use the output of the reduce, but I don't know how to do.
    use the wordcount program as an example if i want to collect the
    wordcount
    into a hashtable for further use, how can i do?
    the example just show how to let the result onto disk.
    myemail is : andy2005cst@gmail.com
    looking forward your help. thanks a lot.
    --
    View this message in context:
    http://www.nabble.com/HELP%3A-I-wanna-store-the-output-value-into-a-list-not-write-to-the-disk-tp22844277p22844277.html
    Sent from the Hadoop core-user mailing list archive at Nabble.com.

    --
    M. Raşit ÖZDAŞ
    --
    View this message in context: http://www.nabble.com/HELP%3A-I-wanna-store-the-output-value-into-a-list-not-write-to-the-disk-tp22844277p22848070.html
    Sent from the Hadoop core-user mailing list archive at Nabble.com.


    --
    M. Raşit ÖZDAŞ
  • Farhan Husain at Apr 2, 2009 at 5:12 pm
    Is there a way to implement some OutputCollector that can do what Andy wants
    to do?
    On Thu, Apr 2, 2009 at 10:21 AM, Rasit OZDAS wrote:

    Andy, I didn't try this feature. But I know that Yahoo had a
    performance record with this file format.
    I came across a file system included in hadoop code (probably that
    one) when searching the source code.
    Luckily I found it: org.apache.hadoop.fs.InMemoryFileSystem
    But if you have a lot of big files, this approach won't be suitable I
    think.

    Maybe someone can give further info.

    2009/4/2 andy2005cst <andy2005cst@gmail.com>:
    thanks for your reply. Let me explain more clearly, since Map Reduce is just
    one step of my program, I need to use the output of reduce for furture
    computation, so i do not need to want to wirte the output into disk, but
    wanna to get the collection or list of the output in RAM. if it directly
    wirtes into disk, I have to read it back into RAM again.
    you have mentioned a special file format, will you please show me what is
    it? and give some example if possible.

    thank you so much.


    Rasit OZDAS wrote:
    Hi, hadoop is normally designed to write to disk. There are a special
    file
    format, which writes output to RAM instead of disk.
    But I don't have an idea if it's what you're looking for.
    If what you said exists, there should be a mechanism which sends output
    as
    objects rather than file content across computers, as far as I know
    there
    is
    no such feature yet.

    Good luck.

    2009/4/2 andy2005cst <andy2005cst@gmail.com>
    I need to use the output of the reduce, but I don't know how to do.
    use the wordcount program as an example if i want to collect the
    wordcount
    into a hashtable for further use, how can i do?
    the example just show how to let the result onto disk.
    myemail is : andy2005cst@gmail.com
    looking forward your help. thanks a lot.
    --
    View this message in context:
    http://www.nabble.com/HELP%3A-I-wanna-store-the-output-value-into-a-list-not-write-to-the-disk-tp22844277p22844277.html
    Sent from the Hadoop core-user mailing list archive at Nabble.com.

    --
    M. Raşit ÖZDAŞ
    --
    View this message in context:
    http://www.nabble.com/HELP%3A-I-wanna-store-the-output-value-into-a-list-not-write-to-the-disk-tp22844277p22848070.html
    Sent from the Hadoop core-user mailing list archive at Nabble.com.


    --
    M. Raşit ÖZDAŞ


    --
    Mohammad Farhan Husain
    Research Assistant
    Department of Computer Science
    Erik Jonsson School of Engineering and Computer Science
    University of Texas at Dallas
  • He Chen at Apr 2, 2009 at 6:59 pm
    It seems like the InMemoryFileSystem class has been deprecated in Hadoop
    0.19.1. Why?

    I want to reuse the result of reduce as the next time map's input. Cascading
    does not work, because the data of each step is dependent. I set each
    timestep mapreduce job as synchronization. If the InMemoryFileSystem is
    deprecated. How can I reduce the I/O for each timestep's mapreduce job.

    2009/4/2 Farhan Husain <russoue@gmail.com>
    Is there a way to implement some OutputCollector that can do what Andy
    wants
    to do?
    On Thu, Apr 2, 2009 at 10:21 AM, Rasit OZDAS wrote:

    Andy, I didn't try this feature. But I know that Yahoo had a
    performance record with this file format.
    I came across a file system included in hadoop code (probably that
    one) when searching the source code.
    Luckily I found it: org.apache.hadoop.fs.InMemoryFileSystem
    But if you have a lot of big files, this approach won't be suitable I
    think.

    Maybe someone can give further info.

    2009/4/2 andy2005cst <andy2005cst@gmail.com>:
    thanks for your reply. Let me explain more clearly, since Map Reduce is just
    one step of my program, I need to use the output of reduce for furture
    computation, so i do not need to want to wirte the output into disk,
    but
    wanna to get the collection or list of the output in RAM. if it
    directly
    wirtes into disk, I have to read it back into RAM again.
    you have mentioned a special file format, will you please show me what
    is
    it? and give some example if possible.

    thank you so much.


    Rasit OZDAS wrote:
    Hi, hadoop is normally designed to write to disk. There are a special
    file
    format, which writes output to RAM instead of disk.
    But I don't have an idea if it's what you're looking for.
    If what you said exists, there should be a mechanism which sends
    output
    as
    objects rather than file content across computers, as far as I know
    there
    is
    no such feature yet.

    Good luck.

    2009/4/2 andy2005cst <andy2005cst@gmail.com>
    I need to use the output of the reduce, but I don't know how to do.
    use the wordcount program as an example if i want to collect the
    wordcount
    into a hashtable for further use, how can i do?
    the example just show how to let the result onto disk.
    myemail is : andy2005cst@gmail.com
    looking forward your help. thanks a lot.
    --
    View this message in context:
    http://www.nabble.com/HELP%3A-I-wanna-store-the-output-value-into-a-list-not-write-to-the-disk-tp22844277p22844277.html
    Sent from the Hadoop core-user mailing list archive at Nabble.com.

    --
    M. Raşit ÖZDAŞ
    --
    View this message in context:
    http://www.nabble.com/HELP%3A-I-wanna-store-the-output-value-into-a-list-not-write-to-the-disk-tp22844277p22848070.html
    Sent from the Hadoop core-user mailing list archive at Nabble.com.


    --
    M. Raşit ÖZDAŞ


    --
    Mohammad Farhan Husain
    Research Assistant
    Department of Computer Science
    Erik Jonsson School of Engineering and Computer Science
    University of Texas at Dallas


    --
    Chen He
    RCF CSE Dept.
    University of Nebraska-Lincoln
    US
  • Owen O'Malley at Apr 2, 2009 at 3:39 pm

    On Apr 2, 2009, at 2:41 AM, andy2005cst wrote:
    I need to use the output of the reduce, but I don't know how to do.
    use the wordcount program as an example if i want to collect the
    wordcount
    into a hashtable for further use, how can i do?
    You can use an output format and then an input format that uses a
    database, but in practice, the cost of writing to hdfs and reading it
    back is not a problem, especially if you set the replication of the
    output files to 1. (You'll need to re-run the job if you lose a node,
    but it will be fast.)

    -- Owen
  • Rasit OZDAS at Apr 2, 2009 at 3:45 pm
    That seems interesting, we have 3 replications as default.
    Is there a way to define, lets say, 1 replication for only job-specific files?

    2009/4/2 Owen O'Malley <omalley@apache.org>:
    On Apr 2, 2009, at 2:41 AM, andy2005cst wrote:


    I need to use the output of the reduce, but I don't know how to do.
    use the wordcount program as an example if i want to collect the wordcount
    into a hashtable for further use, how can i do?
    You can use an output format and then an input format that uses a database,
    but in practice, the cost of writing to hdfs and reading it back is not a
    problem, especially if you set the replication of the output files to 1.
    (You'll need to re-run the job if you lose a node, but it will be fast.)

    -- Owen


    --
    M. Raşit ÖZDAŞ
  • Bryan Duxbury at Apr 2, 2009 at 4:25 pm
    I don't really see what the downside of reading it from disk is. A
    list of word counts should be pretty small on disk so it shouldn't
    take long to read it into a HashMap. Doing anything else is going to
    cause you to go a long way out of your way to end up with the same
    result.

    -Bryan
    On Apr 2, 2009, at 2:41 AM, andy2005cst wrote:


    I need to use the output of the reduce, but I don't know how to do.
    use the wordcount program as an example if i want to collect the
    wordcount
    into a hashtable for further use, how can i do?
    the example just show how to let the result onto disk.
    myemail is : andy2005cst@gmail.com
    looking forward your help. thanks a lot.
    --
    View this message in context: http://www.nabble.com/HELP%3A-I-wanna-
    store-the-output-value-into-a-list-not-write-to-the-disk-
    tp22844277p22844277.html
    Sent from the Hadoop core-user mailing list archive at Nabble.com.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedApr 2, '09 at 9:41a
activeApr 2, '09 at 6:59p
posts9
users6
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase