FAQ
I have a need to write information retrieved from a database to a series of
files that need to be made available to my mappers. Because each mapper
needs access to all of these files, I want to put them in the
DistributedCache. Is there a preferred method to writing new information to
the DistributedCache? I can use Java's File.createTempFile(String prefix,
String suffix), but that uses the system default temporary folder. While
that should usually work, I'd rather have a method that doesn't depend on
writing to the local file system before copying files to the
DistributedCache. As I'm extremely new to Hadoop, I hope I'm not missing
something obvious.

Thank you for your time.

Search Discussions

  • GOEKE, MATTHEW (AG/1000) at Sep 27, 2011 at 10:01 pm
    The simplest route I can think of is to ingest the data directly into HDFS using Sqoop if there is a driver currently made for your database. At that point it would be relatively simple just to read directly from HDFS in your MR code.

    Matt

    -----Original Message-----
    From: lessonz
    Sent: Tuesday, September 27, 2011 4:48 PM
    To: common-user@hadoop.apache.org
    Subject: Temporary Files to be sent to DistributedCache

    I have a need to write information retrieved from a database to a series of
    files that need to be made available to my mappers. Because each mapper
    needs access to all of these files, I want to put them in the
    DistributedCache. Is there a preferred method to writing new information to
    the DistributedCache? I can use Java's File.createTempFile(String prefix,
    String suffix), but that uses the system default temporary folder. While
    that should usually work, I'd rather have a method that doesn't depend on
    writing to the local file system before copying files to the
    DistributedCache. As I'm extremely new to Hadoop, I hope I'm not missing
    something obvious.

    Thank you for your time.
    This e-mail message may contain privileged and/or confidential information, and is intended to be received only by persons entitled
    to receive such information. If you have received this e-mail in error, please notify the sender immediately. Please delete it and
    all attachments from any servers, hard drives or any other media. Other use of this e-mail by you is strictly prohibited.

    All e-mails and attachments sent and received are subject to monitoring, reading and archival by Monsanto, including its
    subsidiaries. The recipient of this e-mail is solely responsible for checking for the presence of "Viruses" or other "Malware".
    Monsanto, along with its subsidiaries, accepts no liability for any damage caused by any such code transmitted by or accompanying
    this e-mail or any attachment.


    The information contained in this email may be subject to the export control laws and regulations of the United States, potentially
    including but not limited to the Export Administration Regulations (EAR) and sanctions regulations issued by the U.S. Department of
    Treasury, Office of Foreign Asset Controls (OFAC). As a recipient of this information you are obligated to comply with all
    applicable U.S. export laws and regulations.
  • Lessonz at Sep 28, 2011 at 4:28 am
    So, I thought about that, and I'd considered writing to the HDFS and then
    copying the file into the DistributedCache so each mapper/reducer doesn't
    have to reach into the HDFS for these files. Is that the "best" way to
    handle this?
    On Tue, Sep 27, 2011 at 4:01 PM, GOEKE, MATTHEW (AG/1000) wrote:

    The simplest route I can think of is to ingest the data directly into HDFS
    using Sqoop if there is a driver currently made for your database. At that
    point it would be relatively simple just to read directly from HDFS in your
    MR code.

    Matt

    -----Original Message-----
    From: lessonz
    Sent: Tuesday, September 27, 2011 4:48 PM
    To: common-user@hadoop.apache.org
    Subject: Temporary Files to be sent to DistributedCache

    I have a need to write information retrieved from a database to a series of
    files that need to be made available to my mappers. Because each mapper
    needs access to all of these files, I want to put them in the
    DistributedCache. Is there a preferred method to writing new information to
    the DistributedCache? I can use Java's File.createTempFile(String prefix,
    String suffix), but that uses the system default temporary folder. While
    that should usually work, I'd rather have a method that doesn't depend on
    writing to the local file system before copying files to the
    DistributedCache. As I'm extremely new to Hadoop, I hope I'm not missing
    something obvious.

    Thank you for your time.
    This e-mail message may contain privileged and/or confidential information,
    and is intended to be received only by persons entitled
    to receive such information. If you have received this e-mail in error,
    please notify the sender immediately. Please delete it and
    all attachments from any servers, hard drives or any other media. Other use
    of this e-mail by you is strictly prohibited.

    All e-mails and attachments sent and received are subject to monitoring,
    reading and archival by Monsanto, including its
    subsidiaries. The recipient of this e-mail is solely responsible for
    checking for the presence of "Viruses" or other "Malware".
    Monsanto, along with its subsidiaries, accepts no liability for any damage
    caused by any such code transmitted by or accompanying
    this e-mail or any attachment.


    The information contained in this email may be subject to the export
    control laws and regulations of the United States, potentially
    including but not limited to the Export Administration Regulations (EAR)
    and sanctions regulations issued by the U.S. Department of
    Treasury, Office of Foreign Asset Controls (OFAC). As a recipient of this
    information you are obligated to comply with all
    applicable U.S. export laws and regulations.
  • Linden Hillenbrand at Sep 28, 2011 at 4:30 am
    Most likely the easiest and fastest way as you will be leveraging the
    distributed ingestion of Sqoop, rather than a single-thread import some
    other way.
    On Wed, Sep 28, 2011 at 12:27 AM, lessonz wrote:

    So, I thought about that, and I'd considered writing to the HDFS and then
    copying the file into the DistributedCache so each mapper/reducer doesn't
    have to reach into the HDFS for these files. Is that the "best" way to
    handle this?

    On Tue, Sep 27, 2011 at 4:01 PM, GOEKE, MATTHEW (AG/1000) <
    matthew.goeke@monsanto.com> wrote:
    The simplest route I can think of is to ingest the data directly into HDFS
    using Sqoop if there is a driver currently made for your database. At that
    point it would be relatively simple just to read directly from HDFS in your
    MR code.

    Matt

    -----Original Message-----
    From: lessonz
    Sent: Tuesday, September 27, 2011 4:48 PM
    To: common-user@hadoop.apache.org
    Subject: Temporary Files to be sent to DistributedCache

    I have a need to write information retrieved from a database to a series of
    files that need to be made available to my mappers. Because each mapper
    needs access to all of these files, I want to put them in the
    DistributedCache. Is there a preferred method to writing new information to
    the DistributedCache? I can use Java's File.createTempFile(String prefix,
    String suffix), but that uses the system default temporary folder. While
    that should usually work, I'd rather have a method that doesn't depend on
    writing to the local file system before copying files to the
    DistributedCache. As I'm extremely new to Hadoop, I hope I'm not missing
    something obvious.

    Thank you for your time.
    This e-mail message may contain privileged and/or confidential
    information,
    and is intended to be received only by persons entitled
    to receive such information. If you have received this e-mail in error,
    please notify the sender immediately. Please delete it and
    all attachments from any servers, hard drives or any other media. Other use
    of this e-mail by you is strictly prohibited.

    All e-mails and attachments sent and received are subject to monitoring,
    reading and archival by Monsanto, including its
    subsidiaries. The recipient of this e-mail is solely responsible for
    checking for the presence of "Viruses" or other "Malware".
    Monsanto, along with its subsidiaries, accepts no liability for any damage
    caused by any such code transmitted by or accompanying
    this e-mail or any attachment.


    The information contained in this email may be subject to the export
    control laws and regulations of the United States, potentially
    including but not limited to the Export Administration Regulations (EAR)
    and sanctions regulations issued by the U.S. Department of
    Treasury, Office of Foreign Asset Controls (OFAC). As a recipient of this
    information you are obligated to comply with all
    applicable U.S. export laws and regulations.


    --
    Linden Hillenbrand
    Customer Operations Engineer

    Phone: 650.644.3900 x4946
    Email: linden@cloudera.com
    Twitter: @lhillenbrand
    Data: http://www.cloudera.com

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedSep 27, '11 at 9:48p
activeSep 28, '11 at 4:30a
posts4
users3
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase