FAQ
Is it recommended to install a hadoop cluster on a set of VM's that are all
connected to a SAN?

Thanks,
Travis

Search Discussions

  • GOEKE, MATTHEW (AG/1000) at Aug 15, 2011 at 6:32 pm
    Is this just for testing purposes or are you planning on going into production with this? If it is the latter than I would STRONGLY advise to not give that a second thought due to how the framework handles I/O. However if you are just trying to test out distributed daemon setup and get some ops documentation then have at it :)

    Matt

    -----Original Message-----
    From: Travis Camechis
    Sent: Monday, August 15, 2011 12:45 PM
    To: common-user@hadoop.apache.org
    Subject: hadoop cluster on VM's

    Is it recommended to install a hadoop cluster on a set of VM's that are all
    connected to a SAN?

    Thanks,
    Travis
    This e-mail message may contain privileged and/or confidential information, and is intended to be received only by persons entitled
    to receive such information. If you have received this e-mail in error, please notify the sender immediately. Please delete it and
    all attachments from any servers, hard drives or any other media. Other use of this e-mail by you is strictly prohibited.

    All e-mails and attachments sent and received are subject to monitoring, reading and archival by Monsanto, including its
    subsidiaries. The recipient of this e-mail is solely responsible for checking for the presence of "Viruses" or other "Malware".
    Monsanto, along with its subsidiaries, accepts no liability for any damage caused by any such code transmitted by or accompanying
    this e-mail or any attachment.


    The information contained in this email may be subject to the export control laws and regulations of the United States, potentially
    including but not limited to the Export Administration Regulations (EAR) and sanctions regulations issued by the U.S. Department of
    Treasury, Office of Foreign Asset Controls (OFAC). As a recipient of this information you are obligated to comply with all
    applicable U.S. export laws and regulations.
  • Liam Friel at Aug 15, 2011 at 8:04 pm

    On Mon, Aug 15, 2011 at 7:31 PM, GOEKE, MATTHEW (AG/1000) wrote:

    Is this just for testing purposes or are you planning on going into
    production with this? If it is the latter than I would STRONGLY advise to
    not give that a second thought due to how the framework handles I/O. However
    if you are just trying to test out distributed daemon setup and get some ops
    documentation then have at it :)

    Matt

    -----Original Message-----
    From: Travis Camechis
    Sent: Monday, August 15, 2011 12:45 PM
    To: common-user@hadoop.apache.org
    Subject: hadoop cluster on VM's

    Is it recommended to install a hadoop cluster on a set of VM's that are all
    connected to a SAN?
    Could you expand on that? Do you mean multiple VMs on a single server are a
    no-no?
    Or do you mean running Hadoop on something like Amazon EC2 for production is
    also a no-no?
    With some pointers to background if the latter please ...

    Just for my education. I have run some (test I guess you could call them)
    Hadoop clusters on EC2 and it was working OK.
    However I didn't have the equivalent pile of physical hardware lying around
    to do a comparison ... which I guess is why EC2 is so attractive.

    Ta
    Liam
  • Travis Camechis at Aug 15, 2011 at 8:06 pm
    My suspicion is correct it is not a good idea but mainly I was talking about
    a Vmware VSphere setup with the whole vmotion thing.
    On Mon, Aug 15, 2011 at 4:03 PM, Liam Friel wrote:

    On Mon, Aug 15, 2011 at 7:31 PM, GOEKE, MATTHEW (AG/1000) <
    matthew.goeke@monsanto.com> wrote:
    Is this just for testing purposes or are you planning on going into
    production with this? If it is the latter than I would STRONGLY advise to
    not give that a second thought due to how the framework handles I/O. However
    if you are just trying to test out distributed daemon setup and get some ops
    documentation then have at it :)

    Matt

    -----Original Message-----
    From: Travis Camechis
    Sent: Monday, August 15, 2011 12:45 PM
    To: common-user@hadoop.apache.org
    Subject: hadoop cluster on VM's

    Is it recommended to install a hadoop cluster on a set of VM's that are all
    connected to a SAN?
    Could you expand on that? Do you mean multiple VMs on a single server are a
    no-no?
    Or do you mean running Hadoop on something like Amazon EC2 for production
    is
    also a no-no?
    With some pointers to background if the latter please ...

    Just for my education. I have run some (test I guess you could call them)
    Hadoop clusters on EC2 and it was working OK.
    However I didn't have the equivalent pile of physical hardware lying around
    to do a comparison ... which I guess is why EC2 is so attractive.

    Ta
    Liam
  • GOEKE, MATTHEW (AG/1000) at Aug 15, 2011 at 8:16 pm
    I was referring to multiple VM's on a single machine (that you have in house) for my previous comment and not EC2. FWIW, I would rather see a single heavy data node than to partition off a single box into multiple machines unless you are trying to do more on that server than just Hadoop. Obviously every person / company has their own constraints but if this box is solely for Hadoop then don't partition it otherwise you will incur a decent loss in possible map/reduce slots.

    Matt

    -----Original Message-----
    From: Liam Friel
    Sent: Monday, August 15, 2011 3:04 PM
    To: common-user@hadoop.apache.org
    Subject: Re: hadoop cluster on VM's
    On Mon, Aug 15, 2011 at 7:31 PM, GOEKE, MATTHEW (AG/1000) wrote:

    Is this just for testing purposes or are you planning on going into
    production with this? If it is the latter than I would STRONGLY advise to
    not give that a second thought due to how the framework handles I/O. However
    if you are just trying to test out distributed daemon setup and get some ops
    documentation then have at it :)

    Matt

    -----Original Message-----
    From: Travis Camechis
    Sent: Monday, August 15, 2011 12:45 PM
    To: common-user@hadoop.apache.org
    Subject: hadoop cluster on VM's

    Is it recommended to install a hadoop cluster on a set of VM's that are all
    connected to a SAN?
    Could you expand on that? Do you mean multiple VMs on a single server are a
    no-no?
    Or do you mean running Hadoop on something like Amazon EC2 for production is
    also a no-no?
    With some pointers to background if the latter please ...

    Just for my education. I have run some (test I guess you could call them)
    Hadoop clusters on EC2 and it was working OK.
    However I didn't have the equivalent pile of physical hardware lying around
    to do a comparison ... which I guess is why EC2 is so attractive.

    Ta
    Liam
    This e-mail message may contain privileged and/or confidential information, and is intended to be received only by persons entitled
    to receive such information. If you have received this e-mail in error, please notify the sender immediately. Please delete it and
    all attachments from any servers, hard drives or any other media. Other use of this e-mail by you is strictly prohibited.

    All e-mails and attachments sent and received are subject to monitoring, reading and archival by Monsanto, including its
    subsidiaries. The recipient of this e-mail is solely responsible for checking for the presence of "Viruses" or other "Malware".
    Monsanto, along with its subsidiaries, accepts no liability for any damage caused by any such code transmitted by or accompanying
    this e-mail or any attachment.


    The information contained in this email may be subject to the export control laws and regulations of the United States, potentially
    including but not limited to the Export Administration Regulations (EAR) and sanctions regulations issued by the U.S. Department of
    Treasury, Office of Foreign Asset Controls (OFAC). As a recipient of this information you are obligated to comply with all
    applicable U.S. export laws and regulations.
  • Travis Camechis at Aug 15, 2011 at 8:17 pm
    agreed
    On Mon, Aug 15, 2011 at 4:15 PM, GOEKE, MATTHEW (AG/1000) wrote:

    I was referring to multiple VM's on a single machine (that you have in
    house) for my previous comment and not EC2. FWIW, I would rather see a
    single heavy data node than to partition off a single box into multiple
    machines unless you are trying to do more on that server than just Hadoop.
    Obviously every person / company has their own constraints but if this box
    is solely for Hadoop then don't partition it otherwise you will incur a
    decent loss in possible map/reduce slots.

    Matt

    -----Original Message-----
    From: Liam Friel
    Sent: Monday, August 15, 2011 3:04 PM
    To: common-user@hadoop.apache.org
    Subject: Re: hadoop cluster on VM's

    On Mon, Aug 15, 2011 at 7:31 PM, GOEKE, MATTHEW (AG/1000) <
    matthew.goeke@monsanto.com> wrote:
    Is this just for testing purposes or are you planning on going into
    production with this? If it is the latter than I would STRONGLY advise to
    not give that a second thought due to how the framework handles I/O. However
    if you are just trying to test out distributed daemon setup and get some ops
    documentation then have at it :)

    Matt

    -----Original Message-----
    From: Travis Camechis
    Sent: Monday, August 15, 2011 12:45 PM
    To: common-user@hadoop.apache.org
    Subject: hadoop cluster on VM's

    Is it recommended to install a hadoop cluster on a set of VM's that are all
    connected to a SAN?
    Could you expand on that? Do you mean multiple VMs on a single server are a
    no-no?
    Or do you mean running Hadoop on something like Amazon EC2 for production
    is
    also a no-no?
    With some pointers to background if the latter please ...

    Just for my education. I have run some (test I guess you could call them)
    Hadoop clusters on EC2 and it was working OK.
    However I didn't have the equivalent pile of physical hardware lying around
    to do a comparison ... which I guess is why EC2 is so attractive.

    Ta
    Liam
    This e-mail message may contain privileged and/or confidential information,
    and is intended to be received only by persons entitled
    to receive such information. If you have received this e-mail in error,
    please notify the sender immediately. Please delete it and
    all attachments from any servers, hard drives or any other media. Other use
    of this e-mail by you is strictly prohibited.

    All e-mails and attachments sent and received are subject to monitoring,
    reading and archival by Monsanto, including its
    subsidiaries. The recipient of this e-mail is solely responsible for
    checking for the presence of "Viruses" or other "Malware".
    Monsanto, along with its subsidiaries, accepts no liability for any damage
    caused by any such code transmitted by or accompanying
    this e-mail or any attachment.


    The information contained in this email may be subject to the export
    control laws and regulations of the United States, potentially
    including but not limited to the Export Administration Regulations (EAR)
    and sanctions regulations issued by the U.S. Department of
    Treasury, Office of Foreign Asset Controls (OFAC). As a recipient of this
    information you are obligated to comply with all
    applicable U.S. export laws and regulations.
  • Liam Friel at Aug 15, 2011 at 8:24 pm

    On Mon, Aug 15, 2011 at 9:15 PM, GOEKE, MATTHEW (AG/1000) wrote:

    I was referring to multiple VM's on a single machine (that you have in
    house) for my previous comment and not EC2. FWIW, I would rather see a
    single heavy data node than to partition off a single box into multiple
    machines unless you are trying to do more on that server than just Hadoop.
    Obviously every person / company has their own constraints but if this box
    is solely for Hadoop then don't partition it otherwise you will incur a
    decent loss in possible map/reduce slots.

    Matt
    OK. Makes sense (I wouldn't try that either, for production).
    Thanks
    Liam
  • GOEKE, MATTHEW (AG/1000) at Aug 15, 2011 at 9:01 pm
    Does anyone have any code examples for how they persist join data across multiple input splits and how they test it? Currently I populate a singleton in the setup method of my mapper (along with having jvm reuse turned on for this job) but with no way to have dependency injection into the mapper I am really having a hard time with wrapping a UT around the code. I could have a package scoped setter simply for testing purposes but that just feels dirty to be honest. Any help is greatly appreciated and I have both MRUnit and Mockito at my disposal.

    private BitPackedMarkerMap markerMap = BitPackedMarkerMapSingleton.getInstance().getMarkerMap();
    private int numberOfIndividuals = -999;
    private int numberOfAlleles = -999;

    @Override
    protected void setup(Context context) throws IOException, InterruptedException {
    LongPackedDoubleInteger inputSizes;
    if(markerMap.getSize() == 0){
    FileInputStream scoresInputStream = null;
    try{
    Path[] cacheFiles = DistributedCache.getLocalCacheFiles(context.getConfiguration());
    if (cacheFiles != null && cacheFiles.length > 0){
    scoresInputStream = new FileInputStream(cacheFiles[0].toString());
    inputSizes = markerMap.parse(scoresInputStream);
    numberOfIndividuals = inputSizes.getInt1();
    numberOfAlleles = inputSizes.getInt2();
    }
    } catch (IOException e){
    System.err.println("Exception reading DistributedCache: " + e);
    throw e;
    }finally {
    if(scoresInputStream != null){
    scoresInputStream.close();
    }
    }
    }
    }




    Matt
    This e-mail message may contain privileged and/or confidential information, and is intended to be received only by persons entitled
    to receive such information. If you have received this e-mail in error, please notify the sender immediately. Please delete it and
    all attachments from any servers, hard drives or any other media. Other use of this e-mail by you is strictly prohibited.

    All e-mails and attachments sent and received are subject to monitoring, reading and archival by Monsanto, including its
    subsidiaries. The recipient of this e-mail is solely responsible for checking for the presence of "Viruses" or other "Malware".
    Monsanto, along with its subsidiaries, accepts no liability for any damage caused by any such code transmitted by or accompanying
    this e-mail or any attachment.


    The information contained in this email may be subject to the export control laws and regulations of the United States, potentially
    including but not limited to the Export Administration Regulations (EAR) and sanctions regulations issued by the U.S. Department of
    Treasury, Office of Foreign Asset Controls (OFAC). As a recipient of this information you are obligated to comply with all
    applicable U.S. export laws and regulations.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedAug 15, '11 at 5:45p
activeAug 15, '11 at 9:01p
posts8
users3
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase