FAQ
I am just getting started with HBase. Thinking about using it for
future Nutch development. Have successfully build with ant scripts. In
the src I see conf and bin directories similar to Hadoop. But in the
build I don't see those. Is there a build that I can drop into a
directory that contains the conf, webapps, libs, etc. or do I need to
pull all of that together myself?

Dennis

Search Discussions

  • Michael Stack at Oct 18, 2007 at 6:52 pm
    Should we be making a runnable hbase at $HADOOP_HOME/build/contrib/hbase
    Dennis?

    If you run the package target from $HADOOP_HOME/build.xml, under
    $HADOOP_HOME/build it makes a hadoop-X.X.X directory. In its src, there
    is a contrib/hbase with the lib, bin, etc. You can run hbase from here.

    Let us know and we'll fix...
    St.Ack


    Dennis Kubes wrote:
    I am just getting started with HBase. Thinking about using it for
    future Nutch development. Have successfully build with ant scripts.
    In the src I see conf and bin directories similar to Hadoop. But in
    the build I don't see those. Is there a build that I can drop into a
    directory that contains the conf, webapps, libs, etc. or do I need to
    pull all of that together myself?

    Dennis
  • Dennis Kubes at Oct 18, 2007 at 9:24 pm
    Yeah, except for that still doesn't have the hadoop-0.16.0-dev-hbase.jar
    , would those be dropped into the hadoop installation or the root hbase
    directory? From this build it looks like we are keeping it hbase
    separate from the hadoop build, kinda. If we are keeping the HBase
    install separate then it would be best to have a runnable HBase drop
    such as copy and go. If you want I can modify the ant script for this?

    Dennis Kubes

    Michael Stack wrote:
    Should we be making a runnable hbase at
    $HADOOP_HOME/build/contrib/hbase Dennis?

    If you run the package target from $HADOOP_HOME/build.xml, under
    $HADOOP_HOME/build it makes a hadoop-X.X.X directory. In its src,
    there is a contrib/hbase with the lib, bin, etc. You can run hbase
    from here.

    Let us know and we'll fix...
    St.Ack


    Dennis Kubes wrote:
    I am just getting started with HBase. Thinking about using it for
    future Nutch development. Have successfully build with ant scripts.
    In the src I see conf and bin directories similar to Hadoop. But in
    the build I don't see those. Is there a build that I can drop into
    a directory that contains the conf, webapps, libs, etc. or do I need
    to pull all of that together myself?

    Dennis
  • Michael Stack at Oct 19, 2007 at 1:12 am
    It doesn't have the hadoop*hbase.jar but it has the hbase classes.
    Pattern for contribs seems to be that contrib jars gets built into the
    contrib directory (See product of the package build). Presumption
    to-date is that hbase is hosted inside of hadoop (The start scripts
    will look in dirs above for hadoop jars and libs, scripts and configs).

    Have a go at the build script Dennis. A patch is probably best way of
    you getting your point across.

    Good stuff,
    St.Ack


    Dennis Kubes wrote:
    Yeah, except for that still doesn't have the
    hadoop-0.16.0-dev-hbase.jar , would those be dropped into the hadoop
    installation or the root hbase directory? From this build it looks
    like we are keeping it hbase separate from the hadoop build, kinda.
    If we are keeping the HBase install separate then it would be best to
    have a runnable HBase drop such as copy and go. If you want I can
    modify the ant script for this?

    Dennis Kubes

    Michael Stack wrote:
    Should we be making a runnable hbase at
    $HADOOP_HOME/build/contrib/hbase Dennis?

    If you run the package target from $HADOOP_HOME/build.xml, under
    $HADOOP_HOME/build it makes a hadoop-X.X.X directory. In its src,
    there is a contrib/hbase with the lib, bin, etc. You can run hbase
    from here.

    Let us know and we'll fix...
    St.Ack


    Dennis Kubes wrote:
    I am just getting started with HBase. Thinking about using it for
    future Nutch development. Have successfully build with ant
    scripts. In the src I see conf and bin directories similar to
    Hadoop. But in the build I don't see those. Is there a build that
    I can drop into a directory that contains the conf, webapps, libs,
    etc. or do I need to pull all of that together myself?

    Dennis
  • Andrzej Bialecki at Oct 19, 2007 at 10:54 am

    Dennis Kubes wrote:
    I am just getting started with HBase. Thinking about using it for
    future Nutch development. Have successfully build with ant scripts. In
    the src I see conf and bin directories similar to Hadoop. But in the
    build I don't see those. Is there a build that I can drop into a
    directory that contains the conf, webapps, libs, etc. or do I need to
    pull all of that together myself?
    I'm also planning to start experimenting with HBase.

    If I'm not mistaken, there is no way right now to use HBase in a "local"
    mode similar to the Hadoop "local" mode, where we don't have to start
    any daemons and all necessary infrastructure runs inside a single JVM.
    What would it take to implement such mode? Would it require big changes
    to the codebase?

    The reason I'm asking is that we're considering using HBase in Nutch.
    However, Nutch is quite often used in small single machine installation,
    where it uses the "local" mode, i.e. LocalFileSystem and
    LocalJobTracker. This simplified mode of operation is attractive in case
    of such small installations, where the ease of use outweighs scalability
    and performance concerns.

    --
    Best regards,
    Andrzej Bialecki <><
    ___. ___ ___ ___ _ _ __________________________________
    [__ || __|__/|__||\/| Information Retrieval, Semantic Web
    ___|||__|| \| || | Embedded Unix, System Integration
    http://www.sigram.com Contact: info at sigram dot com
  • Michael Stack at Oct 19, 2007 at 4:57 pm

    Andrzej Bialecki wrote:
    If I'm not mistaken, there is no way right now to use HBase in a
    "local" mode similar to the Hadoop "local" mode, where we don't have
    to start any daemons and all necessary infrastructure runs inside a
    single JVM. What would it take to implement such mode? Would it
    require big changes to the codebase?
    Checkout MinHBaseCluster. Its used in the bulk of the hbase unit
    tests. It runs a master and a configurable amount of region servers
    each to its own thread. Its modeled on MiniDFSCluster. If the default
    mode -- i.e. if the hbase.master was set to 'local' in hbase-default.xml
    -- was to run a MiniHBaseCluster instance, would this suffice Andrzej?
    Or do you need master and regionservers talking to each via direct
    in-process method invocations rather than over sockets as is done in
    "local" mapreduce?

    Thanks,
    St.Ack
  • Andrzej Bialecki at Oct 19, 2007 at 6:07 pm

    Michael Stack wrote:
    Andrzej Bialecki wrote:
    If I'm not mistaken, there is no way right now to use HBase in a
    "local" mode similar to the Hadoop "local" mode, where we don't have
    to start any daemons and all necessary infrastructure runs inside a
    single JVM. What would it take to implement such mode? Would it
    require big changes to the codebase?
    Checkout MinHBaseCluster. Its used in the bulk of the hbase unit
    tests. It runs a master and a configurable amount of region servers
    each to its own thread. Its modeled on MiniDFSCluster.
    That's excellent news - I just looked at the code, I think it would
    require only minimal tweaks to use it together with other Hadoop
    services running in "local" mode - e.g. it would be more convenient to
    have the MiniHBaseCluster (or its modified version, let's call it
    LocalHBaseCluster) handle the startup / shutdown itself, so that the
    user applications could assume that all necessary services are already
    running. I'm also going to check what is the startup time of
    MiniHBaseCluster.
    If the default
    mode -- i.e. if the hbase.master was set to 'local' in hbase-default.xml
    -- was to run a MiniHBaseCluster instance, would this suffice Andrzej?
    Or do you need master and regionservers talking to each via direct
    in-process method invocations rather than over sockets as is done in
    "local" mapreduce?
    Direct in-process pseudo-protocol would be probably more efficient and
    it would reduce the number of sockets in use, but we could implement it
    as a future enhancement if needed. For now I'm happy with them using
    sockets.

    --
    Best regards,
    Andrzej Bialecki <><
    ___. ___ ___ ___ _ _ __________________________________
    [__ || __|__/|__||\/| Information Retrieval, Semantic Web
    ___|||__|| \| || | Embedded Unix, System Integration
    http://www.sigram.com Contact: info at sigram dot com
  • Michael Stack at Oct 20, 2007 at 12:22 am

    Andrzej Bialecki wrote:
    That's excellent news - I just looked at the code, I think it would
    require only minimal tweaks to use it together with other Hadoop
    services running in "local" mode - e.g. it would be more convenient to
    have the MiniHBaseCluster (or its modified version, let's call it
    LocalHBaseCluster) handle the startup / shutdown itself, so that the
    user applications could assume that all necessary services are already
    running. I'm also going to check what is the startup time of
    MiniHBaseCluster.
    MiniHBaseCluster already has a shutdown method. Construction of a
    MiniHBaseCluster starts up the cluster instance. Would you like a
    different startup mechanism Andrzej? If so, what would you suggest?

    Otherwise, I"m thinking we'd move MiniHBaseCluster from src/test to
    src/java so it makes it into the hadoop-*-hbase.jar renaming it
    LocalHBaseCluster to follow the LocalJobRunner precedent. We should
    also make the hbase.master value default "local" rather than
    "0.0.0.0:60000" and start a LocalHBaseCluster . How does that sound?

    See the content of src/test/hbase-default.xml if you want to tweak
    things to make startup faster. The properties therein are the list to
    tune if you want to make startup and shutdown snappier.
    If the default mode -- i.e. if the hbase.master was set to 'local' in
    hbase-default.xml -- was to run a MiniHBaseCluster instance, would
    this suffice Andrzej? Or do you need master and regionservers
    talking to each via direct in-process method invocations rather than
    over sockets as is done in "local" mapreduce?
    Direct in-process pseudo-protocol would be probably more efficient and
    it would reduce the number of sockets in use, but we could implement
    it as a future enhancement if needed. For now I'm happy with them
    using sockets.
    Agreed. I do not think it would be hard having the master invoke region
    server methods directly (and vice versa) rather than go via the RPC but
    that we can be done later.

    St.Ack
  • Dennis Kubes at Oct 20, 2007 at 12:57 am

    Michael Stack wrote:
    Andrzej Bialecki wrote:
    That's excellent news - I just looked at the code, I think it would
    require only minimal tweaks to use it together with other Hadoop
    services running in "local" mode - e.g. it would be more convenient
    to have the MiniHBaseCluster (or its modified version, let's call it
    LocalHBaseCluster) handle the startup / shutdown itself, so that the
    user applications could assume that all necessary services are
    already running. I'm also going to check what is the startup time of
    MiniHBaseCluster.
    MiniHBaseCluster already has a shutdown method. Construction of a
    MiniHBaseCluster starts up the cluster instance. Would you like a
    different startup mechanism Andrzej? If so, what would you suggest?

    Otherwise, I"m thinking we'd move MiniHBaseCluster from src/test to
    src/java so it makes it into the hadoop-*-hbase.jar renaming it
    LocalHBaseCluster to follow the LocalJobRunner precedent. We should
    also make the hbase.master value default "local" rather than
    "0.0.0.0:60000" and start a LocalHBaseCluster . How does that sound?

    See the content of src/test/hbase-default.xml if you want to tweak
    things to make startup faster. The properties therein are the list to
    tune if you want to make startup and shutdown snappier.
    That sounds great to me.
    If the default mode -- i.e. if the hbase.master was set to 'local'
    in hbase-default.xml -- was to run a MiniHBaseCluster instance,
    would this suffice Andrzej? Or do you need master and regionservers
    talking to each via direct in-process method invocations rather than
    over sockets as is done in "local" mapreduce?
    Direct in-process pseudo-protocol would be probably more efficient
    and it would reduce the number of sockets in use, but we could
    implement it as a future enhancement if needed. For now I'm happy
    with them using sockets.
    Agreed. I do not think it would be hard having the master invoke
    region server methods directly (and vice versa) rather than go via the
    RPC but that we can be done later.

    St.Ack
    I think the RPC method is fine for right now. Good stuff.

    Dennis Kubes
  • Andrzej Bialecki at Oct 22, 2007 at 6:46 pm

    Michael Stack wrote:
    Andrzej Bialecki wrote:
    That's excellent news - I just looked at the code, I think it would
    require only minimal tweaks to use it together with other Hadoop
    services running in "local" mode - e.g. it would be more convenient to
    have the MiniHBaseCluster (or its modified version, let's call it
    LocalHBaseCluster) handle the startup / shutdown itself, so that the
    user applications could assume that all necessary services are already
    running. I'm also going to check what is the startup time of
    MiniHBaseCluster.
    MiniHBaseCluster already has a shutdown method. Construction of a
    MiniHBaseCluster starts up the cluster instance. Would you like a
    different startup mechanism Andrzej? If so, what would you suggest?

    Otherwise, I"m thinking we'd move MiniHBaseCluster from src/test to
    src/java so it makes it into the hadoop-*-hbase.jar renaming it
    LocalHBaseCluster to follow the LocalJobRunner precedent. We should
    also make the hbase.master value default "local" rather than
    "0.0.0.0:60000" and start a LocalHBaseCluster . How does that sound?
    Sounds great, just what we need.

    --
    Best regards,
    Andrzej Bialecki <><
    ___. ___ ___ ___ _ _ __________________________________
    [__ || __|__/|__||\/| Information Retrieval, Semantic Web
    ___|||__|| \| || | Embedded Unix, System Integration
    http://www.sigram.com Contact: info at sigram dot com
  • Michael Stack at Oct 23, 2007 at 8:45 pm

    Andrzej Bialecki wrote:
    Michael Stack wrote:
    ...
    Otherwise, I"m thinking we'd move MiniHBaseCluster from src/test to
    src/java so it makes it into the hadoop-*-hbase.jar renaming it
    LocalHBaseCluster to follow the LocalJobRunner precedent. We should
    also make the hbase.master value default "local" rather than
    "0.0.0.0:60000" and start a LocalHBaseCluster . How does that sound?
    Sounds great, just what we need.
    Hudson for some reason has stopped running hadoop patch builds. We're
    trying to figure why. Unfortunately, his obstinacy is in the way of my
    committing the above (2084) pseudo 'local' support and the
    hbase-running-out-of-build-dir patch for Dennis (2088). Hopefully,
    he'll soon come around.

    St.Ack

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedOct 18, '07 at 6:16p
activeOct 23, '07 at 8:45p
posts11
users3
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase