Grokbase Groups Hive user May 2011
FAQ
Hello,

I have one instance of HIVE JDBC server running on port 10000. Can I run another
instance on different port ? Would it cause a concurrency issue on the
underlying data warehouse files ? Please clarify.

Thanks,
V.Senthil Kumar

Search Discussions

  • Matthew Rathbone at May 3, 2011 at 2:47 pm
    Why would you want to run two? I think it is multithreaded, so you can query it from two different connections

    --
    Matthew Rathbone
    Foursquare | Software Engineer | Server Engineering Team
    matthew@foursquare.com | @rathboma | 4sq

    On Monday, May 2, 2011 at 6:41 PM, V.Senthil Kumar wrote:
    Hello,
    I have one instance of HIVE JDBC server running on port 10000. Can I run another
    instance on different port ? Would it cause a concurrency issue on the
    underlying data warehouse files ? Please clarify.

    Thanks,
    V.Senthil Kumar
  • V.Senthil Kumar at May 3, 2011 at 5:41 pm
    Thanks Matthew. The wiki page http://wiki.apache.org/hadoop/Hive/HiveServer says
    its single threaded. I have a queue of queries which gets added dynamically all
    the time. By the time I run 1 query using 1 JDBC connection, the queue gets
    added more queries and builds up a backlog. So, I was that's why I was wondering
    whether I can run two or more instances to avoid having a big backlog in queue.



    ----- Original Message ----
    From: Matthew Rathbone <matthew@foursquare.com>
    To: user@hive.apache.org
    Sent: Tue, May 3, 2011 7:46:49 AM
    Subject: Re: HIVE Server multiple instances

    Why would you want to run two? I think it is multithreaded, so you can query it
    from two different connections

    --
    Matthew Rathbone
    Foursquare | Software Engineer | Server Engineering Team
    matthew@foursquare.com | @rathboma | 4sq

    On Monday, May 2, 2011 at 6:41 PM, V.Senthil Kumar wrote:
    Hello,
    I have one instance of HIVE JDBC server running on port 10000. Can I run
    another

    instance on different port ? Would it cause a concurrency issue on the
    underlying data warehouse files ? Please clarify.

    Thanks,
    V.Senthil Kumar
  • Matthew Rathbone at May 3, 2011 at 6:00 pm
    Even if it is single threaded it certainly seems to support multiple connections.

    We run 5 workers all connected at the same time executing a different query each ( with a different connection per worker).

    Hope that helps

    Matthew
    On Tuesday, May 3, 2011 at 1:40 PM, V.Senthil Kumar wrote:
    Thanks Matthew. The wiki page http://wiki.apache.org/hadoop/Hive/HiveServer says
    its single threaded. I have a queue of queries which gets added dynamically all
    the time. By the time I run 1 query using 1 JDBC connection, the queue gets
    added more queries and builds up a backlog. So, I was that's why I was wondering
    whether I can run two or more instances to avoid having a big backlog in queue.



    ----- Original Message ----
    From: Matthew Rathbone <matthew@foursquare.com>
    To: user@hive.apache.org
    Sent: Tue, May 3, 2011 7:46:49 AM
    Subject: Re: HIVE Server multiple instances

    Why would you want to run two? I think it is multithreaded, so you can query it
    from two different connections

    --
    Matthew Rathbone
    Foursquare | Software Engineer | Server Engineering Team
    matthew@foursquare.com | @rathboma | 4sq

    On Monday, May 2, 2011 at 6:41 PM, V.Senthil Kumar wrote:
    Hello,
    I have one instance of HIVE JDBC server running on port 10000. Can I run
    another

    instance on different port ? Would it cause a concurrency issue on the
    underlying data warehouse files ? Please clarify.

    Thanks,
    V.Senthil Kumar
  • V.Senthil Kumar at May 3, 2011 at 6:02 pm
    Thanks. That really helps and answers my question.




    ----- Original Message ----
    From: Matthew Rathbone <matthew@foursquare.com>
    To: user@hive.apache.org
    Sent: Tue, May 3, 2011 10:59:37 AM
    Subject: Re: HIVE Server multiple instances

    Even if it is single threaded it certainly seems to support multiple
    connections.


    We run 5 workers all connected at the same time executing a different query each
    ( with a different connection per worker).

    Hope that helps

    Matthew
    On Tuesday, May 3, 2011 at 1:40 PM, V.Senthil Kumar wrote:
    Thanks Matthew. The wiki page http://wiki.apache.org/hadoop/Hive/HiveServer says
    its single threaded. I have a queue of queries which gets added dynamically all

    the time. By the time I run 1 query using 1 JDBC connection, the queue gets
    added more queries and builds up a backlog. So, I was that's why I was
    wondering

    whether I can run two or more instances to avoid having a big backlog in queue.


    ----- Original Message ----
    From: Matthew Rathbone <matthew@foursquare.com>
    To: user@hive.apache.org
    Sent: Tue, May 3, 2011 7:46:49 AM
    Subject: Re: HIVE Server multiple instances

    Why would you want to run two? I think it is multithreaded, so you can query it

    from two different connections

    --
    Matthew Rathbone
    Foursquare | Software Engineer | Server Engineering Team
    matthew@foursquare.com | @rathboma | 4sq

    On Monday, May 2, 2011 at 6:41 PM, V.Senthil Kumar wrote:
    Hello,
    I have one instance of HIVE JDBC server running on port 10000. Can I run
    another

    instance on different port ? Would it cause a concurrency issue on the
    underlying data warehouse files ? Please clarify.

    Thanks,
    V.Senthil Kumar
  • Paul Ingles at May 3, 2011 at 6:16 pm
    HiveServer does seem to support multiple connections but I think it still has thread-safety problems (https://issues.apache.org/jira/browse/HIVE-80).

    We've (www.forward.co.uk) certainly had instability problems with the thrift server in the past and now run 5 or so instances behind the HAProxy load-balancer (http://haproxy.1wt.eu/). Since we did that it's been significantly better.

    I think the JDBC server still operates using thrift to connect to the HiveServer so I would expect it to have similar problems (but I may have got that wrong :)

    On 3 May 2011, at 18:59, Matthew Rathbone wrote:

    Even if it is single threaded it certainly seems to support multiple connections.

    We run 5 workers all connected at the same time executing a different query each ( with a different connection per worker).

    Hope that helps

    Matthew
    On Tuesday, May 3, 2011 at 1:40 PM, V.Senthil Kumar wrote:
    Thanks Matthew. The wiki page http://wiki.apache.org/hadoop/Hive/HiveServer says
    its single threaded. I have a queue of queries which gets added dynamically all
    the time. By the time I run 1 query using 1 JDBC connection, the queue gets
    added more queries and builds up a backlog. So, I was that's why I was wondering
    whether I can run two or more instances to avoid having a big backlog in queue.



    ----- Original Message ----
    From: Matthew Rathbone <matthew@foursquare.com>
    To: user@hive.apache.org
    Sent: Tue, May 3, 2011 7:46:49 AM
    Subject: Re: HIVE Server multiple instances

    Why would you want to run two? I think it is multithreaded, so you can query it
    from two different connections

    --
    Matthew Rathbone
    Foursquare | Software Engineer | Server Engineering Team
    matthew@foursquare.com | @rathboma | 4sq

    On Monday, May 2, 2011 at 6:41 PM, V.Senthil Kumar wrote:
    Hello,
    I have one instance of HIVE JDBC server running on port 10000. Can I run
    another

    instance on different port ? Would it cause a concurrency issue on the
    underlying data warehouse files ? Please clarify.

    Thanks,
    V.Senthil Kumar
  • Matthew Rathbone at May 3, 2011 at 6:18 pm
    Hey Paul,

    I'd be very interested in reading about your hadoop/hive setup, do you have a blog post or anything describing this setup, or some of the issues you've have with hive?

    --
    Matthew Rathbone
    Foursquare | Software Engineer | Server Engineering Team
    matthew@foursquare.com | @rathboma | 4sq

    On Tuesday, May 3, 2011 at 2:15 PM, Paul Ingles wrote:
    HiveServer does seem to support multiple connections but I think it still has thread-safety problems (https://issues.apache.org/jira/browse/HIVE-80).
    We've (www.forward.co.uk) certainly had instability problems with the thrift server in the past and now run 5 or so instances behind the HAProxy load-balancer (http://haproxy.1wt.eu/). Since we did that it's been significantly better.

    I think the JDBC server still operates using thrift to connect to the HiveServer so I would expect it to have similar problems (but I may have got that wrong :)

    On 3 May 2011, at 18:59, Matthew Rathbone wrote:

    Even if it is single threaded it certainly seems to support multiple connections.

    We run 5 workers all connected at the same time executing a different query each ( with a different connection per worker).

    Hope that helps

    Matthew
    On Tuesday, May 3, 2011 at 1:40 PM, V.Senthil Kumar wrote:
    Thanks Matthew. The wiki page http://wiki.apache.org/hadoop/Hive/HiveServer says
    its single threaded. I have a queue of queries which gets added dynamically all
    the time. By the time I run 1 query using 1 JDBC connection, the queue gets
    added more queries and builds up a backlog. So, I was that's why I was wondering
    whether I can run two or more instances to avoid having a big backlog in queue.



    ----- Original Message ----
    From: Matthew Rathbone <matthew@foursquare.com>
    To: user@hive.apache.org
    Sent: Tue, May 3, 2011 7:46:49 AM
    Subject: Re: HIVE Server multiple instances

    Why would you want to run two? I think it is multithreaded, so you can query it
    from two different connections

    --
    Matthew Rathbone
    Foursquare | Software Engineer | Server Engineering Team
    matthew@foursquare.com | @rathboma | 4sq

    On Monday, May 2, 2011 at 6:41 PM, V.Senthil Kumar wrote:
    Hello,
    I have one instance of HIVE JDBC server running on port 10000. Can I run
    another

    instance on different port ? Would it cause a concurrency issue on the
    underlying data warehouse files ? Please clarify.

    Thanks,
    V.Senthil Kumar
  • V.Senthil Kumar at May 3, 2011 at 6:23 pm
    Thanks Paul. That is really useful information.



    ----- Original Message ----
    From: Matthew Rathbone <matthew@foursquare.com>
    To: user@hive.apache.org
    Sent: Tue, May 3, 2011 11:18:17 AM
    Subject: Re: HIVE Server multiple instances

    Hey Paul,

    I'd be very interested in reading about your hadoop/hive setup, do you have a
    blog post or anything describing this setup, or some of the issues you've have
    with hive?

    --
    Matthew Rathbone
    Foursquare | Software Engineer | Server Engineering Team
    matthew@foursquare.com | @rathboma | 4sq

    On Tuesday, May 3, 2011 at 2:15 PM, Paul Ingles wrote:
    HiveServer does seem to support multiple connections but I think it still has
    thread-safety problems (https://issues.apache.org/jira/browse/HIVE-80).
    We've (www.forward.co.uk) certainly had instability problems with the thrift
    server in the past and now run 5 or so instances behind the HAProxy
    load-balancer (http://haproxy.1wt.eu/). Since we did that it's been
    significantly better.


    I think the JDBC server still operates using thrift to connect to the
    HiveServer so I would expect it to have similar problems (but I may have got
    that wrong :)

    On 3 May 2011, at 18:59, Matthew Rathbone wrote:

    Even if it is single threaded it certainly seems to support multiple
    connections.
    We run 5 workers all connected at the same time executing a different query
    each ( with a different connection per worker).
    Hope that helps

    Matthew
    On Tuesday, May 3, 2011 at 1:40 PM, V.Senthil Kumar wrote:
    Thanks Matthew. The wiki page http://wiki.apache.org/hadoop/Hive/HiveServer
    says
    its single threaded. I have a queue of queries which gets added dynamically
    all
    the time. By the time I run 1 query using 1 JDBC connection, the queue gets
    added more queries and builds up a backlog. So, I was that's why I was
    wondering
    whether I can run two or more instances to avoid having a big backlog in
    queue.


    ----- Original Message ----
    From: Matthew Rathbone <matthew@foursquare.com>
    To: user@hive.apache.org
    Sent: Tue, May 3, 2011 7:46:49 AM
    Subject: Re: HIVE Server multiple instances

    Why would you want to run two? I think it is multithreaded, so you can
    query it
    from two different connections

    --
    Matthew Rathbone
    Foursquare | Software Engineer | Server Engineering Team
    matthew@foursquare.com | @rathboma | 4sq

    On Monday, May 2, 2011 at 6:41 PM, V.Senthil Kumar wrote:
    Hello,
    I have one instance of HIVE JDBC server running on port 10000. Can I run
    another

    instance on different port ? Would it cause a concurrency issue on the
    underlying data warehouse files ? Please clarify.

    Thanks,
    V.Senthil Kumar
  • Paul Ingles at May 3, 2011 at 7:02 pm
    Nothing specifically about our Hive setup although some of us at Forward have blogged bits and pieces about Hive + Hadoop and have a few Hadoop/Hive related libs on our GitHub account: https://github.com/forward.

    I've blogged a few bits (http://www.oobaloo.co.uk/) as has one of my colleagues (http://blog.fingertap.org/post/1255463384/hive-thrift-client).

    Another colleague also presented a little about our setup during a Hadoop meetup last summer (http://skillsmatter.com/podcast/home/hadoop-in-context-1591). The numbers Andy mentioned will be a little out of date but it does include some screenshots of a few of the surrounding apps we built that connect to Hive and Hadoop (including a web based Hive query tool + work queue).

    I had a quick search through the mailing lists when we had connection problems but I think most of it was discussed/resolved during a chat I had with Shevek from Karmasphere at a London pub following a Hadoop meetup :)

    If you're interested, I've posted a gist (https://gist.github.com/953926) that contains our HAProxy config; clients connect to 10000 and are balanced between :10001 and :10005 on 2 servers (so actually 10 backend servers).

    Be happy to talk more about our experience- feel free to ping me an email off list if you'd like.

    On 3 May 2011, at 19:18, Matthew Rathbone wrote:

    Hey Paul,

    I'd be very interested in reading about your hadoop/hive setup, do you have a blog post or anything describing this setup, or some of the issues you've have with hive?

    --
    Matthew Rathbone
    Foursquare | Software Engineer | Server Engineering Team
    matthew@foursquare.com | @rathboma | 4sq

    On Tuesday, May 3, 2011 at 2:15 PM, Paul Ingles wrote:
    HiveServer does seem to support multiple connections but I think it still has thread-safety problems (https://issues.apache.org/jira/browse/HIVE-80).
    We've (www.forward.co.uk) certainly had instability problems with the thrift server in the past and now run 5 or so instances behind the HAProxy load-balancer (http://haproxy.1wt.eu/). Since we did that it's been significantly better.

    I think the JDBC server still operates using thrift to connect to the HiveServer so I would expect it to have similar problems (but I may have got that wrong :)

    On 3 May 2011, at 18:59, Matthew Rathbone wrote:

    Even if it is single threaded it certainly seems to support multiple connections.

    We run 5 workers all connected at the same time executing a different query each ( with a different connection per worker).

    Hope that helps

    Matthew
    On Tuesday, May 3, 2011 at 1:40 PM, V.Senthil Kumar wrote:
    Thanks Matthew. The wiki page http://wiki.apache.org/hadoop/Hive/HiveServer says
    its single threaded. I have a queue of queries which gets added dynamically all
    the time. By the time I run 1 query using 1 JDBC connection, the queue gets
    added more queries and builds up a backlog. So, I was that's why I was wondering
    whether I can run two or more instances to avoid having a big backlog in queue.



    ----- Original Message ----
    From: Matthew Rathbone <matthew@foursquare.com>
    To: user@hive.apache.org
    Sent: Tue, May 3, 2011 7:46:49 AM
    Subject: Re: HIVE Server multiple instances

    Why would you want to run two? I think it is multithreaded, so you can query it
    from two different connections

    --
    Matthew Rathbone
    Foursquare | Software Engineer | Server Engineering Team
    matthew@foursquare.com | @rathboma | 4sq

    On Monday, May 2, 2011 at 6:41 PM, V.Senthil Kumar wrote:
    Hello,
    I have one instance of HIVE JDBC server running on port 10000. Can I run
    another

    instance on different port ? Would it cause a concurrency issue on the
    underlying data warehouse files ? Please clarify.

    Thanks,
    V.Senthil Kumar
  • Paul Ingles at May 4, 2011 at 11:48 am
    For future reference I've posted a little more about our setup here:
    http://oobaloo.co.uk/multiple-connections-with-hive

    On Tue, May 3, 2011 at 8:01 PM, Paul Ingles wrote:

    Nothing specifically about our Hive setup although some of us at Forward
    have blogged bits and pieces about Hive + Hadoop and have a few Hadoop/Hive
    related libs on our GitHub account: https://github.com/forward.

    I've blogged a few bits (http://www.oobaloo.co.uk/) as has one of my
    colleagues (http://blog.fingertap.org/post/1255463384/hive-thrift-client).

    Another colleague also presented a little about our setup during a Hadoop
    meetup last summer (
    http://skillsmatter.com/podcast/home/hadoop-in-context-1591). The numbers
    Andy mentioned will be a little out of date but it does include some
    screenshots of a few of the surrounding apps we built that connect to Hive
    and Hadoop (including a web based Hive query tool + work queue).

    I had a quick search through the mailing lists when we had connection
    problems but I think most of it was discussed/resolved during a chat I had
    with Shevek from Karmasphere at a London pub following a Hadoop meetup :)

    If you're interested, I've posted a gist (https://gist.github.com/953926)
    that contains our HAProxy config; clients connect to 10000 and are balanced
    between :10001 and :10005 on 2 servers (so actually 10 backend servers).

    Be happy to talk more about our experience- feel free to ping me an email
    off list if you'd like.

    On 3 May 2011, at 19:18, Matthew Rathbone wrote:

    Hey Paul,

    I'd be very interested in reading about your hadoop/hive setup, do you
    have a blog post or anything describing this setup, or some of the issues
    you've have with hive?
    --
    Matthew Rathbone
    Foursquare | Software Engineer | Server Engineering Team
    matthew@foursquare.com | @rathboma | 4sq

    On Tuesday, May 3, 2011 at 2:15 PM, Paul Ingles wrote:
    HiveServer does seem to support multiple connections but I think it still
    has thread-safety problems (https://issues.apache.org/jira/browse/HIVE-80
    ).
    We've (www.forward.co.uk) certainly had instability problems with the
    thrift server in the past and now run 5 or so instances behind the HAProxy
    load-balancer (http://haproxy.1wt.eu/). Since we did that it's been
    significantly better.
    I think the JDBC server still operates using thrift to connect to the
    HiveServer so I would expect it to have similar problems (but I may have got
    that wrong :)
    On 3 May 2011, at 18:59, Matthew Rathbone wrote:

    Even if it is single threaded it certainly seems to support multiple
    connections.
    We run 5 workers all connected at the same time executing a different
    query each ( with a different connection per worker).
    Hope that helps

    Matthew
    On Tuesday, May 3, 2011 at 1:40 PM, V.Senthil Kumar wrote:
    Thanks Matthew. The wiki page
    http://wiki.apache.org/hadoop/Hive/HiveServer says
    its single threaded. I have a queue of queries which gets added
    dynamically all
    the time. By the time I run 1 query using 1 JDBC connection, the queue
    gets
    added more queries and builds up a backlog. So, I was that's why I was
    wondering
    whether I can run two or more instances to avoid having a big backlog
    in queue.


    ----- Original Message ----
    From: Matthew Rathbone <matthew@foursquare.com>
    To: user@hive.apache.org
    Sent: Tue, May 3, 2011 7:46:49 AM
    Subject: Re: HIVE Server multiple instances

    Why would you want to run two? I think it is multithreaded, so you can
    query it
    from two different connections

    --
    Matthew Rathbone
    Foursquare | Software Engineer | Server Engineering Team
    matthew@foursquare.com | @rathboma | 4sq

    On Monday, May 2, 2011 at 6:41 PM, V.Senthil Kumar wrote:
    Hello,
    I have one instance of HIVE JDBC server running on port 10000. Can I
    run
    another

    instance on different port ? Would it cause a concurrency issue on
    the
    underlying data warehouse files ? Please clarify.

    Thanks,
    V.Senthil Kumar
  • Marcos Ortiz at May 4, 2011 at 1:11 pm

    El 5/4/2011 7:48 AM, Paul Ingles escribió:
    For future reference I've posted a little more about our setup here:
    http://oobaloo.co.uk/multiple-connections-with-hive


    On Tue, May 3, 2011 at 8:01 PM, Paul Ingles wrote:

    Nothing specifically about our Hive setup although some of us at
    Forward have blogged bits and pieces about Hive + Hadoop and have
    a few Hadoop/Hive related libs on our GitHub account:
    https://github.com/forward.

    I've blogged a few bits (http://www.oobaloo.co.uk/) as has one of
    my colleagues
    (http://blog.fingertap.org/post/1255463384/hive-thrift-client).

    Another colleague also presented a little about our setup during a
    Hadoop meetup last summer
    (http://skillsmatter.com/podcast/home/hadoop-in-context-1591). The
    numbers Andy mentioned will be a little out of date but it does
    include some screenshots of a few of the surrounding apps we built
    that connect to Hive and Hadoop (including a web based Hive query
    tool + work queue).

    I had a quick search through the mailing lists when we had
    connection problems but I think most of it was discussed/resolved
    during a chat I had with Shevek from Karmasphere at a London pub
    following a Hadoop meetup :)

    If you're interested, I've posted a gist
    (https://gist.github.com/953926) that contains our HAProxy config;
    clients connect to 10000 and are balanced between :10001 and
    :10005 on 2 servers (so actually 10 backend servers).

    Be happy to talk more about our experience- feel free to ping me
    an email off list if you'd like.

    On 3 May 2011, at 19:18, Matthew Rathbone wrote:

    Hey Paul,

    I'd be very interested in reading about your hadoop/hive setup,
    do you have a blog post or anything describing this setup, or some
    of the issues you've have with hive?
    --
    Matthew Rathbone
    Foursquare | Software Engineer | Server Engineering Team
    matthew@foursquare.com |
    @rathboma | 4sq
    On Tuesday, May 3, 2011 at 2:15 PM, Paul Ingles wrote:
    HiveServer does seem to support multiple connections but I think
    it still has thread-safety problems
    (https://issues.apache.org/jira/browse/HIVE-80).
    We've (www.forward.co.uk <http://www.forward.co.uk>) certainly
    had instability problems with the thrift server in the past and
    now run 5 or so instances behind the HAProxy load-balancer
    (http://haproxy.1wt.eu/). Since we did that it's been
    significantly better.
    I think the JDBC server still operates using thrift to connect
    to the HiveServer so I would expect it to have similar problems
    (but I may have got that wrong :)
    On 3 May 2011, at 18:59, Matthew Rathbone wrote:

    Even if it is single threaded it certainly seems to support
    multiple connections.
    We run 5 workers all connected at the same time executing a
    different query each ( with a different connection per worker).
    Hope that helps

    Matthew
    On Tuesday, May 3, 2011 at 1:40 PM, V.Senthil Kumar wrote:
    Thanks Matthew. The wiki page
    http://wiki.apache.org/hadoop/Hive/HiveServer says
    its single threaded. I have a queue of queries which gets
    added dynamically all
    the time. By the time I run 1 query using 1 JDBC connection,
    the queue gets
    added more queries and builds up a backlog. So, I was that's
    why I was wondering
    whether I can run two or more instances to avoid having a big
    backlog in queue.


    ----- Original Message ----
    From: Matthew Rathbone <matthew@foursquare.com
    To: user@hive.apache.org >>>> Sent: Tue, May 3, 2011 7:46:49 AM
    Subject: Re: HIVE Server multiple instances

    Why would you want to run two? I think it is multithreaded,
    so you can query it
    from two different connections

    --
    Matthew Rathbone
    Foursquare | Software Engineer | Server Engineering Team
    matthew@foursquare.com |
    @rathboma | 4sq
    On Monday, May 2, 2011 at 6:41 PM, V.Senthil Kumar wrote:
    Hello,
    I have one instance of HIVE JDBC server running on port
    10000. Can I run
    another

    instance on different port ? Would it cause a concurrency
    issue on the
    underlying data warehouse files ? Please clarify.

    Thanks,
    V.Senthil Kumar
    Wow, good piece of information.
    Thanks for share it

    --
    Marcos Luís Ortíz Valmaseda
    Software Engineer (Large-Scaled Distributed Systems)
    University of Information Sciences,
    La Habana, Cuba
    Linux User # 418229
    http://about.me/marcosortiz
  • V.Senthil Kumar at May 4, 2011 at 6:06 pm
    This is great info. Thanks a lot for sharing :)




    ________________________________
    From: Paul Ingles <paul@oobaloo.co.uk>
    To: user@hive.apache.org
    Sent: Wed, May 4, 2011 4:48:20 AM
    Subject: Re: HIVE Server multiple instances


    For future reference I've posted a little more about our setup
    here: http://oobaloo.co.uk/multiple-connections-with-hive


    On Tue, May 3, 2011 at 8:01 PM, Paul Ingles wrote:

    Nothing specifically about our Hive setup although some of us at Forward have
    blogged bits and pieces about Hive + Hadoop and have a few Hadoop/Hive related
    libs on our GitHub account: https://github.com/forward.
    I've blogged a few bits (http://www.oobaloo.co.uk/) as has one of my colleagues
    (http://blog.fingertap.org/post/1255463384/hive-thrift-client).

    Another colleague also presented a little about our setup during a Hadoop meetup
    last summer (http://skillsmatter.com/podcast/home/hadoop-in-context-1591). The
    numbers Andy mentioned will be a little out of date but it does include some
    screenshots of a few of the surrounding apps we built that connect to Hive and
    Hadoop (including a web based Hive query tool + work queue).

    I had a quick search through the mailing lists when we had connection problems
    but I think most of it was discussed/resolved during a chat I had with Shevek
    from Karmasphere at a London pub following a Hadoop meetup :)

    If you're interested, I've posted a gist (https://gist.github.com/953926) that
    contains our HAProxy config; clients connect to 10000 and are balanced between
    :10001 and :10005 on 2 servers (so actually 10 backend servers).

    Be happy to talk more about our experience- feel free to ping me an email off
    list if you'd like.


    On 3 May 2011, at 19:18, Matthew Rathbone wrote:

    Hey Paul,

    I'd be very interested in reading about your hadoop/hive setup, do you have a
    blog post or anything describing this setup, or some of the issues you've have
    with hive?

    --
    Matthew Rathbone
    Foursquare | Software Engineer | Server Engineering Team
    matthew@foursquare.com | @rathboma | 4sq

    On Tuesday, May 3, 2011 at 2:15 PM, Paul Ingles wrote:
    HiveServer does seem to support multiple connections but I think it still has
    thread-safety problems (https://issues.apache.org/jira/browse/HIVE-80).
    We've (www.forward.co.uk) certainly had instability problems with the thrift
    server in the past and now run 5 or so instances behind the HAProxy
    load-balancer (http://haproxy.1wt.eu/). Since we did that it's been
    significantly better.

    I think the JDBC server still operates using thrift to connect to the
    HiveServer so I would expect it to have similar problems (but I may have got
    that wrong :)

    On 3 May 2011, at 18:59, Matthew Rathbone wrote:

    Even if it is single threaded it certainly seems to support multiple
    connections.

    We run 5 workers all connected at the same time executing a different query
    each ( with a different connection per worker).

    Hope that helps

    Matthew
    On Tuesday, May 3, 2011 at 1:40 PM, V.Senthil Kumar wrote:
    Thanks Matthew. The wiki page http://wiki.apache.org/hadoop/Hive/HiveServer
    says
    its single threaded. I have a queue of queries which gets added dynamically all
    the time. By the time I run 1 query using 1 JDBC connection, the queue
    gets
    added more queries and builds up a backlog. So, I was that's why I was
    wondering
    whether I can run two or more instances to avoid having a big backlog in queue.


    ----- Original Message ----
    From: Matthew Rathbone <matthew@foursquare.com>
    To: user@hive.apache.org
    Sent: Tue, May 3, 2011 7:46:49 AM
    Subject: Re: HIVE Server multiple instances

    Why would you want to run two? I think it is multithreaded, so you can query it
    from two different connections

    --
    Matthew Rathbone
    Foursquare | Software Engineer | Server Engineering Team
    matthew@foursquare.com | @rathboma | 4sq

    On Monday, May 2, 2011 at 6:41 PM, V.Senthil Kumar wrote:
    Hello,
    I have one instance of HIVE JDBC server running on port 10000. Can I run
    another

    instance on different port ? Would it cause a concurrency issue on the
    underlying data warehouse files ? Please clarify.

    Thanks,
    V.Senthil Kumar
  • Related Discussions

    Discussion Navigation
    viewthread | post
    Discussion Overview
    groupuser @
    categorieshive, hadoop
    postedMay 2, '11 at 10:42p
    activeMay 4, '11 at 6:06p
    posts12
    users4
    websitehive.apache.org

    People

    Translate

    site design / logo © 2022 Grokbase