FAQ
Barry,

I believe we have a JIRA tracking this issue:
https://issues.cloudera.org/browse/IMPALA-37
The problem is that the connection is being closed in the middle of a
fetch, leaving all the queries in-flight. Eventually, one of the backends
crashes, and you see the transport error. As a workaround, could you add a
loop to fetch all the data, and only then close the connection? This does
work for me locally, it would be nice to know if it solves your problem as
well.

while (results.next()) {
// do something
}

should do the trick.


Additionally, the warning indicates that you don't have short-circuit
reads turned on, you can find more information on the how-to here:
https://ccp.cloudera.com/display/IMPALA10BETADOC/Configuring+Impala+for+Performance
Which version of CDH are you using?

Thanks,

.. Ishaan



On Wed, Feb 27, 2013 at 12:38 PM, Barry Becker wrote:

We are having a problem where running the same impala query many times
through java eventually fails. Running it many times through impala-shell
does not fail. We suspect there may either be an issue with the hive jdbc
driver or improperly opening and closing connections.

Versions:
hive driver using hive-*-0.10.0-cdh4.3.0.jar and have tried others
impala : 0.5

In java, I call the following 50 times. On the 18th time it fails with the
below error

try {
connection.connect();
statement = connection.createStatement();
ResultSet results = statement.executeQuery(
“select * from pa_sales_fact, pa_product ”
+ “where pa_sales_fact.material_id =
pa_product.material_id ”
+ ” order by pa_sales_fact.material_id asc limit 10”
);
}
catch (SQLException e)
{
throw new SQLException(e);
}
finally
{
if (results != null) results.close();
if (statement != null) statement.close();
if (connection != null) connection.close();
}

Here is the error:
Exception in thread "main" java.sql.SQLException: java.sql.SQLException:
Couldn't open transport for 10.XXX.XXX.XX:22000(connect() failed:
Connection refused)
at
com.pros.cricket.performance.impala.PerformanceRunner.recordTimeForQuery(PerformanceRunner.java:98)
at
com.pros.cricket.performance.impala.PerformanceRunner.collectPerformanceResults(PerformanceRunner.java:70)
at
com.pros.cricket.performance.impala.PerformanceRunner.main(PerformanceRunner.java:134)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at
com.intellij.rt.execution.application.AppMain.main(AppMain.java:120)
Caused by: java.sql.SQLException: Couldn't open transport for
10.XXX.XXX.XX:22000(connect() failed: Connection refused)
at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:159)
at
org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:147)
at
org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:182)
at
org.apache.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:246)
at
org.apache.commons.dbcp.DelegatingStatement.executeQuery(DelegatingStatement.java:208)
at
org.apache.commons.dbcp.DelegatingStatement.executeQuery(DelegatingStatement.java:208)
at
com.pros.cricket.impala.ImpalaAccess.executeQuery(ImpalaAccess.java:59)
at
com.pros.cricket.performance.impala.PerformanceRunner.recordTimeForQuery(PerformanceRunner.java:94)
... 7 more

The on the failed node tmp/impalad.WARNING has
W0226 16:51:12.098316 26078 hdfs-scan-node.cc:171] Unknown disk id. This
will negatively affect performance. Check your hdfs settings to enable
block location metadata.

but I do not see any other useful warning or error messages.
We have tried using dbcp, but still hit this problem after some number of
queries.

Search Discussions

  • Barry Becker at Mar 19, 2013 at 8:50 pm
    I looked into this further and something does not seem right. I have a
    query that works against a table of 10 million rows and returns in 1/2
    second. the query results has only 10 records because it specifies limit
    10. The very first call to results.next() takes 22 seconds, and the
    remaining 9 calls to results.next() take a virtually no time. Why does the
    first call to results.next() take 22 seconds when the result set is so
    small? I realize that the result set requires an open connection to the
    server, but the 22 seconds seems completely unreasonable. Not sure if this
    is the same as https://issues.cloudera.org/browse/IMPALA-37.
    -Barry
    On Wednesday, February 27, 2013 3:56:49 PM UTC-6, Ishaan wrote:

    Barry,

    I believe we have a JIRA tracking this issue:
    https://issues.cloudera.org/browse/IMPALA-37
    The problem is that the connection is being closed in the middle of a
    fetch, leaving all the queries in-flight. Eventually, one of the backends
    crashes, and you see the transport error. As a workaround, could you add a
    loop to fetch all the data, and only then close the connection? This does
    work for me locally, it would be nice to know if it solves your problem as
    well.

    while (results.next()) {
    // do something
    }

    should do the trick.


    Additionally, the warning indicates that you don't have short-circuit
    reads turned on, you can find more information on the how-to here:
    https://ccp.cloudera.com/display/IMPALA10BETADOC/Configuring+Impala+for+Performance
    Which version of CDH are you using?

    Thanks,

    .. Ishaan




    On Wed, Feb 27, 2013 at 12:38 PM, Barry Becker <barryb...@gmail.com<javascript:>
    wrote:
    We are having a problem where running the same impala query many times
    through java eventually fails. Running it many times through impala-shell
    does not fail. We suspect there may either be an issue with the hive jdbc
    driver or improperly opening and closing connections.

    Versions:
    hive driver using hive-*-0.10.0-cdh4.3.0.jar and have tried others
    impala : 0.5

    In java, I call the following 50 times. On the 18th time it fails with
    the below error

    try {
    connection.connect();
    statement = connection.createStatement();
    ResultSet results = statement.executeQuery(
    “select * from pa_sales_fact, pa_product ”
    + “where pa_sales_fact.material_id =
    pa_product.material_id ”
    + ” order by pa_sales_fact.material_id asc limit
    10”
    );
    }
    catch (SQLException e)
    {
    throw new SQLException(e);
    }
    finally
    {
    if (results != null) results.close();
    if (statement != null) statement.close();
    if (connection != null) connection.close();
    }

    Here is the error:
    Exception in thread "main" java.sql.SQLException: java.sql.SQLException:
    Couldn't open transport for 10.XXX.XXX.XX:22000(connect() failed:
    Connection refused)
    at
    com.pros.cricket.performance.impala.PerformanceRunner.recordTimeForQuery(PerformanceRunner.java:98)
    at
    com.pros.cricket.performance.impala.PerformanceRunner.collectPerformanceResults(PerformanceRunner.java:70)
    at
    com.pros.cricket.performance.impala.PerformanceRunner.main(PerformanceRunner.java:134)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
    Method)
    at
    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at
    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:601)
    at
    com.intellij.rt.execution.application.AppMain.main(AppMain.java:120)
    Caused by: java.sql.SQLException: Couldn't open transport for
    10.XXX.XXX.XX:22000(connect() failed: Connection refused)
    at
    org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:159)
    at
    org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:147)
    at
    org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:182)
    at
    org.apache.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:246)
    at
    org.apache.commons.dbcp.DelegatingStatement.executeQuery(DelegatingStatement.java:208)
    at
    org.apache.commons.dbcp.DelegatingStatement.executeQuery(DelegatingStatement.java:208)
    at
    com.pros.cricket.impala.ImpalaAccess.executeQuery(ImpalaAccess.java:59)
    at
    com.pros.cricket.performance.impala.PerformanceRunner.recordTimeForQuery(PerformanceRunner.java:94)
    ... 7 more

    The on the failed node tmp/impalad.WARNING has
    W0226 16:51:12.098316 26078 hdfs-scan-node.cc:171] Unknown disk id. This
    will negatively affect performance. Check your hdfs settings to enable
    block location metadata.

    but I do not see any other useful warning or error messages.
    We have tried using dbcp, but still hit this problem after some number of
    queries.
  • Alan Choi at Mar 19, 2013 at 9:22 pm
    Hi Barry,

    The query has an order by. That means, the entire result set has to be
    fetched before it can return the first row.

    Thanks,
    Alan

    On Tue, Mar 19, 2013 at 1:50 PM, Barry Becker wrote:

    I looked into this further and something does not seem right. I have a
    query that works against a table of 10 million rows and returns in 1/2
    second. the query results has only 10 records because it specifies limit
    10. The very first call to results.next() takes 22 seconds, and the
    remaining 9 calls to results.next() take a virtually no time. Why does the
    first call to results.next() take 22 seconds when the result set is so
    small? I realize that the result set requires an open connection to the
    server, but the 22 seconds seems completely unreasonable. Not sure if this
    is the same as https://issues.**cloudera.org/browse/IMPALA-37<https://issues.cloudera.org/browse/IMPALA-37>
    .
    -Barry

    On Wednesday, February 27, 2013 3:56:49 PM UTC-6, Ishaan wrote:

    Barry,

    I believe we have a JIRA tracking this issue: https://issues.**
    cloudera.org/browse/IMPALA-37<https://issues.cloudera.org/browse/IMPALA-37>
    The problem is that the connection is being closed in the middle of a
    fetch, leaving all the queries in-flight. Eventually, one of the backends
    crashes, and you see the transport error. As a workaround, could you add a
    loop to fetch all the data, and only then close the connection? This does
    work for me locally, it would be nice to know if it solves your problem as
    well.

    while (results.next()) {
    // do something
    }

    should do the trick.


    Additionally, the warning indicates that you don't have short-circuit
    reads turned on, you can find more information on the how-to here:
    https://ccp.cloudera.**com/display/IMPALA10BETADOC/**
    Configuring+Impala+for+**Performance<https://ccp.cloudera.com/display/IMPALA10BETADOC/Configuring+Impala+for+Performance>
    Which version of CDH are you using?

    Thanks,

    .. Ishaan



    On Wed, Feb 27, 2013 at 12:38 PM, Barry Becker wrote:

    We are having a problem where running the same impala query many times
    through java eventually fails. Running it many times through impala-shell
    does not fail. We suspect there may either be an issue with the hive jdbc
    driver or improperly opening and closing connections.

    Versions:
    hive driver using hive-*-0.10.0-cdh4.3.0.jar and have tried others
    impala : 0.5

    In java, I call the following 50 times. On the 18th time it fails with
    the below error

    try {
    connection.connect();
    statement = connection.createStatement();
    ResultSet results = statement.executeQuery(
    “select * from pa_sales_fact, pa_product ”
    + “where pa_sales_fact.material_id =
    pa_product.material_id ”
    + ” order by pa_sales_fact.material_id asc limit
    10”
    );
    }
    catch (SQLException e)
    {
    throw new SQLException(e);
    }
    finally
    {
    if (results != null) results.close();
    if (statement != null) statement.close();
    if (connection != null) connection.close();
    }

    Here is the error:
    Exception in thread "main" java.sql.SQLException: java.sql.SQLException:
    Couldn't open transport for 10.XXX.XXX.XX:22000(connect() failed:
    Connection refused)
    at com.pros.cricket.performance.**
    impala.PerformanceRunner.**recordTimeForQuery(**
    PerformanceRunner.java:98)
    at com.pros.cricket.performance.**
    impala.PerformanceRunner.**collectPerformanceResults(**
    PerformanceRunner.java:70)
    at com.pros.cricket.performance.**
    impala.PerformanceRunner.main(**PerformanceRunner.java:134)
    at sun.reflect.**NativeMethodAccessorImpl.**invoke0(Native
    Method)
    at sun.reflect.**NativeMethodAccessorImpl.**invoke(**
    NativeMethodAccessorImpl.java:**57)
    at sun.reflect.**DelegatingMethodAccessorImpl.**invoke(*
    *DelegatingMethodAccessorImpl.**java:43)
    at java.lang.reflect.Method.**invoke(Method.java:601)
    at com.intellij.rt.execution.**application.AppMain.main(
    **AppMain.java:120)
    Caused by: java.sql.SQLException: Couldn't open transport for
    10.XXX.XXX.XX:22000(connect() failed: Connection refused)
    at org.apache.hive.jdbc.Utils.**
    verifySuccess(Utils.java:159)
    at org.apache.hive.jdbc.Utils.**
    verifySuccessWithInfo(Utils.**java:147)
    at org.apache.hive.jdbc.**HiveStatement.execute(**
    HiveStatement.java:182)
    at org.apache.hive.jdbc.**HiveStatement.executeQuery(**
    HiveStatement.java:246)
    at org.apache.commons.dbcp.**DelegatingStatement.**
    executeQuery(**DelegatingStatement.java:208)
    at org.apache.commons.dbcp.**DelegatingStatement.**
    executeQuery(**DelegatingStatement.java:208)
    at com.pros.cricket.impala.**ImpalaAccess.executeQuery(*
    *ImpalaAccess.java:59)
    at com.pros.cricket.performance.**
    impala.PerformanceRunner.**recordTimeForQuery(**
    PerformanceRunner.java:94)
    ... 7 more

    The on the failed node tmp/impalad.WARNING has
    W0226 16:51:12.098316 26078 hdfs-scan-node.cc:171] Unknown disk id.
    This will negatively affect performance. Check your hdfs settings to
    enable block location metadata.

    but I do not see any other useful warning or error messages.
    We have tried using dbcp, but still hit this problem after some number
    of queries.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupimpala-user @
categorieshadoop
postedFeb 27, '13 at 9:57p
activeMar 19, '13 at 9:22p
posts3
users3
websitecloudera.com
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase