I believe we have a JIRA tracking this issue:
https://issues.cloudera.org/browse/IMPALA-37
The problem is that the connection is being closed in the middle of a
fetch, leaving all the queries in-flight. Eventually, one of the backends
crashes, and you see the transport error. As a workaround, could you add a
loop to fetch all the data, and only then close the connection? This does
work for me locally, it would be nice to know if it solves your problem as
well.
while (results.next()) {
// do something
}
should do the trick.
Additionally, the warning indicates that you don't have short-circuit
reads turned on, you can find more information on the how-to here:
https://ccp.cloudera.com/display/IMPALA10BETADOC/Configuring+Impala+for+Performance
Which version of CDH are you using?
Thanks,
.. Ishaan
On Wed, Feb 27, 2013 at 12:38 PM, Barry Becker wrote:
We are having a problem where running the same impala query many times
through java eventually fails. Running it many times through impala-shell
does not fail. We suspect there may either be an issue with the hive jdbc
driver or improperly opening and closing connections.
Versions:
hive driver using hive-*-0.10.0-cdh4.3.0.jar and have tried others
impala : 0.5
In java, I call the following 50 times. On the 18th time it fails with the
below error
try {
connection.connect();
statement = connection.createStatement();
ResultSet results = statement.executeQuery(
“select * from pa_sales_fact, pa_product ”
+ “where pa_sales_fact.material_id =
pa_product.material_id ”
+ ” order by pa_sales_fact.material_id asc limit 10”
);
}
catch (SQLException e)
{
throw new SQLException(e);
}
finally
{
if (results != null) results.close();
if (statement != null) statement.close();
if (connection != null) connection.close();
}
Here is the error:
Exception in thread "main" java.sql.SQLException: java.sql.SQLException:
Couldn't open transport for 10.XXX.XXX.XX:22000(connect() failed:
Connection refused)
at
com.pros.cricket.performance.impala.PerformanceRunner.recordTimeForQuery(PerformanceRunner.java:98)
at
com.pros.cricket.performance.impala.PerformanceRunner.collectPerformanceResults(PerformanceRunner.java:70)
at
com.pros.cricket.performance.impala.PerformanceRunner.main(PerformanceRunner.java:134)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at
com.intellij.rt.execution.application.AppMain.main(AppMain.java:120)
Caused by: java.sql.SQLException: Couldn't open transport for
10.XXX.XXX.XX:22000(connect() failed: Connection refused)
at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:159)
at
org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:147)
at
org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:182)
at
org.apache.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:246)
at
org.apache.commons.dbcp.DelegatingStatement.executeQuery(DelegatingStatement.java:208)
at
org.apache.commons.dbcp.DelegatingStatement.executeQuery(DelegatingStatement.java:208)
at
com.pros.cricket.impala.ImpalaAccess.executeQuery(ImpalaAccess.java:59)
at
com.pros.cricket.performance.impala.PerformanceRunner.recordTimeForQuery(PerformanceRunner.java:94)
... 7 more
The on the failed node tmp/impalad.WARNING has
W0226 16:51:12.098316 26078 hdfs-scan-node.cc:171] Unknown disk id. This
will negatively affect performance. Check your hdfs settings to enable
block location metadata.
but I do not see any other useful warning or error messages.
We have tried using dbcp, but still hit this problem after some number of
queries.
We are having a problem where running the same impala query many times
through java eventually fails. Running it many times through impala-shell
does not fail. We suspect there may either be an issue with the hive jdbc
driver or improperly opening and closing connections.
Versions:
hive driver using hive-*-0.10.0-cdh4.3.0.jar and have tried others
impala : 0.5
In java, I call the following 50 times. On the 18th time it fails with the
below error
try {
connection.connect();
statement = connection.createStatement();
ResultSet results = statement.executeQuery(
“select * from pa_sales_fact, pa_product ”
+ “where pa_sales_fact.material_id =
pa_product.material_id ”
+ ” order by pa_sales_fact.material_id asc limit 10”
);
}
catch (SQLException e)
{
throw new SQLException(e);
}
finally
{
if (results != null) results.close();
if (statement != null) statement.close();
if (connection != null) connection.close();
}
Here is the error:
Exception in thread "main" java.sql.SQLException: java.sql.SQLException:
Couldn't open transport for 10.XXX.XXX.XX:22000(connect() failed:
Connection refused)
at
com.pros.cricket.performance.impala.PerformanceRunner.recordTimeForQuery(PerformanceRunner.java:98)
at
com.pros.cricket.performance.impala.PerformanceRunner.collectPerformanceResults(PerformanceRunner.java:70)
at
com.pros.cricket.performance.impala.PerformanceRunner.main(PerformanceRunner.java:134)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at
com.intellij.rt.execution.application.AppMain.main(AppMain.java:120)
Caused by: java.sql.SQLException: Couldn't open transport for
10.XXX.XXX.XX:22000(connect() failed: Connection refused)
at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:159)
at
org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:147)
at
org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:182)
at
org.apache.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:246)
at
org.apache.commons.dbcp.DelegatingStatement.executeQuery(DelegatingStatement.java:208)
at
org.apache.commons.dbcp.DelegatingStatement.executeQuery(DelegatingStatement.java:208)
at
com.pros.cricket.impala.ImpalaAccess.executeQuery(ImpalaAccess.java:59)
at
com.pros.cricket.performance.impala.PerformanceRunner.recordTimeForQuery(PerformanceRunner.java:94)
... 7 more
The on the failed node tmp/impalad.WARNING has
W0226 16:51:12.098316 26078 hdfs-scan-node.cc:171] Unknown disk id. This
will negatively affect performance. Check your hdfs settings to enable
block location metadata.
but I do not see any other useful warning or error messages.
We have tried using dbcp, but still hit this problem after some number of
queries.