FAQ
Hi,

I'm trying to read data from Impala to Qlikview using the ODBC driver 2.5.5
that I downloaded today. Connections work and I can get results on small
queries and aggregations.

CDH version is CDH 4.4.0-1.cdh4.4.0.p0.3 (basically default installation)
Impala version is IMPALA 1.1.1-1.p0.17

The data is in an external table and works fine while using Hue for all
sorts of calculations so the whole set is definitely there and works as
expected.

If i try to do a select * from the table i get a few 100 thousand rows and
then get hit by:
Error: QVX_UNEXPECTED_END_OF_DATA: SQL##f - SqlState: S1000, ErrorCode:
120, ErrorMsg: [Cloudera][ImpalaODBC] (120) Error while retrieveing data
from in Impala: [HY000] : No more data to read.

If I use fewer columns I get more rows but always up to a limit and then I
get hit by the same error.

This happens after ~18-20 seconds. So queries that take less than 20 secs
seem to get completed and queries that take longer are stopped. This make
me suspect that there is some sort of timeout setting somewhere that
hopefully could be tweaked to allow for longer running queries?

Trying the Hive driver for kicks i get similar messages:
Error: QVX_UNEXPECTED_END_OF_DATA: SQL##f - SqlState: S1000, ErrorCode: 35,
ErrorMsg: [Cloudera][Hardy] (35) Error from Hive: error code: '0' error
message: 'No more data to read.'.

Checking the Hive log, the job is launched properly and then ended with:
2013-11-04 17:59:12,790 INFO org.apache.hadoop.hive.ql.Driver: MapReduce
Jobs Launched:
2013-11-04 17:59:12,791 INFO org.apache.hadoop.hive.ql.Driver: Job 0: Map:
15 Cumulative CPU: 183.11 sec HDFS Read: 3286791275 HDFS Write:
2864372836 SUCCESS

Difference being that the Hive query does not return any rows at all before
giving me the error message. Which makes me suspect there is the same kind
of timeout issue at play here sine the hive job takes longer than 20 secs.

Thanks

To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.

Search Discussions

  • Fredrik Ragnar at Nov 7, 2013 at 11:27 am
    Update,

    I installed impala-shell on a remote machine and the same query is running
    fine from there, as it did when I ran it on the namenode.

    I'm stumped. Any ideas would be greatly appreciated.

    Thanks.

    Den tisdagen den 5:e november 2013 kl. 14:02:24 UTC+1 skrev Fredrik Ragnar:
    Hi, and thanks for taking your time.

    Impala shell runs the query happily, producing a 4.7 GB file in ~500 secs
    containing all rows in the source table:
    "
    Query: select *
    from test_table
    Query finished, fetching results ...
    Returned 9967593 row(s) in 506.24s
    "

    Regarding the ODBC driver, I tried to tweak the socket timeout in the
    driver settings but that didn't help me (didn't really think it would but
    I'm game for testing anything :)). Default is 0 which should be infinite
    according to the docs.

    Den tisdagen den 5:e november 2013 kl. 03:03:43 UTC+1 skrev Udai:
    Fredrik,

    What happens when you run the same query(that fails) via the
    impala-shell? Do you see the query complete?
    That would help isolate the scope of the issue whether it is with the
    ODBC driver or if it is indeed with the Impalads.

    Thanks,
    Udai

    On Mon, Nov 4, 2013 at 9:09 AM, Fredrik Ragnar wrote:

    Hi,

    I'm trying to read data from Impala to Qlikview using the ODBC driver
    2.5.5 that I downloaded today. Connections work and I can get results on
    small queries and aggregations.

    CDH version is CDH 4.4.0-1.cdh4.4.0.p0.3 (basically default installation)
    Impala version is IMPALA 1.1.1-1.p0.17

    The data is in an external table and works fine while using Hue for all
    sorts of calculations so the whole set is definitely there and works as
    expected.

    If i try to do a select * from the table i get a few 100 thousand rows
    and then get hit by:
    Error: QVX_UNEXPECTED_END_OF_DATA: SQL##f - SqlState: S1000, ErrorCode:
    120, ErrorMsg: [Cloudera][ImpalaODBC] (120) Error while retrieveing data
    from in Impala: [HY000] : No more data to read.

    If I use fewer columns I get more rows but always up to a limit and then
    I get hit by the same error.

    This happens after ~18-20 seconds. So queries that take less than 20
    secs seem to get completed and queries that take longer are stopped. This
    make me suspect that there is some sort of timeout setting somewhere that
    hopefully could be tweaked to allow for longer running queries?

    Trying the Hive driver for kicks i get similar messages:
    Error: QVX_UNEXPECTED_END_OF_DATA: SQL##f - SqlState: S1000, ErrorCode:
    35, ErrorMsg: [Cloudera][Hardy] (35) Error from Hive: error code: '0' error
    message: 'No more data to read.'.

    Checking the Hive log, the job is launched properly and then ended with:
    2013-11-04 17:59:12,790 INFO org.apache.hadoop.hive.ql.Driver: MapReduce
    Jobs Launched:
    2013-11-04 17:59:12,791 INFO org.apache.hadoop.hive.ql.Driver: Job 0:
    Map: 15 Cumulative CPU: 183.11 sec HDFS Read: 3286791275 HDFS Write:
    2864372836 SUCCESS

    Difference being that the Hive query does not return any rows at all
    before giving me the error message. Which makes me suspect there is the
    same kind of timeout issue at play here sine the hive job takes longer than
    20 secs.

    Thanks

    To unsubscribe from this group and stop receiving emails from it, send
    an email to impala-user...@cloudera.org.
    To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.
  • Fredrik Ragnar at Nov 7, 2013 at 11:59 am
    Update again. I downloaded the Simba driver for Hive, there's none
    available specifically for Impala that I found.

    1. Tested a Hive query, which worked fine from Qlikivew.
    2. Pointed the Hive driver at the Impala port which also worked fine from
    Qlikview.

    So the Simba driver seem to work but the Cloudera driver does not.

    Den torsdagen den 7:e november 2013 kl. 12:27:42 UTC+1 skrev Fredrik Ragnar:
    Update,

    I installed impala-shell on a remote machine and the same query is running
    fine from there, as it did when I ran it on the namenode.

    I'm stumped. Any ideas would be greatly appreciated.

    Thanks.

    Den tisdagen den 5:e november 2013 kl. 14:02:24 UTC+1 skrev Fredrik Ragnar:
    Hi, and thanks for taking your time.

    Impala shell runs the query happily, producing a 4.7 GB file in ~500 secs
    containing all rows in the source table:
    "
    Query: select *
    from test_table
    Query finished, fetching results ...
    Returned 9967593 row(s) in 506.24s
    "

    Regarding the ODBC driver, I tried to tweak the socket timeout in the
    driver settings but that didn't help me (didn't really think it would but
    I'm game for testing anything :)). Default is 0 which should be infinite
    according to the docs.

    Den tisdagen den 5:e november 2013 kl. 03:03:43 UTC+1 skrev Udai:
    Fredrik,

    What happens when you run the same query(that fails) via the
    impala-shell? Do you see the query complete?
    That would help isolate the scope of the issue whether it is with the
    ODBC driver or if it is indeed with the Impalads.

    Thanks,
    Udai

    On Mon, Nov 4, 2013 at 9:09 AM, Fredrik Ragnar wrote:

    Hi,

    I'm trying to read data from Impala to Qlikview using the ODBC driver
    2.5.5 that I downloaded today. Connections work and I can get results on
    small queries and aggregations.

    CDH version is CDH 4.4.0-1.cdh4.4.0.p0.3 (basically default
    installation)
    Impala version is IMPALA 1.1.1-1.p0.17

    The data is in an external table and works fine while using Hue for all
    sorts of calculations so the whole set is definitely there and works as
    expected.

    If i try to do a select * from the table i get a few 100 thousand rows
    and then get hit by:
    Error: QVX_UNEXPECTED_END_OF_DATA: SQL##f - SqlState: S1000, ErrorCode:
    120, ErrorMsg: [Cloudera][ImpalaODBC] (120) Error while retrieveing data
    from in Impala: [HY000] : No more data to read.

    If I use fewer columns I get more rows but always up to a limit and
    then I get hit by the same error.

    This happens after ~18-20 seconds. So queries that take less than 20
    secs seem to get completed and queries that take longer are stopped. This
    make me suspect that there is some sort of timeout setting somewhere that
    hopefully could be tweaked to allow for longer running queries?

    Trying the Hive driver for kicks i get similar messages:
    Error: QVX_UNEXPECTED_END_OF_DATA: SQL##f - SqlState: S1000, ErrorCode:
    35, ErrorMsg: [Cloudera][Hardy] (35) Error from Hive: error code: '0' error
    message: 'No more data to read.'.

    Checking the Hive log, the job is launched properly and then ended with:
    2013-11-04 17:59:12,790 INFO org.apache.hadoop.hive.ql.Driver:
    MapReduce Jobs Launched:
    2013-11-04 17:59:12,791 INFO org.apache.hadoop.hive.ql.Driver: Job 0:
    Map: 15 Cumulative CPU: 183.11 sec HDFS Read: 3286791275 HDFS Write:
    2864372836 SUCCESS

    Difference being that the Hive query does not return any rows at all
    before giving me the error message. Which makes me suspect there is the
    same kind of timeout issue at play here sine the hive job takes longer than
    20 secs.

    Thanks

    To unsubscribe from this group and stop receiving emails from it, send
    an email to impala-user...@cloudera.org.
    To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.
  • Justin Erickson at Nov 7, 2013 at 6:10 pm
    Hi Fredrik,

    The Cloudera ODBC 2.5.x
    drivers<http://go.cloudera.com/odbc-driver-hive-impala.html>are
    drivers provided by Simba. Where did you download the Simba driver
    that
    worked and what's the version number for that?

    Also did you test the same long-running query with the working Simba driver?

    Thanks,
    Justin

    On Thu, Nov 7, 2013 at 3:59 AM, Fredrik Ragnar wrote:

    Update again. I downloaded the Simba driver for Hive, there's none
    available specifically for Impala that I found.

    1. Tested a Hive query, which worked fine from Qlikivew.
    2. Pointed the Hive driver at the Impala port which also worked fine from
    Qlikview.

    So the Simba driver seem to work but the Cloudera driver does not.

    Den torsdagen den 7:e november 2013 kl. 12:27:42 UTC+1 skrev Fredrik
    Ragnar:
    Update,

    I installed impala-shell on a remote machine and the same query is
    running fine from there, as it did when I ran it on the namenode.

    I'm stumped. Any ideas would be greatly appreciated.

    Thanks.

    Den tisdagen den 5:e november 2013 kl. 14:02:24 UTC+1 skrev Fredrik
    Ragnar:
    Hi, and thanks for taking your time.

    Impala shell runs the query happily, producing a 4.7 GB file in ~500
    secs containing all rows in the source table:
    "
    Query: select *
    from test_table
    Query finished, fetching results ...
    Returned 9967593 row(s) in 506.24s
    "

    Regarding the ODBC driver, I tried to tweak the socket timeout in the
    driver settings but that didn't help me (didn't really think it would but
    I'm game for testing anything :)). Default is 0 which should be infinite
    according to the docs.

    Den tisdagen den 5:e november 2013 kl. 03:03:43 UTC+1 skrev Udai:
    Fredrik,

    What happens when you run the same query(that fails) via the
    impala-shell? Do you see the query complete?
    That would help isolate the scope of the issue whether it is with the
    ODBC driver or if it is indeed with the Impalads.

    Thanks,
    Udai

    On Mon, Nov 4, 2013 at 9:09 AM, Fredrik Ragnar wrote:

    Hi,

    I'm trying to read data from Impala to Qlikview using the ODBC driver
    2.5.5 that I downloaded today. Connections work and I can get results on
    small queries and aggregations.

    CDH version is CDH 4.4.0-1.cdh4.4.0.p0.3 (basically default
    installation)
    Impala version is IMPALA 1.1.1-1.p0.17

    The data is in an external table and works fine while using Hue for
    all sorts of calculations so the whole set is definitely there and works as
    expected.

    If i try to do a select * from the table i get a few 100 thousand rows
    and then get hit by:
    Error: QVX_UNEXPECTED_END_OF_DATA: SQL##f - SqlState: S1000,
    ErrorCode: 120, ErrorMsg: [Cloudera][ImpalaODBC] (120) Error while
    retrieveing data from in Impala: [HY000] : No more data to read.

    If I use fewer columns I get more rows but always up to a limit and
    then I get hit by the same error.

    This happens after ~18-20 seconds. So queries that take less than 20
    secs seem to get completed and queries that take longer are stopped. This
    make me suspect that there is some sort of timeout setting somewhere that
    hopefully could be tweaked to allow for longer running queries?

    Trying the Hive driver for kicks i get similar messages:
    Error: QVX_UNEXPECTED_END_OF_DATA: SQL##f - SqlState: S1000,
    ErrorCode: 35, ErrorMsg: [Cloudera][Hardy] (35) Error from Hive: error
    code: '0' error message: 'No more data to read.'.

    Checking the Hive log, the job is launched properly and then ended
    with:
    2013-11-04 17:59:12,790 INFO org.apache.hadoop.hive.ql.Driver:
    MapReduce Jobs Launched:
    2013-11-04 17:59:12,791 INFO org.apache.hadoop.hive.ql.Driver: Job 0:
    Map: 15 Cumulative CPU: 183.11 sec HDFS Read: 3286791275 HDFS Write:
    2864372836 SUCCESS

    Difference being that the Hive query does not return any rows at all
    before giving me the error message. Which makes me suspect there is the
    same kind of timeout issue at play here sine the hive job takes longer than
    20 secs.

    Thanks

    To unsubscribe from this group and stop receiving emails from it, send
    an email to impala-user...@cloudera.org.
    To unsubscribe from this group and stop receiving emails from it,
    send an email to impala-user+unsubscribe@cloudera.org.
    To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.
  • Fredrik Ragnar at Nov 8, 2013 at 10:42 am
    Hi Justin, and thanks for taking your time.

    I grabbed it from:
    http://www.simba.com/connectors/apache-hadoop-hive-odbc

    Hit "Evaluate software" fill out the form and download.

    According to the file name it's SimbaHiveODBC_Windows-1.3.14.1007.

    And yes I tested the same query and was able to download the whole dataset
    to Qlikview using the Simba driver.

    Thanks.

    Den torsdagen den 7:e november 2013 kl. 19:10:52 UTC+1 skrev Justin
    Erickson:
    Hi Fredrik,

    The Cloudera ODBC 2.5.x drivers<http://go.cloudera.com/odbc-driver-hive-impala.html>are drivers provided by Simba. Where did you download the Simba driver that
    worked and what's the version number for that?

    Also did you test the same long-running query with the working Simba
    driver?

    Thanks,
    Justin


    On Thu, Nov 7, 2013 at 3:59 AM, Fredrik Ragnar <fra...@gmail.com<javascript:>
    wrote:
    Update again. I downloaded the Simba driver for Hive, there's none
    available specifically for Impala that I found.

    1. Tested a Hive query, which worked fine from Qlikivew.
    2. Pointed the Hive driver at the Impala port which also worked fine from
    Qlikview.

    So the Simba driver seem to work but the Cloudera driver does not.

    Den torsdagen den 7:e november 2013 kl. 12:27:42 UTC+1 skrev Fredrik
    Ragnar:
    Update,

    I installed impala-shell on a remote machine and the same query is
    running fine from there, as it did when I ran it on the namenode.

    I'm stumped. Any ideas would be greatly appreciated.

    Thanks.

    Den tisdagen den 5:e november 2013 kl. 14:02:24 UTC+1 skrev Fredrik
    Ragnar:
    Hi, and thanks for taking your time.

    Impala shell runs the query happily, producing a 4.7 GB file in ~500
    secs containing all rows in the source table:
    "
    Query: select *
    from test_table
    Query finished, fetching results ...
    Returned 9967593 row(s) in 506.24s
    "

    Regarding the ODBC driver, I tried to tweak the socket timeout in the
    driver settings but that didn't help me (didn't really think it would but
    I'm game for testing anything :)). Default is 0 which should be infinite
    according to the docs.

    Den tisdagen den 5:e november 2013 kl. 03:03:43 UTC+1 skrev Udai:
    Fredrik,

    What happens when you run the same query(that fails) via the
    impala-shell? Do you see the query complete?
    That would help isolate the scope of the issue whether it is with the
    ODBC driver or if it is indeed with the Impalads.

    Thanks,
    Udai

    On Mon, Nov 4, 2013 at 9:09 AM, Fredrik Ragnar wrote:

    Hi,

    I'm trying to read data from Impala to Qlikview using the ODBC driver
    2.5.5 that I downloaded today. Connections work and I can get results on
    small queries and aggregations.

    CDH version is CDH 4.4.0-1.cdh4.4.0.p0.3 (basically default
    installation)
    Impala version is IMPALA 1.1.1-1.p0.17

    The data is in an external table and works fine while using Hue for
    all sorts of calculations so the whole set is definitely there and works as
    expected.

    If i try to do a select * from the table i get a few 100 thousand
    rows and then get hit by:
    Error: QVX_UNEXPECTED_END_OF_DATA: SQL##f - SqlState: S1000,
    ErrorCode: 120, ErrorMsg: [Cloudera][ImpalaODBC] (120) Error while
    retrieveing data from in Impala: [HY000] : No more data to read.

    If I use fewer columns I get more rows but always up to a limit and
    then I get hit by the same error.

    This happens after ~18-20 seconds. So queries that take less than 20
    secs seem to get completed and queries that take longer are stopped. This
    make me suspect that there is some sort of timeout setting somewhere that
    hopefully could be tweaked to allow for longer running queries?

    Trying the Hive driver for kicks i get similar messages:
    Error: QVX_UNEXPECTED_END_OF_DATA: SQL##f - SqlState: S1000,
    ErrorCode: 35, ErrorMsg: [Cloudera][Hardy] (35) Error from Hive: error
    code: '0' error message: 'No more data to read.'.

    Checking the Hive log, the job is launched properly and then ended
    with:
    2013-11-04 17:59:12,790 INFO org.apache.hadoop.hive.ql.Driver:
    MapReduce Jobs Launched:
    2013-11-04 17:59:12,791 INFO org.apache.hadoop.hive.ql.Driver: Job
    0: Map: 15 Cumulative CPU: 183.11 sec HDFS Read: 3286791275 HDFS Write:
    2864372836 SUCCESS

    Difference being that the Hive query does not return any rows at all
    before giving me the error message. Which makes me suspect there is the
    same kind of timeout issue at play here sine the hive job takes longer than
    20 secs.

    Thanks

    To unsubscribe from this group and stop receiving emails from it,
    send an email to impala-user...@cloudera.org.
    To unsubscribe from this group and stop receiving emails from it,
    send an email to impala-user...@cloudera.org <javascript:>.
    To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.
  • James Duong at Nov 8, 2013 at 10:18 pm
    Hi Fredrik,

    A few questions about this problem:

    1. Which version of QlikView are you running? Is it 64-bit or 32-bit?
    2. What does the table you're querying look like? How many columns are in
    it and what are their data types?
    3. Are you applying any ordering or filtering operations, or is it just a
    straight SELECT * FROM <table> query?

    I tried out a table with 126 string columns and roughly 4 million rows and
    QlikView appeared to run out of memory after retrieving about 2 million
    rows.

    Thanks,
    James
    On Friday, 8 November 2013 02:42:17 UTC-8, Fredrik Ragnar wrote:

    Hi Justin, and thanks for taking your time.

    I grabbed it from:
    http://www.simba.com/connectors/apache-hadoop-hive-odbc

    Hit "Evaluate software" fill out the form and download.

    According to the file name it's SimbaHiveODBC_Windows-1.3.14.1007.

    And yes I tested the same query and was able to download the whole dataset
    to Qlikview using the Simba driver.

    Thanks.

    Den torsdagen den 7:e november 2013 kl. 19:10:52 UTC+1 skrev Justin
    Erickson:
    Hi Fredrik,

    The Cloudera ODBC 2.5.x drivers<http://go.cloudera.com/odbc-driver-hive-impala.html>are drivers provided by Simba. Where did you download the Simba driver that
    worked and what's the version number for that?

    Also did you test the same long-running query with the working Simba
    driver?

    Thanks,
    Justin

    On Thu, Nov 7, 2013 at 3:59 AM, Fredrik Ragnar wrote:

    Update again. I downloaded the Simba driver for Hive, there's none
    available specifically for Impala that I found.

    1. Tested a Hive query, which worked fine from Qlikivew.
    2. Pointed the Hive driver at the Impala port which also worked fine
    from Qlikview.

    So the Simba driver seem to work but the Cloudera driver does not.

    Den torsdagen den 7:e november 2013 kl. 12:27:42 UTC+1 skrev Fredrik
    Ragnar:
    Update,

    I installed impala-shell on a remote machine and the same query is
    running fine from there, as it did when I ran it on the namenode.

    I'm stumped. Any ideas would be greatly appreciated.

    Thanks.

    Den tisdagen den 5:e november 2013 kl. 14:02:24 UTC+1 skrev Fredrik
    Ragnar:
    Hi, and thanks for taking your time.

    Impala shell runs the query happily, producing a 4.7 GB file in ~500
    secs containing all rows in the source table:
    "
    Query: select *
    from test_table
    Query finished, fetching results ...
    Returned 9967593 row(s) in 506.24s
    "

    Regarding the ODBC driver, I tried to tweak the socket timeout in the
    driver settings but that didn't help me (didn't really think it would but
    I'm game for testing anything :)). Default is 0 which should be infinite
    according to the docs.

    Den tisdagen den 5:e november 2013 kl. 03:03:43 UTC+1 skrev Udai:
    Fredrik,

    What happens when you run the same query(that fails) via the
    impala-shell? Do you see the query complete?
    That would help isolate the scope of the issue whether it is with the
    ODBC driver or if it is indeed with the Impalads.

    Thanks,
    Udai

    On Mon, Nov 4, 2013 at 9:09 AM, Fredrik Ragnar wrote:

    Hi,

    I'm trying to read data from Impala to Qlikview using the ODBC
    driver 2.5.5 that I downloaded today. Connections work and I can get
    results on small queries and aggregations.

    CDH version is CDH 4.4.0-1.cdh4.4.0.p0.3 (basically default
    installation)
    Impala version is IMPALA 1.1.1-1.p0.17

    The data is in an external table and works fine while using Hue for
    all sorts of calculations so the whole set is definitely there and works as
    expected.

    If i try to do a select * from the table i get a few 100 thousand
    rows and then get hit by:
    Error: QVX_UNEXPECTED_END_OF_DATA: SQL##f - SqlState: S1000,
    ErrorCode: 120, ErrorMsg: [Cloudera][ImpalaODBC] (120) Error while
    retrieveing data from in Impala: [HY000] : No more data to read.

    If I use fewer columns I get more rows but always up to a limit and
    then I get hit by the same error.

    This happens after ~18-20 seconds. So queries that take less than 20
    secs seem to get completed and queries that take longer are stopped. This
    make me suspect that there is some sort of timeout setting somewhere that
    hopefully could be tweaked to allow for longer running queries?

    Trying the Hive driver for kicks i get similar messages:
    Error: QVX_UNEXPECTED_END_OF_DATA: SQL##f - SqlState: S1000,
    ErrorCode: 35, ErrorMsg: [Cloudera][Hardy] (35) Error from Hive: error
    code: '0' error message: 'No more data to read.'.

    Checking the Hive log, the job is launched properly and then ended
    with:
    2013-11-04 17:59:12,790 INFO org.apache.hadoop.hive.ql.Driver:
    MapReduce Jobs Launched:
    2013-11-04 17:59:12,791 INFO org.apache.hadoop.hive.ql.Driver: Job
    0: Map: 15 Cumulative CPU: 183.11 sec HDFS Read: 3286791275 HDFS Write:
    2864372836 SUCCESS

    Difference being that the Hive query does not return any rows at all
    before giving me the error message. Which makes me suspect there is the
    same kind of timeout issue at play here sine the hive job takes longer than
    20 secs.

    Thanks

    To unsubscribe from this group and stop receiving emails from it,
    send an email to impala-user...@cloudera.org.
    To unsubscribe from this group and stop receiving emails from it,
    send an email to impala-user...@cloudera.org.
    To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.
  • James Duong at Nov 9, 2013 at 12:30 am
    Also, what authentication settings are you using?
    On Friday, 8 November 2013 14:18:19 UTC-8, James Duong wrote:

    Hi Fredrik,

    A few questions about this problem:

    1. Which version of QlikView are you running? Is it 64-bit or 32-bit?
    2. What does the table you're querying look like? How many columns are in
    it and what are their data types?
    3. Are you applying any ordering or filtering operations, or is it just a
    straight SELECT * FROM <table> query?

    I tried out a table with 126 string columns and roughly 4 million rows and
    QlikView appeared to run out of memory after retrieving about 2 million
    rows.

    Thanks,
    James
    On Friday, 8 November 2013 02:42:17 UTC-8, Fredrik Ragnar wrote:

    Hi Justin, and thanks for taking your time.

    I grabbed it from:
    http://www.simba.com/connectors/apache-hadoop-hive-odbc

    Hit "Evaluate software" fill out the form and download.

    According to the file name it's SimbaHiveODBC_Windows-1.3.14.1007.

    And yes I tested the same query and was able to download the whole
    dataset to Qlikview using the Simba driver.

    Thanks.

    Den torsdagen den 7:e november 2013 kl. 19:10:52 UTC+1 skrev Justin
    Erickson:
    Hi Fredrik,

    The Cloudera ODBC 2.5.x drivers<http://go.cloudera.com/odbc-driver-hive-impala.html>are drivers provided by Simba. Where did you download the Simba driver that
    worked and what's the version number for that?

    Also did you test the same long-running query with the working Simba
    driver?

    Thanks,
    Justin

    On Thu, Nov 7, 2013 at 3:59 AM, Fredrik Ragnar wrote:

    Update again. I downloaded the Simba driver for Hive, there's none
    available specifically for Impala that I found.

    1. Tested a Hive query, which worked fine from Qlikivew.
    2. Pointed the Hive driver at the Impala port which also worked fine
    from Qlikview.

    So the Simba driver seem to work but the Cloudera driver does not.

    Den torsdagen den 7:e november 2013 kl. 12:27:42 UTC+1 skrev Fredrik
    Ragnar:
    Update,

    I installed impala-shell on a remote machine and the same query is
    running fine from there, as it did when I ran it on the namenode.

    I'm stumped. Any ideas would be greatly appreciated.

    Thanks.

    Den tisdagen den 5:e november 2013 kl. 14:02:24 UTC+1 skrev Fredrik
    Ragnar:
    Hi, and thanks for taking your time.

    Impala shell runs the query happily, producing a 4.7 GB file in ~500
    secs containing all rows in the source table:
    "
    Query: select *
    from test_table
    Query finished, fetching results ...
    Returned 9967593 row(s) in 506.24s
    "

    Regarding the ODBC driver, I tried to tweak the socket timeout in the
    driver settings but that didn't help me (didn't really think it would but
    I'm game for testing anything :)). Default is 0 which should be infinite
    according to the docs.

    Den tisdagen den 5:e november 2013 kl. 03:03:43 UTC+1 skrev Udai:
    Fredrik,

    What happens when you run the same query(that fails) via the
    impala-shell? Do you see the query complete?
    That would help isolate the scope of the issue whether it is with
    the ODBC driver or if it is indeed with the Impalads.

    Thanks,
    Udai

    On Mon, Nov 4, 2013 at 9:09 AM, Fredrik Ragnar wrote:

    Hi,

    I'm trying to read data from Impala to Qlikview using the ODBC
    driver 2.5.5 that I downloaded today. Connections work and I can get
    results on small queries and aggregations.

    CDH version is CDH 4.4.0-1.cdh4.4.0.p0.3 (basically default
    installation)
    Impala version is IMPALA 1.1.1-1.p0.17

    The data is in an external table and works fine while using Hue for
    all sorts of calculations so the whole set is definitely there and works as
    expected.

    If i try to do a select * from the table i get a few 100 thousand
    rows and then get hit by:
    Error: QVX_UNEXPECTED_END_OF_DATA: SQL##f - SqlState: S1000,
    ErrorCode: 120, ErrorMsg: [Cloudera][ImpalaODBC] (120) Error while
    retrieveing data from in Impala: [HY000] : No more data to read.

    If I use fewer columns I get more rows but always up to a limit and
    then I get hit by the same error.

    This happens after ~18-20 seconds. So queries that take less than
    20 secs seem to get completed and queries that take longer are stopped.
    This make me suspect that there is some sort of timeout setting somewhere
    that hopefully could be tweaked to allow for longer running queries?

    Trying the Hive driver for kicks i get similar messages:
    Error: QVX_UNEXPECTED_END_OF_DATA: SQL##f - SqlState: S1000,
    ErrorCode: 35, ErrorMsg: [Cloudera][Hardy] (35) Error from Hive: error
    code: '0' error message: 'No more data to read.'.

    Checking the Hive log, the job is launched properly and then ended
    with:
    2013-11-04 17:59:12,790 INFO org.apache.hadoop.hive.ql.Driver:
    MapReduce Jobs Launched:
    2013-11-04 17:59:12,791 INFO org.apache.hadoop.hive.ql.Driver: Job
    0: Map: 15 Cumulative CPU: 183.11 sec HDFS Read: 3286791275 HDFS Write:
    2864372836 SUCCESS

    Difference being that the Hive query does not return any rows at
    all before giving me the error message. Which makes me suspect there is the
    same kind of timeout issue at play here sine the hive job takes longer than
    20 secs.

    Thanks

    To unsubscribe from this group and stop receiving emails from it,
    send an email to impala-user...@cloudera.org.
    To unsubscribe from this group and stop receiving emails from it,
    send an email to impala-user...@cloudera.org.
    To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.
  • Fredrik Ragnar at Nov 9, 2013 at 8:17 pm
    Hi James,

    1. It's 64 bit, latest release.
    2. 28 columns, all of them are string columns.
    3. Just a straight select * from.

    For the authentication settings I'm not using any authentication at all and
    in the ODBC settings I'm not inserting any usernames. The Hive driver seems
    to need a user but is quite happy without a password.

    The server has about 80 GB to spare (out of 196) when the query ends so I
    would be very surprised if it's a memory issue on QVs behalf.

    Thanks.

    Den fredagen den 8:e november 2013 kl. 23:18:19 UTC+1 skrev James Duong:
    Hi Fredrik,

    A few questions about this problem:

    1. Which version of QlikView are you running? Is it 64-bit or 32-bit?
    2. What does the table you're querying look like? How many columns are in
    it and what are their data types?
    3. Are you applying any ordering or filtering operations, or is it just a
    straight SELECT * FROM <table> query?

    I tried out a table with 126 string columns and roughly 4 million rows and
    QlikView appeared to run out of memory after retrieving about 2 million
    rows.

    Thanks,
    James
    On Friday, 8 November 2013 02:42:17 UTC-8, Fredrik Ragnar wrote:

    Hi Justin, and thanks for taking your time.

    I grabbed it from:
    http://www.simba.com/connectors/apache-hadoop-hive-odbc

    Hit "Evaluate software" fill out the form and download.

    According to the file name it's SimbaHiveODBC_Windows-1.3.14.1007.

    And yes I tested the same query and was able to download the whole
    dataset to Qlikview using the Simba driver.

    Thanks.

    Den torsdagen den 7:e november 2013 kl. 19:10:52 UTC+1 skrev Justin
    Erickson:
    Hi Fredrik,

    The Cloudera ODBC 2.5.x drivers<http://go.cloudera.com/odbc-driver-hive-impala.html>are drivers provided by Simba. Where did you download the Simba driver that
    worked and what's the version number for that?

    Also did you test the same long-running query with the working Simba
    driver?

    Thanks,
    Justin

    On Thu, Nov 7, 2013 at 3:59 AM, Fredrik Ragnar wrote:

    Update again. I downloaded the Simba driver for Hive, there's none
    available specifically for Impala that I found.

    1. Tested a Hive query, which worked fine from Qlikivew.
    2. Pointed the Hive driver at the Impala port which also worked fine
    from Qlikview.

    So the Simba driver seem to work but the Cloudera driver does not.

    Den torsdagen den 7:e november 2013 kl. 12:27:42 UTC+1 skrev Fredrik
    Ragnar:
    Update,

    I installed impala-shell on a remote machine and the same query is
    running fine from there, as it did when I ran it on the namenode.

    I'm stumped. Any ideas would be greatly appreciated.

    Thanks.

    Den tisdagen den 5:e november 2013 kl. 14:02:24 UTC+1 skrev Fredrik
    Ragnar:
    Hi, and thanks for taking your time.

    Impala shell runs the query happily, producing a 4.7 GB file in ~500
    secs containing all rows in the source table:
    "
    Query: select *
    from test_table
    Query finished, fetching results ...
    Returned 9967593 row(s) in 506.24s
    "

    Regarding the ODBC driver, I tried to tweak the socket timeout in the
    driver settings but that didn't help me (didn't really think it would but
    I'm game for testing anything :)). Default is 0 which should be infinite
    according to the docs.

    Den tisdagen den 5:e november 2013 kl. 03:03:43 UTC+1 skrev Udai:
    Fredrik,

    What happens when you run the same query(that fails) via the
    impala-shell? Do you see the query complete?
    That would help isolate the scope of the issue whether it is with
    the ODBC driver or if it is indeed with the Impalads.

    Thanks,
    Udai

    On Mon, Nov 4, 2013 at 9:09 AM, Fredrik Ragnar wrote:

    Hi,

    I'm trying to read data from Impala to Qlikview using the ODBC
    driver 2.5.5 that I downloaded today. Connections work and I can get
    results on small queries and aggregations.

    CDH version is CDH 4.4.0-1.cdh4.4.0.p0.3 (basically default
    installation)
    Impala version is IMPALA 1.1.1-1.p0.17

    The data is in an external table and works fine while using Hue for
    all sorts of calculations so the whole set is definitely there and works as
    expected.

    If i try to do a select * from the table i get a few 100 thousand
    rows and then get hit by:
    Error: QVX_UNEXPECTED_END_OF_DATA: SQL##f - SqlState: S1000,
    ErrorCode: 120, ErrorMsg: [Cloudera][ImpalaODBC] (120) Error while
    retrieveing data from in Impala: [HY000] : No more data to read.

    If I use fewer columns I get more rows but always up to a limit and
    then I get hit by the same error.

    This happens after ~18-20 seconds. So queries that take less than
    20 secs seem to get completed and queries that take longer are stopped.
    This make me suspect that there is some sort of timeout setting somewhere
    that hopefully could be tweaked to allow for longer running queries?

    Trying the Hive driver for kicks i get similar messages:
    Error: QVX_UNEXPECTED_END_OF_DATA: SQL##f - SqlState: S1000,
    ErrorCode: 35, ErrorMsg: [Cloudera][Hardy] (35) Error from Hive: error
    code: '0' error message: 'No more data to read.'.

    Checking the Hive log, the job is launched properly and then ended
    with:
    2013-11-04 17:59:12,790 INFO org.apache.hadoop.hive.ql.Driver:
    MapReduce Jobs Launched:
    2013-11-04 17:59:12,791 INFO org.apache.hadoop.hive.ql.Driver: Job
    0: Map: 15 Cumulative CPU: 183.11 sec HDFS Read: 3286791275 HDFS Write:
    2864372836 SUCCESS

    Difference being that the Hive query does not return any rows at
    all before giving me the error message. Which makes me suspect there is the
    same kind of timeout issue at play here sine the hive job takes longer than
    20 secs.

    Thanks

    To unsubscribe from this group and stop receiving emails from it,
    send an email to impala-user...@cloudera.org.
    To unsubscribe from this group and stop receiving emails from it,
    send an email to impala-user...@cloudera.org.
    To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.
  • James Duong at Nov 21, 2013 at 12:12 am
    Hi Fredrik,

    Can you see what happens if you change the driver's configuration to enable
    the Use Native Query option?
    On Friday, 8 November 2013 14:18:19 UTC-8, James Duong wrote:

    Hi Fredrik,

    A few questions about this problem:

    1. Which version of QlikView are you running? Is it 64-bit or 32-bit?
    2. What does the table you're querying look like? How many columns are in
    it and what are their data types?
    3. Are you applying any ordering or filtering operations, or is it just a
    straight SELECT * FROM <table> query?

    I tried out a table with 126 string columns and roughly 4 million rows and
    QlikView appeared to run out of memory after retrieving about 2 million
    rows.

    Thanks,
    James
    On Friday, 8 November 2013 02:42:17 UTC-8, Fredrik Ragnar wrote:

    Hi Justin, and thanks for taking your time.

    I grabbed it from:
    http://www.simba.com/connectors/apache-hadoop-hive-odbc

    Hit "Evaluate software" fill out the form and download.

    According to the file name it's SimbaHiveODBC_Windows-1.3.14.1007.

    And yes I tested the same query and was able to download the whole
    dataset to Qlikview using the Simba driver.

    Thanks.

    Den torsdagen den 7:e november 2013 kl. 19:10:52 UTC+1 skrev Justin
    Erickson:
    Hi Fredrik,

    The Cloudera ODBC 2.5.x drivers<http://go.cloudera.com/odbc-driver-hive-impala.html>are drivers provided by Simba. Where did you download the Simba driver that
    worked and what's the version number for that?

    Also did you test the same long-running query with the working Simba
    driver?

    Thanks,
    Justin

    On Thu, Nov 7, 2013 at 3:59 AM, Fredrik Ragnar wrote:

    Update again. I downloaded the Simba driver for Hive, there's none
    available specifically for Impala that I found.

    1. Tested a Hive query, which worked fine from Qlikivew.
    2. Pointed the Hive driver at the Impala port which also worked fine
    from Qlikview.

    So the Simba driver seem to work but the Cloudera driver does not.

    Den torsdagen den 7:e november 2013 kl. 12:27:42 UTC+1 skrev Fredrik
    Ragnar:
    Update,

    I installed impala-shell on a remote machine and the same query is
    running fine from there, as it did when I ran it on the namenode.

    I'm stumped. Any ideas would be greatly appreciated.

    Thanks.

    Den tisdagen den 5:e november 2013 kl. 14:02:24 UTC+1 skrev Fredrik
    Ragnar:
    Hi, and thanks for taking your time.

    Impala shell runs the query happily, producing a 4.7 GB file in ~500
    secs containing all rows in the source table:
    "
    Query: select *
    from test_table
    Query finished, fetching results ...
    Returned 9967593 row(s) in 506.24s
    "

    Regarding the ODBC driver, I tried to tweak the socket timeout in the
    driver settings but that didn't help me (didn't really think it would but
    I'm game for testing anything :)). Default is 0 which should be infinite
    according to the docs.

    Den tisdagen den 5:e november 2013 kl. 03:03:43 UTC+1 skrev Udai:
    Fredrik,

    What happens when you run the same query(that fails) via the
    impala-shell? Do you see the query complete?
    That would help isolate the scope of the issue whether it is with
    the ODBC driver or if it is indeed with the Impalads.

    Thanks,
    Udai

    On Mon, Nov 4, 2013 at 9:09 AM, Fredrik Ragnar wrote:

    Hi,

    I'm trying to read data from Impala to Qlikview using the ODBC
    driver 2.5.5 that I downloaded today. Connections work and I can get
    results on small queries and aggregations.

    CDH version is CDH 4.4.0-1.cdh4.4.0.p0.3 (basically default
    installation)
    Impala version is IMPALA 1.1.1-1.p0.17

    The data is in an external table and works fine while using Hue for
    all sorts of calculations so the whole set is definitely there and works as
    expected.

    If i try to do a select * from the table i get a few 100 thousand
    rows and then get hit by:
    Error: QVX_UNEXPECTED_END_OF_DATA: SQL##f - SqlState: S1000,
    ErrorCode: 120, ErrorMsg: [Cloudera][ImpalaODBC] (120) Error while
    retrieveing data from in Impala: [HY000] : No more data to read.

    If I use fewer columns I get more rows but always up to a limit and
    then I get hit by the same error.

    This happens after ~18-20 seconds. So queries that take less than
    20 secs seem to get completed and queries that take longer are stopped.
    This make me suspect that there is some sort of timeout setting somewhere
    that hopefully could be tweaked to allow for longer running queries?

    Trying the Hive driver for kicks i get similar messages:
    Error: QVX_UNEXPECTED_END_OF_DATA: SQL##f - SqlState: S1000,
    ErrorCode: 35, ErrorMsg: [Cloudera][Hardy] (35) Error from Hive: error
    code: '0' error message: 'No more data to read.'.

    Checking the Hive log, the job is launched properly and then ended
    with:
    2013-11-04 17:59:12,790 INFO org.apache.hadoop.hive.ql.Driver:
    MapReduce Jobs Launched:
    2013-11-04 17:59:12,791 INFO org.apache.hadoop.hive.ql.Driver: Job
    0: Map: 15 Cumulative CPU: 183.11 sec HDFS Read: 3286791275 HDFS Write:
    2864372836 SUCCESS

    Difference being that the Hive query does not return any rows at
    all before giving me the error message. Which makes me suspect there is the
    same kind of timeout issue at play here sine the hive job takes longer than
    20 secs.

    Thanks

    To unsubscribe from this group and stop receiving emails from it,
    send an email to impala-user...@cloudera.org.
    To unsubscribe from this group and stop receiving emails from it,
    send an email to impala-user...@cloudera.org.
    To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.
  • Fredrik Ragnar at Nov 27, 2013 at 11:04 am
    Hi James, and thanks again. And sorry for late answer.

    Sorry no. Makes no difference at all, query is stopped at 20 seconds just
    as before.

    BR

    Den torsdagen den 21:e november 2013 kl. 01:12:53 UTC+1 skrev James Duong:
    Hi Fredrik,

    Can you see what happens if you change the driver's configuration to
    enable the Use Native Query option?
    On Friday, 8 November 2013 14:18:19 UTC-8, James Duong wrote:

    Hi Fredrik,

    A few questions about this problem:

    1. Which version of QlikView are you running? Is it 64-bit or 32-bit?
    2. What does the table you're querying look like? How many columns are in
    it and what are their data types?
    3. Are you applying any ordering or filtering operations, or is it just a
    straight SELECT * FROM <table> query?

    I tried out a table with 126 string columns and roughly 4 million rows
    and QlikView appeared to run out of memory after retrieving about 2 million
    rows.

    Thanks,
    James
    On Friday, 8 November 2013 02:42:17 UTC-8, Fredrik Ragnar wrote:

    Hi Justin, and thanks for taking your time.

    I grabbed it from:
    http://www.simba.com/connectors/apache-hadoop-hive-odbc

    Hit "Evaluate software" fill out the form and download.

    According to the file name it's SimbaHiveODBC_Windows-1.3.14.1007.

    And yes I tested the same query and was able to download the whole
    dataset to Qlikview using the Simba driver.

    Thanks.

    Den torsdagen den 7:e november 2013 kl. 19:10:52 UTC+1 skrev Justin
    Erickson:
    Hi Fredrik,

    The Cloudera ODBC 2.5.x drivers<http://go.cloudera.com/odbc-driver-hive-impala.html>are drivers provided by Simba. Where did you download the Simba driver that
    worked and what's the version number for that?

    Also did you test the same long-running query with the working Simba
    driver?

    Thanks,
    Justin

    On Thu, Nov 7, 2013 at 3:59 AM, Fredrik Ragnar wrote:

    Update again. I downloaded the Simba driver for Hive, there's none
    available specifically for Impala that I found.

    1. Tested a Hive query, which worked fine from Qlikivew.
    2. Pointed the Hive driver at the Impala port which also worked fine
    from Qlikview.

    So the Simba driver seem to work but the Cloudera driver does not.

    Den torsdagen den 7:e november 2013 kl. 12:27:42 UTC+1 skrev Fredrik
    Ragnar:
    Update,

    I installed impala-shell on a remote machine and the same query is
    running fine from there, as it did when I ran it on the namenode.

    I'm stumped. Any ideas would be greatly appreciated.

    Thanks.

    Den tisdagen den 5:e november 2013 kl. 14:02:24 UTC+1 skrev Fredrik
    Ragnar:
    Hi, and thanks for taking your time.

    Impala shell runs the query happily, producing a 4.7 GB file in ~500
    secs containing all rows in the source table:
    "
    Query: select *
    from test_table
    Query finished, fetching results ...
    Returned 9967593 row(s) in 506.24s
    "

    Regarding the ODBC driver, I tried to tweak the socket timeout in
    the driver settings but that didn't help me (didn't really think it would
    but I'm game for testing anything :)). Default is 0 which should be
    infinite according to the docs.

    Den tisdagen den 5:e november 2013 kl. 03:03:43 UTC+1 skrev Udai:
    Fredrik,

    What happens when you run the same query(that fails) via the
    impala-shell? Do you see the query complete?
    That would help isolate the scope of the issue whether it is with
    the ODBC driver or if it is indeed with the Impalads.

    Thanks,
    Udai

    On Mon, Nov 4, 2013 at 9:09 AM, Fredrik Ragnar wrote:

    Hi,

    I'm trying to read data from Impala to Qlikview using the ODBC
    driver 2.5.5 that I downloaded today. Connections work and I can get
    results on small queries and aggregations.

    CDH version is CDH 4.4.0-1.cdh4.4.0.p0.3 (basically default
    installation)
    Impala version is IMPALA 1.1.1-1.p0.17

    The data is in an external table and works fine while using Hue
    for all sorts of calculations so the whole set is definitely there and
    works as expected.

    If i try to do a select * from the table i get a few 100 thousand
    rows and then get hit by:
    Error: QVX_UNEXPECTED_END_OF_DATA: SQL##f - SqlState: S1000,
    ErrorCode: 120, ErrorMsg: [Cloudera][ImpalaODBC] (120) Error while
    retrieveing data from in Impala: [HY000] : No more data to read.

    If I use fewer columns I get more rows but always up to a limit
    and then I get hit by the same error.

    This happens after ~18-20 seconds. So queries that take less than
    20 secs seem to get completed and queries that take longer are stopped.
    This make me suspect that there is some sort of timeout setting somewhere
    that hopefully could be tweaked to allow for longer running queries?

    Trying the Hive driver for kicks i get similar messages:
    Error: QVX_UNEXPECTED_END_OF_DATA: SQL##f - SqlState: S1000,
    ErrorCode: 35, ErrorMsg: [Cloudera][Hardy] (35) Error from Hive: error
    code: '0' error message: 'No more data to read.'.

    Checking the Hive log, the job is launched properly and then ended
    with:
    2013-11-04 17:59:12,790 INFO org.apache.hadoop.hive.ql.Driver:
    MapReduce Jobs Launched:
    2013-11-04 17:59:12,791 INFO org.apache.hadoop.hive.ql.Driver:
    Job 0: Map: 15 Cumulative CPU: 183.11 sec HDFS Read: 3286791275 HDFS
    Write: 2864372836 SUCCESS

    Difference being that the Hive query does not return any rows at
    all before giving me the error message. Which makes me suspect there is the
    same kind of timeout issue at play here sine the hive job takes longer than
    20 secs.

    Thanks

    To unsubscribe from this group and stop receiving emails from it,
    send an email to impala-user...@cloudera.org.
    To unsubscribe from this group and stop receiving emails from it,
    send an email to impala-user...@cloudera.org.
    To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupimpala-user @
categorieshadoop
postedNov 4, '13 at 5:09p
activeNov 27, '13 at 11:04a
posts10
users3
websitecloudera.com
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase