FAQ
Hi,

when I use Impala to query an HBase table with two WHERE conditions
(connected by AND), one of the conditions seems to be ignored:

select count(*) from customer_journey where customer_city is not null;
  => 13562
select count(*) from customer_journey where pi_search_phrase = 'shs viveon';
=> 3048

select count(*) from customer_journey where customer_city is not null *and* pi_search_phrase
= 'shs viveon';
=> 13562

This even happens if the two columns are on the same column family.
Bug or feature?

I use Impala 1.1.1 on CDH 4.4.0.

Regards,
Henrik

To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.

Search Discussions

  • Henrik B. at Oct 18, 2013 at 12:49 pm
    Hi Alex,

    here are the table definitions:

    HBase:

    WiredMinds 2 {NAME => 'WiredMinds', FAMILIES => [{NAME =>
    'customer'}, {NAME => 'pi'}, {NAME => 'visit'}]}

    Hive/Impala:

    create external table customer_journey (
    rowkey string,
    customer_agent_name string,
    customer_agent_os string,
    customer_agent_os_version string,
    customer_agent_version string,
    customer_campaign_datas string,
    customer_campaign_datas_str string,
    customer_cc2 string,
    customer_city string,
    customer_city_name string,
    customer_city_raw string,
    customer_color string,
    customer_company_name string,
    customer_company_name_raw string,
    customer_country_code2 string,
    customer_country_code2_raw string,
    customer_country_name string,
    customer_country_name_raw string,
    customer_ctime string,
    customer_ct_js string,
    customer_ct_jv string,
    customer_ct_ret_visitor string,
    customer_ct_user_cookie string,
    customer_ct_users_id string,
    customer_host_name string,
    customer_ip_addr string,
    customer_ip_addr_and_host_name string,
    customer_ip_from string,
    customer_ip_long string,
    customer_last_atime string,
    customer_latitude string,
    customer_longitude string,
    customer_milestone_datas string,
    customer_mundt_id string,
    customer_num string,
    customer_provider_name string,
    customer_region string,
    customer_region_name string,
    customer_region_raw string,
    customer_resolution string,
    customer_schober_id string,
    customer_schober_idcc2 string,
    customer_total_perc string,
    customer_zip string,
    customer_zip_raw string,
    pi_campaign_name string,
    pi_clicks_id string,
    pi_date string,
    pi_duration string,
    pi_duration_string string,
    pi_groups_id string,
    pi_group_string string,
    pi_page_name string,
    pi_pages_id string,
    pi_referrers_id string,
    pi_referrer_string string,
    pi_search_engine string,
    pi_search_phrase string,
    visit_ctime string,
    visit_ct_users_id string,
    visit_duration string,
    visit_duration_string string,
    visit_mtime string,
    visit_num_clicks string)
    stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
    with serdeproperties
    ("hbase.columns.mapping"=":key,customer:agent_name,customer:agent_os,customer:agent_os_version,customer:agent_version,customer:campaign_datas,customer:campaign_datas_str,customer:cc2,customer:city,customer:city_name,customer:city_raw,customer:color,customer:company_name,customer:company_name_raw,customer:country_code2,customer:country_code2_raw,customer:country_name,customer:country_name_raw,customer:ctime,customer:ct_js,customer:ct_jv,customer:ct_ret_visitor,customer:ct_user_cookie,customer:ct_users_id,customer:host_name,customer:ip_addr,customer:ip_addr_and_host_name,customer:ip_from,customer:ip_long,customer:last_atime,customer:latitude,customer:longitude,customer:milestone_datas,customer:mundt_id,customer:num,customer:provider_name,customer:region,customer:region_name,customer:region_raw,customer:resolution,customer:schober_id,customer:schober_idcc2,customer:total_perc,customer:zip,customer:zip_raw,pi:campaign_name,pi:clicks_id,pi:date,pi:duration,pi:duration_string,pi:groups_id,pi:group_string,pi:page_name,pi:pages_id,pi:referrers_id,pi:referrer_string,pi:search_engine,pi:search_phrase,visit:ctime,visit:ct_users_id,visit:duration,visit:duration_string,visit:mtime,visit:num_clicks")
    tblproperties("hbase.table.name" = "WiredMinds");

    Attached, you find the logfiles. The cluster has two nodes, but no logfile
    entries were produced on the second node because all two regions of the
    HBase table happen to reside on the first node.

    Henrik

    To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.
  • Alex Behm at Oct 21, 2013 at 5:22 pm
    Hi Henrik,

    thanks for the detailed info package! I'm having a look.

    Cheers,

    Alex

    On Fri, Oct 18, 2013 at 5:49 AM, Henrik B.
    wrote:
    Hi Alex,

    here are the table definitions:

    HBase:

    WiredMinds 2 {NAME => 'WiredMinds', FAMILIES => [{NAME =>
    'customer'}, {NAME => 'pi'}, {NAME => 'visit'}]}

    Hive/Impala:

    create external table customer_journey (
    rowkey string,
    customer_agent_name string,
    customer_agent_os string,
    customer_agent_os_version string,
    customer_agent_version string,
    customer_campaign_datas string,
    customer_campaign_datas_str string,
    customer_cc2 string,
    customer_city string,
    customer_city_name string,
    customer_city_raw string,
    customer_color string,
    customer_company_name string,
    customer_company_name_raw string,
    customer_country_code2 string,
    customer_country_code2_raw string,
    customer_country_name string,
    customer_country_name_raw string,
    customer_ctime string,
    customer_ct_js string,
    customer_ct_jv string,
    customer_ct_ret_visitor string,
    customer_ct_user_cookie string,
    customer_ct_users_id string,
    customer_host_name string,
    customer_ip_addr string,
    customer_ip_addr_and_host_name string,
    customer_ip_from string,
    customer_ip_long string,
    customer_last_atime string,
    customer_latitude string,
    customer_longitude string,
    customer_milestone_datas string,
    customer_mundt_id string,
    customer_num string,
    customer_provider_name string,
    customer_region string,
    customer_region_name string,
    customer_region_raw string,
    customer_resolution string,
    customer_schober_id string,
    customer_schober_idcc2 string,
    customer_total_perc string,
    customer_zip string,
    customer_zip_raw string,
    pi_campaign_name string,
    pi_clicks_id string,
    pi_date string,
    pi_duration string,
    pi_duration_string string,
    pi_groups_id string,
    pi_group_string string,
    pi_page_name string,
    pi_pages_id string,
    pi_referrers_id string,
    pi_referrer_string string,
    pi_search_engine string,
    pi_search_phrase string,
    visit_ctime string,
    visit_ct_users_id string,
    visit_duration string,
    visit_duration_string string,
    visit_mtime string,
    visit_num_clicks string)
    stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
    with serdeproperties
    ("hbase.columns.mapping"=":key,customer:agent_name,customer:agent_os,customer:agent_os_version,customer:agent_version,customer:campaign_datas,customer:campaign_datas_str,customer:cc2,customer:city,customer:city_name,customer:city_raw,customer:color,customer:company_name,customer:company_name_raw,customer:country_code2,customer:country_code2_raw,customer:country_name,customer:country_name_raw,customer:ctime,customer:ct_js,customer:ct_jv,customer:ct_ret_visitor,customer:ct_user_cookie,customer:ct_users_id,customer:host_name,customer:ip_addr,customer:ip_addr_and_host_name,customer:ip_from,customer:ip_long,customer:last_atime,customer:latitude,customer:longitude,customer:milestone_datas,customer:mundt_id,customer:num,customer:provider_name,customer:region,customer:region_name,customer:region_raw,customer:resolution,customer:schober_id,customer:schober_idcc2,customer:total_perc,customer:zip,customer:zip_raw,pi:campaign_name,pi:clicks_id,pi:date,pi:duration,pi:duration_string,pi:groups_id,pi:group_string,pi:page_name,pi:pages_id,pi:referrers_id,pi:referrer_string,pi:search_engine,pi:search_phrase,visit:ctime,visit:ct_users_id,visit:duration,visit:duration_string,visit:mtime,visit:num_clicks")
    tblproperties("hbase.table.name" = "WiredMinds");

    Attached, you find the logfiles. The cluster has two nodes, but no logfile
    entries were produced on the second node because all two regions of the
    HBase table happen to reside on the first node.

    Henrik

    To unsubscribe from this group and stop receiving emails from it, send an
    email to impala-user+unsubscribe@cloudera.org.
    To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.
  • Henrik B. at Oct 22, 2013 at 10:26 am
    Hi Alex,

    if you like, we can have look on the machine together via Skype on an
    evening (Europe time).

    Henrik

    To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.
  • Alex Behm at Oct 24, 2013 at 10:48 pm
    Hendrik, my apologies for the late reply.

    After some experimentation I was able to reproduce the issue in-house.
    It's definitely a bug, and unfortunately, I wasn't able to come up
    with a workaround. You'll need to wait for a fix.

    I've filed https://issues.cloudera.org/browse/IMPALA-642 to track the
    progress on this bug.

    Thanks for letting us know about it!

    Cheers,

    Alex

    On Tue, Oct 22, 2013 at 3:26 AM, Henrik B.
    wrote:
    Hi Alex,

    if you like, we can have look on the machine together via Skype on an
    evening (Europe time).

    Henrik

    To unsubscribe from this group and stop receiving emails from it, send an
    email to impala-user+unsubscribe@cloudera.org.
    To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupimpala-user @
categorieshadoop
postedOct 17, '13 at 1:31p
activeOct 24, '13 at 10:48p
posts5
users2
websitecloudera.com
irc#hadoop

2 users in discussion

Henrik B.: 3 posts Alex Behm: 2 posts

People

Translate

site design / logo © 2022 Grokbase