FAQ
Could you also post the runtime profiles (also found in the logs) for
both queries?
On Mon, Mar 11, 2013 at 7:41 PM, zjp... wrote:
Thanks Marcel !

My create table statement is:
CREATE TABLE gb_xdr( Online_ID bigint, Session_ID string, Start_Date_Time
string, End_Date_Time string, SGSN_IP bigint, SGSN string, BSC_IP bigint,
BSC string, LAC int, CI int, RAC int, BVCI int, IMSI bigint, IMEI string,
APN string, Protocol string, Event_Type int, IP_LEN_UL bigint, IP_LEN_DL
bigint, Count_Packet_UL int, Count_Packet_DL int, Duration int, SOURCE_IP
bigint, DEST_IP bigint, SOURCE_PORT int, DEST_PORT int, URI string, URI_Main
string, Content_Type_Part1 string, Content_Type_Part2 string,
User_Agent_Main string, Service_Type int, Sub_Service_Type int, De_PDP_Acc
int, De_PDP_Acc_Direction string, Detach_Acc int, Detach_Acc_Direction
string, Abnormal_reason int, Request int, Repeat int, Reject int,
Reject_Cause int, Accept int, Accept_DelayFirst int, Complete int,
Complete_DelayFirst int, ACK int, ACK_Delay bigint, Last_ACK_Delay bigint,
BVCI_OLD int, BVCI_NEW int, LLC_frame_Disc int, OPP_LAC int, OPP_RAC int,
Result int, Result_Delayfirst bigint, Disconnect int, Abort int, Query_Name
string, Query_Type int, Query_OP int, Resp int, Resp_delayFirst bigint,
Resp_Cause int, FromX string, ToX string, Subject string, MMS_Cause int,
ErrorPacketWired int, ErrorPacketWireless int, GPRS_multislot_E int,
EDGE_multislot_E int, Umts_128TDD_E int, Umts_FDD_E int, GPRS_multislot_1800
int, EDGE_multislot_1800 int, Umts_128TDD_1800 int, Umts_FDD_1800 int, Reset
int, Reset_Direction int, LLC_Frame_Count int, Abort_Reason_User int,
Abort_Reason_Provider int, Accept_Repeat int, Abnormal_Type int, De_Cause
int, RAU_Type int, RAU_Attr int, Event_Direction int, Prefix_IMEI string,
IsReassemble int, IsPrivate_Service int, User_IP_LEN_UL bigint,
User_IP_LEN_DL bigint, Bssgp_Status_Cause int, Weight_Rate_UL bigint,
Weight_Rate_DL bigint, Weight_Rate_Total bigint, Brand string, Type string,
ICMP_Type int, ICMP_Code int, Gob_Src_IP int, Gob_Dst_IP int, Gob_IP_LEN
int, Gob_Src_PORT int, Gob_Dst_PORT int, Gob_UDP_LEN int) row format
delimited fields terminated by '\t';


The INFO logs dumped by impalad when i issue the SELECT query are:
--------------------------------------------------------------------------------------------------------------------------
......
13/03/10 22:56:21 INFO metastore.HiveMetaStore: 0: get_all_databases
13/03/10 22:56:21 INFO HiveMetaStore.audit: ugi=root ip=unknown-ip-addr
cmd=get_all_databases
13/03/11 10:02:19 INFO service.Frontend: analyze query select count(*) from
mpos_gb.gb_xdr
13/03/11 10:02:19 INFO metastore.HiveMetaStore: 1: get_all_tables:
db=mpos_gb
13/03/11 10:02:19 INFO HiveMetaStore.audit: ugi=root ip=unknown-ip-addr
cmd=get_all_tables: db=mpos_gb
13/03/11 10:02:19 INFO metastore.HiveMetaStore: 1: Opening raw store with
implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
13/03/11 10:02:19 INFO metastore.ObjectStore: ObjectStore, initialize called
13/03/11 10:02:19 INFO metastore.ObjectStore: Initialized ObjectStore
13/03/11 10:02:19 INFO DataNucleus.Datastore: The class
"org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as
"embedded-only" so d oes not have its own datastore table.
13/03/11 10:02:19 INFO DataNucleus.Persistence: Managing Persistence of
Class : org.apache.hadoop.hive.metastore.model.MColumnDescriptor [Table
: `CDS`, InheritanceStrategy : new-table]
13/03/11 10:02:19 INFO DataNucleus.Persistence: Managing Persistence of
Class : org.apache.hadoop.hive.metastore.model.MSerDeInfo [Table : `SER
DES`, InheritanceStrategy : new-table]
13/03/11 10:02:19 INFO DataNucleus.Persistence: Managing Persistence of
Class : org.apache.hadoop.hive.metastore.model.MStringList [Table : `SK
EWED_STRING_LIST`, InheritanceStrategy : new-table]
13/03/11 10:02:19 INFO DataNucleus.Datastore: The class
"org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only"
so does no t have its own datastore table.
13/03/11 10:02:19 INFO DataNucleus.Persistence: Managing Persistence of
Class : org.apache.hadoop.hive.metastore.model.MStorageDescriptor [Tabl
e : `SDS`, InheritanceStrategy : new-table]
13/03/11 10:02:19 INFO DataNucleus.Persistence: Managing Persistence of
Class : org.apache.hadoop.hive.metastore.model.MTable [Table : `TBLS`,
InheritanceStrategy : new-table]
13/03/11 10:02:19 INFO DataNucleus.Persistence: Managing Persistence of
Field : org.apache.hadoop.hive.metastore.model.MSerDeInfo.parameters [T
able : `SERDE_PARAMS`]
13/03/11 10:02:19 INFO DataNucleus.Persistence: Managing Persistence of
Field : org.apache.hadoop.hive.metastore.model.MStringList.internalList
[Table : `SKEWED_STRING_LIST_VALUES`]
13/03/11 10:02:19 INFO DataNucleus.Persistence: Managing Persistence of
Field : org.apache.hadoop.hive.metastore.model.MTable.parameters [Table
: `TABLE_PARAMS`]
13/03/11 10:02:19 INFO DataNucleus.Persistence: Managing Persistence of
Field : org.apache.hadoop.hive.metastore.model.MTable.partitionKeys [Ta
ble : `PARTITION_KEYS`]
13/03/11 10:02:19 INFO DataNucleus.Persistence: Managing Persistence of
Field : org.apache.hadoop.hive.metastore.model.MStorageDescriptor.bucke
tCols [Table : `BUCKETING_COLS`]
13/03/11 10:02:19 INFO DataNucleus.Persistence: Managing Persistence of
Field : org.apache.hadoop.hive.metastore.model.MStorageDescriptor.param
eters [Table : `SD_PARAMS`]
13/03/11 10:02:19 INFO DataNucleus.Persistence: Managing Persistence of
Field : org.apache.hadoop.hive.metastore.model.MStorageDescriptor.skewe
dColNames [Table : `SKEWED_COL_NAMES`]
13/03/11 10:02:19 INFO DataNucleus.Persistence: Managing Persistence of
Field : org.apache.hadoop.hive.metastore.model.MStorageDescriptor.skewe
dColValueLocationMaps [Table : `SKEWED_COL_VALUE_LOC_MAP`]
13/03/11 10:02:19 INFO DataNucleus.Persistence: Managing Persistence of
Field : org.apache.hadoop.hive.metastore.model.MStorageDescriptor.skewe
dColValues [Table : `SKEWED_VALUES`]
13/03/11 10:02:19 INFO DataNucleus.Persistence: Managing Persistence of
Field : org.apache.hadoop.hive.metastore.model.MStorageDescriptor.sortC
ols [Table : `SORT_COLS`]
13/03/11 10:02:19 INFO DataNucleus.Persistence: Managing Persistence of
Field : org.apache.hadoop.hive.metastore.model.MColumnDescriptor.cols [
Table : `COLUMNS_V2`]
13/03/11 10:02:20 INFO metastore.HiveMetaStore: 1: get_table : db=mpos_gb
tbl=gb_xdr
13/03/11 10:02:20 INFO HiveMetaStore.audit: ugi=root ip=unknown-ip-addr
cmd=get_table : db=mpos_gb tbl=gb_xdr
13/03/11 10:02:20 INFO metastore.HiveMetaStore: 1: Opening raw store with
implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
13/03/11 10:02:20 INFO metastore.ObjectStore: ObjectStore, initialize called
13/03/11 10:02:20 INFO metastore.ObjectStore: Initialized ObjectStore
13/03/11 10:02:20 INFO DataNucleus.MetaData: Listener found initialisation
for persistable class org.apache.hadoop.hive.metastore.model.MColumn
Descriptor
13/03/11 10:02:20 INFO DataNucleus.MetaData: Listener found initialisation
for persistable class org.apache.hadoop.hive.metastore.model.MSerDeI
nfo
13/03/11 10:02:20 INFO DataNucleus.MetaData: Listener found initialisation
for persistable class org.apache.hadoop.hive.metastore.model.MStorag
eDescriptor
13/03/11 10:02:20 INFO DataNucleus.MetaData: Listener found initialisation
for persistable class org.apache.hadoop.hive.metastore.model.MTable
13/03/11 10:02:20 INFO DataNucleus.MetaData: Listener found initialisation
for persistable class org.apache.hadoop.hive.metastore.model.MFieldS
chema
13/03/11 10:02:20 INFO metastore.HiveMetaStore: 1: get_config_value:
name=hive.exec.default.partition.name defaultValue=__HIVE_DEFAULT_PARTITIO
N__
13/03/11 10:02:20 INFO HiveMetaStore.audit: ugi=root ip=unknown-ip-addr
cmd=get_config_value: name=hive.exec.default.partition.name def
aultValue=__HIVE_DEFAULT_PARTITION__
13/03/11 10:02:20 INFO metastore.HiveMetaStore: 1: get_fields:
db=mpos_gbtbl=gb_xdr
13/03/11 10:02:20 INFO HiveMetaStore.audit: ugi=root ip=unknown-ip-addr
cmd=get_fields: db=mpos_gbtbl=gb_xdr
13/03/11 10:02:20 INFO metastore.HiveMetaStore: 1: get_table : db=mpos_gb
tbl=gb_xdr
13/03/11 10:02:20 INFO HiveMetaStore.audit: ugi=root ip=unknown-ip-addr
cmd=get_table : db=mpos_gb tbl=gb_xdr
13/03/11 10:02:21 INFO metastore.HiveMetaStore: 1: get_partitions :
db=mpos_gb tbl=gb_xdr
13/03/11 10:02:21 INFO HiveMetaStore.audit: ugi=root ip=unknown-ip-addr
cmd=get_partitions : db=mpos_gb tbl=gb_xdr
13/03/11 10:02:21 INFO DataNucleus.Datastore: The class
"org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as
"embedded-only" so d oes not have its own datastore table.
13/03/11 10:02:21 INFO DataNucleus.Datastore: The class
"org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only"
so does no t have its own datastore table.
13/03/11 10:02:21 INFO DataNucleus.Persistence: Managing Persistence of
Class : org.apache.hadoop.hive.metastore.model.MPartition [Table : `PAR
TITIONS`, InheritanceStrategy : new-table]
13/03/11 10:02:21 INFO DataNucleus.Persistence: Managing Persistence of
Field : org.apache.hadoop.hive.metastore.model.MPartition.parameters [T
able : `PARTITION_PARAMS`]
13/03/11 10:02:21 INFO DataNucleus.Persistence: Managing Persistence of
Field : org.apache.hadoop.hive.metastore.model.MPartition.values [Table
: `PARTITION_KEY_VALS`]
13/03/11 10:02:21 INFO analysis.AggregateInfo: agg info:
AggregateInfo{grouping_exprs=, aggregate_exprs=(AggregateExpr{op=COUNT,
isStar=true, isDistinct=false, }), agg_tuple=TupleDescriptor{id=1, tbl=
null, byte_size=0, is_materialized=true, slots=[SlotDescriptor{id=0,
col=null, type=BIGINT, materialized=false, byteSize=0, byteOffset=-1, null
IndicatorByte=0, nullIndicatorBit=0, slotIdx=0}]},
smap=smap(AggregateExpr{op=COUNT, isStar=true, isDistinct=false,
}:SlotRef{tblName=null, col =null, id=0})}
mergeAggInfo:
AggregateInfo{grouping_exprs=, aggregate_exprs=(AggregateExpr{op=SUM,
isStar=false, isDistinct=false, (SlotRef{tblName=null, col=null, id=0})})
, agg_tuple=TupleDescriptor{id=1, tbl=null, byte_size=0,
is_materialized=true, slots=[SlotDescriptor{id=0, col=null, type=BIGINT,
materialized= false, byteSize=0, byteOffset=-1,
nullIndicatorByte=0, nullIndicatorBit=0, slotIdx=0}]},
smap=smap(AggregateExpr{op=COUNT, isStar=true, isDisti
nct=false, }:SlotRef{tblName=null, col=null, id=0})}
13/03/11 10:02:21 INFO service.Frontend: create plan
13/03/11 10:02:21 INFO planner.Planner: create single-node plan
13/03/11 10:02:21 INFO planner.Planner: create plan fragments
13/03/11 10:02:22 INFO planner.Planner: finalize plan fragments
13/03/11 10:02:22 INFO planner.HdfsScanNode: collecting partitions for table
gb_xdr
13/03/11 10:02:22 INFO service.Frontend: get scan range locations
13/03/11 10:02:26 INFO catalog.HdfsTable: loaded partiton
PartitionBlockMetadata{#blocks=864, #filenames=118, totalStringLen=11266}
13/03/11 10:02:27 INFO hdfs.BlockStorageLocationUtil: Failed to connect to
datanode 192.168.1.126:50020
13/03/11 10:02:27 INFO catalog.HdfsTable: loaded disk ids for
PartitionBlockMetadata{#blocks=864, #filenames=118, totalStringLen=11266}
13/03/11 10:02:27 INFO catalog.HdfsTable: block metadata cache:
CacheStats{hitCount=0, missCount=1, loadSuccessCount=1,
loadExceptionCount=0, t otalLoadTime=5080514950,
evictionCount=0}
13/03/11 10:02:27 INFO service.Frontend: create result set metadata
13/03/11 10:02:27 INFO service.JniFrontend: Plan Fragment 0
UNPARTITIONED
AGGREGATE
OUTPUT: SUM(<slot 0>)
GROUP BY:
TUPLE IDS: 1
EXCHANGE (2)
TUPLE IDS: 1

Plan Fragment 1
RANDOM
STREAM DATA SINK
EXCHANGE ID: 2
UNPARTITIONED

AGGREGATE
OUTPUT: COUNT(*)
GROUP BY:
TUPLE IDS: 1
SCAN HDFS table=mpos_gb.gb_xdr #partitions=1 size=50.50GB (0)
TUPLE IDS: 0

13/03/12 10:13:43 INFO service.Frontend: analyze query select * from
mpos_gb.gb_xdr where Start_Date_Time > '2012-09-05 13:00:00.000000' and
Start_Date_Time < '2012-09-05 15:30:10.000000' limit 10
13/03/12 10:13:43 INFO service.Frontend: create plan
13/03/12 10:13:43 INFO planner.Planner: create single-node plan
13/03/12 10:13:43 INFO planner.Planner: create plan fragments
13/03/12 10:13:43 INFO planner.Planner: finalize plan fragments
13/03/12 10:13:43 INFO planner.HdfsScanNode: collecting partitions for table
gb_xdr
13/03/12 10:13:43 INFO service.Frontend: get scan range locations
13/03/12 10:13:43 INFO catalog.HdfsTable: loaded partiton
PartitionBlockMetadata{#blocks=864, #filenames=118, totalStringLen=11210}
13/03/12 10:13:43 INFO catalog.HdfsTable: loaded disk ids for
PartitionBlockMetadata{#blocks=864, #filenames=118, totalStringLen=11210}
13/03/12 10:13:43 INFO catalog.HdfsTable: block metadata cache:
CacheStats{hitCount=0, missCount=2, loadSuccessCount=2,
loadExceptionCount=0, totalLoadTime=5617207689, evictionCount=1}
13/03/12 10:13:43 INFO service.Frontend: create result set metadata
13/03/12 10:13:43 INFO service.JniFrontend: Plan Fragment 0
UNPARTITIONED
EXCHANGE (1)
LIMIT: 10
TUPLE IDS: 0

Plan Fragment 1
RANDOM
STREAM DATA SINK
EXCHANGE ID: 1
UNPARTITIONED

SCAN HDFS table=mpos_gb.gb_xdr #partitions=1 size=50.50GB (0)
PREDICATES: Start_Date_Time > '2012-09-05 13:00:00.000000',
Start_Date_Time < '2012-09-05 15:30:10.000000'
LIMIT: 10
TUPLE IDS: 0
---------------------------------------------------------------------------------------------------------------------------------

And output from impala-shell:
[cloudera-host2:21000] > select count(*) from mpos_gb.gb_xdr;
Query: select count(*) from mpos_gb.gb_xdr
Query finished, fetching results ...
152193112
Returned 1 row(s) in 468.47s
[cloudera-host2:21000] >

[cloudera-host2:21000] > select * from mpos_gb.gb_xdr where Start_Date_Time
'2012-09-05 13:00:00.000000' and Start_Date_Time < '2012-09-05
15:30:10.000000' limit 10;
Query: select * from mpos_gb.gb_xdr where Start_Date_Time > '2012-09-05
13:00:00.000000' and Start_Date_Time < '2012-09-05 15:30:10.000000' limit 10
Query finished, fetching results ...

......
101509376 20906357 2012-09-05 13:00:46.841000 2012-09-05
13:00:47.202000 2796454154 SGSN206 52788746 JMSM8B2 9858
218821 3 460020502304239 2 20 NULL
NULL 2 1 361 1142202122 2885681162 48154 80
00 NULL NULL 0 1 NULL NULL
NULL NULL NULL NULL NULL 1 361 NULL NULL NULL
NULLNULL NULL 1 361 NULL NULL NULL NULL
1 6 NULL NULL NULL NULL 32
32 01 32 32 0 1 NULL NULL 1 NULL
NULL NULL 0 NULL NULL NULL 2 NULL 0
NULLNULL NULL 0 0 0 NULL NULL
NULL NULL NULL NULL NULL NULL
......

Returned 10 row(s) in 45.37s
[cloudera-host2:21000] >

PS: When I had finished my statement, press the enter key, the output "Query
finished, fetching results ..." appears immediately, But the results
returned very slow.
----------------------------------------------------

Thanks!

在 2013年3月12日星期二UTC上午1时04分18秒,Marcel Kornacker写道:
On Mon, Mar 11, 2013 at 1:13 AM, zjp... wrote:
Hi, All!
I have installed impala(version of impala is 0.6, CDH is 4.2) without
Cloudera Manager followed instructions on

https://ccp.cloudera.com/display/IMPALA10BETADOC/Installing+and+Using+Cloudera+Impala.
And my cluster consists of three data nodes and one name node, And each
one
machine hardware is: RAM: 8GB, CPU: 2 cores, 4 threads, Disk: 1TB.
OS is CentOS6.3 64bit.

I have installed impala on eache data node, and run process of
statestore on
one data node, Following is how I start statestored and impalad daemons:
GLOG_v=1 nohup statestored -state_store_port=24000 & #note: run on
cloudera-host3
GLOG_v=1 nohup impalad -state_store_host=cloudera-host3
-nn=cloudera-host1
-nn_port=8020 -hostname=cloudera-host2 -ipaddress=192.168.1.nn & #note:
nn
is part of ip address of datanode.

I have created a database and a table, And have loaded about 150 million
records(size of data is about 52GB, it is about 1.5 hours of data in my
business) into the table with Hive. When i run a query as "select
count(*)
from dbname.tbname" in Hive, The query takes approx 600~689 seconds to
come
back, When i run the same query in impala-shell, It takes approx 230~380
seconds to come back. And i run query as "select * from dbname.tbname
where
string_column>'xxxx' and string_column<'yyyy' limit 10", The Hive takes
approx 28~35 seconds, and the impala-shell takes approx 44~50 seconds,
Are
there expected? And i known that the same situation in Infobright data
warehouse application, The queries previous mentioned takes only about
10
seconds!

Who can give me some advises that whether have problems?

Thanks!
Could you post the exact statements you use to create and populate the
table, the SELECT statements, and the query profile from the Impala
INFO log?

Marcel

Search Discussions

  • Ricky Saltzer at Mar 14, 2013 at 2:37 pm
    If you have trouble finding the query profile in the logs, you can also
    access it via the daemon's web interface.

    for example: http://192.168.1.126:25000/queries

    There will be a list of recently executed queries below, find the
    problematic query and then click the "profile" button. The next page will
    contain the entire query profile (text).

    Hope this helps
    Ricky

    On Thu, Mar 14, 2013 at 10:30 AM, Marcel Kornacker wrote:

    Could you also post the runtime profiles (also found in the logs) for
    both queries?
    On Mon, Mar 11, 2013 at 7:41 PM, zjp... wrote:
    Thanks Marcel !

    My create table statement is:
    CREATE TABLE gb_xdr( Online_ID bigint, Session_ID string, Start_Date_Time
    string, End_Date_Time string, SGSN_IP bigint, SGSN string, BSC_IP bigint,
    BSC string, LAC int, CI int, RAC int, BVCI int, IMSI bigint, IMEI string,
    APN string, Protocol string, Event_Type int, IP_LEN_UL bigint, IP_LEN_DL
    bigint, Count_Packet_UL int, Count_Packet_DL int, Duration int, SOURCE_IP
    bigint, DEST_IP bigint, SOURCE_PORT int, DEST_PORT int, URI string, URI_Main
    string, Content_Type_Part1 string, Content_Type_Part2 string,
    User_Agent_Main string, Service_Type int, Sub_Service_Type int,
    De_PDP_Acc
    int, De_PDP_Acc_Direction string, Detach_Acc int, Detach_Acc_Direction
    string, Abnormal_reason int, Request int, Repeat int, Reject int,
    Reject_Cause int, Accept int, Accept_DelayFirst int, Complete int,
    Complete_DelayFirst int, ACK int, ACK_Delay bigint, Last_ACK_Delay bigint,
    BVCI_OLD int, BVCI_NEW int, LLC_frame_Disc int, OPP_LAC int, OPP_RAC int,
    Result int, Result_Delayfirst bigint, Disconnect int, Abort int,
    Query_Name
    string, Query_Type int, Query_OP int, Resp int, Resp_delayFirst bigint,
    Resp_Cause int, FromX string, ToX string, Subject string, MMS_Cause int,
    ErrorPacketWired int, ErrorPacketWireless int, GPRS_multislot_E int,
    EDGE_multislot_E int, Umts_128TDD_E int, Umts_FDD_E int,
    GPRS_multislot_1800
    int, EDGE_multislot_1800 int, Umts_128TDD_1800 int, Umts_FDD_1800 int, Reset
    int, Reset_Direction int, LLC_Frame_Count int, Abort_Reason_User int,
    Abort_Reason_Provider int, Accept_Repeat int, Abnormal_Type int, De_Cause
    int, RAU_Type int, RAU_Attr int, Event_Direction int, Prefix_IMEI string,
    IsReassemble int, IsPrivate_Service int, User_IP_LEN_UL bigint,
    User_IP_LEN_DL bigint, Bssgp_Status_Cause int, Weight_Rate_UL bigint,
    Weight_Rate_DL bigint, Weight_Rate_Total bigint, Brand string, Type string,
    ICMP_Type int, ICMP_Code int, Gob_Src_IP int, Gob_Dst_IP int, Gob_IP_LEN
    int, Gob_Src_PORT int, Gob_Dst_PORT int, Gob_UDP_LEN int) row format
    delimited fields terminated by '\t';


    The INFO logs dumped by impalad when i issue the SELECT query are:
    --------------------------------------------------------------------------------------------------------------------------
    ......
    13/03/10 22:56:21 INFO metastore.HiveMetaStore: 0: get_all_databases
    13/03/10 22:56:21 INFO HiveMetaStore.audit: ugi=root
    ip=unknown-ip-addr
    cmd=get_all_databases
    13/03/11 10:02:19 INFO service.Frontend: analyze query select count(*) from
    mpos_gb.gb_xdr
    13/03/11 10:02:19 INFO metastore.HiveMetaStore: 1: get_all_tables:
    db=mpos_gb
    13/03/11 10:02:19 INFO HiveMetaStore.audit: ugi=root
    ip=unknown-ip-addr
    cmd=get_all_tables: db=mpos_gb
    13/03/11 10:02:19 INFO metastore.HiveMetaStore: 1: Opening raw store with
    implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
    13/03/11 10:02:19 INFO metastore.ObjectStore: ObjectStore, initialize called
    13/03/11 10:02:19 INFO metastore.ObjectStore: Initialized ObjectStore
    13/03/11 10:02:19 INFO DataNucleus.Datastore: The class
    "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as
    "embedded-only" so d oes not have its own datastore table.
    13/03/11 10:02:19 INFO DataNucleus.Persistence: Managing Persistence of
    Class : org.apache.hadoop.hive.metastore.model.MColumnDescriptor [Table
    : `CDS`, InheritanceStrategy : new-table]
    13/03/11 10:02:19 INFO DataNucleus.Persistence: Managing Persistence of
    Class : org.apache.hadoop.hive.metastore.model.MSerDeInfo [Table : `SER
    DES`, InheritanceStrategy : new-table]
    13/03/11 10:02:19 INFO DataNucleus.Persistence: Managing Persistence of
    Class : org.apache.hadoop.hive.metastore.model.MStringList [Table : `SK
    EWED_STRING_LIST`, InheritanceStrategy : new-table]
    13/03/11 10:02:19 INFO DataNucleus.Datastore: The class
    "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as
    "embedded-only"
    so does no t have its own datastore table.
    13/03/11 10:02:19 INFO DataNucleus.Persistence: Managing Persistence of
    Class : org.apache.hadoop.hive.metastore.model.MStorageDescriptor [Tabl
    e : `SDS`, InheritanceStrategy : new-table]
    13/03/11 10:02:19 INFO DataNucleus.Persistence: Managing Persistence of
    Class : org.apache.hadoop.hive.metastore.model.MTable [Table : `TBLS`,
    InheritanceStrategy : new-table]
    13/03/11 10:02:19 INFO DataNucleus.Persistence: Managing Persistence of
    Field : org.apache.hadoop.hive.metastore.model.MSerDeInfo.parameters [T
    able : `SERDE_PARAMS`]
    13/03/11 10:02:19 INFO DataNucleus.Persistence: Managing Persistence of
    Field : org.apache.hadoop.hive.metastore.model.MStringList.internalList
    [Table : `SKEWED_STRING_LIST_VALUES`]
    13/03/11 10:02:19 INFO DataNucleus.Persistence: Managing Persistence of
    Field : org.apache.hadoop.hive.metastore.model.MTable.parameters [Table
    : `TABLE_PARAMS`]
    13/03/11 10:02:19 INFO DataNucleus.Persistence: Managing Persistence of
    Field : org.apache.hadoop.hive.metastore.model.MTable.partitionKeys [Ta
    ble : `PARTITION_KEYS`]
    13/03/11 10:02:19 INFO DataNucleus.Persistence: Managing Persistence of
    Field : org.apache.hadoop.hive.metastore.model.MStorageDescriptor.bucke
    tCols [Table : `BUCKETING_COLS`]
    13/03/11 10:02:19 INFO DataNucleus.Persistence: Managing Persistence of
    Field : org.apache.hadoop.hive.metastore.model.MStorageDescriptor.param
    eters [Table : `SD_PARAMS`]
    13/03/11 10:02:19 INFO DataNucleus.Persistence: Managing Persistence of
    Field : org.apache.hadoop.hive.metastore.model.MStorageDescriptor.skewe
    dColNames [Table : `SKEWED_COL_NAMES`]
    13/03/11 10:02:19 INFO DataNucleus.Persistence: Managing Persistence of
    Field : org.apache.hadoop.hive.metastore.model.MStorageDescriptor.skewe
    dColValueLocationMaps [Table : `SKEWED_COL_VALUE_LOC_MAP`]
    13/03/11 10:02:19 INFO DataNucleus.Persistence: Managing Persistence of
    Field : org.apache.hadoop.hive.metastore.model.MStorageDescriptor.skewe
    dColValues [Table : `SKEWED_VALUES`]
    13/03/11 10:02:19 INFO DataNucleus.Persistence: Managing Persistence of
    Field : org.apache.hadoop.hive.metastore.model.MStorageDescriptor.sortC
    ols [Table : `SORT_COLS`]
    13/03/11 10:02:19 INFO DataNucleus.Persistence: Managing Persistence of
    Field : org.apache.hadoop.hive.metastore.model.MColumnDescriptor.cols [
    Table : `COLUMNS_V2`]
    13/03/11 10:02:20 INFO metastore.HiveMetaStore: 1: get_table : db=mpos_gb
    tbl=gb_xdr
    13/03/11 10:02:20 INFO HiveMetaStore.audit: ugi=root
    ip=unknown-ip-addr
    cmd=get_table : db=mpos_gb tbl=gb_xdr
    13/03/11 10:02:20 INFO metastore.HiveMetaStore: 1: Opening raw store with
    implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
    13/03/11 10:02:20 INFO metastore.ObjectStore: ObjectStore, initialize called
    13/03/11 10:02:20 INFO metastore.ObjectStore: Initialized ObjectStore
    13/03/11 10:02:20 INFO DataNucleus.MetaData: Listener found
    initialisation
    for persistable class org.apache.hadoop.hive.metastore.model.MColumn
    Descriptor
    13/03/11 10:02:20 INFO DataNucleus.MetaData: Listener found
    initialisation
    for persistable class org.apache.hadoop.hive.metastore.model.MSerDeI
    nfo
    13/03/11 10:02:20 INFO DataNucleus.MetaData: Listener found
    initialisation
    for persistable class org.apache.hadoop.hive.metastore.model.MStorag
    eDescriptor
    13/03/11 10:02:20 INFO DataNucleus.MetaData: Listener found
    initialisation
    for persistable class org.apache.hadoop.hive.metastore.model.MTable
    13/03/11 10:02:20 INFO DataNucleus.MetaData: Listener found
    initialisation
    for persistable class org.apache.hadoop.hive.metastore.model.MFieldS
    chema
    13/03/11 10:02:20 INFO metastore.HiveMetaStore: 1: get_config_value:
    name=hive.exec.default.partition.namedefaultValue=__HIVE_DEFAULT_PARTITIO
    N__
    13/03/11 10:02:20 INFO HiveMetaStore.audit: ugi=root
    ip=unknown-ip-addr
    cmd=get_config_value: name=hive.exec.default.partition.name def
    aultValue=__HIVE_DEFAULT_PARTITION__
    13/03/11 10:02:20 INFO metastore.HiveMetaStore: 1: get_fields:
    db=mpos_gbtbl=gb_xdr
    13/03/11 10:02:20 INFO HiveMetaStore.audit: ugi=root
    ip=unknown-ip-addr
    cmd=get_fields: db=mpos_gbtbl=gb_xdr
    13/03/11 10:02:20 INFO metastore.HiveMetaStore: 1: get_table : db=mpos_gb
    tbl=gb_xdr
    13/03/11 10:02:20 INFO HiveMetaStore.audit: ugi=root
    ip=unknown-ip-addr
    cmd=get_table : db=mpos_gb tbl=gb_xdr
    13/03/11 10:02:21 INFO metastore.HiveMetaStore: 1: get_partitions :
    db=mpos_gb tbl=gb_xdr
    13/03/11 10:02:21 INFO HiveMetaStore.audit: ugi=root
    ip=unknown-ip-addr
    cmd=get_partitions : db=mpos_gb tbl=gb_xdr
    13/03/11 10:02:21 INFO DataNucleus.Datastore: The class
    "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as
    "embedded-only" so d oes not have its own datastore table.
    13/03/11 10:02:21 INFO DataNucleus.Datastore: The class
    "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as
    "embedded-only"
    so does no t have its own datastore table.
    13/03/11 10:02:21 INFO DataNucleus.Persistence: Managing Persistence of
    Class : org.apache.hadoop.hive.metastore.model.MPartition [Table : `PAR
    TITIONS`, InheritanceStrategy : new-table]
    13/03/11 10:02:21 INFO DataNucleus.Persistence: Managing Persistence of
    Field : org.apache.hadoop.hive.metastore.model.MPartition.parameters [T
    able : `PARTITION_PARAMS`]
    13/03/11 10:02:21 INFO DataNucleus.Persistence: Managing Persistence of
    Field : org.apache.hadoop.hive.metastore.model.MPartition.values [Table
    : `PARTITION_KEY_VALS`]
    13/03/11 10:02:21 INFO analysis.AggregateInfo: agg info:
    AggregateInfo{grouping_exprs=, aggregate_exprs=(AggregateExpr{op=COUNT,
    isStar=true, isDistinct=false, }), agg_tuple=TupleDescriptor{id=1, tbl=
    null, byte_size=0, is_materialized=true, slots=[SlotDescriptor{id=0,
    col=null, type=BIGINT, materialized=false, byteSize=0, byteOffset=-1, null
    IndicatorByte=0, nullIndicatorBit=0, slotIdx=0}]},
    smap=smap(AggregateExpr{op=COUNT, isStar=true, isDistinct=false,
    }:SlotRef{tblName=null, col =null, id=0})}
    mergeAggInfo:
    AggregateInfo{grouping_exprs=, aggregate_exprs=(AggregateExpr{op=SUM,
    isStar=false, isDistinct=false, (SlotRef{tblName=null, col=null, id=0})})
    , agg_tuple=TupleDescriptor{id=1, tbl=null, byte_size=0,
    is_materialized=true, slots=[SlotDescriptor{id=0, col=null, type=BIGINT,
    materialized= false, byteSize=0, byteOffset=-1,
    nullIndicatorByte=0, nullIndicatorBit=0, slotIdx=0}]},
    smap=smap(AggregateExpr{op=COUNT, isStar=true, isDisti
    nct=false, }:SlotRef{tblName=null, col=null, id=0})}
    13/03/11 10:02:21 INFO service.Frontend: create plan
    13/03/11 10:02:21 INFO planner.Planner: create single-node plan
    13/03/11 10:02:21 INFO planner.Planner: create plan fragments
    13/03/11 10:02:22 INFO planner.Planner: finalize plan fragments
    13/03/11 10:02:22 INFO planner.HdfsScanNode: collecting partitions for table
    gb_xdr
    13/03/11 10:02:22 INFO service.Frontend: get scan range locations
    13/03/11 10:02:26 INFO catalog.HdfsTable: loaded partiton
    PartitionBlockMetadata{#blocks=864, #filenames=118, totalStringLen=11266}
    13/03/11 10:02:27 INFO hdfs.BlockStorageLocationUtil: Failed to connect to
    datanode 192.168.1.126:50020
    13/03/11 10:02:27 INFO catalog.HdfsTable: loaded disk ids for
    PartitionBlockMetadata{#blocks=864, #filenames=118, totalStringLen=11266}
    13/03/11 10:02:27 INFO catalog.HdfsTable: block metadata cache:
    CacheStats{hitCount=0, missCount=1, loadSuccessCount=1,
    loadExceptionCount=0, t otalLoadTime=5080514950,
    evictionCount=0}
    13/03/11 10:02:27 INFO service.Frontend: create result set metadata
    13/03/11 10:02:27 INFO service.JniFrontend: Plan Fragment 0
    UNPARTITIONED
    AGGREGATE
    OUTPUT: SUM(<slot 0>)
    GROUP BY:
    TUPLE IDS: 1
    EXCHANGE (2)
    TUPLE IDS: 1

    Plan Fragment 1
    RANDOM
    STREAM DATA SINK
    EXCHANGE ID: 2
    UNPARTITIONED

    AGGREGATE
    OUTPUT: COUNT(*)
    GROUP BY:
    TUPLE IDS: 1
    SCAN HDFS table=mpos_gb.gb_xdr #partitions=1 size=50.50GB (0)
    TUPLE IDS: 0

    13/03/12 10:13:43 INFO service.Frontend: analyze query select * from
    mpos_gb.gb_xdr where Start_Date_Time > '2012-09-05 13:00:00.000000' and
    Start_Date_Time < '2012-09-05 15:30:10.000000' limit 10
    13/03/12 10:13:43 INFO service.Frontend: create plan
    13/03/12 10:13:43 INFO planner.Planner: create single-node plan
    13/03/12 10:13:43 INFO planner.Planner: create plan fragments
    13/03/12 10:13:43 INFO planner.Planner: finalize plan fragments
    13/03/12 10:13:43 INFO planner.HdfsScanNode: collecting partitions for table
    gb_xdr
    13/03/12 10:13:43 INFO service.Frontend: get scan range locations
    13/03/12 10:13:43 INFO catalog.HdfsTable: loaded partiton
    PartitionBlockMetadata{#blocks=864, #filenames=118, totalStringLen=11210}
    13/03/12 10:13:43 INFO catalog.HdfsTable: loaded disk ids for
    PartitionBlockMetadata{#blocks=864, #filenames=118, totalStringLen=11210}
    13/03/12 10:13:43 INFO catalog.HdfsTable: block metadata cache:
    CacheStats{hitCount=0, missCount=2, loadSuccessCount=2,
    loadExceptionCount=0, totalLoadTime=5617207689, evictionCount=1}
    13/03/12 10:13:43 INFO service.Frontend: create result set metadata
    13/03/12 10:13:43 INFO service.JniFrontend: Plan Fragment 0
    UNPARTITIONED
    EXCHANGE (1)
    LIMIT: 10
    TUPLE IDS: 0

    Plan Fragment 1
    RANDOM
    STREAM DATA SINK
    EXCHANGE ID: 1
    UNPARTITIONED

    SCAN HDFS table=mpos_gb.gb_xdr #partitions=1 size=50.50GB (0)
    PREDICATES: Start_Date_Time > '2012-09-05 13:00:00.000000',
    Start_Date_Time < '2012-09-05 15:30:10.000000'
    LIMIT: 10
    TUPLE IDS: 0
    ---------------------------------------------------------------------------------------------------------------------------------
    And output from impala-shell:
    [cloudera-host2:21000] > select count(*) from mpos_gb.gb_xdr;
    Query: select count(*) from mpos_gb.gb_xdr
    Query finished, fetching results ...
    152193112
    Returned 1 row(s) in 468.47s
    [cloudera-host2:21000] >

    [cloudera-host2:21000] > select * from mpos_gb.gb_xdr where
    Start_Date_Time
    '2012-09-05 13:00:00.000000' and Start_Date_Time < '2012-09-05
    15:30:10.000000' limit 10;
    Query: select * from mpos_gb.gb_xdr where Start_Date_Time > '2012-09-05
    13:00:00.000000' and Start_Date_Time < '2012-09-05 15:30:10.000000' limit 10
    Query finished, fetching results ...

    ......
    101509376 20906357 2012-09-05 13:00:46.841000
    2012-09-05
    13:00:47.202000 2796454154 SGSN206 52788746 JMSM8B2 9858
    218821 3 460020502304239 2 20 NULL
    NULL 2 1 361 1142202122 2885681162 48154 80
    00 NULL NULL 0 1 NULL NULL
    NULL NULL NULL NULL NULL 1 361 NULL NULL NULL
    NULLNULL NULL 1 361 NULL NULL NULL NULL
    1 6 NULL NULL NULL NULL 32
    32 01 32 32 0 1 NULL NULL 1 NULL
    NULL NULL 0 NULL NULL NULL 2 NULL 0
    NULLNULL NULL 0 0 0 NULL NULL
    NULL NULL NULL NULL NULL NULL
    ......

    Returned 10 row(s) in 45.37s
    [cloudera-host2:21000] >

    PS: When I had finished my statement, press the enter key, the output "Query
    finished, fetching results ..." appears immediately, But the results
    returned very slow.
    ----------------------------------------------------

    Thanks!

    在 2013年3月12日星期二UTC上午1时04分18秒,Marcel Kornacker写道:
    On Mon, Mar 11, 2013 at 1:13 AM, zjp... wrote:
    Hi, All!
    I have installed impala(version of impala is 0.6, CDH is 4.2) without
    Cloudera Manager followed instructions on
    https://ccp.cloudera.com/display/IMPALA10BETADOC/Installing+and+Using+Cloudera+Impala
    .
    And my cluster consists of three data nodes and one name node, And
    each
    one
    machine hardware is: RAM: 8GB, CPU: 2 cores, 4 threads, Disk: 1TB.
    OS is CentOS6.3 64bit.

    I have installed impala on eache data node, and run process of
    statestore on
    one data node, Following is how I start statestored and impalad
    daemons:
    GLOG_v=1 nohup statestored -state_store_port=24000 & #note: run
    on
    cloudera-host3
    GLOG_v=1 nohup impalad -state_store_host=cloudera-host3
    -nn=cloudera-host1
    -nn_port=8020 -hostname=cloudera-host2 -ipaddress=192.168.1.nn &
    #note:
    nn
    is part of ip address of datanode.

    I have created a database and a table, And have loaded about 150
    million
    records(size of data is about 52GB, it is about 1.5 hours of data in
    my
    business) into the table with Hive. When i run a query as "select
    count(*)
    from dbname.tbname" in Hive, The query takes approx 600~689 seconds to
    come
    back, When i run the same query in impala-shell, It takes approx
    230~380
    seconds to come back. And i run query as "select * from dbname.tbname
    where
    string_column>'xxxx' and string_column<'yyyy' limit 10", The Hive
    takes
    approx 28~35 seconds, and the impala-shell takes approx 44~50 seconds,
    Are
    there expected? And i known that the same situation in Infobright data
    warehouse application, The queries previous mentioned takes only about
    10
    seconds!

    Who can give me some advises that whether have problems?

    Thanks!
    Could you post the exact statements you use to create and populate the
    table, the SELECT statements, and the query profile from the Impala
    INFO log?

    Marcel


    --
    Ricky Saltzer
    Customer Operations Engineer
    http://www.cloudera.com
  • Alan Choi at Mar 19, 2013 at 5:18 pm
    Hi,

    For the first query, it would of course depends on the where cause
    condition, isn't it?
    For the second query, it might due to buffer cache.

    We have a few known issues in RC file and it will be fixed in the next beta
    release. Please give it a shot when the next one is out.

    For the HBase error, can you share the log with us and table schema, sample
    data?

    Thanks,
    Alan



    On Tue, Mar 19, 2013 at 2:41 AM, zjp... wrote:

    Hi, All! Thanks for your responds!

    The running for query of "select * from tablename where ... limit 10",
    take different times according to the where cause condition. Here is a
    result shown:
    Query:
    select * from mpos_gb.gb_xdr where start_date_time >= '2012-09-05
    12:34:55.908000' and start_date_time <= '2012-09-05 16:54:55.000000' limit
    10
    Taken times:
    30 seconds
    Profile:
    Query (id=10011a282ff34109:b3cc3421d9e18bde):

    - PlanningTime: 70ms
    Query 10011a282ff34109:b3cc3421d9e18bde:(1m24s 0.00%)
    Aggregate Profile:
    Coordinator Fragment:(1m19s 0.00%)
    - RowsProduced: 10
    CodeGen:
    - CodegenTime: 0K clock cycles
    - CompileTime: 110ms
    - LoadTime: 8ms
    - ModuleFileSize: 44.61 KB
    EXCHANGE_NODE (id=1):(1m19s 100.00%)
    - BytesReceived: 9.37 KB
    - ConvertRowBatchTime: 10K clock cycles
    - DeserializeRowBatchTimer: 47K clock cycles
    - MemoryUsed: 0.00
    - RowsReturned: 10
    - RowsReturnedRate: 0
    Averaged Fragment 1:(1m17s 0.00%)
    completion times: min:28s962ms max:29s360ms mean: 29s102ms stddev:182ms
    execution rates: min:587.05 MB/sec max:595.38 MB/sec mean:592.30 MB/sec stddev:3.73 MB/sec
    split sizes: min: 16.81 GB, max: 16.85 GB, avg: 16.83 GB, stddev: 16.61 MB
    - RowsProduced: 3
    CodeGen:
    - CodegenTime: 161ms
    - CompileTime: 2s686ms
    - LoadTime: 15ms
    - ModuleFileSize: 42.09 KB
    DataStreamSender:
    - BytesSent: 3.12 KB
    - DataSinkTime: 13ms
    - SerializeBatchTime: 8ms
    - ThriftTransmitTime: 9ms
    HDFS_SCAN_NODE (id=0):(1m17s 100.00%)
    - BytesRead: 2.18 GB
    - DelimiterParseTime: 31s777ms
    - MaterializeTupleTime: 1s076ms
    - MemoryUsed: 0.00
    - PerDiskReadThroughput: 41.55 MB/sec
    - RowsReturned: 3
    - RowsReturnedRate: 0
    - ScanRangesComplete: 35
    - ScannerThreadsReadTime: 1m14s
    - TotalReadThroughput: 73.28 MB/sec
    Fragment 1:
    Instance 10011a282ff34109:b3cc3421d9e18be0:(1m20s 0.00%)
    Hdfs split stats (**:<# splits>/**): 0:290/18.10B
    - RowsProduced: 0
    CodeGen:
    - CodegenTime: 154ms
    - CompileTime: 2s667ms
    - LoadTime: 13ms
    - ModuleFileSize: 44.61 KB
    DataStreamSender:
    - BytesSent: 0.00
    - DataSinkTime: 0K clock cycles
    - SerializeBatchTime: 0K clock cycles
    - ThriftTransmitTime: 0K clock cycles
    HDFS_SCAN_NODE (id=0):(1m20s 100.00%)
    - BytesRead: 1.87 GB
    - DelimiterParseTime: 18s192ms
    - MaterializeTupleTime: 620ms
    - MemoryUsed: 0.00
    - PerDiskReadThroughput: 24.64 MB/sec
    - RowsReturned: 0
    - RowsReturnedRate: 0
    - ScanRangesComplete: 31
    - ScannerThreadsReadTime: 1m17s
    - TotalReadThroughput: 63.02 MB/sec
    Instance 10011a282ff34109:b3cc3421d9e18be1:(1m10s 0.00%)
    Hdfs split stats (**:<# splits>/**): 0:287/18.05B
    - RowsProduced: 10
    CodeGen:
    - CodegenTime: 179ms
    - CompileTime: 2s783ms
    - LoadTime: 12ms
    - ModuleFileSize: 37.04 KB
    DataStreamSender:
    - BytesSent: 9.37 KB
    - DataSinkTime: 41ms
    - SerializeBatchTime: 25ms
    - ThriftTransmitTime: 27ms
    HDFS_SCAN_NODE (id=0):(1m10s 100.00%)
    File Formats: TEXT/NONE:44
    - BytesRead: 2.76 GB
    - DelimiterParseTime: 59s413ms
    - MaterializeTupleTime: 1s989ms
    - MemoryUsed: 0.00
    - PerDiskReadThroughput: 75.25 MB/sec
    - RowsReturned: 10
    - RowsReturnedRate: 0
    - ScanRangesComplete: 44
    - ScannerThreadsReadTime: 1m6s
    - TotalReadThroughput: 93.45 MB/sec
    Instance 10011a282ff34109:b3cc3421d9e18be2:(1m21s 0.00%)
    Hdfs split stats (**:<# splits>/**): 0:287/18.07B
    - RowsProduced: 0
    CodeGen:
    - CodegenTime: 148ms
    - CompileTime: 2s608ms
    - LoadTime: 21ms
    - ModuleFileSize: 44.61 KB
    DataStreamSender:
    - BytesSent: 0.00
    - DataSinkTime: 0K clock cycles
    - SerializeBatchTime: 0K clock cycles
    - ThriftTransmitTime: 0K clock cycles
    HDFS_SCAN_NODE (id=0):(1m21s 100.00%)
    - BytesRead: 1.90 GB
    - DelimiterParseTime: 17s727ms
    - MaterializeTupleTime: 618ms
    - MemoryUsed: 0.00
    - PerDiskReadThroughput: 24.75 MB/sec
    - RowsReturned: 0
    - RowsReturnedRate: 0
    - ScanRangesComplete: 30
    - ScannerThreadsReadTime: 1m18s
    - TotalReadThroughput: 63.39 MB/sec************


    -----------------------------------------------------------------------------------------------------------------------
    The running for query of "select count(*) from tablename", take different
    times each time. Here is a result shown:
    Query:
    select count(*) from mpos_gb.gb_xdr
    Taken times:
    268 seconds
    Profile:

    Query (id=ed40fdac0efa42ec:86f84f79a1a16f15):
    - PlanningTime: 4s666ms
    Query ed40fdac0efa42ec:86f84f79a1a16f15:(376ms 0.00%)
    Aggregate Profile:
    Coordinator Fragment:(12m19s 0.00%)
    - RowsProduced: 1
    CodeGen:
    - CodegenTime: 856K clock cycles
    - CompileTime: 111ms
    - LoadTime: 9ms
    - ModuleFileSize: 44.61 KB
    AGGREGATION_NODE (id=3):(12m19s 0.00%)
    - BuildBuckets: 1.02K
    - BuildTime: 15K clock cycles
    - GetResultsTime: 14K clock cycles
    - MemoryUsed: 32.01 KB
    - RowsReturned: 1
    - RowsReturnedRate: 0
    EXCHANGE_NODE (id=2):(12m19s 100.00%)
    - BytesReceived: 48.00 B
    - ConvertRowBatchTime: 2ms
    - DeserializeRowBatchTimer: 34K clock cycles
    - MemoryUsed: 0.00
    - RowsReturned: 3
    - RowsReturnedRate: 0
    Averaged Fragment 1:(10m38s 0.00%)
    completion times: min:3m max:4m26s mean: 3m57s stddev:40s257ms
    execution rates: min:64.79 MB/sec max:95.39 MB/sec mean:75.04 MB/sec stddev:14.39 MB/sec
    split sizes: min: 16.81 GB, max: 16.85 GB, avg: 16.83 GB, stddev: 16.61 MB
    - RowsProduced: 1
    CodeGen:
    - CodegenTime: 2ms
    - CompileTime: 128ms
    - LoadTime: 14ms
    - ModuleFileSize: 42.09 KB
    DataStreamSender:
    - BytesSent: 16.00 B
    - DataSinkTime: 41ms
    - SerializeBatchTime: 32ms
    - ThriftTransmitTime: 450ms
    AGGREGATION_NODE (id=1):(10m38s 0.11%)
    - BuildBuckets: 1.02K
    - BuildTime: 551ms
    - GetResultsTime: 12K clock cycles
    - MemoryUsed: 32.01 KB
    - RowsReturned: 1
    - RowsReturnedRate: 0
    HDFS_SCAN_NODE (id=0):(10m38s 99.89%)
    - BytesRead: 16.83 GB
    - DelimiterParseTime: 2m8s
    - MaterializeTupleTime: 22ms
    - MemoryUsed: 0.00
    - PerDiskReadThroughput: 40.15 MB/sec
    - RowsReturned: 50.73M
    - RowsReturnedRate: 114.36 K/sec
    - ScanRangesComplete: 288
    - ScannerThreadsReadTime: 10m16s
    - TotalReadThroughput: 75.10 MB/sec
    Fragment 1:
    Instance ed40fdac0efa42ec:86f84f79a1a16f17:(12m19s 0.00%)
    Hdfs split stats (**:<# splits>/**): 0:290/18.10B
    - RowsProduced: 1
    CodeGen:
    - CodegenTime: 1ms
    - CompileTime: 117ms
    - LoadTime: 10ms
    - ModuleFileSize: 44.61 KB
    DataStreamSender:
    - BytesSent: 16.00 B
    - DataSinkTime: 15ms
    - SerializeBatchTime: 13ms
    - ThriftTransmitTime: 15ms
    AGGREGATION_NODE (id=1):(12m19s 0.11%)
    - BuildBuckets: 1.02K
    - BuildTime: 653ms
    - GetResultsTime: 11K clock cycles
    - MemoryUsed: 32.01 KB
    - RowsReturned: 1
    - RowsReturnedRate: 0
    HDFS_SCAN_NODE (id=0):(12m19s 99.89%)
    File Formats: TEXT/NONE:290
    - BytesRead: 16.85 GB
    - DelimiterParseTime: 1m5s
    - MaterializeTupleTime: 24ms
    - MemoryUsed: 0.00
    - PerDiskReadThroughput: 24.25 MB/sec
    - RowsReturned: 50.80M
    - RowsReturnedRate: 68.73 K/sec
    - ScanRangesComplete: 290
    - ScannerThreadsReadTime: 11m51s
    - TotalReadThroughput: 64.73 MB/sec
    Instance ed40fdac0efa42ec:86f84f79a1a16f18:(7m20s 0.00%)
    Hdfs split stats (**:<# splits>/**): 0:287/18.05B
    - RowsProduced: 1
    CodeGen:
    - CodegenTime: 1ms
    - CompileTime: 142ms
    - LoadTime: 12ms
    - ModuleFileSize: 37.04 KB
    DataStreamSender:
    - BytesSent: 16.00 B
    - DataSinkTime: 101ms
    - SerializeBatchTime: 80ms
    - ThriftTransmitTime: 353ms
    AGGREGATION_NODE (id=1):(7m20s 0.11%)
    - BuildBuckets: 1.02K
    - BuildTime: 347ms
    - GetResultsTime: 8K clock cycles
    - MemoryUsed: 32.01 KB
    - RowsReturned: 1
    - RowsReturnedRate: 0
    HDFS_SCAN_NODE (id=0):(7m19s 99.89%)
    File Formats: TEXT/NONE:287
    - BytesRead: 16.81 GB
    - DelimiterParseTime: 4m15s
    - MaterializeTupleTime: 19ms
    - MemoryUsed: 0.00
    - PerDiskReadThroughput: 72.01 MB/sec
    - RowsReturned: 50.68M
    - RowsReturnedRate: 205.39 K/sec
    - ScanRangesComplete: 287
    - ScannerThreadsReadTime: 7m6s
    - TotalReadThroughput: 95.56 MB/sec
    Instance ed40fdac0efa42ec:86f84f79a1a16f19:(12m16s 0.00%)
    Hdfs split stats (**:<# splits>/**): 0:287/18.07B
    - RowsProduced: 1
    CodeGen:
    - CodegenTime: 4ms
    - CompileTime: 126ms
    - LoadTime: 20ms
    - ModuleFileSize: 44.61 KB
    DataStreamSender:
    - BytesSent: 16.00 B
    - DataSinkTime: 6ms
    - SerializeBatchTime: 4ms
    - ThriftTransmitTime: 981ms
    AGGREGATION_NODE (id=1):(12m16s 0.11%)
    - BuildBuckets: 1.02K
    - BuildTime: 653ms
    - GetResultsTime: 17K clock cycles
    - MemoryUsed: 32.01 KB
    - RowsReturned: 1
    - RowsReturnedRate: 0
    HDFS_SCAN_NODE (id=0):(12m15s 99.89%)
    File Formats: TEXT/NONE:287
    - BytesRead: 16.83 GB
    - DelimiterParseTime: 1m4s
    - MaterializeTupleTime: 24ms
    - MemoryUsed: 0.00
    - PerDiskReadThroughput: 24.19 MB/sec
    - RowsReturned: 50.72M
    - RowsReturnedRate: 68.97 K/sec
    - ScanRangesComplete: 287
    - ScannerThreadsReadTime: 11m52s
    - TotalReadThroughput: 65.01 MB/sec************

    **------------------------------------------------------------**

    **
    **

    **By the way, When i try to use RCFILE format for data stored, the impalad crashed as long as i run query in impala-shell, So i want to use impala with HBase to improve efficiency, But i can't complete my query in impala-shell also, When i run a query on table data stored in HBase, It gave me message "Query aborted, unable to fetch data**" (I could run the same query in Hive with right results returned back), Why? Need your helps too! Thanks!
  • Zjp... at Mar 20, 2013 at 4:46 am
    Hi, Alan, Thank you!
    For the first query, when i run the query at first:
    select * from mpos_gb.gb_xdr where start_date_time >= '2012-09-05
    11:34:55.908000' and start_date_time <= '2012-09-05 16:54:55.000000' limit
    10
    it takes about 5 seconds, And when run query:
    select * from mpos_gb.gb_xdr where start_date_time >= '2012-09-05
    12:34:55.908000' and start_date_time <= '2012-09-05 16:54:55.000000' limit
    10
    it takes about 30 seconds. The different only is the first condition.

    For the second query, It take about 4 minutes at least, It is too long for
    me, Because it is only about 1.5 hours data, And in my business system,
    volume is about ~250millons/h(~250MB/s), And In the future will also
    increase.

    For the second query, when i run it, i found all of impalad processes
    running on data nodes get CPU usage and IO usage increased, So, I think the
    IO is the performance bottleneck, because of TEXTFILE format
    storage data row by row, And the types of query of "select count (*) from
    tbl","select max(col) from tbl" will be scanning the entire table. I said
    right?
    .
    For the HBase error, Here are the log, table schema, sample data:
    Impala-shell output:
    # impala-shell
    Welcome to the Impala shell. Press TAB twice to see a list of available
    commands.

    Copyright (c) 2012 Cloudera, Inc. All rights reserved.

    (Build version: Impala v0.6 (720f93c) built on Sat Feb 23 18:52:43 PST 2013)
    [Not connected] > connect cloudera-host2
    Connected to cloudera-host2:21000
    [cloudera-host2:21000] > select count(*) from gb_xdr_hb;
    Query: select count(*) from gb_xdr_hb
    Query aborted, unable to fetch data
    [cloudera-host2:21000] >
    ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    Impalad output:
    13/03/20 12:13:55 INFO service.Frontend: analyze query select count(*) from
    gb_xdr_hb
    13/03/20 12:13:55 INFO metastore.HiveMetaStore: 1: get_all_tables:
    db=default
    13/03/20 12:13:55 INFO HiveMetaStore.audit: ugi=root ip=unknown-ip-addr
    cmd=get_all_tables: db=default
    13/03/20 12:13:55 INFO metastore.HiveMetaStore: 1: Opening raw store with
    implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
    13/03/20 12:13:55 INFO metastore.ObjectStore: ObjectStore, initialize called
    13/03/20 12:13:55 INFO metastore.ObjectStore: Initialized ObjectStore
    13/03/20 12:13:55 INFO DataNucleus.Datastore: The class
    "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as
    "embedded-only" so does not have its own datastore table.
    13/03/20 12:13:55 INFO DataNucleus.Persistence: Managing Persistence of
    Class : org.apache.hadoop.hive.metastore.model.MColumnDescriptor [Table :
    `CDS`, InheritanceStrategy : new-table]
    13/03/20 12:13:55 INFO DataNucleus.Persistence: Managing Persistence of
    Class : org.apache.hadoop.hive.metastore.model.MSerDeInfo [Table :
    `SERDES`, InheritanceStrategy : new-table]
    13/03/20 12:13:55 INFO DataNucleus.Persistence: Managing Persistence of
    Class : org.apache.hadoop.hive.metastore.model.MStringList [Table :
    `SKEWED_STRING_LIST`, InheritanceStrategy : new-table]
    13/03/20 12:13:55 INFO DataNucleus.Datastore: The class
    "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as
    "embedded-only" so does not have its own datastore table.
    13/03/20 12:13:55 INFO DataNucleus.Persistence: Managing Persistence of
    Class : org.apache.hadoop.hive.metastore.model.MStorageDescriptor [Table :
    `SDS`, InheritanceStrategy : new-table]
    13/03/20 12:13:55 INFO DataNucleus.Persistence: Managing Persistence of
    Class : org.apache.hadoop.hive.metastore.model.MTable [Table : `TBLS`,
    InheritanceStrategy : new-table]
    13/03/20 12:13:55 INFO DataNucleus.Persistence: Managing Persistence of
    Field : org.apache.hadoop.hive.metastore.model.MSerDeInfo.parameters [Table
    : `SERDE_PARAMS`]
    13/03/20 12:13:56 INFO DataNucleus.Persistence: Managing Persistence of
    Field : org.apache.hadoop.hive.metastore.model.MStringList.internalList
    [Table : `SKEWED_STRING_LIST_VALUES`]
    13/03/20 12:13:56 INFO DataNucleus.Persistence: Managing Persistence of
    Field : org.apache.hadoop.hive.metastore.model.MTable.parameters [Table :
    `TABLE_PARAMS`]
    13/03/20 12:13:56 INFO DataNucleus.Persistence: Managing Persistence of
    Field : org.apache.hadoop.hive.metastore.model.MTable.partitionKeys [Table
    : `PARTITION_KEYS`]
    13/03/20 12:13:56 INFO DataNucleus.Persistence: Managing Persistence of
    Field :
    org.apache.hadoop.hive.metastore.model.MStorageDescriptor.bucketCols [Table
    : `BUCKETING_COLS`]
    13/03/20 12:13:56 INFO DataNucleus.Persistence: Managing Persistence of
    Field :
    org.apache.hadoop.hive.metastore.model.MStorageDescriptor.parameters [Table
    : `SD_PARAMS`]
    13/03/20 12:13:56 INFO DataNucleus.Persistence: Managing Persistence of
    Field :
    org.apache.hadoop.hive.metastore.model.MStorageDescriptor.skewedColNames
    [Table : `SKEWED_COL_NAMES`]
    13/03/20 12:13:56 INFO DataNucleus.Persistence: Managing Persistence of
    Field :
    org.apache.hadoop.hive.metastore.model.MStorageDescriptor.skewedColValueLocationMaps
    [Table : `SKEWED_COL_VALUE_LOC_MAP`]
    13/03/20 12:13:56 INFO DataNucleus.Persistence: Managing Persistence of
    Field :
    org.apache.hadoop.hive.metastore.model.MStorageDescriptor.skewedColValues
    [Table : `SKEWED_VALUES`]
    13/03/20 12:13:56 INFO DataNucleus.Persistence: Managing Persistence of
    Field : org.apache.hadoop.hive.metastore.model.MStorageDescriptor.sortCols
    [Table : `SORT_COLS`]
    13/03/20 12:13:56 INFO DataNucleus.Persistence: Managing Persistence of
    Field : org.apache.hadoop.hive.metastore.model.MColumnDescriptor.cols
    [Table : `COLUMNS_V2`]
    13/03/20 12:13:56 INFO metastore.HiveMetaStore: 1: get_table : db=default
    tbl=gb_xdr_hb
    13/03/20 12:13:56 INFO HiveMetaStore.audit: ugi=root ip=unknown-ip-addr
    cmd=get_table : db=default tbl=gb_xdr_hb
    13/03/20 12:13:56 INFO metastore.HiveMetaStore: 1: Opening raw store with
    implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
    13/03/20 12:13:56 INFO metastore.ObjectStore: ObjectStore, initialize called
    13/03/20 12:13:56 INFO metastore.ObjectStore: Initialized ObjectStore
    13/03/20 12:13:56 INFO DataNucleus.MetaData: Listener found initialisation
    for persistable class
    org.apache.hadoop.hive.metastore.model.MColumnDescriptor
    13/03/20 12:13:56 INFO DataNucleus.MetaData: Listener found initialisation
    for persistable class org.apache.hadoop.hive.metastore.model.MSerDeInfo
    13/03/20 12:13:56 INFO DataNucleus.MetaData: Listener found initialisation
    for persistable class
    org.apache.hadoop.hive.metastore.model.MStorageDescriptor
    13/03/20 12:13:56 INFO DataNucleus.MetaData: Listener found initialisation
    for persistable class org.apache.hadoop.hive.metastore.model.MTable
    13/03/20 12:13:56 INFO DataNucleus.MetaData: Listener found initialisation
    for persistable class org.apache.hadoop.hive.metastore.model.MFieldSchema
    13/03/20 12:13:56 INFO analysis.AggregateInfo: agg info:
    AggregateInfo{grouping_exprs=, aggregate_exprs=(AggregateExpr{op=COUNT,
    isStar=true, isDistinct=false, }), agg_tuple=TupleDescriptor{id=1,
    tbl=null, byte_size=0, is_materialized=true, slots=[SlotDescriptor{id=0,
    col=null, type=BIGINT, materialized=false, byteSize=0, byteOffset=-1,
    nullIndicatorByte=0, nullIndicatorBit=0, slotIdx=0}]},
    smap=smap(AggregateExpr{op=COUNT, isStar=true, isDistinct=false,
    }:SlotRef{tblName=null, col=null, id=0})}
    mergeAggInfo:
    AggregateInfo{grouping_exprs=, aggregate_exprs=(AggregateExpr{op=SUM,
    isStar=false, isDistinct=false, (SlotRef{tblName=null, col=null, id=0})}),
    agg_tuple=TupleDescriptor{id=1, tbl=null, byte_size=0,
    is_materialized=true, slots=[SlotDescriptor{id=0, col=null, type=BIGINT,
    materialized=false, byteSize=0, byteOffset=-1, nullIndicatorByte=0,
    nullIndicatorBit=0, slotIdx=0}]}, smap=smap(AggregateExpr{op=COUNT,
    isStar=true, isDistinct=false, }:SlotRef{tblName=null, col=null, id=0})}
    13/03/20 12:13:56 INFO service.Frontend: create plan
    13/03/20 12:13:56 INFO planner.Planner: create single-node plan
    13/03/20 12:13:56 INFO planner.Planner: create plan fragments
    13/03/20 12:13:56 INFO planner.Planner: finalize plan fragments
    13/03/20 12:13:56 INFO service.Frontend: get scan range locations
    13/03/20 12:13:56 INFO service.Frontend: create result set metadata
    13/03/20 12:13:56 INFO service.JniFrontend: Plan Fragment 0
    UNPARTITIONED
    AGGREGATE
    OUTPUT: SUM(<slot 0>)
    GROUP BY:
    TUPLE IDS: 1
    EXCHANGE (2)
    TUPLE IDS: 1

    Plan Fragment 1
    RANDOM
    STREAM DATA SINK
    EXCHANGE ID: 2
    UNPARTITIONED

    AGGREGATE
    OUTPUT: COUNT(*)
    GROUP BY:
    TUPLE IDS: 1
    SCAN HBASE table=gb_xdr_hb (0)
    TUPLE IDS: 0

    E0320 12:13:57.587276 17966 impala-server.cc:1349] unknown query id:
    51899b8f7ceb485b:83f451a2e8daf1f9
    E0320 12:13:58.071900 17958 impala-server.cc:1349] unknown query id:
    51899b8f7ceb485b:83f451a2e8daf1f9
    ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    Query profile from WEB interface:

    Query (id=51899b8f7ceb485b:83f451a2e8daf1f9):
    - PlanningTime: 2s969ms
    Query 51899b8f7ceb485b:83f451a2e8daf1f9:(1s848ms 0.00%)
    Aggregate Profile:
    Coordinator Fragment:(39ms 0.00%)
    - RowsProduced: 0
    CodeGen:
    - CodegenTime: 68ms
    - CompileTime: 223ms
    - LoadTime: 269ms
    - ModuleFileSize: 44.61 KB
    AGGREGATION_NODE (id=3):(107ms 100.00%)
    - BuildBuckets: 0
    - BuildTime: 0K clock cycles
    - GetResultsTime: 0K clock cycles
    - MemoryUsed: 0.00
    - RowsReturned: 0
    - RowsReturnedRate: 0
    EXCHANGE_NODE (id=2):(0K clock cycles 0.00%)
    - BytesReceived: 0.00
    - ConvertRowBatchTime: 0K clock cycles
    - DeserializeRowBatchTimer: 0K clock cycles
    - MemoryUsed: 0.00
    - RowsReturned: 0
    - RowsReturnedRate: 0
    Averaged Fragment 1:(49ms 0.00%)
    completion times: min:14ms max:396ms mean: 251ms stddev:169ms
    execution rates: min:0.00 /sec max:0.00 /sec mean:0.00 /sec stddev:0.00 /sec
    split sizes: min: 0.00 , max: 0.00 , avg: 0.00 , stddev: 0.00
    - RowsProduced: 0
    CodeGen:
    - CodegenTime: 7ms
    - CompileTime: 33ms
    - LoadTime: 9ms
    - ModuleFileSize: 12.35 KB
    DataStreamSender:
    - BytesSent: 0.00
    - DataSinkTime: 0K clock cycles
    - SerializeBatchTime: 0K clock cycles
    - ThriftTransmitTime: 0K clock cycles
    AGGREGATION_NODE (id=1):(56ms 52.51%)
    - BuildBuckets: 0
    - BuildTime: 0K clock cycles
    - GetResultsTime: 0K clock cycles
    - MemoryUsed: 0.00
    - RowsReturned: 0
    - RowsReturnedRate: 0
    HBASE_SCAN_NODE (id=0):(30ms 62.03%)
    - BytesRead: 0.00
    - HBaseTableScanner.ScanSetup: 13K clock cycles
    - MaterializeTupleTime: 0K clock cycles
    - MemoryUsed: 0.00
    - PerDiskReadThroughput: 0.00 /sec
    - RowsReturned: 0
    - RowsReturnedRate: 0
    - ScanRangesComplete: 0
    - ScannerThreadsReadTime: 0K clock cycles
    - TotalReadThroughput: 0.00 /sec
    Fragment 1:
    Instance 51899b8f7ceb485b:83f451a2e8daf1fb:
    Instance 51899b8f7ceb485b:83f451a2e8daf1fc:(147ms 0.00%)
    Hdfs split stats (:<# splits>/):
    - RowsProduced: 0
    CodeGen:
    - CodegenTime: 21ms
    - CompileTime: 99ms
    - LoadTime: 28ms
    - ModuleFileSize: 37.04 KB
    DataStreamSender:
    - BytesSent: 0.00
    - DataSinkTime: 0K clock cycles
    - SerializeBatchTime: 0K clock cycles
    - ThriftTransmitTime: 0K clock cycles
    AGGREGATION_NODE (id=1):(168ms 52.51%)
    - BuildBuckets: 0
    - BuildTime: 0K clock cycles
    - GetResultsTime: 0K clock cycles
    - MemoryUsed: 0.00
    - RowsReturned: 0
    - RowsReturnedRate: 0
    HBASE_SCAN_NODE (id=0):(91ms 62.03%)
    - BytesRead: 0.00
    - HBaseTableScanner.ScanSetup: 39K clock cycles
    - MaterializeTupleTime: 0K clock cycles
    - MemoryUsed: 0.00
    - PerDiskReadThroughput: 0.00 /sec
    - RowsReturned: 0
    - RowsReturnedRate: 0
    - ScanRangesComplete: 0
    - ScannerThreadsReadTime: 0K clock cycles
    - TotalReadThroughput: 0.00 /sec
    Instance 51899b8f7ceb485b:83f451a2e8daf1fd:

    ---------------------------------------------------------------------------------------------------------

    the table schema:

    hive>CREATE TABLE gb_xdr_hb(Online_ID bigint, Session_ID string, Start_Date_Time string, End_Date_Time string, SGSN_IP bigint, SGSN string, BSC_IP bigint, BSC string, LAC int, CI int, RAC int, BVCI int, IMSI bigint, IMEI string, APN string, Protocol string, Event_Type int, IP_LEN_UL bigint, IP_LEN_DL bigint, Count_Packet_UL int, Count_Packet_DL int, Duration int, SOURCE_IP bigint, DEST_IP bigint, SOURCE_PORT int, DEST_PORT int, URI string, URI_Main string, Content_Type_Part1 string, Content_Type_Part2 string, User_Agent_Main string, Service_Type int, Sub_Service_Type int, De_PDP_Acc int, De_PDP_Acc_Direction string, Detach_Acc int, Detach_Acc_Direction string, Abnormal_reason int, Request int, Repeat int, Reject int, Reject_Cause int, Accept int, Accept_DelayFirst int, Complete int, Complete_DelayFirst int, ACK int, ACK_Delay bigint, Last_ACK_Delay bigint, BVCI_OLD int, BVCI_NEW int, LLC_frame_Disc int, OPP_LAC int, OPP_RAC int, Result int, Result_Delayfirst bigint, Disconnect int, Abort int, Query_Name string, Query_Type int, Query_OP int, Resp int, Resp_delayFirst bigint, Resp_Cause int, FromX string, ToX string, Subject string, MMS_Cause int, ErrorPacketWired int, ErrorPacketWireless int, GPRS_multislot_E int, EDGE_multislot_E int, Umts_128TDD_E int, Umts_FDD_E int, GPRS_multislot_1800 int, EDGE_multislot_1800 int, Umts_128TDD_1800 int, Umts_FDD_1800 int, Reset int, Reset_Direction int, LLC_Frame_Count int, Abort_Reason_User int, Abort_Reason_Provider int, Accept_Repeat int, Abnormal_Type int, De_Cause int, RAU_Type int, RAU_Attr int, Event_Direction int, Prefix_IMEI string, IsReassemble int, IsPrivate_Service int, User_IP_LEN_UL bigint, User_IP_LEN_DL bigint, Bssgp_Status_Cause int, Weight_Rate_UL bigint, Weight_Rate_DL bigint, Weight_Rate_Total bigint, Brand string, Type string, ICMP_Type int, ICMP_Code int, Gob_Src_IP int, Gob_Dst_IP int, Gob_IP_LEN int, Gob_Src_PORT int, Gob_Dst_PORT int, Gob_UDP_LEN int) row format delimited fields terminated by '\t'
    STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
    WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,strc:sessid,strc:sdt,strc:edt,bintc:sgsnip,strc:sgsn,bintc:bscip,strc:bsc,intc:lac,intc:ci,intc:rac,intc:bvci,bintc:imsi,strc:imei,strc:apn,strc:protoc,intc:etype,bintc:iplen_ul,bintc:iplen_dl,intc:cntpkt_ul,intc:cntpkt_dl,intc:duration,bintc:srcip,bintc:dstip,intc:srcport,intc:dstport,strc:uri,strc:urimain,strc:contype_p1,strc:contype_p2,strc:usragnmain,intc:svctype,intc:subsvctype,intc:de_pdpacc,strc:de_pdpacc_dir,intc:detacc,strc:detacc_dir,intc:abnm_rsn,intc: req,intc:rep,intc:rej,intc:rejcau,intc:accept,intc:accept_df,intc:complete,intc:complete_df,intc:ack,bintc:ackdelay,bintc:last_ackdelay,intc:bvciold,intc:bvcinew,intc:llcfrmdisc,intc:opp_lac,intc:opp_rac,intc:rst,bintc:rst_df,intc:discon,intc:abort,strc:qryname,intc:qrytype,intc:qry_op,intc:res,bintc:res_df,intc:res_cau,strc:fromx,strc:tox,strc:subj,intc:mmscau,intc:errpktwir,intc:errpktwirl,intc:gprsmults_e,intc:edgemuls_e,intc:umts128tdd_e,intc:umtsfdd_e,intc:gprsmuls1800,intc:edgemuls1800,intc:umts128tdd1800,intc:umtsfdd1800,intc:reset,intc:resetdir,intc:llcfrmcnt,intc:abt_rsn_usr,intc:abt_rsn_prvd,intc:accrep,intc:abnmtype,intc:de_cau,intc:rautype,intc:rauattr,intc:evtdir,strc:prefiximei,intc:isreass,intc:ispriv_svc,bintc:usr_iplen_ul,bintc:usr_iplen_dl,intc:bssgp_stat_cau,bintc:wgt_rat_ul,bintc:wgt_rat_dl,bintc:wgt_rat_tot,strc:brand,strc:type,intc:icmptype,intc:icmpcode,intc:gobsrcip,intc:gobdstip,intc:gob_iplen,intc:gobsrcport,intc:gobdstport,intc:gob_udplen")
    TBLPROPERTIES ("hbase.table.name" = "gb_xdr_hb");

    ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    the samples of data(10 rows, columns split by tab):
    102310007 52861470 2012-09-05 16:02:13.915000 2012-09-05 16:02:14.715000
    2796454154 SGSN206 1042644490 JMSM8B2 9858 65512 1 217 460004970453832 CMNET
    2 20 2 1 800 2020798986 951359443 1069 80 0 0 0 1 1 800 1 800 1 44 10 0 0 0
    10 0 0 0 1 0 2 0 0 0 0
    104630971 52863105 2012-09-05 16:02:14.475000 2012-09-05 16:02:14.715000
    2796454154 SGSN206 137396234 JMSM10B1 9287 30123 1 57 460020317031561
    012849007156581 2 20 2 1 240 2875977738 4166520175 50695 80 0 0 0 1 1 240 1
    240 1 37 10 10 0 0 1 0 2 01284900 0 0 0 0 iphone IPHONE_4
    101592006 9171377 2012-09-05 16:02:14.950000 2012-09-05 16:02:14.952000
    2796454154 SGSN206 488996362 JMSM8B2 9858 23572 1 460020272769766 8 2 0 1 1
    2 110 111 12 0 0 0 12 0 0 0 0 1
    106524291 52862516 2012-09-05 16:02:14.273000 2012-09-05 16:02:14.716000
    2796454154 SGSN206 623148554 JMSM3B1 9659 65402 1 68 460025200602650
    860980011055457 CMWAP 2 20 2 1 443 2597062922 2885681162 63447 80 0 0 0 1 1
    443 1 6 12 12 0 0 12 12 0 0 1 2 1 0 2 86098001 0 0 0 0
    102431545 52862667 2012-09-05 16:02:14.328000 2012-09-05 16:02:14.716000
    2796454154 SGSN206 171409418 JMSM10B3 9725 21353 1 52 460005630577558
    355712029701925 CMWAP 2 16 617 262 2 4 388 3099116298 2885681162 39088 80
    http://conf.3g.qq.com/newConf/n conf.3g.qq.com application
    x-www-form-urlencoded 27 270694 0 1 1 380 380 1 31 1 31 200 32 32 0 1 32 32
    0 1 4 0 2 35571202 0 617 262 12280 2214 24923 nokia 6120C
    106042934 9171378 2012-09-05 16:02:14.951000 2012-09-05 16:02:14.952000
    2796454154 SGSN206 976584714 JMSM4B2 10064 20422 1 460001861330414
    353990032645348 CMNET 8 1 0 1 1 1 36 74 10 0 0 0 10 0 0 0 0 1 35399003 nokia
    5300
    105595895 52744950 2012-09-05 16:01:33.551000 2012-09-05 16:02:14.716000
    2796454154 SGSN206 2251325450 JMSM10B2 9614 22341 1 24 460020322868605
    359026041942491 CMWAP 2 15 850 1001 3 4 41165 1697466378 2885681162 33652 80
    http://hot.imtt.qq.com/h?c=0&key=1346813826 hot.imtt.qq.com text
    html;charset=utf-8 27 270694 0 1 1 620 620 1 133 1 133 200 32 32 0 0 32 32 0
    0 4 0 2 35902604 0 850 1001 5432 7533 25760 nokia 5233
    101642845 52006881 2012-09-05 15:57:14.717000 2012-09-05 15:57:14.717000
    2796454154 SGSN206 992247306 JMSM3B1 9659 26053 1 71 460020203688580 2 15
    361 1 0 2743708170 2885681162 33015 80
    http://3g.zol.com.cn/images/search.gif 3g.zol.com.cn 27 270922 15901 1 10 10
    0 1 10 10 0 1 2 2 0 361
    106574535 52861851 2012-09-05 16:02:14.037000 2012-09-05 16:02:14.717000
    2796454154 SGSN206 2603909130 JMSM4B1 10063 24181 1 89 460023198308923
    354393000215150 CMWAP 2 20 2 1 680 4291926794 2885681162 65386 80 0 0 0 1 1
    680 1 680 1 5 12 0 0 0 12 0 0 0 1 0 2 35439300 0 0 0 0 unknown unknown
    101542475 52863714 2012-09-05 16:02:14.690000 2012-09-05 16:02:14.717000
    2796454154 SGSN206 827162378 JMSM7B1 10067 21041 1 26 460020225821456 CMNET
    2 21 1340 1 1 27 538878730 991169135 56952 80 0 0 21906 1 1 27 12 12 0 1 12
    12 0 1 1 2 1 1 1 0 1340 0 66503 66503

    ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    Why? Need your helps too! Thanks!
  • Alan at Mar 21, 2013 at 2:37 am
    Hi,

    Let's look at the perf issue first. Looking at the profile you sent us
    earlier, the disk throughput is very low:

    PerDiskReadThroughput: 24.25 MB/sec


    Did you see any warning/error in your log?


    Thanks,

    Alan


    On Tuesday, March 19, 2013 9:46:31 PM UTC-7, zjp... wrote:

    Hi, Alan, Thank you!
    For the first query, when i run the query at first:
    select * from mpos_gb.gb_xdr where start_date_time >= '2012-09-05
    11:34:55.908000' and start_date_time <= '2012-09-05 16:54:55.000000' limit
    10
    it takes about 5 seconds, And when run query:
    select * from mpos_gb.gb_xdr where start_date_time >= '2012-09-05
    12:34:55.908000' and start_date_time <= '2012-09-05 16:54:55.000000' limit
    10
    it takes about 30 seconds. The different only is the first condition.

    For the second query, It take about 4 minutes at least, It is too long for
    me, Because it is only about 1.5 hours data, And in my business system,
    volume is about ~250millons/h(~250MB/s), And In the future will also
    increase.

    For the second query, when i run it, i found all of impalad processes
    running on data nodes get CPU usage and IO usage increased, So, I think the
    IO is the performance bottleneck, because of TEXTFILE format
    storage data row by row, And the types of query of "select count (*) from
    tbl","select max(col) from tbl" will be scanning the entire table. I said
    right?
    .
    For the HBase error, Here are the log, table schema, sample data:
    Impala-shell output:
    # impala-shell
    Welcome to the Impala shell. Press TAB twice to see a list of available
    commands.

    Copyright (c) 2012 Cloudera, Inc. All rights reserved.

    (Build version: Impala v0.6 (720f93c) built on Sat Feb 23 18:52:43 PST
    2013)
    [Not connected] > connect cloudera-host2
    Connected to cloudera-host2:21000
    [cloudera-host2:21000] > select count(*) from gb_xdr_hb;
    Query: select count(*) from gb_xdr_hb
    Query aborted, unable to fetch data
    [cloudera-host2:21000] >

    ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    Impalad output:
    13/03/20 12:13:55 INFO service.Frontend: analyze query select count(*)
    from gb_xdr_hb
    13/03/20 12:13:55 INFO metastore.HiveMetaStore: 1: get_all_tables:
    db=default
    13/03/20 12:13:55 INFO HiveMetaStore.audit: ugi=root ip=unknown-ip-addr
    cmd=get_all_tables: db=default
    13/03/20 12:13:55 INFO metastore.HiveMetaStore: 1: Opening raw store with
    implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
    13/03/20 12:13:55 INFO metastore.ObjectStore: ObjectStore, initialize
    called
    13/03/20 12:13:55 INFO metastore.ObjectStore: Initialized ObjectStore
    13/03/20 12:13:55 INFO DataNucleus.Datastore: The class
    "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as
    "embedded-only" so does not have its own datastore table.
    13/03/20 12:13:55 INFO DataNucleus.Persistence: Managing Persistence of
    Class : org.apache.hadoop.hive.metastore.model.MColumnDescriptor [Table :
    `CDS`, InheritanceStrategy : new-table]
    13/03/20 12:13:55 INFO DataNucleus.Persistence: Managing Persistence of
    Class : org.apache.hadoop.hive.metastore.model.MSerDeInfo [Table :
    `SERDES`, InheritanceStrategy : new-table]
    13/03/20 12:13:55 INFO DataNucleus.Persistence: Managing Persistence of
    Class : org.apache.hadoop.hive.metastore.model.MStringList [Table :
    `SKEWED_STRING_LIST`, InheritanceStrategy : new-table]
    13/03/20 12:13:55 INFO DataNucleus.Datastore: The class
    "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as
    "embedded-only" so does not have its own datastore table.
    13/03/20 12:13:55 INFO DataNucleus.Persistence: Managing Persistence of
    Class : org.apache.hadoop.hive.metastore.model.MStorageDescriptor [Table :
    `SDS`, InheritanceStrategy : new-table]
    13/03/20 12:13:55 INFO DataNucleus.Persistence: Managing Persistence of
    Class : org.apache.hadoop.hive.metastore.model.MTable [Table : `TBLS`,
    InheritanceStrategy : new-table]
    13/03/20 12:13:55 INFO DataNucleus.Persistence: Managing Persistence of
    Field : org.apache.hadoop.hive.metastore.model.MSerDeInfo.parameters [Table
    : `SERDE_PARAMS`]
    13/03/20 12:13:56 INFO DataNucleus.Persistence: Managing Persistence of
    Field : org.apache.hadoop.hive.metastore.model.MStringList.internalList
    [Table : `SKEWED_STRING_LIST_VALUES`]
    13/03/20 12:13:56 INFO DataNucleus.Persistence: Managing Persistence of
    Field : org.apache.hadoop.hive.metastore.model.MTable.parameters [Table :
    `TABLE_PARAMS`]
    13/03/20 12:13:56 INFO DataNucleus.Persistence: Managing Persistence of
    Field : org.apache.hadoop.hive.metastore.model.MTable.partitionKeys [Table
    : `PARTITION_KEYS`]
    13/03/20 12:13:56 INFO DataNucleus.Persistence: Managing Persistence of
    Field :
    org.apache.hadoop.hive.metastore.model.MStorageDescriptor.bucketCols [Table
    : `BUCKETING_COLS`]
    13/03/20 12:13:56 INFO DataNucleus.Persistence: Managing Persistence of
    Field :
    org.apache.hadoop.hive.metastore.model.MStorageDescriptor.parameters [Table
    : `SD_PARAMS`]
    13/03/20 12:13:56 INFO DataNucleus.Persistence: Managing Persistence of
    Field :
    org.apache.hadoop.hive.metastore.model.MStorageDescriptor.skewedColNames
    [Table : `SKEWED_COL_NAMES`]
    13/03/20 12:13:56 INFO DataNucleus.Persistence: Managing Persistence of
    Field :
    org.apache.hadoop.hive.metastore.model.MStorageDescriptor.skewedColValueLocationMaps
    [Table : `SKEWED_COL_VALUE_LOC_MAP`]
    13/03/20 12:13:56 INFO DataNucleus.Persistence: Managing Persistence of
    Field :
    org.apache.hadoop.hive.metastore.model.MStorageDescriptor.skewedColValues
    [Table : `SKEWED_VALUES`]
    13/03/20 12:13:56 INFO DataNucleus.Persistence: Managing Persistence of
    Field : org.apache.hadoop.hive.metastore.model.MStorageDescriptor.sortCols
    [Table : `SORT_COLS`]
    13/03/20 12:13:56 INFO DataNucleus.Persistence: Managing Persistence of
    Field : org.apache.hadoop.hive.metastore.model.MColumnDescriptor.cols
    [Table : `COLUMNS_V2`]
    13/03/20 12:13:56 INFO metastore.HiveMetaStore: 1: get_table : db=default
    tbl=gb_xdr_hb
    13/03/20 12:13:56 INFO HiveMetaStore.audit: ugi=root ip=unknown-ip-addr
    cmd=get_table : db=default tbl=gb_xdr_hb
    13/03/20 12:13:56 INFO metastore.HiveMetaStore: 1: Opening raw store with
    implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
    13/03/20 12:13:56 INFO metastore.ObjectStore: ObjectStore, initialize
    called
    13/03/20 12:13:56 INFO metastore.ObjectStore: Initialized ObjectStore
    13/03/20 12:13:56 INFO DataNucleus.MetaData: Listener found initialisation
    for persistable class
    org.apache.hadoop.hive.metastore.model.MColumnDescriptor
    13/03/20 12:13:56 INFO DataNucleus.MetaData: Listener found initialisation
    for persistable class org.apache.hadoop.hive.metastore.model.MSerDeInfo
    13/03/20 12:13:56 INFO DataNucleus.MetaData: Listener found initialisation
    for persistable class
    org.apache.hadoop.hive.metastore.model.MStorageDescriptor
    13/03/20 12:13:56 INFO DataNucleus.MetaData: Listener found initialisation
    for persistable class org.apache.hadoop.hive.metastore.model.MTable
    13/03/20 12:13:56 INFO DataNucleus.MetaData: Listener found initialisation
    for persistable class org.apache.hadoop.hive.metastore.model.MFieldSchema
    13/03/20 12:13:56 INFO analysis.AggregateInfo: agg info:
    AggregateInfo{grouping_exprs=, aggregate_exprs=(AggregateExpr{op=COUNT,
    isStar=true, isDistinct=false, }), agg_tuple=TupleDescriptor{id=1,
    tbl=null, byte_size=0, is_materialized=true, slots=[SlotDescriptor{id=0,
    col=null, type=BIGINT, materialized=false, byteSize=0, byteOffset=-1,
    nullIndicatorByte=0, nullIndicatorBit=0, slotIdx=0}]},
    smap=smap(AggregateExpr{op=COUNT, isStar=true, isDistinct=false,
    }:SlotRef{tblName=null, col=null, id=0})}
    mergeAggInfo:
    AggregateInfo{grouping_exprs=, aggregate_exprs=(AggregateExpr{op=SUM,
    isStar=false, isDistinct=false, (SlotRef{tblName=null, col=null, id=0})}),
    agg_tuple=TupleDescriptor{id=1, tbl=null, byte_size=0,
    is_materialized=true, slots=[SlotDescriptor{id=0, col=null, type=BIGINT,
    materialized=false, byteSize=0, byteOffset=-1, nullIndicatorByte=0,
    nullIndicatorBit=0, slotIdx=0}]}, smap=smap(AggregateExpr{op=COUNT,
    isStar=true, isDistinct=false, }:SlotRef{tblName=null, col=null, id=0})}
    13/03/20 12:13:56 INFO service.Frontend: create plan
    13/03/20 12:13:56 INFO planner.Planner: create single-node plan
    13/03/20 12:13:56 INFO planner.Planner: create plan fragments
    13/03/20 12:13:56 INFO planner.Planner: finalize plan fragments
    13/03/20 12:13:56 INFO service.Frontend: get scan range locations
    13/03/20 12:13:56 INFO service.Frontend: create result set metadata
    13/03/20 12:13:56 INFO service.JniFrontend: Plan Fragment 0
    UNPARTITIONED
    AGGREGATE
    OUTPUT: SUM(<slot 0>)
    GROUP BY:
    TUPLE IDS: 1
    EXCHANGE (2)
    TUPLE IDS: 1

    Plan Fragment 1
    RANDOM
    STREAM DATA SINK
    EXCHANGE ID: 2
    UNPARTITIONED

    AGGREGATE
    OUTPUT: COUNT(*)
    GROUP BY:
    TUPLE IDS: 1
    SCAN HBASE table=gb_xdr_hb (0)
    TUPLE IDS: 0

    E0320 12:13:57.587276 17966 impala-server.cc:1349] unknown query id:
    51899b8f7ceb485b:83f451a2e8daf1f9
    E0320 12:13:58.071900 17958 impala-server.cc:1349] unknown query id:
    51899b8f7ceb485b:83f451a2e8daf1f9

    ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    Query profile from WEB interface:

    Query (id=51899b8f7ceb485b:83f451a2e8daf1f9):
    - PlanningTime: 2s969ms
    Query 51899b8f7ceb485b:83f451a2e8daf1f9:(1s848ms 0.00%)
    Aggregate Profile:
    Coordinator Fragment:(39ms 0.00%)
    - RowsProduced: 0
    CodeGen:
    - CodegenTime: 68ms
    - CompileTime: 223ms
    - LoadTime: 269ms
    - ModuleFileSize: 44.61 KB
    AGGREGATION_NODE (id=3):(107ms 100.00%)
    - BuildBuckets: 0
    - BuildTime: 0K clock cycles
    - GetResultsTime: 0K clock cycles
    - MemoryUsed: 0.00
    - RowsReturned: 0
    - RowsReturnedRate: 0
    EXCHANGE_NODE (id=2):(0K clock cycles 0.00%)
    - BytesReceived: 0.00
    - ConvertRowBatchTime: 0K clock cycles
    - DeserializeRowBatchTimer: 0K clock cycles
    - MemoryUsed: 0.00
    - RowsReturned: 0
    - RowsReturnedRate: 0
    Averaged Fragment 1:(49ms 0.00%)
    completion times: min:14ms max:396ms mean: 251ms stddev:169ms
    execution rates: min:0.00 /sec max:0.00 /sec mean:0.00 /sec stddev:0.00 /sec
    split sizes: min: 0.00 , max: 0.00 , avg: 0.00 , stddev: 0.00
    - RowsProduced: 0
    CodeGen:
    - CodegenTime: 7ms
    - CompileTime: 33ms
    - LoadTime: 9ms
    - ModuleFileSize: 12.35 KB
    DataStreamSender:
    - BytesSent: 0.00
    - DataSinkTime: 0K clock cycles
    - SerializeBatchTime: 0K clock cycles
    - ThriftTransmitTime: 0K clock cycles
    AGGREGATION_NODE (id=1):(56ms 52.51%)
    - BuildBuckets: 0
    - BuildTime: 0K clock cycles
    - GetResultsTime: 0K clock cycles
    - MemoryUsed: 0.00
    - RowsReturned: 0
    - RowsReturnedRate: 0
    HBASE_SCAN_NODE (id=0):(30ms 62.03%)
    - BytesRead: 0.00
    - HBaseTableScanner.ScanSetup: 13K clock cycles
    - MaterializeTupleTime: 0K clock cycles
    - MemoryUsed: 0.00
    - PerDiskReadThroughput: 0.00 /sec
    - RowsReturned: 0
    - RowsReturnedRate: 0
    - ScanRangesComplete: 0
    - ScannerThreadsReadTime: 0K clock cycles
    - TotalReadThroughput: 0.00 /sec
    Fragment 1:
    Instance 51899b8f7ceb485b:83f451a2e8daf1fb:
    Instance 51899b8f7ceb485b:83f451a2e8daf1fc:(147ms 0.00%)
    Hdfs split stats (:<# splits>/):
    - RowsProduced: 0
    CodeGen:
    - CodegenTime: 21ms
    - CompileTime: 99ms
    - LoadTime: 28ms
    - ModuleFileSize: 37.04 KB
    DataStreamSender:
    - BytesSent: 0.00
    - DataSinkTime: 0K clock cycles
    - SerializeBatchTime: 0K clock cycles
    - ThriftTransmitTime: 0K clock cycles
    AGGREGATION_NODE (id=1):(168ms 52.51%)
    - BuildBuckets: 0
    - BuildTime: 0K clock cycles
    - GetResultsTime: 0K clock cycles
    - MemoryUsed: 0.00
    - RowsReturned: 0
    - RowsReturnedRate: 0
    HBASE_SCAN_NODE (id=0):(91ms 62.03%)
    - BytesRead: 0.00
    - HBaseTableScanner.ScanSetup: 39K clock cycles
    - MaterializeTupleTime: 0K clock cycles
    - MemoryUsed: 0.00
    - PerDiskReadThroughput: 0.00 /sec
    - RowsReturned: 0
    - RowsReturnedRate: 0
    - ScanRangesComplete: 0
    - ScannerThreadsReadTime: 0K clock cycles
    - TotalReadThroughput: 0.00 /sec
    Instance 51899b8f7ceb485b:83f451a2e8daf1fd:

    ---------------------------------------------------------------------------------------------------------

    the table schema:

    hive>CREATE TABLE gb_xdr_hb(Online_ID bigint, Session_ID string, Start_Date_Time string, End_Date_Time string, SGSN_IP bigint, SGSN string, BSC_IP bigint, BSC string, LAC int, CI int, RAC int, BVCI int, IMSI bigint, IMEI string, APN string, Protocol string, Event_Type int, IP_LEN_UL bigint, IP_LEN_DL bigint, Count_Packet_UL int, Count_Packet_DL int, Duration int, SOURCE_IP bigint, DEST_IP bigint, SOURCE_PORT int, DEST_PORT int, URI string, URI_Main string, Content_Type_Part1 string, Content_Type_Part2 string, User_Agent_Main string, Service_Type int, Sub_Service_Type int, De_PDP_Acc int, De_PDP_Acc_Direction string, Detach_Acc int, Detach_Acc_Direction string, Abnormal_reason int, Request int, Repeat int, Reject int, Reject_Cause int, Accept int, Accept_DelayFirst int, Complete int, Complete_DelayFirst int, ACK int, ACK_Delay bigint, Last_ACK_Delay bigint, BVCI_OLD int, BVCI_NEW int, LLC_frame_Disc int, OPP_LAC int, OPP_RAC int, Result int, Result_Delayfirst bigint, Disconnect int, Abort int, Query_Name string, Query_Type int, Query_OP int, Resp int, Resp_delayFirst bigint, Resp_Cause int, FromX string, ToX string, Subject string, MMS_Cause int, ErrorPacketWired int, ErrorPacketWireless int, GPRS_multislot_E int, EDGE_multislot_E int, Umts_128TDD_E int, Umts_FDD_E int, GPRS_multislot_1800 int, EDGE_multislot_1800 int, Umts_128TDD_1800 int, Umts_FDD_1800 int, Reset int, Reset_Direction int, LLC_Frame_Count int, Abort_Reason_User int, Abort_Reason_Provider int, Accept_Repeat int, Abnormal_Type int, De_Cause int, RAU_Type int, RAU_Attr int, Event_Direction int, Prefix_IMEI string, IsReassemble int, IsPrivate_Service int, User_IP_LEN_UL bigint, User_IP_LEN_DL bigint, Bssgp_Status_Cause int, Weight_Rate_UL bigint, Weight_Rate_DL bigint, Weight_Rate_Total bigint, Brand string, Type string, ICMP_Type int, ICMP_Code int, Gob_Src_IP int, Gob_Dst_IP int, Gob_IP_LEN int, Gob_Src_PORT int, Gob_Dst_PORT int, Gob_UDP_LEN int) row format delimited fields terminated by '\t'
    STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
    WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,strc:sessid,strc:sdt,strc:edt,bintc:sgsnip,strc:sgsn,bintc:bscip,strc:bsc,intc:lac,intc:ci,intc:rac,intc:bvci,bintc:imsi,strc:imei,strc:apn,strc:protoc,intc:etype,bintc:iplen_ul,bintc:iplen_dl,intc:cntpkt_ul,intc:cntpkt_dl,intc:duration,bintc:srcip,bintc:dstip,intc:srcport,intc:dstport,strc:uri,strc:urimain,strc:contype_p1,strc:contype_p2,strc:usragnmain,intc:svctype,intc:subsvctype,intc:de_pdpacc,strc:de_pdpacc_dir,intc:detacc,strc:detacc_dir,intc:abnm_rsn,intc: req,intc:rep,intc:rej,intc:rejcau,intc:accept,intc:accept_df,intc:complete,intc:complete_df,intc:ack,bintc:ackdelay,bintc:last_ackdelay,intc:bvciold,intc:bvcinew,intc:llcfrmdisc,intc:opp_lac,intc:opp_rac,intc:rst,bintc:rst_df,intc:discon,intc:abort,strc:qryname,intc:qrytype,intc:qry_op,intc:res,bintc:res_df,intc:res_cau,strc:fromx,strc:tox,strc:subj,intc:mmscau,intc:errpktwir,intc:errpktwirl,intc:gprsmults_e,intc:edgemuls_e,intc:umts128tdd_e,intc:umtsfdd_e,intc:gprsmuls1800,intc:edgemuls1800,intc:umts128tdd1800,intc:umtsfdd1800,intc:reset,intc:resetdir,intc:llcfrmcnt,intc:abt_rsn_usr,intc:abt_rsn_prvd,intc:accrep,intc:abnmtype,intc:de_cau,intc:rautype,intc:rauattr,intc:evtdir,strc:prefiximei,intc:isreass,intc:ispriv_svc,bintc:usr_iplen_ul,bintc:usr_iplen_dl,intc:bssgp_stat_cau,bintc:wgt_rat_ul,bintc:wgt_rat_dl,bintc:wgt_rat_tot,strc:brand,strc:type,intc:icmptype,intc:icmpcode,intc:gobsrcip,intc:gobdstip,intc:gob_iplen,intc:gobsrcport,intc:gobdstport,intc:gob_udplen")
    TBLPROPERTIES ("hbase.table.name" = "gb_xdr_hb");


    ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    the samples of data(10 rows, columns split by tab):
    102310007 52861470 2012-09-05 16:02:13.915000 2012-09-05 16:02:14.715000
    2796454154 SGSN206 1042644490 JMSM8B2 9858 65512 1 217 460004970453832
    CMNET 2 20 2 1 800 2020798986 951359443 1069 80 0 0 0 1 1 800 1 800 1 44
    10 0 0 0 10 0 0 0 1 0 2 0 0 0 0
    104630971 52863105 2012-09-05 16:02:14.475000 2012-09-05 16:02:14.715000
    2796454154 SGSN206 137396234 JMSM10B1 9287 30123 1 57 460020317031561
    012849007156581 2 20 2 1 240 2875977738 4166520175 50695 80 0 0 0 1 1 240
    1 240 1 37 10 10 0 0 1 0 2 01284900 0 0 0 0 iphone IPHONE_4
    101592006 9171377 2012-09-05 16:02:14.950000 2012-09-05 16:02:14.952000
    2796454154 SGSN206 488996362 JMSM8B2 9858 23572 1 460020272769766 8 2 0 1
    1 2 110 111 12 0 0 0 12 0 0 0 0 1
    106524291 52862516 2012-09-05 16:02:14.273000 2012-09-05 16:02:14.716000
    2796454154 SGSN206 623148554 JMSM3B1 9659 65402 1 68 460025200602650
    860980011055457 CMWAP 2 20 2 1 443 2597062922 2885681162 63447 80 0 0 0 1
    1 443 1 6 12 12 0 0 12 12 0 0 1 2 1 0 2 86098001 0 0 0 0
    102431545 52862667 2012-09-05 16:02:14.328000 2012-09-05 16:02:14.716000
    2796454154 SGSN206 171409418 JMSM10B3 9725 21353 1 52 460005630577558
    355712029701925 CMWAP 2 16 617 262 2 4 388 3099116298 2885681162 39088 80
    http://conf.3g.qq.com/newConf/n conf.3g.qq.com application
    x-www-form-urlencoded 27 270694 0 1 1 380 380 1 31 1 31 200 32 32 0 1 32
    32 0 1 4 0 2 35571202 0 617 262 12280 2214 24923 nokia 6120C
    106042934 9171378 2012-09-05 16:02:14.951000 2012-09-05 16:02:14.952000
    2796454154 SGSN206 976584714 JMSM4B2 10064 20422 1 460001861330414
    353990032645348 CMNET 8 1 0 1 1 1 36 74 10 0 0 0 10 0 0 0 0 1 35399003
    nokia 5300
    105595895 52744950 2012-09-05 16:01:33.551000 2012-09-05 16:02:14.716000
    2796454154 SGSN206 2251325450 JMSM10B2 9614 22341 1 24 460020322868605
    359026041942491 CMWAP 2 15 850 1001 3 4 41165 1697466378 2885681162 33652
    80 http://hot.imtt.qq.com/h?c=0&key=1346813826 hot.imtt.qq.com text
    html;charset=utf-8 27 270694 0 1 1 620 620 1 133 1 133 200 32 32 0 0 32 32
    0 0 4 0 2 35902604 0 850 1001 5432 7533 25760 nokia 5233
    101642845 52006881 2012-09-05 15:57:14.717000 2012-09-05 15:57:14.717000
    2796454154 SGSN206 992247306 JMSM3B1 9659 26053 1 71 460020203688580 2 15
    361 1 0 2743708170 2885681162 33015 80
    http://3g.zol.com.cn/images/search.gif 3g.zol.com.cn 27 270922 15901 1 10
    10 0 1 10 10 0 1 2 2 0 361
    106574535 52861851 2012-09-05 16:02:14.037000 2012-09-05 16:02:14.717000
    2796454154 SGSN206 2603909130 JMSM4B1 10063 24181 1 89 460023198308923
    354393000215150 CMWAP 2 20 2 1 680 4291926794 2885681162 65386 80 0 0 0 1
    1 680 1 680 1 5 12 0 0 0 12 0 0 0 1 0 2 35439300 0 0 0 0 unknown unknown
    101542475 52863714 2012-09-05 16:02:14.690000 2012-09-05 16:02:14.717000
    2796454154 SGSN206 827162378 JMSM7B1 10067 21041 1 26 460020225821456
    CMNET 2 21 1340 1 1 27 538878730 991169135 56952 80 0 0 21906 1 1 27 12 12
    0 1 12 12 0 1 1 2 1 1 1 0 1340 0 66503 66503


    ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    Why? Need your helps too! Thanks!
  • Greg Rahn at Mar 21, 2013 at 5:01 pm
    More in the docs:
    https://ccp.cloudera.com/display/IMPALA10BETADOC/Using+Impala+Logging

    On Wed, Mar 20, 2013 at 9:17 PM, zjp... wrote:

    Hi,Alan,Thanks for your response!

    Where are the logs I should look?

    Thanks!


    在 2013年3月21日星期四UTC上午2时37分39秒,Alan写道:
    Hi,

    Let's look at the perf issue first. Looking at the profile you sent us
    earlier, the disk throughput is very low:

    PerDiskReadThroughput: 24.25 MB/sec


    Did you see any warning/error in your log?


    Thanks,

    Alan


    On Tuesday, March 19, 2013 9:46:31 PM UTC-7, zjp... wrote:

    Hi, Alan, Thank you!
    For the first query, when i run the query at first:
    select * from mpos_gb.gb_xdr where start_date_time >= '2012-09-05
    11:34:55.908000' and start_date_time <= '2012-09-05 16:54:55.000000' limit
    10
    it takes about 5 seconds, And when run query:
    select * from mpos_gb.gb_xdr where start_date_time >= '2012-09-05
    12:34:55.908000' and start_date_time <= '2012-09-05 16:54:55.000000' limit
    10
    it takes about 30 seconds. The different only is the first condition.

    For the second query, It take about 4 minutes at least, It is too long
    for me, Because it is only about 1.5 hours data, And in my business system,
    volume is about ~250millons/h(~250MB/s), And In the future will also
    increase.

    For the second query, when i run it, i found all of impalad processes
    running on data nodes get CPU usage and IO usage increased, So, I think the
    IO is the performance bottleneck, because of TEXTFILE format
    storage data row by row, And the types of query of "select count (*) from
    tbl","select max(col) from tbl" will be scanning the entire table. I said
    right?
    .
    For the HBase error, Here are the log, table schema, sample data:
    Impala-shell output:
    # impala-shell
    Welcome to the Impala shell. Press TAB twice to see a list of available
    commands.

    Copyright (c) 2012 Cloudera, Inc. All rights reserved.

    (Build version: Impala v0.6 (720f93c) built on Sat Feb 23 18:52:43 PST
    2013)
    [Not connected] > connect cloudera-host2
    Connected to cloudera-host2:21000
    [cloudera-host2:21000] > select count(*) from gb_xdr_hb;
    Query: select count(*) from gb_xdr_hb
    Query aborted, unable to fetch data
    [cloudera-host2:21000] >
    ------------------------------**------------------------------**
    ------------------------------**------------------------------**
    ------------------------------**------------------------------**
    ------------------------------**----
    Impalad output:
    13/03/20 12:13:55 INFO service.Frontend: analyze query select count(*)
    from gb_xdr_hb
    13/03/20 12:13:55 INFO metastore.HiveMetaStore: 1: get_all_tables:
    db=default
    13/03/20 12:13:55 INFO HiveMetaStore.audit: ugi=root
    ip=unknown-ip-addr cmd=get_all_tables: db=default
    13/03/20 12:13:55 INFO metastore.HiveMetaStore: 1: Opening raw store
    with implemenation class:org.apache.hadoop.hive.**metastore.ObjectStore
    13/03/20 12:13:55 INFO metastore.ObjectStore: ObjectStore, initialize
    called
    13/03/20 12:13:55 INFO metastore.ObjectStore: Initialized ObjectStore
    13/03/20 12:13:55 INFO DataNucleus.Datastore: The class
    "org.apache.hadoop.hive.**metastore.model.MFieldSchema" is tagged as
    "embedded-only" so does not have its own datastore table.
    13/03/20 12:13:55 INFO DataNucleus.Persistence: Managing Persistence of
    Class : org.apache.hadoop.hive.**metastore.model.**MColumnDescriptor
    [Table : `CDS`, InheritanceStrategy : new-table]
    13/03/20 12:13:55 INFO DataNucleus.Persistence: Managing Persistence of
    Class : org.apache.hadoop.hive.**metastore.model.MSerDeInfo [Table :
    `SERDES`, InheritanceStrategy : new-table]
    13/03/20 12:13:55 INFO DataNucleus.Persistence: Managing Persistence of
    Class : org.apache.hadoop.hive.**metastore.model.MStringList [Table :
    `SKEWED_STRING_LIST`, InheritanceStrategy : new-table]
    13/03/20 12:13:55 INFO DataNucleus.Datastore: The class
    "org.apache.hadoop.hive.**metastore.model.MOrder" is tagged as
    "embedded-only" so does not have its own datastore table.
    13/03/20 12:13:55 INFO DataNucleus.Persistence: Managing Persistence of
    Class : org.apache.hadoop.hive.**metastore.model.**MStorageDescriptor
    [Table : `SDS`, InheritanceStrategy : new-table]
    13/03/20 12:13:55 INFO DataNucleus.Persistence: Managing Persistence of
    Class : org.apache.hadoop.hive.**metastore.model.MTable [Table :
    `TBLS`, InheritanceStrategy : new-table]
    13/03/20 12:13:55 INFO DataNucleus.Persistence: Managing Persistence of
    Field : org.apache.hadoop.hive.**metastore.model.MSerDeInfo.**parameters
    [Table : `SERDE_PARAMS`]
    13/03/20 12:13:56 INFO DataNucleus.Persistence: Managing Persistence of
    Field : org.apache.hadoop.hive.**metastore.model.MStringList.**internalList
    [Table : `SKEWED_STRING_LIST_VALUES`]
    13/03/20 12:13:56 INFO DataNucleus.Persistence: Managing Persistence of
    Field : org.apache.hadoop.hive.**metastore.model.MTable.**parameters
    [Table : `TABLE_PARAMS`]
    13/03/20 12:13:56 INFO DataNucleus.Persistence: Managing Persistence of
    Field : org.apache.hadoop.hive.**metastore.model.MTable.**partitionKeys
    [Table : `PARTITION_KEYS`]
    13/03/20 12:13:56 INFO DataNucleus.Persistence: Managing Persistence of
    Field : org.apache.hadoop.hive.**metastore.model.**MStorageDescriptor.bucketCols
    [Table : `BUCKETING_COLS`]
    13/03/20 12:13:56 INFO DataNucleus.Persistence: Managing Persistence of
    Field : org.apache.hadoop.hive.**metastore.model.**MStorageDescriptor.parameters
    [Table : `SD_PARAMS`]
    13/03/20 12:13:56 INFO DataNucleus.Persistence: Managing Persistence of
    Field : org.apache.hadoop.hive.**metastore.model.**MStorageDescriptor.**skewedColNames
    [Table : `SKEWED_COL_NAMES`]
    13/03/20 12:13:56 INFO DataNucleus.Persistence: Managing Persistence of
    Field : org.apache.hadoop.hive.**metastore.model.**MStorageDescriptor.**skewedColValueLocationMaps
    [Table : `SKEWED_COL_VALUE_LOC_MAP`]
    13/03/20 12:13:56 INFO DataNucleus.Persistence: Managing Persistence of
    Field : org.apache.hadoop.hive.**metastore.model.**MStorageDescriptor.**skewedColValues
    [Table : `SKEWED_VALUES`]
    13/03/20 12:13:56 INFO DataNucleus.Persistence: Managing Persistence of
    Field : org.apache.hadoop.hive.**metastore.model.**MStorageDescriptor.sortCols
    [Table : `SORT_COLS`]
    13/03/20 12:13:56 INFO DataNucleus.Persistence: Managing Persistence of
    Field : org.apache.hadoop.hive.**metastore.model.**MColumnDescriptor.cols
    [Table : `COLUMNS_V2`]
    13/03/20 12:13:56 INFO metastore.HiveMetaStore: 1: get_table :
    db=default tbl=gb_xdr_hb
    13/03/20 12:13:56 INFO HiveMetaStore.audit: ugi=root
    ip=unknown-ip-addr cmd=get_table : db=default tbl=gb_xdr_hb
    13/03/20 12:13:56 INFO metastore.HiveMetaStore: 1: Opening raw store
    with implemenation class:org.apache.hadoop.hive.**metastore.ObjectStore
    13/03/20 12:13:56 INFO metastore.ObjectStore: ObjectStore, initialize
    called
    13/03/20 12:13:56 INFO metastore.ObjectStore: Initialized ObjectStore
    13/03/20 12:13:56 INFO DataNucleus.MetaData: Listener found
    initialisation for persistable class org.apache.hadoop.hive.**
    metastore.model.**MColumnDescriptor
    13/03/20 12:13:56 INFO DataNucleus.MetaData: Listener found
    initialisation for persistable class org.apache.hadoop.hive.**
    metastore.model.MSerDeInfo
    13/03/20 12:13:56 INFO DataNucleus.MetaData: Listener found
    initialisation for persistable class org.apache.hadoop.hive.**
    metastore.model.**MStorageDescriptor
    13/03/20 12:13:56 INFO DataNucleus.MetaData: Listener found
    initialisation for persistable class org.apache.hadoop.hive.**
    metastore.model.MTable
    13/03/20 12:13:56 INFO DataNucleus.MetaData: Listener found
    initialisation for persistable class org.apache.hadoop.hive.**
    metastore.model.MFieldSchema
    13/03/20 12:13:56 INFO analysis.AggregateInfo: agg info:
    AggregateInfo{grouping_exprs=, aggregate_exprs=(**AggregateExpr{op=COUNT,
    isStar=true, isDistinct=false, }), agg_tuple=TupleDescriptor{id=**1,
    tbl=null, byte_size=0, is_materialized=true, slots=[SlotDescriptor{id=0,
    col=null, type=BIGINT, materialized=false, byteSize=0, byteOffset=-1,
    nullIndicatorByte=0, nullIndicatorBit=0, slotIdx=0}]},
    smap=smap(AggregateExpr{op=**COUNT, isStar=true, isDistinct=false,
    }:SlotRef{tblName=null, col=null, id=0})}
    mergeAggInfo:
    AggregateInfo{grouping_exprs=, aggregate_exprs=(**AggregateExpr{op=SUM,
    isStar=false, isDistinct=false, (SlotRef{tblName=null, col=null, id=0})}),
    agg_tuple=TupleDescriptor{id=**1, tbl=null, byte_size=0,
    is_materialized=true, slots=[SlotDescriptor{id=0, col=null, type=BIGINT,
    materialized=false, byteSize=0, byteOffset=-1, nullIndicatorByte=0,
    nullIndicatorBit=0, slotIdx=0}]}, smap=smap(AggregateExpr{op=**COUNT,
    isStar=true, isDistinct=false, }:SlotRef{tblName=null, col=null, id=0})}
    13/03/20 12:13:56 INFO service.Frontend: create plan
    13/03/20 12:13:56 INFO planner.Planner: create single-node plan
    13/03/20 12:13:56 INFO planner.Planner: create plan fragments
    13/03/20 12:13:56 INFO planner.Planner: finalize plan fragments
    13/03/20 12:13:56 INFO service.Frontend: get scan range locations
    13/03/20 12:13:56 INFO service.Frontend: create result set metadata
    13/03/20 12:13:56 INFO service.JniFrontend: Plan Fragment 0
    UNPARTITIONED
    AGGREGATE
    OUTPUT: SUM(<slot 0>)
    GROUP BY:
    TUPLE IDS: 1
    EXCHANGE (2)
    TUPLE IDS: 1

    Plan Fragment 1
    RANDOM
    STREAM DATA SINK
    EXCHANGE ID: 2
    UNPARTITIONED

    AGGREGATE
    OUTPUT: COUNT(*)
    GROUP BY:
    TUPLE IDS: 1
    SCAN HBASE table=gb_xdr_hb (0)
    TUPLE IDS: 0

    E0320 12:13:57.587276 17966 impala-server.cc:1349] unknown query id:
    51899b8f7ceb485b:**83f451a2e8daf1f9
    E0320 12:13:58.071900 17958 impala-server.cc:1349] unknown query id:
    51899b8f7ceb485b:**83f451a2e8daf1f9
    ------------------------------**------------------------------**
    ------------------------------**------------------------------**
    ------------------------------**------------------------------**
    ---------------------
    Query profile from WEB interface:

    Query (id=51899b8f7ceb485b:**83f451a2e8daf1f9):
    - PlanningTime: 2s969ms
    Query 51899b8f7ceb485b:**83f451a2e8daf1f9:(1s848ms 0.00%)
    Aggregate Profile:
    Coordinator Fragment:(39ms 0.00%)
    - RowsProduced: 0
    CodeGen:
    - CodegenTime: 68ms
    - CompileTime: 223ms
    - LoadTime: 269ms
    - ModuleFileSize: 44.61 KB
    AGGREGATION_NODE (id=3):(107ms 100.00%)
    - BuildBuckets: 0
    - BuildTime: 0K clock cycles
    - GetResultsTime: 0K clock cycles
    - MemoryUsed: 0.00
    - RowsReturned: 0
    - RowsReturnedRate: 0
    EXCHANGE_NODE (id=2):(0K clock cycles 0.00%)
    - BytesReceived: 0.00
    - ConvertRowBatchTime: 0K clock cycles
    - DeserializeRowBatchTimer: 0K clock cycles
    - MemoryUsed: 0.00
    - RowsReturned: 0
    - RowsReturnedRate: 0
    Averaged Fragment 1:(49ms 0.00%)
    completion times: min:14ms max:396ms mean: 251ms stddev:169ms
    execution rates: min:0.00 /sec max:0.00 /sec mean:0.00 /sec stddev:0.00 /sec
    split sizes: min: 0.00 , max: 0.00 , avg: 0.00 , stddev: 0.00
    - RowsProduced: 0
    CodeGen:
    - CodegenTime: 7ms
    - CompileTime: 33ms
    - LoadTime: 9ms
    - ModuleFileSize: 12.35 KB
    DataStreamSender:
    - BytesSent: 0.00
    - DataSinkTime: 0K clock cycles
    - SerializeBatchTime: 0K clock cycles
    - ThriftTransmitTime: 0K clock cycles
    AGGREGATION_NODE (id=1):(56ms 52.51%)
    - BuildBuckets: 0
    - BuildTime: 0K clock cycles
    - GetResultsTime: 0K clock cycles
    - MemoryUsed: 0.00
    - RowsReturned: 0
    - RowsReturnedRate: 0
    HBASE_SCAN_NODE (id=0):(30ms 62.03%)
    - BytesRead: 0.00
    - HBaseTableScanner.ScanSetup: 13K clock cycles
    - MaterializeTupleTime: 0K clock cycles
    - MemoryUsed: 0.00
    - PerDiskReadThroughput: 0.00 /sec
    - RowsReturned: 0
    - RowsReturnedRate: 0
    - ScanRangesComplete: 0
    - ScannerThreadsReadTime: 0K clock cycles
    - TotalReadThroughput: 0.00 /sec
    Fragment 1:
    Instance 51899b8f7ceb485b:**83f451a2e8daf1fb:
    Instance 51899b8f7ceb485b:**83f451a2e8daf1fc:(147ms 0.00%)
    Hdfs split stats (:<# splits>/):
    - RowsProduced: 0
    CodeGen:
    - CodegenTime: 21ms
    - CompileTime: 99ms
    - LoadTime: 28ms
    - ModuleFileSize: 37.04 KB
    DataStreamSender:
    - BytesSent: 0.00
    - DataSinkTime: 0K clock cycles
    - SerializeBatchTime: 0K clock cycles
    - ThriftTransmitTime: 0K clock cycles
    AGGREGATION_NODE (id=1):(168ms 52.51%)
    - BuildBuckets: 0
    - BuildTime: 0K clock cycles
    - GetResultsTime: 0K clock cycles
    - MemoryUsed: 0.00
    - RowsReturned: 0
    - RowsReturnedRate: 0
    HBASE_SCAN_NODE (id=0):(91ms 62.03%)
    - BytesRead: 0.00
    - HBaseTableScanner.ScanSetup: 39K clock cycles
    - MaterializeTupleTime: 0K clock cycles
    - MemoryUsed: 0.00
    - PerDiskReadThroughput: 0.00 /sec
    - RowsReturned: 0
    - RowsReturnedRate: 0
    - ScanRangesComplete: 0
    - ScannerThreadsReadTime: 0K clock cycles
    - TotalReadThroughput: 0.00 /sec
    Instance 51899b8f7ceb485b:**83f451a2e8daf1fd:

    ------------------------------**------------------------------**------------------------------**---------------

    the table schema:

    hive>CREATE TABLE gb_xdr_hb(Online_ID bigint, Session_ID string, Start_Date_Time string, End_Date_Time string, SGSN_IP bigint, SGSN string, BSC_IP bigint, BSC string, LAC int, CI int, RAC int, BVCI int, IMSI bigint, IMEI string, APN string, Protocol string, Event_Type int, IP_LEN_UL bigint, IP_LEN_DL bigint, Count_Packet_UL int, Count_Packet_DL int, Duration int, SOURCE_IP bigint, DEST_IP bigint, SOURCE_PORT int, DEST_PORT int, URI string, URI_Main string, Content_Type_Part1 string, Content_Type_Part2 string, User_Agent_Main string, Service_Type int, Sub_Service_Type int, De_PDP_Acc int, De_PDP_Acc_Direction string, Detach_Acc int, Detach_Acc_Direction string, Abnormal_reason int, Request int, Repeat int, Reject int, Reject_Cause int, Accept int, Accept_DelayFirst int, Complete int, Complete_DelayFirst int, ACK int, ACK_Delay bigint, Last_ACK_Delay bigint, BVCI_OLD int, BVCI_NEW int, LLC_frame_Disc int, OPP_LAC int, OPP_RAC int, Result int, Result_Delayfirst bigint, Disconnect int, Abort int, Query_Name string, Query_Type int, Query_OP int, Resp int, Resp_delayFirst bigint, Resp_Cause int, FromX string, ToX string, Subject string, MMS_Cause int, ErrorPacketWired int, ErrorPacketWireless int, GPRS_multislot_E int, EDGE_multislot_E int, Umts_128TDD_E int, Umts_FDD_E int, GPRS_multislot_1800 int, EDGE_multislot_1800 int, Umts_128TDD_1800 int, Umts_FDD_1800 int, Reset int, Reset_Direction int, LLC_Frame_Count int, Abort_Reason_User int, Abort_Reason_Provider int, Accept_Repeat int, Abnormal_Type int, De_Cause int, RAU_Type int, RAU_Attr int, Event_Direction int, Prefix_IMEI string, IsReassemble int, IsPrivate_Service int, User_IP_LEN_UL bigint, User_IP_LEN_DL bigint, Bssgp_Status_Cause int, Weight_Rate_UL bigint, Weight_Rate_DL bigint, Weight_Rate_Total bigint, Brand string, Type string, ICMP_Type int, ICMP_Code int, Gob_Src_IP int, Gob_Dst_IP int, Gob_IP_LEN int, Gob_Src_PORT int, Gob_Dst_PORT int, Gob_UDP_LEN int) row format delimited fields terminated by '\t'
    STORED BY 'org.apache.hadoop.hive.hbase.**HBaseStorageHandler'
    WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,strc:sessid,strc:sdt,**strc:edt,bintc:sgsnip,strc:**sgsn,bintc:bscip,strc:bsc,**intc:lac,intc:ci,intc:rac,**intc:bvci,bintc:imsi,strc:**imei,strc:apn,strc:protoc,**intc:etype,bintc:iplen_ul,**bintc:iplen_dl,intc:cntpkt_ul,**intc:cntpkt_dl,intc:duration,**bintc:srcip,bintc:dstip,intc:**srcport,intc:dstport,strc:uri,**strc:urimain,strc:contype_p1,**strc:contype_p2,strc:**usragnmain,intc:svctype,intc:**subsvctype,intc:de_pdpacc,**strc:de_pdpacc_dir,intc:**detacc,strc:detacc_dir,intc:**abnm_rsn,intc: req,intc:rep,intc:rej,intc:**rejcau,intc:accept,intc:**accept_df,intc:complete,intc:**complete_df,intc:ack,bintc:**ackdelay,bintc:last_ackdelay,**intc:bvciold,intc:bvcinew,**intc:llcfrmdisc,intc:opp_lac,**intc:opp_rac,intc:rst,bintc:**rst_df,intc:discon,intc:abort,**strc:qryname,intc:qrytype,**intc:qry_op,intc:res,bintc:**res_df,intc:res_cau,strc:**fromx,strc:tox,strc:subj,intc:**mmscau,intc:errpktwir,intc:**errpktwirl,intc:gprsmults_e,**intc:edgemuls_e,intc:**umts128tdd_e,intc:umtsfdd_e,**intc:gprsmuls1800,intc:**edgemuls1800,intc:**umts128tdd1800,intc:**umtsfdd1800,intc:reset,intc:**resetdir,intc:llcfrmcnt,intc:**abt_rsn_usr,intc:abt_rsn_prvd,**intc:accrep,intc:abnmtype,**intc:de_cau,intc:rautype,intc:**rauattr,intc:evtdir,strc:**prefiximei,intc:isreass,intc:**ispriv_svc,bintc:usr_iplen_ul,**bintc:usr_iplen_dl,intc:bssgp_**stat_cau,bintc:wgt_rat_ul,**bintc:wgt_rat_dl,bintc:wgt_**rat_tot,strc:brand,strc:type,**intc:icmptype,intc:icmpcode,**intc:gobsrcip,intc:gobdstip,**intc:gob_iplen,intc:**gobsrcport,intc:gobdstport,**intc:gob_udplen")
    TBLPROPERTIES ("hbase.table.name" = "gb_xdr_hb");

    ------------------------------**------------------------------**
    ------------------------------**------------------------------**
    ------------------------------**------------------------------**
    ------------------------------**----
    the samples of data(10 rows, columns split by tab):
    102310007 52861470 2012-09-05 16:02:13.915000 2012-09-05 16:02:14.715000
    2796454154 SGSN206 1042644490 JMSM8B2 9858 65512 1 217 460004970453832
    CMNET 2 20 2 1 800 2020798986 951359443 1069 80 0 0 0 1 1 800 1 800 1 44
    10 0 0 0 10 0 0 0 1 0 2 0 0 0 0
    104630971 52863105 2012-09-05 16:02:14.475000 2012-09-05 16:02:14.715000
    2796454154 SGSN206 137396234 JMSM10B1 9287 30123 1 57 460020317031561
    012849007156581 2 20 2 1 240 2875977738 4166520175 50695 80 0 0 0 1 1
    240 1 240 1 37 10 10 0 0 1 0 2 01284900 0 0 0 0 iphone IPHONE_4
    101592006 9171377 2012-09-05 16:02:14.950000 2012-09-05 16:02:14.952000
    2796454154 SGSN206 488996362 JMSM8B2 9858 23572 1 460020272769766 8 2 0
    1 1 2 110 111 12 0 0 0 12 0 0 0 0 1
    106524291 52862516 2012-09-05 16:02:14.273000 2012-09-05 16:02:14.716000
    2796454154 SGSN206 623148554 JMSM3B1 9659 65402 1 68 460025200602650
    860980011055457 CMWAP 2 20 2 1 443 2597062922 2885681162 63447 80 0 0 0
    1 1 443 1 6 12 12 0 0 12 12 0 0 1 2 1 0 2 86098001 0 0 0 0
    102431545 52862667 2012-09-05 16:02:14.328000 2012-09-05 16:02:14.716000
    2796454154 SGSN206 171409418 JMSM10B3 9725 21353 1 52 460005630577558
    355712029701925 CMWAP 2 16 617 262 2 4 388 3099116298 2885681162 39088
    80 http://conf.3g.qq.com/newConf/**n <http://conf.3g.qq.com/newConf/n>
    conf.3g.qq.com application x-www-form-urlencoded 27 270694 0 1 1 380 380
    1 31 1 31 200 32 32 0 1 32 32 0 1 4 0 2 35571202 0 617 262 12280 2214
    24923 nokia 6120C
    106042934 9171378 2012-09-05 16:02:14.951000 2012-09-05 16:02:14.952000
    2796454154 SGSN206 976584714 JMSM4B2 10064 20422 1 460001861330414
    353990032645348 CMNET 8 1 0 1 1 1 36 74 10 0 0 0 10 0 0 0 0 1 35399003
    nokia 5300
    105595895 52744950 2012-09-05 16:01:33.551000 2012-09-05 16:02:14.716000
    2796454154 SGSN206 2251325450 JMSM10B2 9614 22341 1 24 460020322868605
    359026041942491 CMWAP 2 15 850 1001 3 4 41165 1697466378 2885681162
    33652 80 http://hot.imtt.qq.com/h?c=0&**key=1346813826<http://hot.imtt.qq.com/h?c=0&key=1346813826>
    hot.imtt.qq.com text html;charset=utf-8 27 270694 0 1 1 620 620 1 133 1
    133 200 32 32 0 0 32 32 0 0 4 0 2 35902604 0 850 1001 5432 7533 25760
    nokia 5233
    101642845 52006881 2012-09-05 15:57:14.717000 2012-09-05 15:57:14.717000
    2796454154 SGSN206 992247306 JMSM3B1 9659 26053 1 71 460020203688580 2
    15 361 1 0 2743708170 2885681162 33015 80 http://3g.zol.com.cn/images/**
    search.gif <http://3g.zol.com.cn/images/search.gif> 3g.zol.com.cn 27
    270922 15901 1 10 10 0 1 10 10 0 1 2 2 0 361
    106574535 52861851 2012-09-05 16:02:14.037000 2012-09-05 16:02:14.717000
    2796454154 SGSN206 2603909130 JMSM4B1 10063 24181 1 89 460023198308923
    354393000215150 CMWAP 2 20 2 1 680 4291926794 2885681162 65386 80 0 0 0
    1 1 680 1 680 1 5 12 0 0 0 12 0 0 0 1 0 2 35439300 0 0 0 0 unknown
    unknown
    101542475 52863714 2012-09-05 16:02:14.690000 2012-09-05 16:02:14.717000
    2796454154 SGSN206 827162378 JMSM7B1 10067 21041 1 26 460020225821456
    CMNET 2 21 1340 1 1 27 538878730 991169135 56952 80 0 0 21906 1 1 27 12
    12 0 1 12 12 0 1 1 2 1 1 1 0 1340 0 66503 66503

    ------------------------------**------------------------------**
    ------------------------------**------------------------------**
    ------------------------------**------------------------------**
    ------------------------------**----
    Why? Need your helps too! Thanks!

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupimpala-user @
categorieshadoop
postedMar 14, '13 at 2:30p
activeMar 21, '13 at 5:01p
posts6
users5
websitecloudera.com
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase